SQL Findings

Tuesday 13 December 2011

Best SQL Date Truncator

Let's say you want to capture or query the date but discard a portion of it:
Instead of 2011-12-13 01:05:05.154, just 2011-12-13 00:00:00.000 order 2011-12-01 00:00:00.000

The reasons you would need this? Perhaps you want to filter records only by day. Perhaps you're dealing with a database that doesn't use complete datetime in any records and you want <=today.

I can think of a number of ways to do it, but I can't decide which is best.
First of all - it's fairly easy if you want to simply eliminate the time portion. A number of tools will do that, but the most primitive, I think, would be cast-convert:

select cast(convert(char(10),getdate(),120) as datetime)
1. 2011-12-13 01:05:05:154 becomes a string in this format
2. the string gets chomped to the first 10 characters 2011-12-13
3. the remaining string 2011-12-13 casts to a datetime and because no further information is given, all other values default to 00:00:00.000

But is that the most efficient way of getting the job done? Is there any advantage to using a timestyle like 112 so you don't have to manually truncate?

The reason I tend to use the method above is that I can use it to find the Floor of a date at whatever level:
To find the first day of the month:
cast(convert(char(7),getdate(),120)+'-01' as datetime)
Or the first day of the year:
select cast(convert(char(4),getdate(),120)+'-01-01' as datetime)

Or any other day or day month or time or whatever - simply hack off what you don't want and replace it with what you do want.

Of course, you would have to be careful with capturing the 31st of months - that would be more of a datediff function and I'm thinking less efficient.

But what about creating a date out of capturing the year, month, dayofmonth separately and assembling a date. Is there any reason we should be using datepart rather than using the text functions?

Monday 12 December 2011

Querying for Top 1 of group by (row_number() subqueries)

Often, there are clusters of records and the requirement is to return only one record per cluster based on a max or min of one of the values.

In the past, I've done this a number of ways

Group to get the max/min value (maxkey, shall we say) then create functions to retrieve the column values for primary keys plus maxkey within the source table
Group to get the max/min value and create subqueries on the table within the select clause referencing the primary keys plus maxkey (same as #1 when it comes to performance)
Collect all records handle the filtering of non maxkey records at the consumption level (outside of sql)

These have not been satisfying at all. But the use of row_number() under SQL2005 allows for a bit cleaner implementation. I would welcome feedback with respect to efficiency. I'd really like to believe that this is head and shoulders above the other options.

So here's the pseudo-query:

Get the entire set and capture the row_number() partitioned by your intended output, sorted by your maxkey column name (desc or asc)

Wrap this in an outer query and only capture rows with the row_number() = 1.

That's it.
So here's an example. You have a table with sellers, buyers, product and transaction date. You want to see all of the content from the table but only for the last transaction per buyer (to see what it was that was bought, who sold it).
my_table (seller_id (PK), buyer_id (PK), product_id(PK), transaction_date (PK))
-------------------------
select seller_id, buyer_id, product_id, transaction_date from
(
select row_number() over (partition by buyer_id order by transaction_date desc) rown
, seller_id, buyer_id, product_id, transaction_date
from my_table
) x where rown=1
-------------------------

So in reading a few places about expensive routines in SQL, I haven't seen issues mentioned with this type of technique. However, it's hard from this point to decide how nasty the magic of row_number() is for efficiency.

So time for feedback - is this an appropriate method for getting those top values? What is another method that could be better?

Thanks for reading...