Let's say you want to capture or query the date but discard a portion of it:
Instead of 2011-12-13 01:05:05.154, just 2011-12-13 00:00:00.000 order 2011-12-01 00:00:00.000
The reasons you would need this? Perhaps you want to filter records only by day. Perhaps you're dealing with a database that doesn't use complete datetime in any records and you want <=today.
I can think of a number of ways to do it, but I can't decide which is best.
First of all - it's fairly easy if you want to simply eliminate the time portion. A number of tools will do that, but the most primitive, I think, would be cast-convert:
select cast(convert(char(10),getdate(),120) as datetime)
1. 2011-12-13 01:05:05:154 becomes a string in this format
2. the string gets chomped to the first 10 characters 2011-12-13
3. the remaining string 2011-12-13 casts to a datetime and because no further information is given, all other values default to 00:00:00.000
But is that the most efficient way of getting the job done? Is there any advantage to using a timestyle like 112 so you don't have to manually truncate?
The reason I tend to use the method above is that I can use it to find the Floor of a date at whatever level:
To find the first day of the month:
cast(convert(char(7),getdate(),120)+'-01' as datetime)
Or the first day of the year:
select cast(convert(char(4),getdate(),120)+'-01-01' as datetime)
Or any other day or day month or time or whatever - simply hack off what you don't want and replace it with what you do want.
Of course, you would have to be careful with capturing the 31st of months - that would be more of a datediff function and I'm thinking less efficient.
But what about creating a date out of capturing the year, month, dayofmonth separately and assembling a date. Is there any reason we should be using datepart rather than using the text functions?
SQL Findings
While working with ASP.NET and fairly large databases for datamining and management, I keep stumbling upon solutions to oddball little problems. The purpose of this blog is not to declare these solutions as optimum - rather it is to elicit feedback on the techniques and learn through discussion with real pros.
Tuesday, 13 December 2011
Monday, 12 December 2011
Querying for Top 1 of group by (row_number() subqueries)
Often, there are clusters of records and the requirement is to return only one record per cluster based on a max or min of one of the values.
In the past, I've done this a number of ways
So here's the pseudo-query:
Get the entire set and capture the row_number() partitioned by your intended output, sorted by your maxkey column name (desc or asc)
Wrap this in an outer query and only capture rows with the row_number() = 1.
That's it.
So here's an example. You have a table with sellers, buyers, product and transaction date. You want to see all of the content from the table but only for the last transaction per buyer (to see what it was that was bought, who sold it).
my_table (seller_id (PK), buyer_id (PK), product_id(PK), transaction_date (PK))
-------------------------
select seller_id, buyer_id, product_id, transaction_date from
(
select row_number() over (partition by buyer_id order by transaction_date desc) rown
, seller_id, buyer_id, product_id, transaction_date
from my_table
) x where rown=1
-------------------------
So in reading a few places about expensive routines in SQL, I haven't seen issues mentioned with this type of technique. However, it's hard from this point to decide how nasty the magic of row_number() is for efficiency.
So time for feedback - is this an appropriate method for getting those top values? What is another method that could be better?
Thanks for reading...
In the past, I've done this a number of ways
- Group to get the max/min value (maxkey, shall we say) then create functions to retrieve the column values for primary keys plus maxkey within the source table
- Group to get the max/min value and create subqueries on the table within the select clause referencing the primary keys plus maxkey (same as #1 when it comes to performance)
- Collect all records handle the filtering of non maxkey records at the consumption level (outside of sql)
So here's the pseudo-query:
Get the entire set and capture the row_number() partitioned by your intended output, sorted by your maxkey column name (desc or asc)
Wrap this in an outer query and only capture rows with the row_number() = 1.
That's it.
So here's an example. You have a table with sellers, buyers, product and transaction date. You want to see all of the content from the table but only for the last transaction per buyer (to see what it was that was bought, who sold it).
my_table (seller_id (PK), buyer_id (PK), product_id(PK), transaction_date (PK))
-------------------------
select seller_id, buyer_id, product_id, transaction_date from
(
select row_number() over (partition by buyer_id order by transaction_date desc) rown
, seller_id, buyer_id, product_id, transaction_date
from my_table
) x where rown=1
-------------------------
So in reading a few places about expensive routines in SQL, I haven't seen issues mentioned with this type of technique. However, it's hard from this point to decide how nasty the magic of row_number() is for efficiency.
So time for feedback - is this an appropriate method for getting those top values? What is another method that could be better?
Thanks for reading...
Subscribe to:
Posts (Atom)