Comparing Dates and Timestamps in SQL
We (humans) intuitively think we understand dates and times. There’s a lot out there on the subject of how this creates problems when it comes to programming (both of those are good reads, by the way).
Within that space, something I’ve seen cause a decent amount of pain and frustration is the notion of date (i.e., YYYY-MM-DD) to timestamp (i.e., YYYY-MM-DD HH:MM:SS …) comparison in SQL. To sharpen my thinking on the subject, I wrote an initial version of this post, which I’ve expanded and tried to clean up a bit.
The TL;DR
When comparing dates and timestamps, you have to be careful about how they interact. Most SQL compliant databases are happy to do the backend conversions needed to do the comparison for you, but will - like all computers - follow a set of rules to compare the two data types.
If nothing else, it’s helpful to remember what happens when a database casts a date into a timestamp: YYYY-MM-DD becomes YYYY-MM-DD 00:00:00.000; this value is then used in the comparison. If you don’t account for the implicit cast, you’ll get results which don’t “make sense.”
A Note on Timezones
This problem we’re looking at here compounds when you factor in that timestamps can be timezone aware, and that timezones can cause two clock-times to be further apart or closer together than they’d appear just looking at the YYYY-MM-DD HH:MM:SS portion of the timestamp. For the rest of this post, I’m abstracting some of that complexity by assuming UTC.
A Concrete Example
The date five days ago is between the dates four and five days ago, but it isn’t between the current time five days ago and four days ago.
SELECT
CURRENT_DATE AS dt_today
, CURRENT_TIMESTAMP AS ts_today
, CURRENT_DATE - INTERVAL '4 DAYS' AS dt_four_days_ago
, CURRENT_DATE - INTERVAL '5 DAYS' AS dt_five_days_ago
, CURRENT_TIMESTAMP - INTERVAL '4 DAYS' AS ts_four_days_ago
, CURRENT_TIMESTAMP - INTERVAL '5 DAYS' AS ts_five_days_ago
, dt_five_days_ago BETWEEN dt_five_days_ago AND dt_four_days_ago
AS dt_comparison
, dt_five_days_ago BETWEEN ts_five_days_ago AND ts_four_days_ago
AS ts_comparison
You can see the reason for this by doing a TIMESTAMPDIFF
instead, and seeing how far outside of the BETWEEN
the value is:
SELECT
CURRENT_TIMESTAMP - INTERVAL '5 DAYS' AS ts_five_days_ago
, CURRENT_DATE - INTERVAL '5 DAYS' AS dt_five_days_ago
, TIMESTAMPDIFF(second, dt_five_days_ago, ts_five_days_ago)
AS ts_outside_range_second
… and it gets more fun. If you write a query to ask, well, how many days outside the bound is it, the answer is zero:
SELECT
CURRENT_TIMESTAMP - INTERVAL '5 DAYS' AS ts_five_days_ago
, CURRENT_DATE - INTERVAL '5 DAYS' AS dt_five_days_ago
, TIMESTAMPDIFF(day, dt_five_days_ago, ts_five_days_ago)
AS ts_outside_range_day
From this, it’s easy to see how where we get tripped up - there’s no difference between compared values at one granularity, but there is at a more nuanced granularity. Combine that with tooling which abstracts / hides away some of the complexity, and it makes it easy to get caught in a false equivalency.
Footgun: BETWEEN
is directional
It’s also worth noting that you can easily shoot yourself in the foot without even getting into the nuances of date-timestamp parsing. For example, directionality matters in BETWEEN
statements; if you write a query with a set of dates which nothing can ever be “between”, it won’t tell you you’ve made a mistake, it’ll just happily give you the (correct) results to your question.
SELECT
CURRENT_DATE - INTERVAL '4 DAYS' AS dt_four_days_ago
, CURRENT_DATE - INTERVAL '5 DAYS' AS dt_five_days_ago
, dt_five_days_ago BETWEEN dt_four_days_ago AND dt_five_days_ago
AS dt_comparison_reversed
Looker Dates and Timestamps
Looker has a really cool set of features around timestamps; dimension_group
. dimension_group
parameters save the user from having to write the truncation of timestamps of timestamp and date fields manually. But they make it really easy to fall into the trap we’ve been talking about. Indeed, Looker foresaw this, and gives the strong recommendation in their best-practice guidelines:
Don’t: Use formatted timestamps within joins. Instead, use the raw timeframe option for joining on any date or time fields. This will avoid the inclusion of casting and timezone conversion in join predicates.
You can, of course, do exactly what they recommend against on purpose. Joining on created_date
instead of created_raw
will do a text-join on TO_CHAR(created_at, “YYYY-MM-MM”) instead of on the timestamp, at which point the challenges and less-intuitive aspects of comparing dates and times becomes baked into every query Looker writes for you. The thing to be aware of here is that formatting a timestamp as a date for joining truncates the time-component, adding another layer of logic to unpack to understand the join condition (for the humans involved). Where you can avoid this extra logic (either by pre-transforming the data or doing a _raw
join), it makes the model easier to intuit.
Some fun examples of how Looker casts timestamps:
SELECT
CURRENT_DATE - INTERVAL '5 DAYS' AS dt_five_days_ago
, CURRENT_TIMESTAMP - INTERVAL '5 DAYS' AS ts_five_days_ago
, TO_CHAR(dt_five_days_ago, 'YYYY-MM-DD') AS lk_five_days_ago_date
, TO_CHAR(ts_five_days_ago, 'YYYY-MM-DD') AS lk_five_days_ago_time
, TO_CHAR(DATE_TRUNC('month', dt_five_days_ago), 'YYYY-MM') AS lk_five_days_ago_month
, TO_CHAR(
DATE_TRUNC('month', CAST(DATE_TRUNC('quarter', dt_five_days_ago) AS DATE)),
'YYYY-MM'
) AS lk_five_days_ago_quarter