Date Islands and Data Warehousing

Data warehouse design has a big component about exposing the data to users in a way that is both logical and easy to use. After all, no one wants to go into a warehouse and spend hours trying to look for the Ikea flat pack that is out of stock!

This blog post explores of the problems that date ranges and how to get around them using some Kimball-like thinking.

Building a proper warehouse is important or you’ll get lost! Photo by Pashminu Mansukhani from Pixabay

All about those dates

The classic example of dealing with date ranges is if you have multiple dimension tables with different date ranges and you need to join them up before joining with a fact table.

For example, you have two tables are per below, sourced from a CRM system:

Customer Contract Dimension table

Customer ID	Contract GUID	Contract Start Date	Contract End Date
1	12b4-39A7	2011-01-05	2016-09-08
1	676H-AABC	2017-01-01	2019-10-27
2	91AF-110A	2011-01-08	2014-10-15
3	368A-192F	2017-05-04	2017-05-31

As above, the table has date ranges for contracts, where Customer ID is foreign key to the above table, with a surrogate key to uniquely identify each row.

Customer Dimension table

Status ID	Customer ID	Name	Status	Valid From	Valid To	Current Flag
1-2012-01-01	1	Mark	Gold	2012-01-01	2013-10-15	N
1-2013-10-16	1	Mark	Bronze	2013-10-16	2018-05-09	Y
1-2018-05-10	1	Mark	Silver	2018-05-10	9999-12-31	Y
3-2017-05-04	3	Jane	Silver	2017-05-04	9999-12-31	Y

The Customer Dimension table uses a Kimball-style Slowly Changing Dimension (Type 2).

Purchase Orders Fact table

Contract ID	Product	Purchase Date
12b4-39A7	Flat Pack Shelf	2011-12-17
91AF-110A	Compact Chair	2013-10-16

This fact table has all the purchases made, where the Contract ID is the primary key and part of the composite key in the Customer Contract table above.

Now, what if you wanted to query what the status of a customer was when the purchase order was made?

So many dates! I need an anchor!

With a date range problem, it is easier to visualise the ranges.

If we took the contract date ranges of one contract (12b4-39A7) and visualised it against the customer status, you would get this:

From this visualisation, you can see that if you want to check whether a status has at least a partial date range within the contract, it is a matter of comparing:

Red Line -> Contract From Date with Status To Date
Green Line -> Contract To Date with Status From Date

From there, it’s a simple comparison!

If the Status To Date is AFTER the Contract From Date, it’s covered!
If the Status From Date is BEFORE the Contract To Date, it’s covered!

A new dimension!

Now that we have the logic, we need to build a new dimension that has the appropriate surrogate key which brings both tables together! Using Kimball best practices, this will be an incrementing integer!

This means if you want to see a customer contract and status for a purchase date, you only need to do one join.

Customer Contract Status table

SK	Status ID	Contract GUID	From Date	To Date
11	1-2012-01-01	12b4-39A7	2012-01-01	2013-10-15
12	1-2013-10-16	12b4-39A7	2013-10-16	2016-09-08

As bolded in the above new dimension table, now there’s one date range we can reference - if you want to see the periods for the specific customer status and contracts, you just need to use the keys to join back to the above tables!

No need to do anymore date range joins! Also without the customer ID as a key, you don’t need to worry about accidental duplicates resulting from joining customer ID to multiple statuses/contracts.

No more messy date joins! Photo by Pixabay from Pexels

For example, if customer ID 1 was used to join status and contracts, you would get at least 3 rows - potentially even 6 (2 rows in customer contract x 3 rows in customer status).

Bringing it into SQL!

Using the above logic, to get a result with all three tables joined, you would do:

Closing thoughts

Navigating the date ranges is not too difficult when you have an anchor! In my case, my go-to is always that visualisation!

Date Islands - A Kimball Approach

Navigating the islands without sinking