Date Islands - A Kimball Approach

Navigating the islands without sinking

Posted by Albert Cheng on 08 February 2020

Last Updated: 18 May 2020

Date Islands and Data Warehousing

Data warehouse design has a big component about exposing the data to users in a way that is both logical and easy to use. After all, no one wants to go into a warehouse and spend hours trying to look for the Ikea flat pack that is out of stock!

This blog post explores of the problems that date ranges and how to get around them using some Kimball-like thinking.

Building a proper warehouse is important or you’ll get lost! Photo by Pashminu Mansukhani from Pixabay

All about those dates

The classic example of dealing with date ranges is if you have multiple dimension tables with different date ranges and you need to join them up before joining with a fact table.

For example, you have two tables are per below, sourced from a CRM system:

Customer Contract Dimension table

Customer ID Contract GUID Contract Start Date Contract End Date
1 12b4-39A7 2011-01-05 2016-09-08
1 676H-AABC 2017-01-01 2019-10-27
2 91AF-110A 2011-01-08 2014-10-15
3 368A-192F 2017-05-04 2017-05-31

As above, the table has date ranges for contracts, where Customer ID is foreign key to the above table, with a surrogate key to uniquely identify each row.

Customer Dimension table

Status ID Customer ID Name Status Valid From Valid To Current Flag
1-2012-01-01 1 Mark Gold 2012-01-01 2013-10-15 N
1-2013-10-16 1 Mark Bronze 2013-10-16 2018-05-09 Y
1-2018-05-10 1 Mark Silver 2018-05-10 9999-12-31 Y
3-2017-05-04 3 Jane Silver 2017-05-04 9999-12-31 Y

The Customer Dimension table uses a Kimball-style Slowly Changing Dimension (Type 2).

Purchase Orders Fact table

Contract ID Product Purchase Date
12b4-39A7 Flat Pack Shelf 2011-12-17
91AF-110A Compact Chair 2013-10-16

This fact table has all the purchases made, where the Contract ID is the primary key and part of the composite key in the Customer Contract table above.

Now, what if you wanted to query what the status of a customer was when the purchase order was made?

So many dates! I need an anchor!

With a date range problem, it is easier to visualise the ranges.

If we took the contract date ranges of one contract (12b4-39A7) and visualised it against the customer status, you would get this:

From this visualisation, you can see that if you want to check whether a status has at least a partial date range within the contract, it is a matter of comparing:

  • Red Line -> Contract From Date with Status To Date
  • Green Line -> Contract To Date with Status From Date

From there, it’s a simple comparison!

  • If the Status To Date is AFTER the Contract From Date, it’s covered!
  • If the Status From Date is BEFORE the Contract To Date, it’s covered!

A new dimension!

Now that we have the logic, we need to build a new dimension that has the appropriate surrogate key which brings both tables together! Using Kimball best practices, this will be an incrementing integer!

This means if you want to see a customer contract and status for a purchase date, you only need to do one join.

Customer Contract Status table

SK Status ID Contract GUID From Date To Date
11 1-2012-01-01 12b4-39A7 2012-01-01 2013-10-15
12 1-2013-10-16 12b4-39A7 2013-10-16 2016-09-08

As bolded in the above new dimension table, now there’s one date range we can reference - if you want to see the periods for the specific customer status and contracts, you just need to use the keys to join back to the above tables!

No need to do anymore date range joins! Also without the customer ID as a key, you don’t need to worry about accidental duplicates resulting from joining customer ID to multiple statuses/contracts.

No more messy date joins! Photo by Pixabay from Pexels

For example, if customer ID 1 was used to join status and contracts, you would get at least 3 rows - potentially even 6 (2 rows in customer contract x 3 rows in customer status).

Bringing it into SQL!

Using the above logic, to get a result with all three tables joined, you would do:

Closing thoughts

Navigating the date ranges is not too difficult when you have an anchor! In my case, my go-to is always that visualisation!