Need a Date Dimension Table? Why not BYO!

A dimension date that fits into your very own Jupyter Notebook!

Posted by Albert Cheng on 11 February 2020

Last Updated: 19 May 2020

Time series data - too many dates

Analysing time series data and sales/finance data often involves dealing with a lot of dates! While it’s not hard to do date differences, aggregations in SQL and even Pandas (Python), the moment you need to start dealing with weekends, public holidays, it becomes a lot of coding! For example, which days in my dataset are work days? Lots of IF statements to figure that out!

Wouldn’t it be nice to navigate finance data easily? Photo by energepic.com from Pexels

The solution - build your own date dimension!

What on earth is a date dimension?

A date dimension is basically a small table that has a row for every single day (or even down to the hour), for a particular period. In addition, it includes other columns, such as whether it’s a weekend, fiscal year, etc.

Having this date dimension table lets you do filtering very simply - e.g. to find all the weekends, you just join on date and filter where ‘Day’ NOT IN [‘Saturday’, ‘Sunday’].

Date Day Fiscal Year EndOfMonth
2018-01-01 Monday 2017 2018-01-31
2018-01-02 Tuesday 2017 2018-01-31
2018-01-03 Wednesday 2017 2018-01-31
... ... ... ...
2018-12-29 Saturday 2018 2018-12-31
2018-12-30 Sunday 2018 2018-12-31

All the navigation you need in one spot!

BYO date dimension!

Generally in a data warehouse, there will be a date dimension - in fact, it’s something Kimball mentions too.

But what if you don’t have a database handy? The solution - just build your own! An example below in Python using Pandas:

It even includes a workday YN column - very useful for dealing with finance time series data!

For example, if you have some finance time series data, you can find all the missing work days like below:

You can even export out the date dimension, such as df.to_csv(‘mydatedim.csv’) and then import it into your local friendly database!

Closing thoughts

So if you need a dimension date in a pinch to have an anchor when dealing with time series data, the Python BYO date dimension is a good way to go!