Time series data - too many dates
Analysing time series data and sales/finance data often involves dealing with a lot of dates! While it’s not hard to do date differences, aggregations in SQL and even Pandas (Python), the moment you need to start dealing with weekends, public holidays, it becomes a lot of coding! For example, which days in my dataset are work days? Lots of IF statements to figure that out!
The solution - build your own date dimension!
What on earth is a date dimension?
A date dimension is basically a small table that has a row for every single day (or even down to the hour), for a particular period. In addition, it includes other columns, such as whether it’s a weekend, fiscal year, etc.
Having this date dimension table lets you do filtering very simply - e.g. to find all the weekends, you just join on date and filter where ‘Day’ NOT IN [‘Saturday’, ‘Sunday’].
Date | Day | Fiscal Year | EndOfMonth |
---|---|---|---|
2018-01-01 | Monday | 2017 | 2018-01-31 |
2018-01-02 | Tuesday | 2017 | 2018-01-31 |
2018-01-03 | Wednesday | 2017 | 2018-01-31 |
... | ... | ... | ... |
2018-12-29 | Saturday | 2018 | 2018-12-31 |
2018-12-30 | Sunday | 2018 | 2018-12-31 |
All the navigation you need in one spot!
BYO date dimension!
Generally in a data warehouse, there will be a date dimension - in fact, it’s something Kimball mentions too.
But what if you don’t have a database handy? The solution - just build your own! An example below in Python using Pandas:
It even includes a workday YN column - very useful for dealing with finance time series data!
For example, if you have some finance time series data, you can find all the missing work days like below:
You can even export out the date dimension, such as df.to_csv(‘mydatedim.csv’) and then import it into your local friendly database!
Closing thoughts
So if you need a dimension date in a pinch to have an anchor when dealing with time series data, the Python BYO date dimension is a good way to go!