Albert Cheng

Data Engineering. Machine Learning. Data & Cloud. Data & Law.


Blog Posts


Target Identified! How Advanced Fingerprinting is Used to Identify Cyber Attackers

Identifying the bad actors!





Unleashing the Power of AI - Learning to Learn with Little Real-World Examples

Overcoming real-world limitations of data availability





The Machine Learning Minefield - How to Avoid Getting Hit by Machine Learning Poisoning

Data corruption that destroys machine learning systems





The Art of Keeping Secrets - Data Anonymization and Synthetic Data Explained

How to safely handle personally identifiable information





How the Pandemic Impacted Human Traffic

Diving into mobility traffic and seeing how COVID-19 impacted human movement





X-Ray Vision for Machine Learning - How Model Interpretability demystifies the ML black-box

The 'why' is just as important as the 'what'





A guide on Airflow best practices

Go with the flow!





Special Sauce of ML - Heuristics and Feature Engineering

The best way to improve ML isn't always in the algorithm





From Chaos to Control - Using Automated Testing for Big Data Pipelines

Like seat belts, once you have em', you appreciate them a lot!





DataOps Part 2 - using data to find data

Data-driven techniques apply to finding data as well.





DataOps - using data to manage data

The way to manage big data is using data





Data science/analytics pitfalls Part 2 - a focus on machine learning

This time focusing on machine learning!





Exploring some common data science/analytics pitfalls

It's a trap!





From Zero to Kimball - A Guide and Tips on Good Data Warehousing Practices

Exploring Kimball dimension modelling best practices





Unlocking the Mystery of Google Analytics and Advertising

Why do I keep seeing ads about shoes!?





Property vs Shares - Which One is the Better Investment?

Let's see which one wins!





Lock It Up - Keeping Your Data Safe with Security Best Practices

The prevention is definitely better than the cure!





Sailing Through Data Privacy Waters - A Recap on How Data Protection Laws Work

A practical guide to privacy law in the context of data work





Event, Action! A Beginner's Guide to Event-Driven Architecture

Fully embracing the scalability and flexilibility of cloud serverless infrastructure





Time Series Forecasting - Different to Regular Machine Learning

An example of using data science to predict electricity usage





Navigating the date islands and finding the gaps - Part 2

Mind the Gap!





Need a Date Dimension Table? Why not BYO!

A dimension date that fits into your very own Jupyter Notebook!





Date Islands - A Kimball Approach

Navigating the islands without sinking





The Secret Sauce of Scalability - Building data pipelines without servers!

A Guide to Serverless Data Pipelines without the big bill!





The Prophets have Spoken! A Quick Exploration of Weather Forecasting

An experiment using Facebook's Prophet library





How it all started - From TV to Data Visualisation

A data visualisation journey from humble beginnings





A Peek Under the Hood - this Website's Architecture and Analytics

Explaining how this website infrastructure works