Categories



Articles


14 September 2024 Â· Reading Time: 6 min

Simple Questions, Hard Answers

When business stakeholders ask data analysts seemingly simple questions, they often expect a quick and straightforward answer. On the surface, it seems like a piece of cake. But in the messy reality of data, what appears to be a simple question can quickly turn into a multi-layered onion of a problem - each layer revealing increasing complexity and ambiguity. Example from the insurance domain: Let’s take a practical example from the insurance industry....

1 December 2020 Â· Reading Time: 6 min

Data Leakage in Machine Learning

Recently, I read a thread on Twitter about several Machine Learning papers that contained severe cases of data leakage. The authors of the papers seemed unaware of this phenomenon and therefore trained models that performed exceptionally well. Unfortunately, this was mainly due to data leakage. Not many beginners are aware of this problem and in my opinion, not many courses emphasize this issue early enough. Therefore, I would like to tell you all the things you need to know about data leakage and some ways to prevent it in this post....

1 September 2020 Â· Reading Time: 9 min

Detect Forged Banknotes with a Logistic Regression

Counterfeit money is a real problem both for individuals and for businesses. Counterfeiters constantly find new ways and techniques to produce fake banknotes, that are essentially indistinguishable from real money. At least for the human eye! Identifying forged banknotes is a typical example of a binary classification task in Machine Learning. If we have enough data of both real and forged banknotes, we can use this data to train a model that can classify new banknotes as either real or fake....

1 July 2020 Â· Reading Time: 7 min

Linear and Logistic Regression

Linear and Logistic regression are among the most elementary algorithms for supervised learning. Supervised Learning describes the situation where we deal with labelled data, which means that we have labelled inputs and a target variable. Despite the fact that both have the word “regression” in their name, only one of them is typically being used for solving regression problems! Let’s see how they work! Linear Regression Linear regression is possibly the easiest, most intuitive way of making a quantitative prediction....

1 May 2020 Â· Reading Time: 9 min

How to Create a Racing Bar Chart with Python

After reading this article from Pratap Vardhan with great interest, I wanted to build my own version of a Bar Chart Race that is smoother and a bit more beautiful. The biggest improvement is the interpolation (or augmentation) of the available data points in order to make the animation smoother. Here is the Bar Chart Race we are going to build in this article: For the purpose of this demonstration, we are going to use a GDP per capita forecast dataset provided by the OECD....