Categories



Articles


1 December 2020 · Reading Time: 6 min

Data Leakage in Machine Learning

Recently, I read a thread on Twitter about several Machine Learning papers that contained severe cases of data leakage. The authors of the papers seemed unaware of this phenomenon and therefore trained models that performed exceptionally well. Unfortunately, this was mainly due to data leakage. Not many beginners are aware of this problem and in my opinion, not many courses emphasize this issue early enough. Therefore, I would like to tell you all the things you need to know about data leakage and some ways to prevent it in this post....

1 September 2020 · Reading Time: 9 min

Detect Forged Banknotes with a Logistic Regression

Counterfeit money is a real problem both for individuals and for businesses. Counterfeiters constantly find new ways and techniques to produce fake banknotes, that are essentially indistinguishable from real money. At least for the human eye! Identifying forged banknotes is a typical example of a binary classification task in Machine Learning. If we have enough data of both real and forged banknotes, we can use this data to train a model that can classify new banknotes as either real or fake....

1 July 2020 · Reading Time: 7 min

Linear and Logistic Regression

Linear and Logistic regression are among the most elementary algorithms for supervised learning. Supervised Learning describes the situation where we deal with labelled data, which means that we have labelled inputs and a target variable. Despite the fact that both have the word “regression” in their name, only one of them is typically being used for solving regression problems! Let’s see how they work! Linear Regression Linear regression is possibly the easiest, most intuitive way of making a quantitative prediction....

1 May 2020 · Reading Time: 9 min

How to Create a Racing Bar Chart with Python

After reading this article from Pratap Vardhan with great interest, I wanted to build my own version of a Bar Chart Race that is smoother and a bit more beautiful. The biggest improvement is the interpolation (or augmentation) of the available data points in order to make the animation smoother. Here is the Bar Chart Race we are going to build in this article: For the purpose of this demonstration, we are going to use a GDP per capita forecast dataset provided by the OECD....

1 May 2020 · Reading Time: 5 min

k-Nearest Neighbors

k-Nearest Neighbors, or k-NN as I am going to call it from now on, is one of the easiest algorithms to solve classification tasks. It can be used for regression problems as well, but I am going to focus on the more common use case of classification in this post. In a nutshell, k-NN will assign a new data point to the class that the majority of its k neighbours in the training set belong to....