Understanding the Log Loss Metric

Understanding how the log loss metric works and coding it from scratch

Temiloluwa Awoyele
5 min readOct 28, 2020
Photo by Kunal Shinde on Unsplash

The logarithmic loss(log loss) basically penalizes our model for uncertainty in correct predictions and heavily penalizes our model for making the wrong prediction.

In this article, we will understand how the log loss metric works and we will be coding the log loss metric from scratch ourselves.

Log Loss as the name implies is a loss metric and loss is not something we want, so we make sure to cut our losses to the barest minimum. We want our log loss score to be as small as possible, so we minimize our log loss. Log Loss can lie between 0 to Infinity.

The log loss metric is mainly for binary classification problems of 0’s and 1’s but can be extended to multi-class problems by one-hot encoding the targets and treating it as a multi-label classification problem. The log loss also works well with binary multi-label classification problems.

In a binary classification task, when we make our predictions using the .predict_proba( ) function, it returns two arrays of numbers, the first being the probabilities of the predictions being 0, while the other is the probabilities of the predictions being 1. Here, we are interested in the second array the array which is the probabilities of the predictions being 1.

The output looks like the array below

Screenshot%20from%202020-10-26%2002-55-31.png

The array on the left is the 0’s probabilities and the one on the right is the 1’s probabilities

We can also see that log loss takes in probabilities as predicted value unlike most metrics that uses the predicted class directly

To import the log loss metric from scikit-learn

The formula for log loss is

CodeCogsEqn%281%29.png

where yp is the probability of a point and yt is the target or label of a point either 0 or 1.

Log loss takes the probabilities of predictions from our model, then takes the ground truth of each point and applies the above formula on it, let’s break down the formula.

The part before the equals sign

image.png

could be read in plain English as “Negative Logarithm of yt given yp”.

The first part of the equation :

image.png

where yp is the probability of a point and yt is the target or label of a point either 0 or 1.

It takes the log of the probability and multiplies it with yt

The second part of the equation:

image.png

The right-hand side, i.e log(1 — yp) takes the probability of the point and subtracts it from 1 before taking the log of it, then the left-hand side (1 — yt) takes the class either 1 or 0 and also subtracts it from 1 then product of the two is taken.

Let’s take 4 examples:

Example 1: let yt = 1 and yp = 0.88

The first part of the equation:

image.png

The second part of the equation:

image.png

Putting the two equations together

image.png

Example 2: let yt = 0 and yp = 0.21

First part of equation:

image.png

The second part of the equation:

image.png

Putting the two equations together

image.png

Example 3: let yt = 1 and yp = 0.10

The first part of the equation:

image.png

The second part of the equation:

image.png

Putting the two equations together

image.png

Example 4: let yt = 0 and yp = 0.90

The first part of the equation:

image.png

The second part of the equation:

image.png

Putting the two equations together

image.png

You should have noticed by now that one part of the equation will return 0 depending on what the target yt is.

Then the formula is applied on each point and the average of the results is returned.

This can also be extended to multi-class problems, but the target will have to be one-hot encoded.

So now, we’ll be implementing the log_loss metric in code using scikit-learn metrics and also using our own custom function to see it coded from scratch with the help of NumPy.

Implementation of Log Loss from scratch and comparison to Scikit-Learn Log Loss Metric

We can see it’s exactly the same as scikit-learn result even down to the decimals, Good!!

Pros

  1. Leads to better probability estimation.

Cons:

  1. It is tricky to interpret since it could vary from 0 to infinity.

Thanks for Reading

I hope I’ve given you some understanding of the log loss metric. A little bit of motivation will be appreciated and you can do that by giving a clap👏. I am also open to questions and suggestions. You can share this with friends and other people or post on any of your favorite social media platforms so someone who needs this might stumble on this.

You can reach me on:

LinkedIn: https://www.linkedin.com/in/temiloluwa-awoyele/

Twitter: https://twitter.com/temmyzeus100

Github: https://github.com/temmyzeus

--

--