The dataset accommodates a log of 286,500 consumer interactions. It has details about the consumer, the interactions, timestamps, software utilized by the consumer, and so on. The schema of the information is as beneath:

The dataset snapshot is as beneath:

Information Cleansing

We blank up the dataset to take away information with lacking userId and sessionId and the place userId is empty, we’ve got 278,154 information, since these kinds of information have been for customers who had now not but logged in or have been taking a look to enroll. We additionally convert the date of registration and present time from timestamp to extra comprehensible date-time structure. The web page characteristic lists all of the imaginable consumer movements within the dataset.

We outline Churn as when a consumer plays Cancellation affirmation motion. There are two subscription ranges within the carrier — Unfastened and Paid. A consumer can improve or downgrade their subscription stage. We create a flag to spot when a consumer downgrades their account and if a consumer churned.


We have a look at distribution of a few options to spot patterns within the information.

The distribution of churned customers within the dataset displays that the dataset is closely imbalanced. Whilst that is usual for churn research issues, we will be able to wish to account for this throughout our modeling. We can wish to both steadiness our dataset by way of under-sampling or opting for the precise metrics to account for minority magnificence higher.

There are much more loose customers for our song streaming carrier than paid customers. We will have to additionally take a look at if there’s a predilection for churn amongst customers of a given subscription stage.

We understand that there’s a upper probability for paid customers churning than loose customers. This may well be a very powerful issue for predicting consumer churn.

We now have extra male customers than feminine customers however the numbers don’t vary by way of so much, so we will be able to imagine our dataset as most commonly balanced wrt gender.

We understand that the there’s a upper propensity for male customers to churn, in comparison to feminine customers. Shall we additionally analyze the rage with subscription stage within the combine.

We see that loose tier customers concentrate to a lot much less songs each consultation as in comparison to paid customers. Additionally customers who didn’t churn concentrate to reasonably extra songs in keeping with consultation on reasonable, particularly the paid tier customers.

After all let’s have a look at the preferred artists on our streaming carrier.


Please enter your comment!
Please enter your name here