Python pandas is a powerful and widely used library for data analysis.
It comes up with 200+ functions and methods, making data manipulation and transformation easy. However, knowing all these functions and using them where required in the actual work isn’t a feasible task.
One of the common tasks in data manipulation is converting a column having continuous numerical values into a column containing discrete or categorical values. And pandas has two amazing built-in functions which can certainly save you a few minutes.
You can use such type of data transformation for a variety of applications like grouping data, analyzing data by discrete groups, or visualizing data using histograms.
Recently, I calculated Herfindahl-Hirschman Index (HHI) to understand the market concentration of multiple brands. So in a pandas DataFrame, I had a column with continuous values of HHI for all brands. Ultimately, I wanted to convert this column to a discrete one to categorize each brand as low, medium, and high market concentration — That’s where I got inspired for this story.
Without knowing these built-in pandas functions, you might need to write multiple if-else and for statements to get the same work done.
Therefore, here you’ll explore such 2 super-useful built-in pandas functions along with interesting examples (including my project), which will supercharge your data analysis and save you a couple of minutes.
Often you need to convert a column with continuous values into another column with discrete values in your analytics project.
So basically you categorize the continuous data into several categories, i.e. buckets or bins. And you can do so by either specifying minimum and maximum values for each bin, i.e. defining bin edges or by specifying the number of bins.
Depending on your purpose of splitting a continuous series into a discrete one, you can…