How to Apply Functions in Pandas
Before we start: This Python tutorial is a part of our series of Python Package tutorials. The steps explained ahead are related to the sample project introduced here.
The Pandas apply() function lets you to manipulate columns and rows in a DataFrame. Let’s see how. First we read our DataFrame from a CSV file and display it.
Report_Card = pd.read_csv("Grades.csv")
Let’s assume we need to create a column called Retake, which indicates that if a student needs to retake an exam. And for that we will need to check if a student’s grade from any class is lower than 45. We can achieve this with the following code snippet :
Report_Card["Retake"] = Report_Card["Grades"].apply(lambda val: "Yes" if val < 45 else "No")
What we do here is simply assign a new column to the DataFrame called Retake, which is built from applying the lambda function above to evaluate every row of the Grades column. Since the Retake column didn’t exist before, Pandas creates it for us and assigns a Yes or No value for each row.
In this way, we have created a whole new column indicating if students need to take a retake exam for the respective classes.
Another example is to apply a reduction operation to the whole DataFrame. Let’s say we want to get the total credits and grades received by all students so we can determine the average. We would achieve this with the following code snippet:
import numpy as np credits = Report_Card[["Credits","Grades"]] credits.apply(np.sum)
Here we are using the sum function from the Numpy package to get the summation of all rows for the Credits and Grades columns separately. The result of the above code block is:
We could apply the same summation operation to the columns instead by using:
A more complex example will serve to show how you can use multiple values from multiple columns as arguments in the apply function. Let’s assume students can receive a credit bonus if they get an average grade higher than 75 for Mathematics, Geography or German. The following code block will:
- Define the formula for calculating the credit bonus
- Create a new column called Bonus
- Populate the each row within the Bonus column with the bonus value, or else 0
def bonus(lecture,grade,credits): if (lecture in ["Mathematics","Geography","German"]) and grade > 75: return (grade-75)/10*credits else: return 0 Report_Card["Bonus"] = Report_Card.apply (lambda row: bonus(row["Lectures"],row["Grades"],row["Credits"]), axis=1)
Now that you know how to apply functions in DataFrames using Python’s Pandas library, let’s move on to other things you can do with Pandas:
Get Pre-compiled Python Packages For Data Science, Web Development, Machine Learning, Code Quality And Security
If you’re one of the many engineers using Python to build your algorithms, ActivePython is the right choice for your projects Get The Machine Learning Packages You Need – No Configuration Required. We’ve built the hard-to-build packages so you don’t have to waste time on configuration…get started right away! Learn more about ActivePython here.
With deep roots in open source, and as a founding member of the Python Foundation, ActiveState actively contributes to the Python community. We offer the convenience, security and support that your enterprise needs while being compatible with the open source distribution of Python.
Download ActiveState Python to get started or contact us to learn more about using ActiveState Python in your organization.
You can also start by trying our mini ML runtime for Linux or Windows that includes most of the popular packages for Machine Learning and Data Science, pre-compiled and ready to for use in projects ranging from recommendation engines to dashboards.