Skip to Content
ModulesPandas TutorialData Aggregation and Grouping

Data Aggregation and Grouping in Pandas

Pandas provides powerful tools for grouping and aggregating data, allowing you to analyze datasets effectively. This page covers groupby operations, built-in aggregation methods, and applying custom functions.


GroupBy Operations

The groupby() method splits data into groups based on a column or set of columns, enabling aggregation or transformation on each group.

Example: Grouping Data by a Column

import pandas as pd # Sample DataFrame data = { 'Category': ['Electronics', 'Electronics', 'Clothing', 'Clothing', 'Groceries'], 'Sales': [1000, 1500, 800, 1200, 500], 'Quantity': [5, 7, 10, 4, 20] } df = pd.DataFrame(data) # Group by 'Category' grouped = df.groupby('Category') print(grouped)

Output:

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x...>

To view the grouped data:

for name, group in grouped: print(name) print(group)

Aggregations

Aggregation refers to performing operations like sum, mean, count, etc., on grouped data.

Built-In Aggregations

# Aggregate sales by category sales_sum = grouped['Sales'].sum() print(sales_sum)

Output:

Category Clothing 2000 Electronics 2500 Groceries 500 Name: Sales, dtype: int64

Multiple Aggregations

You can apply multiple aggregation functions to a column or across multiple columns.

# Apply multiple aggregations agg_result = grouped.agg({'Sales': ['sum', 'mean'], 'Quantity': 'sum'}) print(agg_result)

Output:

Sales Quantity sum mean sum Category Clothing 2000 1000.0 14 Electronics 2500 1250.0 12 Groceries 500 500.0 20

Applying Custom Functions

Custom aggregation functions can be applied using the agg() method or apply() method.

Example: Custom Aggregation with agg()

# Custom function to calculate range (max - min) def range_func(series): return series.max() - series.min() # Apply custom aggregation df_range = grouped['Sales'].agg(range_func) print(df_range)

Output:

Category Clothing 400 Electronics 500 Groceries 0 Name: Sales, dtype: int64

Example: Custom Transformation with apply()

# Apply a custom function to each group def normalize(group): return group - group.mean() normalized_sales = grouped['Sales'].apply(normalize) print(normalized_sales)

Output:

0 -250.0 1 250.0 2 -200.0 3 200.0 4 0.0 Name: Sales, dtype: float64

Try It Yourself

Problem 1: Aggregate Grouped Data

Given the following DataFrame:

import pandas as pd data = { 'Department': ['HR', 'HR', 'IT', 'IT', 'Sales'], 'Salary': [50000, 60000, 70000, 80000, 55000], 'Experience': [5, 7, 10, 12, 6] } df = pd.DataFrame(data)
  1. Group by Department.
  2. Calculate the total and average salary for each department.

Show Code

grouped = df.groupby('Department') # Total and average salary result = grouped['Salary'].agg(['sum', 'mean']) print(result)

Problem 2: Apply Custom Functions

  1. Use the same DataFrame.
  2. Create a custom function to calculate the square of each Salary value within groups and apply it.

Show Code

# Custom function def square_values(series): return series ** 2 squared_salaries = grouped['Salary'].apply(square_values) print(squared_salaries)

Pyground

Play with Python!

Output:

Last updated on