Skip to Content
ModulesPandas TutorialReal Life examples

Real-World Examples with Pandas

Pandas is a go-to tool for real-world data analysis, offering a wide array of functionalities to handle, analyze, and visualize data efficiently. This page showcases practical use cases and case studies to demonstrate the power of Pandas in action.


Pandas in Data Analysis

Example 1: Analyzing Sales Data

Scenario: A company wants to analyze its sales data to find trends and identify top-performing products.

import pandas as pd # Sample sales data data = { "Product": ["A", "B", "C", "A", "B", "C"], "Region": ["North", "South", "East", "West", "North", "South"], "Sales": [150, 200, 300, 100, 250, 400] } df = pd.DataFrame(data) # Analyze total sales by product product_sales = df.groupby("Product")["Sales"].sum() print("Total Sales by Product:\n", product_sales) # Analyze total sales by region region_sales = df.groupby("Region")["Sales"].sum() print("\nTotal Sales by Region:\n", region_sales)

Output:

Total Sales by Product: Product A 250 B 450 C 700 Name: Sales, dtype: int64 Total Sales by Region: Region East 300 North 400 South 600 West 100 Name: Sales, dtype: int64

Example 2: Handling Missing Data in Weather Reports

Scenario: A meteorological department needs to clean and analyze temperature data with missing values.

# Sample temperature data with missing values data = { "City": ["Delhi", "Mumbai", "Chennai", "Kolkata"], "Temperature": [40, None, 35, None] } df = pd.DataFrame(data) # Fill missing values with the average temperature df["Temperature"] = df["Temperature"].fillna(df["Temperature"].mean()) print(df)

Output:

City Temperature 0 Delhi 40.000000 1 Mumbai 37.500000 2 Chennai 35.000000 3 Kolkata 37.500000

Case Studies

Case Study 1: Customer Segmentation for Marketing

Problem: A retail company wants to segment its customers based on their purchasing behavior.

# Sample customer data data = { "CustomerID": [1, 2, 3, 4], "Purchases": [500, 300, 700, 200], "Region": ["North", "South", "North", "East"] } df = pd.DataFrame(data) # Segment customers into high, medium, and low spenders df["Segment"] = pd.cut(df["Purchases"], bins=[0, 300, 600, 1000], labels=["Low", "Medium", "High"]) print(df)

Output:

CustomerID Purchases Region Segment 0 1 500 North Medium 1 2 300 South Low 2 3 700 North High 3 4 200 East Low

Case Study 2: Time Series Analysis for Energy Usage

Problem: An energy company wants to analyze hourly energy usage data to detect peaks and valleys.

# Sample energy usage data data = { "Time": pd.date_range("2023-01-01", periods=6, freq="H"), "Usage": [100, 120, 150, 90, 80, 200] } df = pd.DataFrame(data) # Resample data to daily total usage daily_usage = df.resample("D", on="Time").sum() print(daily_usage)

Output:

Usage Time 2023-01-01 740

Try It Yourself

Problem 1: Analyze Movie Ratings

Create a DataFrame containing movie names, their genres, and ratings. Group the movies by genre and find the average rating for each genre.

Show Code

import pandas as pd # Sample movie data data = { "Movie": ["Movie1", "Movie2", "Movie3", "Movie4"], "Genre": ["Action", "Comedy", "Action", "Drama"], "Rating": [4.5, 3.8, 4.7, 4.0] } df = pd.DataFrame(data) # Group by genre and calculate average rating genre_ratings = df.groupby("Genre")["Rating"].mean() print(genre_ratings)

Problem 2: Analyze Employee Salaries

Create a DataFrame with employee IDs, departments, and salaries. Calculate the total and average salary for each department.

Show Code

import pandas as pd # Sample employee data data = { "EmployeeID": [1, 2, 3, 4], "Department": ["HR", "Finance", "HR", "IT"], "Salary": [50000, 70000, 55000, 65000] } df = pd.DataFrame(data) # Group by department and calculate total and average salary total_salary = df.groupby("Department")["Salary"].sum() avg_salary = df.groupby("Department")["Salary"].mean() print("Total Salary by Department:\n", total_salary) print("\nAverage Salary by Department:\n", avg_salary)

Pyground

Play with Python!

Output:

Last updated on