Skip to Content
ModulesPandas TutorialIntroduction to Pandas

Introduction to Pandas

Pandas is one of the most powerful and widely used Python libraries for data manipulation and analysis. Its user-friendly data structures and rich set of functions make it a favorite among data scientists and analysts.


What is Pandas?

Pandas is an open-source Python library designed for:

  • Manipulating and analyzing structured data.
  • Handling large datasets efficiently.
  • Providing easy-to-use data structures such as Series (1D) and DataFrame (2D).

It builds on NumPy and seamlessly integrates with other popular libraries like Matplotlib and Scikit-learn.


Key Features and Use Cases

Key Features:

  1. Data Structures: Offers Series and DataFrame for handling one-dimensional and two-dimensional data.
  2. Indexing: Enables intuitive indexing and selection of data.
  3. Data Cleaning: Simplifies handling missing data and duplicates.
  4. Data Aggregation: Provides powerful GroupBy operations for summarizing data.
  5. File Handling: Supports reading from and writing to multiple file formats (CSV, Excel, SQL, JSON, etc.).
  6. Time Series: Offers tools for working with time-series data.
  7. Visualization: Includes built-in support for basic plotting.

Use Cases:

  • Data Wrangling: Cleaning and reshaping raw data for analysis.
  • Exploratory Data Analysis (EDA): Summarizing datasets to uncover patterns.
  • Data Aggregation: Summarizing sales, user behavior, and other metrics.
  • Time Series Analysis: Stock market data, IoT data, etc.
  • ETL Pipelines: Extracting, transforming, and loading data efficiently.

Installing Pandas

To install Pandas, use the following command:

pip install pandas

If you’re using Anaconda, Pandas comes pre-installed. To update:

conda update pandas

Verifying Installation

After installation, verify the version using:

import pandas as pd print(pd.__version__)

Importing and Basic Usage

To use Pandas, import it as pd (a convention in the Python community):

import pandas as pd

Creating a Series

A Series is a one-dimensional array-like object.

# Create a Series s = pd.Series([10, 20, 30, 40]) print(s)

Output:

0 10 1 20 2 30 3 40 dtype: int64

Creating a DataFrame

A DataFrame is a two-dimensional table-like data structure.

# Create a DataFrame data = { 'Name': ['Anika', 'Rahul', 'Sneha'], 'Age': [25, 30, 22], 'City': ['Delhi', 'Mumbai', 'Bangalore'] } df = pd.DataFrame(data) print(df)

Output:

Name Age City 0 Anika 25 Delhi 1 Rahul 30 Mumbai 2 Sneha 22 Bangalore

Reading a CSV File

Pandas makes it easy to load datasets from files.

# Reading a CSV file df = pd.read_csv('data.csv') print(df.head()) # Display the first 5 rows

Try It Yourself

Problem 1: Create a Series

Create a Pandas Series from a list of integers [1, 2, 3, 4, 5] and display it.

Show Code

import pandas as pd # Create a Series series = pd.Series([1, 2, 3, 4, 5]) print(series)

Problem 2: Create a DataFrame

Create a DataFrame with columns Product, Price, and Stock. Add sample data for 3 products and display the DataFrame.

Show Code

import pandas as pd # Create a DataFrame data = { 'Product': ['Laptop', 'Phone', 'Tablet'], 'Price': [80000, 30000, 20000], 'Stock': [50, 150, 100] } df = pd.DataFrame(data) print(df)

Pyground

Play with Python!

Output:

Last updated on