Pandas

Introduction Reading Time: 12 min

Table of Contents

Description

Pandas is a fast, powerful, and flexible open-source data analysis and manipulation library for Python. It provides two primary data structures:
Series: 1D labeled array.
DataFrame: 2D labeled, tabular structure.
Pandas makes it easy to perform indexing, filtering, grouping, merging, and cleaning on structured data.

Prerequisites

  • Understanding of basic Python
  • Familiarity with NumPy
  • Concept of rows and columns in tables

Examples

Here's a simple example of a data science task using Python:


import pandas as pd

# Creating a Series
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print("Series:\n", s)

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print("DataFrame:\n", df)

# Indexing
print("First row:\n", df.loc[0])
print("Name column:\n", df['Name'])

# Filtering
print("Age > 25:\n", df[df['Age'] > 25])

# Grouping
grouped = df.groupby('Age')
for age, group in grouped:
    print(f"\nGroup: Age = {age}\n", group)
          

Real-World Applications

Data Analysis

Load, clean, and analyze structured data from CSV, Excel, databases
Perform descriptive stats, trends, and summarizations

Image & Signal Processing

Representing pixel data as arrays
Applying filters via convolution

Finance

Time-series stock data analysis
Portfolio performance calculations

Where topic Is Applied

Healthcare

  • Analyzing patient records and lab test results
  • Grouping patients by age, disease, or treatment plans

E-commerce

  • User purchase behavior
  • Grouping orders by product/category

Machine Learning

  • Underlying numerical operations in models like linear regression, PCA
  • Data preprocessing and augmentation

Robotics

  • Coordinate transformations and movement control using arrays
  • Sensor data processing with broadcasting

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ Pandas is a Python library that provides data structures like Series and DataFrame for efficient manipulation and analysis of structured data.

➤ A Series is a 1-dimensional labeled array, whereas a DataFrame is a 2-dimensional table of data with rows and columns.

➤ You can use boolean indexing:
df[df['Age'] > 30] returns rows where Age > 30.

➤ groupby() is used to split data into groups based on a column and then apply functions like sum, mean, or count to each group.

➤ .loc[] uses labels (names), while .iloc[] uses integer-based indexing.