Value counts, groupby summaries

Introduction Reading Time: 12 min

Table of Contents

Description

value_counts() and groupby() are fundamental operations in Pandas used for summarizing categorical and grouped data:
value_counts(): Counts occurrences of each unique value in a Series.
groupby(): Groups data based on one or more columns and applies aggregation functions (like mean(), sum(), count(), etc.) to summarize each group.
These are powerful tools in exploratory data analysis (EDA).

Prerequisites

  • Python basics,Pandas library
  • Understanding of DataFrames and Series
  • Familiarity with aggregation functions (mean(), sum(), etc.)

Examples

Here's a simple example of a data science task using Python:


import pandas as pd

# Sample dataset
data = {
    'Department': ['HR', 'HR', 'IT', 'IT', 'IT', 'Sales', 'Sales'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
    'Salary': [50000, 52000, 60000, 58000, 62000, 45000, 47000]
}

df = pd.DataFrame(data)

# Value counts: Frequency of each department
dept_counts = df['Department'].value_counts()
print("Value Counts:\n", dept_counts)

# Groupby summaries: Average salary by department
salary_summary = df.groupby('Department')['Salary'].mean()
print("\nAverage Salary by Department:\n", salary_summary)

# Groupby with multiple aggregates
grouped_summary = df.groupby('Department')['Salary'].agg(['mean', 'sum', 'max', 'min', 'count'])
print("\nGroupBy with Multiple Aggregations:\n", grouped_summary)

          

Real-World Applications

Value counts for transaction types, Grouping expenses by category/month/year

Healthcare: Counting diagnosis types Summarizing average treatment cost per department

E-commerce: Number of orders per product/category, Grouping by user to find average order value

Where topic Is Applied

Finance

  • Grouping accounts by type or branch

Retail

  • Count of sold items by category

Logistics

  • Deliveries grouped by region, value counts by carrier

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ It returns a Series containing counts of unique values in a Series.

➤ When you need to summarize or aggregate data based on categories or groups.

➤ Yes, using .agg() with a list of functions like ['mean', 'sum', 'count'].

➤ groupby() is more programmatic and flexible, while pivot_table() is table-oriented and used for reshaping.

➤ A hierarchical index (MultiIndex) is created, and aggregation is applied per combination of those columns.