Home › Topics › Data Cleaning › Dropping/Filtering rows and columns

Dropping/Filtering rows and columns

Introduction Reading Time: 12 min

Description
Prerequisites
Examples
Real-World Applications
Where topic Is Applied
Resources
Interview Questions

Description

Dropping or filtering rows and columns is a fundamental step in data wrangling. It allows you to clean and reshape your dataset by removing unnecessary data, such as:
Rows or columns with missing or irrelevant values
Filtering rows based on conditions
Dropping duplicate rows or outliers
This helps in focusing on the most relevant features and improving the performance of data analysis and machine learning models.

Prerequisites

Basic understanding of Python and Pandas
Familiarity with DataFrame structure
Knowledge of Boolean indexing and conditions

Examples

Here's a simple example of a data science task using Python:


import pandas as pd

# Sample dataset
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda', 'Tom'],
    'Age': [25, 22, 29, 33, 40],
    'Gender': ['M', 'F', 'M', 'F', 'M'],
    'Salary': [50000, 60000, 45000, 52000, 49000]
}
df = pd.DataFrame(data)

# Drop a column by name
df_dropped_col = df.drop(columns=['Salary'])

# Drop a row by index
df_dropped_row = df.drop(index=2)

# Filter rows where Age is greater than 30
df_filtered = df[df['Age'] > 30]

# Drop columns with all missing values
df.dropna(axis=1, how='all')

# Drop duplicate rows
df_no_duplicates = df.drop_duplicates()

# Drop rows where Salary < 50000
df_salary_filter = df[df['Salary'] >= 50000]

# Display results
print(df_dropped_col)
print(df_filtered)

Real-World Applications

Healthcare

Remove patients with missing diagnosis data
Filter rows with critical conditions for urgent analysis

Finance

Drop transactions with missing merchant IDs
Filter clients based on credit score thresholds

Where topic Is Applied

Healthcare

Removing null test results
Filtering patient records by age or condition

Finance

Cleaning loan applications by dropping incomplete rows
Filtering transactions above a certain amount

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ Use df.drop(columns=['column_name']).

➤ Filtering selects rows/columns that meet conditions; dropping removes them directly.

➤ Use df.dropna() to remove rows with missing values.

➤ Use Boolean indexing: df[(df['Age'] > 25) & (df['Gender'] == 'F')]

➤ When it has too many missing values or offers no useful information (low variance).

Data Science in my style

Dropping/Filtering rows and columns

Table of Contents

Description

Prerequisites

Examples

Real-World Applications

Healthcare

Finance

Where topic Is Applied

Healthcare

Finance

Resources

Data Science topic PDF

Harvard Data Science Course

Interview Questions