Reading from CSV, Excel, JSON

Introduction Reading Time: 12 min

Table of Contents

Description

Data scientists often work with data stored in external files like .csv, .xlsx, and .json. Python’s Pandas library provides easy-to-use methods like read_csv(), read_excel(), and read_json() to load these files into DataFrames for analysis.

Prerequisites

  • Basic Python syntax
  • Familiarity with file formats: CSV, Excel, JSON
  • Installed libraries: pandas, openpyxl (for Excel)

Examples

Here's a simple example of a data science task using Python:


import pandas as pd

# Read a CSV file
csv_data = pd.read_csv('data/sample.csv')  # Make sure 'sample.csv' is in your working directory
print(csv_data.head())  # Display first 5 rows

# Read an Excel file
excel_data = pd.read_excel('data/sample.xlsx', engine='openpyxl')  # engine is required for .xlsx
print(excel_data.head())

# Read a JSON file
json_data = pd.read_json('data/sample.json')
print(json_data.head())

          

📝 Comments:
pd.read_csv() is used for comma-separated values files.
pd.read_excel() may need openpyxl or xlrd depending on the file.
pd.read_json() can read both flat and nested JSON, depending on orientation.

Real-World Applications

Finance


Import transaction logs, pricing data from CSV/Excel
Load JSON-formatted API data (e.g., stock tickers)

Healthcare


Excel files for patient records
JSON for medical device data exports

E-commerce:


Read product, order, and customer data from CSVs JSON data from REST APIs for real-time analytics

Where topic Is Applied

Finance

  • Bank statements, audit logs in CSV

Healthcare

  • EHR systems exporting Excel/JSON

Marketing

  • Reading analytics reports in JSON

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ Use pd.read_csv('filename.csv') to load the file into a DataFrame.

➤ Use sheet_name='Sheet1' in read_excel(), or sheet_name=None to read all sheets

➤ Use pd.json_normalize() to flatten nested JSON into tabular format.

➤ Python throws a FileNotFoundError.

➤ Typically openpyxl for .xlsx and xlrd for older .xls files.