Reading from CSV, Excel, JSON
Table of Contents
Description
Data scientists often work with data stored in external files like .csv, .xlsx, and .json. Python’s Pandas library provides easy-to-use methods like read_csv(), read_excel(), and read_json() to load these files into DataFrames for analysis.
Prerequisites
- Basic Python syntax
- Familiarity with file formats: CSV, Excel, JSON
- Installed libraries: pandas, openpyxl (for Excel)
Examples
Here's a simple example of a data science task using Python:
import pandas as pd
# Read a CSV file
csv_data = pd.read_csv('data/sample.csv') # Make sure 'sample.csv' is in your working directory
print(csv_data.head()) # Display first 5 rows
# Read an Excel file
excel_data = pd.read_excel('data/sample.xlsx', engine='openpyxl') # engine is required for .xlsx
print(excel_data.head())
# Read a JSON file
json_data = pd.read_json('data/sample.json')
print(json_data.head())
📝 Comments:
pd.read_csv() is used for comma-separated values files.
pd.read_excel() may need openpyxl or xlrd depending on the file.
pd.read_json() can read both flat and nested JSON, depending on orientation.
Real-World Applications
Finance
Import transaction logs, pricing data from CSV/Excel
Load JSON-formatted API data (e.g., stock tickers)
Healthcare
Excel files for patient records
JSON for medical device data exports
E-commerce:
Read product, order, and customer data from CSVs JSON data from REST APIs for real-time analytics
Where topic Is Applied
Finance
- Bank statements, audit logs in CSV
Healthcare
- EHR systems exporting Excel/JSON
Marketing
- Reading analytics reports in JSON
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ Use pd.read_csv('filename.csv') to load the file into a DataFrame.
➤ Use sheet_name='Sheet1' in read_excel(), or sheet_name=None to read all sheets
➤ Use pd.json_normalize() to flatten nested JSON into tabular format.
➤ Python throws a FileNotFoundError.
➤ Typically openpyxl for .xlsx and xlrd for older .xls files.