Data type conversions
Table of Contents
Description
Data Type Conversion (also called type casting) is the process of converting one data type into another. In Python and data science, it's crucial for ensuring compatibility between different data sources and preparing data for analysis or modeling.
Common conversions include:
String ↔ Integer/Float
Object ↔ DateTime
Integer ↔ Float
Categorical ↔ Numeric
In Pandas, this is often done using .astype() or functions like pd.to_numeric(), pd.to_datetime(), or pd.to_timedelta().
Prerequisites
- Basic Python data types
- Understanding of Pandas and NumPy
- Familiarity with data loading and cleaning
Examples
Here's a simple example of a data science task using Python:
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'age': ['25', '30', '28'], # String numbers
'salary': [40000.0, 50000.0, 45000.0],
'joining_date': ['2021-01-01', '2022-03-15', '2020-07-20']
})
# Convert 'age' from string to integer
df['age'] = df['age'].astype(int)
# Convert 'salary' from float to int
df['salary'] = df['salary'].astype(int)
# Convert 'joining_date' to datetime
df['joining_date'] = pd.to_datetime(df['joining_date'])
# Convert 'age' to string again
df['age'] = df['age'].astype(str)
print(df.dtypes) # Check data types after conversion
Real-World Applications
Healthcare
Convert string-based dates to datetime objects for tracking patient history
Finance
Change transaction amounts from strings to numeric types for aggregation
E-commerce
Transform product IDs to strings or customer ratings to integers
Where topic Is Applied
Finance
- Casting transaction amounts for analysis
- Converting date strings into date objects for time-series analysis
E-commerce
- Parsing product prices and quantities correctly
- Handling data imported from Excel or CSV
Manufacturing
- Converting time logs to datetime
- Handling numeric sensor readings
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ It's the process of converting data from one type to another, like int(), float(), or str()
➤ Use .astype() method or functions like pd.to_numeric().
➤ astype() strictly converts to the specified type and may raise an error; pd.to_numeric() allows more flexibility (e.g., errors='coerce')..
➤ To convert invalid parsing into NaN instead of throwing an error.
➤ When loading data from external sources like CSVs where numbers or dates are interpreted as strings.