Back to Topics

Epochs, Batches, Learning Rate

Description

When training deep learning models, three critical hyperparameters define the training process: Epochs, Batches, and Learning Rate.

  • Epoch: One complete pass through the entire training dataset.
  • Batch: A subset of the training dataset used to train the model in one forward/backward pass.
  • Learning Rate: The step size at which the model updates its weights after each batch.

The combination of these parameters influences training speed, model accuracy, and convergence stability.

Did You Know?

Using too high a learning rate can cause the model to diverge, while too low a rate might slow convergence drastically.

Examples

Here's how you set epochs, batch size, and learning rate in Keras and PyTorch:

TensorFlow/Keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

model = Sequential([
    Dense(64, activation='relu', input_shape=(100,)),
    Dense(10, activation='softmax')
])

# Compile with custom learning rate
optimizer = Adam(learning_rate=0.01)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train with specified epochs and batch size
model.fit(X_train, y_train, epochs=20, batch_size=64)

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(100, 64)
        self.fc2 = nn.Linear(64, 10)

    def forward(self, x):
        return torch.softmax(self.fc2(torch.relu(self.fc1(x))), dim=1)

model = Net()
optimizer = optim.Adam(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()
train_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=64, shuffle=True)

# Train for 20 epochs
for epoch in range(20):
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        output = model(batch_X)
        loss = criterion(output, batch_y)
        loss.backward()
        optimizer.step()

Real-World Applications

Health Diagnostics

Batch and learning rate tuning is crucial in training medical diagnosis models to avoid false positives or overfitting.

Robotics

Deep models controlling robotic motion are trained using small batches and adaptive learning rates for precision tasks.

Chatbots

Training language models with the right batch size and learning rate enhances conversational fluency and coherence.

Stock Prediction

Batch gradient descent and learning rate scheduling are vital to adapt models to changing financial data trends.

Resources

Video Tutorials

below is the video resource

PDFs

The following documents

Interview Questions

What is the difference between batch size and epoch?

Batch size is the number of samples processed before the model updates its weights. One epoch is when the entire dataset is passed once through the model.

How does the learning rate affect model training?

A high learning rate may lead to convergence issues, while a low one can slow training. Finding the right balance is critical for efficient learning.

What is mini-batch gradient descent?

Mini-batch gradient descent updates the model weights after computing the loss over a subset (mini-batch) of the training data, balancing speed and stability.