Back to Topics

Batch Normalization

Description

Batch Normalization is a technique to improve the training of deep neural networks by normalizing the inputs of each layer. It stabilizes and accelerates training by reducing internal covariate shift — the change in the distribution of network activations during training.

  • Normalizes layer inputs to have zero mean and unit variance within a mini-batch.
  • Introduces two learnable parameters per activation: scale (γ) and shift (β) to restore the representational power of the network.
  • Allows higher learning rates, reduces sensitivity to initialization, and acts as a form of regularization.
Tip

Batch normalization is commonly used between the linear transformation and activation function layers.

Batch Normalization Illustration

Examples

Example of Batch Normalization in Keras:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Activation

model = Sequential([
    Dense(128, input_shape=(input_dim,)),
    BatchNormalization(),
    Activation('relu'),
    Dense(64),
    BatchNormalization(),
    Activation('relu'),
    Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

Real-World Applications

Autonomous Driving

Stabilizes training of deep perception models, improving accuracy in real-time environments.

Hardware-Efficient Models

Enables faster convergence, reducing computational resources during training.

Computer Vision

Used in convolutional neural networks (CNNs) for image classification and detection tasks.

Natural Language Processing

Improves training stability for transformer and recurrent models in NLP tasks.

Resources

Video Tutorials

below is the video resource

PDFs

The following documents

Recommended Books

Interview Questions

What problem does batch normalization solve?

Batch normalization reduces internal covariate shift by normalizing layer inputs, which stabilizes and accelerates the training process.

How does batch normalization improve training?

It allows higher learning rates, reduces sensitivity to initialization, provides some regularization effect, and helps gradients flow better through deep networks.

Where is batch normalization typically applied in a neural network?

Usually applied after the linear transformation (Dense or Conv layer) and before the activation function.