Deep-Learning

1. Fundamentals of Backpropagation

Forward Pass to Compute Output
- The forward pass involves passing input data through the network layers to compute the final output.
- Each neuron applies a weighted sum of its inputs followed by an activation function.
- Outputs are calculated layer-by-layer until the network produces predictions for the given input.
Backward Pass to Calculate Gradients Using the Chain Rule
- The backward pass calculates the gradients of the loss function with respect to each weight and bias in the network.
- Gradients are computed layer-by-layer in reverse order, starting from the output layer back to the input layer.
- The chain rule of calculus is applied to propagate errors backward through the network:
  - Compute the derivative of the loss with respect to the output of each neuron.
  - Use these derivatives to compute the gradients for weights and biases.
- Handles inter-layer dependencies efficiently, ensuring correct updates for all parameters.
Weight Updates Based on Computed Gradients
- After gradients are calculated, the weights and biases are updated to minimize the loss function.
- Update rule:
  - ( w_{new} = w_{old} - \eta \cdot \nabla w )
  - ( \eta ): Learning rate, determining the step size for updates.
  - ( \nabla w ): Gradient of the loss with respect to the weight.
- Repeated iterations of forward and backward passes refine the weights, improving the network’s predictions.

Backpropagation in Feedforward Neural Networks
- Feedforward networks use backpropagation to train weights across fully connected layers.
- Applications:
  - Image classification, where layers progressively learn hierarchical features.
  - Regression tasks, such as predicting house prices based on input features.
Backpropagation Through Time (BPTT) for Recurrent Networks
- An extension of backpropagation for sequential data in RNNs.
- Handles dependencies across time steps by unrolling the network across the sequence.
- Steps:
  - Forward pass through the entire sequence to compute the output and loss.
  - Backward pass through the unrolled network to compute gradients across all time steps.
- Challenges:
  - High memory requirements for storing intermediate states across time steps.
  - Risk of vanishing or exploding gradients in long sequences.
Examples
- Gradient Calculation:
  - Example: A small network with a single hidden layer calculates the loss gradients for weights and biases using backpropagation.
- Error Minimization:
  - Example: Minimizing cross-entropy loss in a binary classification task using iterative updates of weights.
- Training Small Networks:
  - Example: A simple XOR classification problem solved using backpropagation, demonstrating the power of non-linear activation functions.

This site is open source. Improve this page.