Lecture 5 discusses how neural networks can be trained using a distributed gradient descent technique known as back propagation. Key phrases: Neural networks. Forward computation. Backward propagation. Neuron Units. Max-margin Loss. Gradient checks. Xavier parameter initialization. Learning rates. Adagrad.