neuron (node/unit)
input + weight
function f = Activation Function
- Sigmoid σ = [0,1]
- tanh = [-1, 1]
- ReLU f(x) = max(0, x)
- Softmax: Probability (Pass) + Probability (Fail) = 1
Feedforward Neural Network
neurons, layers, edges, weights
only one direction
Multi Layer Perceptron (MLP)
Backpropagation algorithm
Backward Propagation of Errors (BackProp) = learning from mistakes
- Forward Propagation
- Random weight
- Input + output (training) -> V = f (1*w1 + 35*w2 + 67*w3)
- Back Propagation and Weight Updation
- calculate total error at the output nodes
- calculate the gradients (propagate these errors back through)
- optimization by Gradient Descent to ‘adjust’ all weights
https://www.cs.ryerson.ca/~aharley/vis/fc/
MNIST Database of handwritten digits
- Inputs: 28 x 28 image = 784 numeric pixel values
- first hidden layer: 300 nodes
- second hidden layer: 100 nodes
- output layer: 10 nodes (10 digits)
ConvNet
Operations:
- Convolution
- Non Linearity (ReLU)
- Pooling or Sub Sampling
- Classification (Fully Connected Layer)
Convolution
Image: three channels (red, green and blue) range 0 to 255 (0=black, 255=white)
grayscale image: one channel
3 x 3 matrix = filter / kernel / feature detector
output matrix = Convolved Feature / Feature Map / Activation Map
size of the Feature Map controlled by three parameters:
- Depth: number of filters
- Stride: number of pixels. larger stride = smaller feature maps
- Zero-padding (wide convolution)
Non Linearity (ReLU)
want our ConvNet to learn would be non-linear (Convolution is a linear operation)
The Pooling Step
Spatial Pooling / subsampling / downsampling
- reduces dimensionality but retains important information
- Max, Average, Sum etc
Fully Connected Layer
softmax activation: sum of output probabilities = 1
training process:
- randomly initialize filters and parameters / weights
- forward propagation
- goes through the step
- finds the output probabilities, i.e. [0.2, 0.4, 0.1, 0.3]
- Calculate the total error at the output layer
- Backpropagation
- calculate the gradients of the error
- minimize error: gradient descent – update all filter values / weights and parameter
- weights are adjusted
- output [0.1, 0.1, 0.7, 0.1], closer to the target vector [0, 0, 1, 0]
- only the values of the filter matrix and connection weights get updated
- Repeat steps 2-4
new image: forward propagation step and output a probability for each class (calculated using the weights of previous training data)
http://scs.ryerson.ca/~aharley/vis/conv/flat.html
https://www.cs.ryerson.ca/~aharley/vis/conv/
1024 pixels (32 x 32 image)
- Convolution Layer 1: six unique 5 × 5 (stride 1) filters
- feature map of depth six
- Pooling Layer 1: 2 × 2 max pooling (with stride 2)
- sixteen 5 × 5 (stride 1) convolutional filters
- 2 × 2 max pooling (with stride 2)
- three fully-connected (FC) layers
- 120 neurons in the first FC layer
- 100 neurons in the second FC layer
- 10 neurons in the third FC layer corresponding to the 10 digits
reference
https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/