### neuron (node/unit)

input + weight

function f = **Activation Function**

- Sigmoid σ = [0,1]
- tanh = [-1, 1]
- ReLU f(x) = max(0, x)
- Softmax: Probability (Pass) + Probability (Fail) = 1

### Feedforward Neural Network

neurons, layers, edges, weights

only one direction

#### Multi Layer Perceptron (MLP)

#### Backpropagation algorithm

Backward Propagation of Errors (BackProp) = learning from mistakes

**Forward Propagation**- Random weight
- Input + output (training) -> V =
*f*(1*w1 + 35*w2 + 67*w3)

**Back Propagation and Weight Updation**- calculate total error at the output nodes
- calculate the
*gradients*(propagate these errors back through) - optimization by
*Gradient Descent*to ‘adjust’ all weights

https://www.cs.ryerson.ca/~aharley/vis/fc/

**MNIST Database of handwritten digits**

- Inputs: 28 x 28 image = 784 numeric pixel values
- first hidden layer: 300 nodes
- second hidden layer: 100 nodes
- output layer: 10 nodes (10 digits)

**ConvNet**

**Operations:**

- Convolution
- Non Linearity (ReLU)
- Pooling or Sub Sampling
- Classification (Fully Connected Layer)

**Convolution**

**Image**: three **channels **(red, green and blue) range 0 to 255 (0=black, 255=white)**grayscale **image: one channel

3 x 3 matrix = **filter **/ kernel / feature detector

output matrix = Convolved Feature / **Feature Map** / Activation Map

**size of the Feature Map controlled by three parameters:**

**Depth:**number of filters

**Stride:**number of pixels. larger stride = smaller feature maps**Zero-padding**(*wide convolution)*

**Non Linearity (ReLU)**

want our ConvNet to learn would be non-linear (Convolution is a linear operation)

**The Pooling Step**

Spatial Pooling / subsampling / downsampling

- reduces dimensionality but retains important information
- Max, Average, Sum etc

**Fully Connected Layer**

softmax activation: sum of output probabilities = 1

**training process:**

- randomly initialize filters and parameters / weights
- forward propagation
- goes through the step
- finds the output probabilities, i.e. [0.2, 0.4, 0.1, 0.3]

- Calculate the total error at the output layer
- Backpropagation
- calculate the gradients of the error
- minimize error: gradient descent – update all filter values / weights and parameter
- weights are adjusted
- output [0.1, 0.1, 0.7, 0.1], closer to the target vector [0, 0, 1, 0]
- only the values of the filter matrix and connection weights get updated

- Repeat steps 2-4

new image: forward propagation step and output a probability for each class (calculated using the weights of previous training data)

http://scs.ryerson.ca/~aharley/vis/conv/flat.html

https://www.cs.ryerson.ca/~aharley/vis/conv/

1024 pixels (32 x 32 image)

- Convolution Layer 1: six unique 5 × 5 (
**stride**1) filters- feature map of
**depth**six

- feature map of
- Pooling Layer 1: 2 × 2 max pooling (with stride 2)
- sixteen 5 × 5 (stride 1) convolutional filters
- 2 × 2 max pooling (with stride 2)
- three fully-connected (FC) layers
- 120 neurons in the first FC layer
- 100 neurons in the second FC layer
- 10 neurons in the third FC layer corresponding to the 10 digits

#### reference

https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/

https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/