Neural Network Basics

A neural network consists of interconnected layers of simple processing units (neurons) that learn to map inputs to outputs by adjusting weights based on data.

Activation Functions in Neural Networks

Neural networks are computational models inspired by the human brain, composed of layers of interconnected neurons that learn patterns by propagating signals through weighted connections.

Activation functions introduce the essential non-linearity that allows these networks to capture complex relationships beyond simple linear mappings.

Activation functions introduce non-linear transformations in neural networks, enabling them to model complex relationships between inputs and outputs. Below are key functions commonly used in practice.

1. Sigmoid

σ(z) = 1/(1+e^-z)

Maps inputs to (0,1); may suffer vanishing gradients for large |z|.

2. Tanh

tanh(z) = (e^z - e^-z)/(e^z + e^-z)

Zero-centered (-1,1); still prone to saturation at extremes.

3. ReLU & Leaky ReLU

ReLU(z) = max(0, z); Leaky(z) = max(α·z, z)

Simple, efficient; Leaky variant avoids dead neurons.

4. Softmax

softmax(z_i) = e^z_i / Σ_j e^z_j

Converts vector to probabilities; used in classification output.

5. GELU

GELU(z) = z · Φ(z)

Smooth, probabilistic gating; popular in transformer models.

6. Swish

swish(z) = z · σ(z)

Smooth, self-gated; often improves deep network training.

7. Mish

mish(z) = z · tanh(ln(1 + e^z))

Combines smoothness and strong nonlinearity; emerging favorite.

8. PReLU

prelu(z) = max(0, z) + a · min(0, z)

Learnable slope for negative inputs; flexible alternative to Leaky ReLU.

Practical Tips

Use ReLU for most hidden layers to speed up training.
Try Swish or Mish for deeper networks where smoothness aids gradients.
Plot functions to understand behavior around zero and in saturation regions.
Switch activation if you observe dead neurons or slow convergence.