Activation Functions in Neural Networks
An overview of activation functions for neural networks.
Neural Network Basics
A neural network consists of interconnected layers of simple processing units (neurons) that learn to map inputs to outputs by adjusting weights based on data.
Activation Functions in Neural Networks
Neural networks are computational models inspired by the human brain, composed of layers of interconnected neurons that learn patterns by propagating signals through weighted connections.
Activation functions introduce the essential non-linearity that allows these networks to capture complex relationships beyond simple linear mappings.
Activation functions introduce non-linear transformations in neural networks, enabling them to model complex relationships between inputs and outputs. Below are key functions commonly used in practice.
1. Sigmoid
σ(z) = 1/(1+e-z)
Maps inputs to (0,1); may suffer vanishing gradients for large |z|.
2. Tanh
tanh(z) = (ez - e-z)/(ez + e-z)
Zero-centered (-1,1); still prone to saturation at extremes.
3. ReLU & Leaky ReLU
ReLU(z) = max(0, z); Leaky(z) = max(α·z, z)
Simple, efficient; Leaky variant avoids dead neurons.
4. Softmax
softmax(zi) = ezi / Σj ezj
Converts vector to probabilities; used in classification output.
5. GELU
GELU(z) = z · Φ(z)
Smooth, probabilistic gating; popular in transformer models.
6. Swish
swish(z) = z · σ(z)
Smooth, self-gated; often improves deep network training.
7. Mish
mish(z) = z · tanh(ln(1 + ez))
Combines smoothness and strong nonlinearity; emerging favorite.
8. PReLU
prelu(z) = max(0, z) + a · min(0, z)
Learnable slope for negative inputs; flexible alternative to Leaky ReLU.
Practical Tips
- Use ReLU for most hidden layers to speed up training.
- Try Swish or Mish for deeper networks where smoothness aids gradients.
- Plot functions to understand behavior around zero and in saturation regions.
- Switch activation if you observe dead neurons or slow convergence.