Feature scaling is important in machine learning because many algorithms perform better or converge faster when features are on the same scale. Here’s why:
🔍 1. Algorithms Rely on Distance (e.g., KNN, SVM, K-Means)
- These algorithms use Euclidean distance or dot products to compute similarity.
- If features have different scales (e.g., height in cm vs income in lakhs), one can dominate the calculation.
- Example:
- Feature A (0–1000)
- Feature B (0–1)
→ Feature A will dominate unless scaled.
📉 2. Gradient Descent Convergence (used in Linear Regression, Neural Nets)
- Gradient descent optimizes parameters iteratively.
- If features are on different scales, the cost function becomes elongated, and gradient steps take longer.
- Feature scaling makes the surface smoother and symmetric, leading to faster and more stable convergence.
📈 3. Improves Model Performance
- Some models assume that all features are centered around 0 and have unit variance (e.g., PCA, Logistic Regression).
- Scaling improves numerical stability and reduces overfitting in regularized models like Ridge/Lasso.
✅ 4. Makes Model Weights Interpretable
- In linear models, unscaled features can lead to misleading interpretations of weights.
- Scaling helps you compare coefficients to understand which feature contributes more.
Feature scaling helps ensure that all features contribute equally to the model and speeds up training.
- Normalization
- X – Xmin / Xmax – Xmin
- Standardization
- X-avg/ standard_devision