Many machine-learning algorithms need to have features on the same scale.
There are diffident types of feature scaling methods that we can choose in various scenarios. They have different (technical) names. The term Feature Scaling simply refers to any of those methods.
1. Feature scaling in different scenarios
a. Feature scaling in PCA
b. Feature scaling in k-means
c. Feature scaling in KNN and SVM
d. Feature scaling in linear models
e. Feature scaling in neural networks
f. Feature scaling in the convergence
g. Feature scaling in tree-based algorithms
h. Feature scaling in LDA
2. Feature scaling methods
3. Feature scaling and distribution of data
4. Data leakage when feature scaling
5. Summary of feature scaling methods
- Feature scaling in PCA: In principal component analysis, PCA components are highly sensitive to the relative ranges of the original features, if they are not measured on the same scale. PCA tries to choose the components that maximize the variance of the data. If the maximization of various occurs due to higher ranges of some features, those features may tend to dominate the PCA process. In this case, the true variance may not be captured by the components. To avoid this, we generally perform feature scaling before PCA. However, there are two exceptions. If there is no significant difference in the scale between the features, for example, one feature ranges between 0 and 1 and another ranges between 0 and 1.2, we do not need to perform feature scaling although there will be no harm if we do! If you perform PCA by decomposing the correlation matrix instead of the covariance matrix, you do not need to do feature scaling even though the features are not measured on the same…