KNN (K-Nearest Neighbors) Algorithm
What is KNN?
KNN (K-Nearest Neighbors) is a simple, instance-based machine learning algorithm. For classification, it predicts the class of a new point based on the majority class among its K nearest neighbors in the training data.
How It Works
- Choose K: Select the number of neighbors to consider
- Calculate Distance: Find distances from new point to all training points
- Find Neighbors: Identify the K closest points
- Vote: Classification assigns the majority class; regression averages values
Distance Metrics
- Euclidean: Straight-line distance (most common)
- Manhattan: Sum of absolute differences
- Minkowski: Generalization of Euclidean and Manhattan
- Cosine: Angle-based similarity
Choosing K
- Small K (1-3): More sensitive to noise, complex boundaries
- Large K (10+): Smoother boundaries, may miss local patterns
- Odd K: Avoids ties in binary classification
- Rule of Thumb: Start with sqrtn where n is training size
Advantages
- Simple: Easy to understand and implement
- No Training: Just stores the data
- Adaptive: Naturally handles multi-class problems
- Non-parametric: No assumptions about data distribution
Disadvantages
- Slow Prediction: Must compare to all training points
- Memory Intensive: Stores entire training set
- Sensitive to Scale: Requires feature normalization
- Curse of Dimensionality: Performance degrades with many features
Related Terms
- Standard Scaling: Essential preprocessing for KNN
- Classification: Primary KNN application
- Distance: Core concept in KNN