KNN (K-Nearest Neighbors) Algorithm

What is KNN?

KNN (K-Nearest Neighbors) is a simple, instance-based machine learning algorithm. For classification, it predicts the class of a new point based on the majority class among its K nearest neighbors in the training data.

How It Works

  1. Choose K: Select the number of neighbors to consider
  2. Calculate Distance: Find distances from new point to all training points
  3. Find Neighbors: Identify the K closest points
  4. Vote: Classification assigns the majority class; regression averages values

Distance Metrics

  • Euclidean: Straight-line distance (most common)
  • Manhattan: Sum of absolute differences
  • Minkowski: Generalization of Euclidean and Manhattan
  • Cosine: Angle-based similarity

Choosing K

  • Small K (1-3): More sensitive to noise, complex boundaries
  • Large K (10+): Smoother boundaries, may miss local patterns
  • Odd K: Avoids ties in binary classification
  • Rule of Thumb: Start with sqrtn where n is training size

Advantages

  • Simple: Easy to understand and implement
  • No Training: Just stores the data
  • Adaptive: Naturally handles multi-class problems
  • Non-parametric: No assumptions about data distribution

Disadvantages

  • Slow Prediction: Must compare to all training points
  • Memory Intensive: Stores entire training set
  • Sensitive to Scale: Requires feature normalization
  • Curse of Dimensionality: Performance degrades with many features

Related Terms

  • Standard Scaling: Essential preprocessing for KNN
  • Classification: Primary KNN application
  • Distance: Core concept in KNN