Logistic Regression Algorithm

What is Logistic Regression?

Logistic Regression is a statistical model used for binary classification problems. Despite its name, it's a classification algorithm that predicts the probability of an outcome belonging to one of two classes.

How It Works

  1. Linear Combination: Calculate weighted sum of features
  2. Sigmoid Function: Transform to probability (0-1)
  3. Threshold: Classify based on probability (typically 0.5)

Formula

P(Y=1) = 1 / (1 + e^(-z))
where z = b0 + b1*x1 + b2*x2 + ... + bn*xn

Advantages

  • Simple and Fast: Quick to train and predict
  • Interpretable: Coefficients indicate feature importance
  • Probability Output: Provides confidence in predictions
  • No Hyperparameters: Minimal tuning required
  • Works with Small Data: Doesn't require large datasets

Disadvantages

  • Linear Boundaries: Cannot capture complex relationships
  • Binary Only: Standard form only handles two classes
  • Feature Engineering: Requires careful feature preparation

When to Use

  • Binary classification problems
  • When interpretability is important
  • As a baseline model for comparison
  • When you need probability estimates

Applications in CMMI-DCC

  • Disease Classification: Disease vs. healthy
  • Treatment Response: Responder vs. non-responder
  • Feature Importance: Understanding which features predict outcomes

Related Terms

  • Classification: Type of ML task
  • Random Forest: More complex alternative
  • ROC-AUC: Evaluation metric for classifiers