Derivation

$$ \text{Probability of success:}\\(P(y=1|x): P(y=1|x) = p\\ \text{Probability of failure:}\\ (P(y=0|x))):
P(y=0|x) = 1 - p\\

\text{Odds} = \frac{p}{1-p}\\

\text{log-odds} = \log(\frac{p}{1-p}) = w^T x + b

$$

Transforming back into probabilities using the sigmoid function:

$$ p = \frac{1}{1 + e^{-(w^T x + b)}}  $$

The goal is to maximize the log-likelihood (or minimize the negative log-likelihood) to find w and b. The likelihood loss function function represents the probability of observing the data:

$$     L(w, b) = \prod_{i=1}^n P(y_i|x_i)  $$

Subsituting sigmoid

$$     L(w, b) = \prod_{i=1}^n p_i^{y_i} (1-p_i)^{1-y_i}  $$

$$ \ell(w, b) = \sum_{i=1}^n \left[ y_i \log(p_i) + (1-y_i) \log(1-p_i) \right]  $$

Assumptions

  1. Independent Observations:
  2. No Multicollinearity:
  3. Predictor Variables Are Correctly Specified:
  4. Large Sample Size:

Interpretability: High