Synthetic Controls

Why Synthetic Controls

When we looked at DiD, we had data on units from 2 different cities and 2 different time periods (before & after intervention). What if we only have aggregated data on the city level?

would still be able to use DiD,
however, sample size would be number of cities, not number of units

With Synthetic Controls, we don’t need to find single units in treated and untreated groups. Instead, we can forge our own as a weighted average of multiple untreated units that best mirror the treated unit’s characteristics.

Example

Uber has markets where credit cards aren’t widely utilized compared to cash. They charge drivers ~25% of their earnings as the service fee, which means that Uber drivers need to wire their cash earnings as well, creating hassles for drivers. Alternatively drivers may prefer cash over credit payments.

Uber considered giving a notice about the payment type before trip acceptance
However, If treatment drivers accept more/fewer cash trips, it will affect the proportion of cash trips that control drivers can accept (Indirect Spillover)
Solution: Synthetic controls to construct counterfactual scenario

Implementation

Donor Pool
- Treat one city and use matching to find n untreated cities (donor pool) with similar statistics (credit card usage, population, age, ride volume)
Weight Each Unit
- not all treated units are equally important
- use pre-treatment data to find weight vector:
  - choose features X that can be used to predict outcome Y and find their importances (V)
  - build model to predict each unit’s pre-treatment outcome, optimize weights to minimize the difference in means between treated unit and donor pool
    
    $$ |X_1 - X_0W|\\ = \sqrt{ \bigg(\sum^k_{h=1}v_h \bigg(X_{h1} - \sum^{J+1}{j=2} w_j X{hj} \bigg)^2 \bigg)} $$
After obtaining W, use weighted average of donor pool to project the post-treatment outcome in the treated unit if the treatment didn’t occur:

$$ \sum^{J+1}{j=2} w_j Y{jt} $$

This “Synthetic Control” is the best guess for the counterfactual reality that would have occurred in the treated city.
Hypothesis Test to check the effect of the treatment

Considerations

Donor pool selection: we need good market research to choose the right features for matching in order to create a good counterfactual
Overfitting: synthetic control may mimic the treated units too much
Data Leakage: we may peek at post-treatment data and end up finding false positive treatment effects