Why Synthetic Controls

When we looked at DiD, we had data on units from 2 different cities and 2 different time periods (before & after intervention). What if we only have aggregated data on the city level?

With Synthetic Controls, we don’t need to find single units in treated and untreated groups. Instead, we can forge our own as a weighted average of multiple untreated units that best mirror the treated unit’s characteristics.

Example

Uber has markets where credit cards aren’t widely utilized compared to cash. They charge drivers ~25% of their earnings as the service fee, which means that Uber drivers need to wire their cash earnings as well, creating hassles for drivers. Alternatively drivers may prefer cash over credit payments.

image.png

Implementation

  1. Donor Pool

  2. Weight Each Unit

  3. After obtaining W, use weighted average of donor pool to project the post-treatment outcome in the treated unit if the treatment didn’t occur:

    $$ \sum^{J+1}{j=2} w_j Y{jt} $$

    This “Synthetic Control” is the best guess for the counterfactual reality that would have occurred in the treated city.

  4. Hypothesis Test to check the effect of the treatment

Considerations

  1. Donor pool selection: we need good market research to choose the right features for matching in order to create a good counterfactual
  2. Overfitting: synthetic control may mimic the treated units too much
  3. Data Leakage: we may peek at post-treatment data and end up finding false positive treatment effects