Datasets ======== Built-in real-world datasets from published studies for examples, tutorials, and testing. All datasets are downloaded from public sources on first use and cached locally at ``~/.cache/diff_diff/datasets/``. Pass ``force_download=True`` to any loader to refresh the cache. If the download fails and a cached copy exists, the cached version is used automatically. Dataset Loaders --------------- load_card_krueger ~~~~~~~~~~~~~~~~~ Card & Krueger (1994) minimum wage study. Classic 2x2 DiD comparing fast-food employment in New Jersey (treated) and Pennsylvania (control) around NJ's 1992 minimum wage increase. .. autofunction:: diff_diff.load_card_krueger Example ^^^^^^^ .. code-block:: python from diff_diff.datasets import load_card_krueger from diff_diff import DifferenceInDifferences ck = load_card_krueger() # Reshape to long format for DiD estimation ck_long = ck.melt( id_vars=['store_id', 'state', 'treated'], value_vars=['emp_pre', 'emp_post'], var_name='period', value_name='employment' ) ck_long['post'] = (ck_long['period'] == 'emp_post').astype(int) did = DifferenceInDifferences() results = did.fit(ck_long, outcome='employment', treatment='treated', time='post') load_castle_doctrine ~~~~~~~~~~~~~~~~~~~~ Castle doctrine (Stand Your Ground) gun law study. Staggered adoption of self-defense law expansions across U.S. states (2000--2010), suitable for Callaway--Sant'Anna or Sun--Abraham estimation. .. autofunction:: diff_diff.load_castle_doctrine Example ^^^^^^^ .. code-block:: python from diff_diff.datasets import load_castle_doctrine from diff_diff import CallawaySantAnna castle = load_castle_doctrine() cs = CallawaySantAnna(control_group="never_treated") results = cs.fit( castle, outcome="homicide_rate", unit="state", time="year", first_treat="first_treat" ) load_divorce_laws ~~~~~~~~~~~~~~~~~ Unilateral (no-fault) divorce law reforms. Staggered adoption across U.S. states (1968--1988) from Stevenson & Wolfers (2006), with outcomes for divorce rate, female labor force participation, and female suicide rate. .. autofunction:: diff_diff.load_divorce_laws Example ^^^^^^^ .. code-block:: python from diff_diff.datasets import load_divorce_laws from diff_diff import CallawaySantAnna divorce = load_divorce_laws() cs = CallawaySantAnna(control_group="never_treated") results = cs.fit( divorce, outcome="divorce_rate", unit="state", time="year", first_treat="first_treat" ) load_mpdta ~~~~~~~~~~ Minimum wage panel data for training (Callaway & Sant'Anna 2021). Simulated county-level employment data with staggered minimum wage increases (2003--2007), from the R ``did`` package. .. autofunction:: diff_diff.load_mpdta Example ^^^^^^^ .. code-block:: python from diff_diff.datasets import load_mpdta from diff_diff import CallawaySantAnna mpdta = load_mpdta() cs = CallawaySantAnna() results = cs.fit( mpdta, outcome="lemp", unit="countyreal", time="year", first_treat="first_treat" ) Utility Functions ----------------- load_dataset ~~~~~~~~~~~~ Generic loader that fetches a dataset by name. .. autofunction:: diff_diff.load_dataset list_datasets ~~~~~~~~~~~~~ List all available datasets with descriptions. .. autofunction:: diff_diff.list_datasets clear_cache ~~~~~~~~~~~~ Remove all cached dataset files from ``~/.cache/diff_diff/datasets/``. .. autofunction:: diff_diff.clear_cache Listing and Loading Datasets ---------------------------- .. code-block:: python from diff_diff.datasets import list_datasets, load_dataset # See what's available for name, description in list_datasets().items(): print(f"{name}: {description}") # Load by name df = load_dataset("card_krueger") print(df.shape) print(df.columns.tolist())