r-econometrics¶
Run IV, DiD, and RDD analyses in R with proper diagnostics.
analysisOther (see repo)data-analysisR Econometrics¶
Purpose¶
This skill helps economists run rigorous econometric analyses in R, including Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Design (RDD). It generates publication-ready code with proper diagnostics and robust standard errors.
When to Use¶
- Running causal inference analyses
- Estimating treatment effects with panel data
- Creating publication-ready regression tables
- Implementing modern econometric methods (two-way fixed effects, event studies)
Instructions¶
Step 1: Understand the Research Design¶
Before generating code, ask the user: 1. What is your identification strategy? (IV, DiD, RDD, or simple regression) 2. What is the unit of observation? (individual, firm, country-year, etc.) 3. What fixed effects do you need? (entity, time, two-way) 4. How should standard errors be clustered?
Step 2: Generate Analysis Code¶
Based on the research design, generate R code that:
- Uses the
fixestpackage - Modern, fast, and feature-rich for panel data - Includes proper diagnostics:
- For IV: First-stage F-statistics, weak instrument tests
- For DiD: Parallel trends visualization, event study plots
- For RDD: Bandwidth selection, density tests
- Uses robust/clustered standard errors appropriate for the data structure
- Creates publication-ready output using
modelsummaryoretable
Step 3: Structure the Output¶
Always include:
## 1. Setup and packages
## 2. Data loading and preparation
## 3. Descriptive statistics
## 4. Main specification
## 5. Robustness checks
## 6. Visualization
## 7. Export results
Step 4: Add Documentation¶
Include comments explaining: - Why each specification choice was made - Interpretation of key coefficients - Limitations and assumptions
Example Prompts¶
- "Run a DiD analysis with state and year fixed effects, clustering at the state level"
- "Estimate the effect of X on Y using Z as an instrument"
- "Create an event study plot showing treatment effects by year"
- "Run a sharp RDD with optimal bandwidth selection"
Example Output¶
## ============================================
## Difference-in-Differences Analysis
## ============================================
## Setup
library(tidyverse)
library(fixest)
library(modelsummary)
## Load data
df <- read_csv("data.csv")
## Prepare treatment variable
df <- df %>%
mutate(
post = year >= treatment_year,
treated = state %in% treatment_states,
treat_post = treated * post
)
## ----------------------------------------
## Main DiD Specification
## ----------------------------------------
## Two-way fixed effects
did_model <- feols(
outcome ~ treat_post | state + year,
data = df,
cluster = ~state
)
## View results
summary(did_model)
## ----------------------------------------
## Event Study
## ----------------------------------------
## Create relative time variable
df <- df %>%
mutate(rel_time = year - treatment_year)
## Event study regression
event_study <- feols(
outcome ~ i(rel_time, treated, ref = -1) | state + year,
data = df,
cluster = ~state
)
## Plot coefficients
iplot(event_study,
main = "Event Study: Effect on Outcome",
xlab = "Years Relative to Treatment")
## ----------------------------------------
## Robustness: Alternative Specifications
## ----------------------------------------
## Different clustering
did_robust <- feols(
outcome ~ treat_post | state + year,
data = df,
cluster = ~state + year # Two-way clustering
)
## ----------------------------------------
## Export Results
## ----------------------------------------
modelsummary(
list("Main" = did_model, "Two-way Cluster" = did_robust),
stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
output = "results/did_table.tex"
)
Requirements¶
Software¶
- R 4.0+
Packages¶
fixest- Fast fixed effects estimationmodelsummary- Publication-ready tablestidyverse- Data manipulationggplot2- Visualization
Install with:
Best Practices¶
- Always cluster standard errors at the level of treatment assignment
- Run pre-trend tests for DiD designs
- Report first-stage F-statistics for IV (should be > 10)
- Use
feolsoverlmfor panel data (faster and more features) - Document all specification choices in your code comments
Common Pitfalls¶
- ❌ Not clustering standard errors at the right level
- ❌ Ignoring weak instruments in IV estimation
- ❌ Using TWFE with staggered treatment timing (use
didorsunab()instead) - ❌ Not reporting robustness checks
References¶
- fixest documentation
- Cunningham (2021) Causal Inference: The Mixtape
- Angrist & Pischke (2009) Mostly Harmless Econometrics
Changelog¶
v1.0.0¶
- Initial release with IV, DiD, RDD support