python-panel-data¶
Panel data analysis with Python using linearmodels and pandas.
Category:
analysisField: econometrics
License:
Other (see repo)Updated: 2026
Stages:
data-analysisPython Panel Data¶
Purpose¶
This skill helps economists run panel data models in Python using pandas, statsmodels, and linearmodels, with correct fixed effects, clustering, and diagnostics.
When to Use¶
- Estimating fixed effects or random effects models
- Running difference-in-differences on panel data
- Creating regression tables and plots in Python
Instructions¶
Follow these steps to complete the task:
Step 1: Understand the Context¶
Before generating any code, ask the user:
- What is the unit of observation and panel identifiers?
- Which outcomes and regressors are required?
- What fixed effects or time effects are needed?
- How should standard errors be clustered?
Step 2: Generate the Output¶
Based on the context, generate Python code that:
- Loads and cleans the data with
pandas - Sets a MultiIndex for panel structure
- Fits the model using
linearmodels.PanelOLSorRandomEffects - Outputs results in a readable table and optional LaTeX
Step 3: Verify and Explain¶
After generating output:
- Interpret key coefficients
- Note assumptions (strict exogeneity, parallel trends, etc.)
- Suggest robustness checks (alternative clustering, placebo tests)
Example Prompts¶
- "Run a two-way fixed effects model with firm and year effects"
- "Estimate a DiD using state and year fixed effects"
- "Export panel regression results to LaTeX"
Example Output¶
Python
## ============================================
## Panel Data Analysis in Python
## ============================================
import pandas as pd
from linearmodels.panel import PanelOLS
## Load data
df = pd.read_csv("panel_data.csv")
## Set panel index
df = df.set_index(["firm_id", "year"])
## Create treatment indicator
df["treat_post"] = df["treated"] * df["post"]
## Two-way fixed effects model
model = PanelOLS.from_formula(
"outcome ~ 1 + treat_post + EntityEffects + TimeEffects",
data=df
)
results = model.fit(cov_type="clustered", cluster_entity=True)
print(results.summary)
Requirements¶
Software¶
- Python 3.10+
Packages¶
pandaslinearmodelsstatsmodels
Install with:
Best Practices¶
- Always verify panel identifiers and balanced vs unbalanced panels
- Cluster standard errors at the appropriate level
- Check for missing data before estimation
Common Pitfalls¶
- Failing to set a proper panel index
- Using pooled OLS when fixed effects are required
- Misinterpreting coefficients without accounting for fixed effects
References¶
- linearmodels documentation
- statsmodels documentation
- Wooldridge (2010) Econometric Analysis of Cross Section and Panel Data
Changelog¶
v1.0.0¶
- Initial release