Suggestions for my model
Posted: Tue Jun 02, 2026 6:39 am
Hi everyone, I am an undergraduate economics student working on this model, I am posting here not just to get answers, but genuinely to learn and test my own understanding of the methodology I applied. Any feedback, criticism, or suggestions are welcome.I want to understand where I might be wrong. The primary objective of this model is to isolate and quantify the effect of meteorological drought, measured by the SPEI_7 index, on annual barley production. ΔCultivatedArea is included strictly as a control variable to prevent the drought coefficient from absorbing the effect of physical land changes, not as a variable of independent interest
Here is my setup
Model: Production_t = β0 + β1SPEI7_t + β2ΔCultivatedAreat + ε_t
(n=26).(due to differencing)
t=year
Where:
PRODUCTION: Annual barley production (tonnes)
SPEI_7: 7-month SPEI index for August
ΔCultivatedArea: First difference of barley cultivated area
Steps followed:
ADF unit root tests (intercept for PRODUCTION and SPEI_7; intercept+trend for CultivatedArea due to visible deterministic trend)
First-differenced CultivatedArea to achieve stationarity
Pearson correlation matrix to check multicollinearity (r = -0.081 between SPEI_7 and ΔCultivatedArea)
OLS estimation
Breusch-Godfrey test for autocorrelation (lag=1)
Breusch-Pagan-Godfrey test for heteroskedasticity
Jarque-Bera and Shapiro-Wilk tests for normality of residuals
Ramsey RESET test for functional form (F p=0.8856)
Results:
SPEI_7: β=874,320, p=0.0021 (significant at 1%)
ΔCultivatedArea: β=1.983, p=0.0188 (significant at 5%)
R²=0.453, Adjusted R²=0.401, F p=0.0014
All diagnostic tests passed (no autocorrelation, no heteroskedasticity, normality satisfied, correct functional form
MY QUESTIONS:
Two of the diagnostic tests produced borderline results that I would like to highlight:
1. Breusch-Godfrey Test (Autocorrelation)
Chi-Square p = 0.0691
F p = 0.0874
Both values exceed the 0.05 threshold, so the null hypothesis of no autocorrelation cannot be rejected. However, the margin is relatively narrow. I am wondering whether this should be a concern or whether it is simply a consequence of the small sample size (n=26).
2. Shapiro-Wilk Test (Normality of Residuals)
p = 0.0532
The null hypothesis of normality cannot be rejected, but the result is marginally above the critical value. Again, I suspect this may be related to the limited number of observations.
With only n=26 observations, ADF unit root tests are known to have low power. Is there a more appropriate test for this sample, and should I run both for robustness?
While I argue that SPEI_7 is strictly exogenous, the same argument does not hold for ΔCultivatedArea, as annual planting decisions may be correlated with omitted socioeconomic variables such as input costs or government subsidies. However, since the correlation between SPEI_7 and ΔCultivatedArea is negligible (r=-0.081, p=0.73), I argue that even if the ΔCultivatedArea coefficient is biased, this does not contaminate the SPEI7 estimate. Is this reasoning valid, or should I be more concerned about the potential endogeneity of ΔCultivatedArea?
Here is my setup
Model: Production_t = β0 + β1SPEI7_t + β2ΔCultivatedAreat + ε_t
(n=26).(due to differencing)
t=year
Where:
PRODUCTION: Annual barley production (tonnes)
SPEI_7: 7-month SPEI index for August
ΔCultivatedArea: First difference of barley cultivated area
Steps followed:
ADF unit root tests (intercept for PRODUCTION and SPEI_7; intercept+trend for CultivatedArea due to visible deterministic trend)
First-differenced CultivatedArea to achieve stationarity
Pearson correlation matrix to check multicollinearity (r = -0.081 between SPEI_7 and ΔCultivatedArea)
OLS estimation
Breusch-Godfrey test for autocorrelation (lag=1)
Breusch-Pagan-Godfrey test for heteroskedasticity
Jarque-Bera and Shapiro-Wilk tests for normality of residuals
Ramsey RESET test for functional form (F p=0.8856)
Results:
SPEI_7: β=874,320, p=0.0021 (significant at 1%)
ΔCultivatedArea: β=1.983, p=0.0188 (significant at 5%)
R²=0.453, Adjusted R²=0.401, F p=0.0014
All diagnostic tests passed (no autocorrelation, no heteroskedasticity, normality satisfied, correct functional form
MY QUESTIONS:
Two of the diagnostic tests produced borderline results that I would like to highlight:
1. Breusch-Godfrey Test (Autocorrelation)
Chi-Square p = 0.0691
F p = 0.0874
Both values exceed the 0.05 threshold, so the null hypothesis of no autocorrelation cannot be rejected. However, the margin is relatively narrow. I am wondering whether this should be a concern or whether it is simply a consequence of the small sample size (n=26).
2. Shapiro-Wilk Test (Normality of Residuals)
p = 0.0532
The null hypothesis of normality cannot be rejected, but the result is marginally above the critical value. Again, I suspect this may be related to the limited number of observations.
With only n=26 observations, ADF unit root tests are known to have low power. Is there a more appropriate test for this sample, and should I run both for robustness?
While I argue that SPEI_7 is strictly exogenous, the same argument does not hold for ΔCultivatedArea, as annual planting decisions may be correlated with omitted socioeconomic variables such as input costs or government subsidies. However, since the correlation between SPEI_7 and ΔCultivatedArea is negligible (r=-0.081, p=0.73), I argue that even if the ΔCultivatedArea coefficient is biased, this does not contaminate the SPEI7 estimate. Is this reasoning valid, or should I be more concerned about the potential endogeneity of ΔCultivatedArea?