Postby rileyjiang » Fri Dec 20, 2013 7:54 pm
Dear Garath,
I’m a PhD student and fairly new to EViews programing. But I have been reading and learning the programs you posted up here in the past several days. I found them very helpful.
I have a question in relation to EViews programing. It’s for my PhD study and I was hoping you that could help me. Part of my PhD study involves testing statistical robustness of explanatory regressors in explaining dependent variable. When testing variable’s robustness, one often encounters the following situation: x1 (an explanatory regressor) may be statistically significant in explaining the dependent variable when the regression includes x2 and x3, but not when x4 is included. So, which combination of all available x’s do we choose? To tackle this, we use extreme bound analysis (hereafter EBA) which is a liner regression based analysis to find out if there is robustness in the determinants of the dependent variable.
EBA estimates liner regression in a fashion that it tests all combinations of the available regressors as opposed to selectively test some combinations and report the ones that “favour” the most (in the situation above, the most favourable combo to x1 would be x1,x2,x3). Because EBA stretches variable combination to an extreme, it can get tricky sometimes when modelling regression equation under EBA because the amount of regressions estimated can get potentially very large (in some cases exponentially).
In terms of data, my data sample (cross sectional) contains 1 dependent variable and 21 regressors, if we were to include 11 regressors in our model, according to EBA, the total combinations of the available regressors to be tested will be 352,716. This is determined by the mathematical formula “n!/[(n-r)!*r!]”. Now fix one regressor in our model because we want to test how robust it is when put against all the other regressors, we then have 10 regressors (11 minus 1) readily free to choose from 20 regressors (i.e. 21 minus 1, since 1 is locked-in now) in forming the regressor part of our model. The total combinations now become 184,756.
While having found a way to tackle the challenge in which led me into running over 3.8 million regressions*, I was hoping if there is any way to program it, in terms of running the regressions more efficiently as well as extracting key stats such as R^2, beta coefficient and t-stats with no need to open every single equation. This came after I read the programs you post up here, in particular Program No.2 which runs pairwise regressions between each X and every other X.
*3.8m=184,756 combinations for testing 1 regressor * 21 regressors
Just to quickly show you the way I did it (v tedious, but I guess it’s still better than manually typing up some 184 thousand regressions one by one):
The process is fairly straight forward. Since I have discovered a website-based combination calculator and it uses letters to represent combination entry, I coded the name of all 21 regressors in letter for easy modelling. Next, generating combinations; since we want to test each regressor’s robustness against a combination of 10 other regressors, we need to run 184,756 regressions in order to test them all. This number represents the total possibility of a 10 regressor combination choosing from a set of 20 regressors. The first combination is “a b c d e f g h i j”.
Because we want to have one regressor that appears in each and every regression equation in order to test its statistical robustness, I put one non-conflict letter in the model to represent it, namely “z”. The regressor combination of our first equation now becomes “z a b c d e f g h i j”.
The next step is to construct a model that EViews Program understands; the first equation is constructed as follows (I used “aa” instead of only “a” because some of the names are reserved in EViews, e.g. “c” for beta coefficient):
“equation zabcdefghij.ls score c zz aa bb cc dd ee ff gg hh ii jj” (“score” is my dependent var)
And the last equation:
“equation zklmnopqrst.ls score c zz kk ll mm nn oo pp qq rr ss tt“
As soon as all the combinations are properly formatted into equation, I plugged them into the EViews Program and let the program generates equation output itself. It turns out EViews Program has a limit. Apparently I can only type in somewhere around 10,000 equations at a time. Nonetheless, I managed to finish it. For every 184,756 regressions I run, I create a new workfile and change the “zz” for the next regressor to be tested.
For statistics reporting (e.g. R^2, beta coefficient etc.), I used Add-in “EqTab”. However, I discover that this giant spreadsheet also has a limit and it takes literally hours to generate the output and crashes the whole system very often.
In my opinion, the process I used is to some extent manageable. However, it is by no mean replicable or re-usable. For instance, if we want to change the total number of regressors from 21 to say 15 and decrease the regressors included in the model from 11 to 5, we then have to go back to Day 1 and re-write the whole model because the number of combinations is no longer the same.
Thank you very much for your help in advance and I look forward to hearing from you.
Any thoughts and comments are greatly appreciated.
Merry Christmas and Happy New Year!
Kind Regards,
Riley