Logit model, 'Overflow' message problem
Moderators: EViews Gareth, EViews Moderator
Logit model, 'Overflow' message problem
I will be happy if You advise me on the following issue. I have a simple model for bank failure prediction. My sample consists of 26 failed banks (Y=1) and 24 operating banks (Y=0). For each bank I have 7 variables for 4 quarters prior to failure. I need to run a logistic regression with all variables for each quarter separately, thus I have a cross-section study with quaterly data. The problem is that the software refuses to run a regresssion with 7 variables ('Overflow' message appears), but when I leave ANY 5 variables the estimation proceeds well. Perfectly, I need to use even more variables 10 or more, so I am really puzzled of how to do it.
-
EViews Gareth
- Fe ddaethom, fe welon, fe amcangyfrifon
- Posts: 13600
- Joined: Tue Sep 16, 2008 5:38 pm
Re: Logit model, 'Overflow' message problem
Could you post your workfile?
Re: Logit model, 'Overflow' message problem
Thank You for the reply. Sure, this is what I need to estimate:
y ni_q4/(ta_q4-tl_q4) log(ta_q4*1000) (rfr_q4-re_q4) div_q4 ni_q4/ta_q4 latl_q4 c
these are the variables for banks 4 quarters before they were reported to fail or appeared financially strong.
Definitions:
ni_q4/(ta_q4-tl_q4) - Return on Equity, ni - Net income, TA - Total Assets, TL - Total Liabilities
log(ta_q4*1000) - Natural Logarithm of Total Assets (bank size proxy)
(re_q4-rfr_q4) - Cumulative quarterly excess returns on a bank's stock (%), re - stock return, rfr - risk free rate (3-month T bills)
div_q4 - Dividend dummy, 1 if dividends were paid during the quarter; 0 - otherwise
ni_q4/ta_q4 - Return on Total Assets
latl_q4 - Loan and Lease provisions to Total Loans and Leases (%)
P.S. To be more precise, the aim of the work is to find how equity market data can supplement traditional Call Report based models for failure prediction. Thus, I will firstly run a regression with purely financial variables and then run a new one with market based variables added. Finally, I will compare the 2 models' pseudo R^2 , Akaike Information Criterion and so forth.
y ni_q4/(ta_q4-tl_q4) log(ta_q4*1000) (rfr_q4-re_q4) div_q4 ni_q4/ta_q4 latl_q4 c
these are the variables for banks 4 quarters before they were reported to fail or appeared financially strong.
Definitions:
ni_q4/(ta_q4-tl_q4) - Return on Equity, ni - Net income, TA - Total Assets, TL - Total Liabilities
log(ta_q4*1000) - Natural Logarithm of Total Assets (bank size proxy)
(re_q4-rfr_q4) - Cumulative quarterly excess returns on a bank's stock (%), re - stock return, rfr - risk free rate (3-month T bills)
div_q4 - Dividend dummy, 1 if dividends were paid during the quarter; 0 - otherwise
ni_q4/ta_q4 - Return on Total Assets
latl_q4 - Loan and Lease provisions to Total Loans and Leases (%)
P.S. To be more precise, the aim of the work is to find how equity market data can supplement traditional Call Report based models for failure prediction. Thus, I will firstly run a regression with purely financial variables and then run a new one with market based variables added. Finally, I will compare the 2 models' pseudo R^2 , Akaike Information Criterion and so forth.
- Attachments
-
- Banks.WF1
- Bank Failure Prediction, Logit model
- (30.72 KiB) Downloaded 1770 times
-
startz
- Non-normality and collinearity are NOT problems!
- Posts: 3797
- Joined: Wed Sep 17, 2008 2:25 pm
Re: Logit model, 'Overflow' message problem
You might want to look at "Estimating the Value of Implicit Government Guarantees to Thai Banks,"by Idanna Kaplan-Appio, Review of International Economics, Vol. 10, pp. 26-35, 2002P.S. To be more precise, the aim of the work is to find how equity market data can supplement traditional Call Report based models for failure prediction. Thus, I will firstly run a regression with purely financial variables and then run a new one with market based variables added. Finally, I will compare the 2 models' pseudo R^2 , Akaike Information Criterion and so forth.
-
startz
- Non-normality and collinearity are NOT problems!
- Posts: 3797
- Joined: Wed Sep 17, 2008 2:25 pm
Re: Logit model, 'Overflow' message problem
This is probably a problem in the numerical algorithms. If you do a probit there's no overflow message, but there aren't any useful results either. You might try different starting values, but I wouldn't be too optimistic.Thank You for the reply. Sure, this is what I need to estimate:
y ni_q4/(ta_q4-tl_q4) log(ta_q4*1000) (rfr_q4-re_q4) div_q4 ni_q4/ta_q4 latl_q4 c
these are the variables for banks 4 quarters before they were reported to fail or appeared financially strong.
Definitions:
ni_q4/(ta_q4-tl_q4) - Return on Equity, ni - Net income, TA - Total Assets, TL - Total Liabilities
log(ta_q4*1000) - Natural Logarithm of Total Assets (bank size proxy)
(re_q4-rfr_q4) - Cumulative quarterly excess returns on a bank's stock (%), re - stock return, rfr - risk free rate (3-month T bills)
div_q4 - Dividend dummy, 1 if dividends were paid during the quarter; 0 - otherwise
ni_q4/ta_q4 - Return on Total Assets
latl_q4 - Loan and Lease provisions to Total Loans and Leases (%)
-
EViews Glenn
- EViews Developer
- Posts: 2682
- Joined: Wed Oct 15, 2008 9:17 am
Re: Logit model, 'Overflow' message problem
My quick look at this suggests that it's overfitting. If you do a probit, you get no overflow but a collinearity message. If you look at the gradient summary, you see that the gradients are effectively zero (which is good), but if you also look at the residuals, you'll see that you get zeros (that's bad).
Re: Logit model, 'Overflow' message problem
Thank You startz and QMS Gareth. In fact, this is my major project and before creating this model a read some papers like the one suggested by startz for writing the Literature Review. One of the works was Distinguin, I, Rous, P & Tarazi, A 2005, 'Market discipline and the use of stock market data to predict bank financial distress'. That work did exactly what I planned; it studied a cross-section of 63 European banks and the research did even have more explanatory variables - 15, as I remember. No overfitting problems were reported and collinearity was said to be "in accepable limits".
Moreover, I estimated probit; I got only coeffitients but for significance, standard errors and other results I got 'NA'. LPM (linear probability model) was also estimated and in general it gave signs of the coefficients consistent with the expected and they were statistically significant in most cases. I played a little with the logit, that was I combined different variables from different quarters instead of using them for a single quarter; as a result, the estimation went well. Thus it can be concluded that the number of variables is not the main reason for the problem.
Interestingly, I estimated the model with EViews 4 and 5 versions, and the 5th version didn't handle more than three variables, while the 4th one handled five variables.
What should I do in my situation? Is a larger sample able to solve these problem, or the task is to complex for the software to cope with? Maybe, the data have numbers with too many digits and I should round them up?
Moreover, I estimated probit; I got only coeffitients but for significance, standard errors and other results I got 'NA'. LPM (linear probability model) was also estimated and in general it gave signs of the coefficients consistent with the expected and they were statistically significant in most cases. I played a little with the logit, that was I combined different variables from different quarters instead of using them for a single quarter; as a result, the estimation went well. Thus it can be concluded that the number of variables is not the main reason for the problem.
Interestingly, I estimated the model with EViews 4 and 5 versions, and the 5th version didn't handle more than three variables, while the 4th one handled five variables.
What should I do in my situation? Is a larger sample able to solve these problem, or the task is to complex for the software to cope with? Maybe, the data have numbers with too many digits and I should round them up?
-
EViews Glenn
- EViews Developer
- Posts: 2682
- Joined: Wed Oct 15, 2008 9:17 am
Re: Logit model, 'Overflow' message problem
Looking at this one again this morning, I think it's pretty clear that your equation is overfitting.
The reason you are able to get results for the linear probability model is that the linearity is helping you out relative to the probit and the logit. The latter two are able to use the nonlinearity in their specification to (almost) exactly fit the binary response. Hence the (almost) zero gradients, hence the singularity. You can see this by noting that the LPM has almost a perfect separation between the (almost always positive) residuals for the Y=1 cases, and the (almost always negative) residuals for the Y=0 cases.
Note that in general, overfitting isn't just a property of the number of variables, it's a property of the data and the variables. But to put this in some perspective for those keeping score, there are 50 binary observations and 7 explanatory variables in the preferred specification. That's not very many observations relative to the number of coefficients. Even if you were able to get estimates, I would be somewhat wary about invoking asymptotic results.
The only reasonable way of dealing with the problem in your situation is to restrict your set of explanatory variables or to get more data. The latter is probably the preferred approach.
The reason you are able to get results for the linear probability model is that the linearity is helping you out relative to the probit and the logit. The latter two are able to use the nonlinearity in their specification to (almost) exactly fit the binary response. Hence the (almost) zero gradients, hence the singularity. You can see this by noting that the LPM has almost a perfect separation between the (almost always positive) residuals for the Y=1 cases, and the (almost always negative) residuals for the Y=0 cases.
Note that in general, overfitting isn't just a property of the number of variables, it's a property of the data and the variables. But to put this in some perspective for those keeping score, there are 50 binary observations and 7 explanatory variables in the preferred specification. That's not very many observations relative to the number of coefficients. Even if you were able to get estimates, I would be somewhat wary about invoking asymptotic results.
The only reasonable way of dealing with the problem in your situation is to restrict your set of explanatory variables or to get more data. The latter is probably the preferred approach.
Re: Logit model, 'Overflow' message problem
Thank You QMS Glenn for the support. I shall not bother You anymore until I get more data and try it out.
Who is online
Users browsing this forum: No registered users and 2 guests
