What to do in the eyes of multicollinearity
Posted: Fri Dec 11, 2015 6:05 am
GDP_PER_CAPITA EDUC_PER LAW LIFE_EXPEC REL_PER URB_PER
GDP_PER_CAPITA 1.000000 0.895168 0.956279 0.952616 -0.973835 0.807385
EDUC_PER 0.895168 1.000000 0.975186 0.977476 -0.948947 0.824475
LAW 0.956279 0.975186 1.000000 0.997357 -0.991417 0.853135
LIFE_EXPEC 0.952616 0.977476 0.997357 1.000000 -0.985784 0.867973
REL_PER -0.973835 -0.948947 -0.991417 -0.985784 1.000000 -0.823132
URB_PER 0.807385 0.824475 0.853135 0.867973 -0.823132 1.000000
Sorry that the data isn't super clear, but just by looking at the numbers you should see that I have a huge multicollinearity problem. Even if I reduce the model by the variables that are not significant I still get very high numbers (over 0.8 )
This is actually a big surprise to me. I am looking at very different things such as GDP, a law change, women in tertiary educational facilities, the % of religious people and how urban the society is in my country Austria. This is somewhat similar to some past papers in other countries, so I am surprised to see such high multicolinearity. This is my topic I chose with these variables. I can't add or change my variables anymore at this stage. Should I then just incorporate it into my conclusion and state how my test results, even when significant cannot be taken without a pinch of salt and some other variables will be needed to compensate for this?
edit: is it because I have 3 variables that have a value between 0 and 1 that this could be the root of the problem?
GDP_PER_CAPITA 1.000000 0.895168 0.956279 0.952616 -0.973835 0.807385
EDUC_PER 0.895168 1.000000 0.975186 0.977476 -0.948947 0.824475
LAW 0.956279 0.975186 1.000000 0.997357 -0.991417 0.853135
LIFE_EXPEC 0.952616 0.977476 0.997357 1.000000 -0.985784 0.867973
REL_PER -0.973835 -0.948947 -0.991417 -0.985784 1.000000 -0.823132
URB_PER 0.807385 0.824475 0.853135 0.867973 -0.823132 1.000000
Sorry that the data isn't super clear, but just by looking at the numbers you should see that I have a huge multicollinearity problem. Even if I reduce the model by the variables that are not significant I still get very high numbers (over 0.8 )
This is actually a big surprise to me. I am looking at very different things such as GDP, a law change, women in tertiary educational facilities, the % of religious people and how urban the society is in my country Austria. This is somewhat similar to some past papers in other countries, so I am surprised to see such high multicolinearity. This is my topic I chose with these variables. I can't add or change my variables anymore at this stage. Should I then just incorporate it into my conclusion and state how my test results, even when significant cannot be taken without a pinch of salt and some other variables will be needed to compensate for this?
edit: is it because I have 3 variables that have a value between 0 and 1 that this could be the root of the problem?