Many discussions of missing data deal with various methods of imputation, like mean values or EM. But in some cases the data will be missing as a necessary consequence of the data generation process.
For instance, let's say I'm trying to predict students' grades, and one of the inputs I want to analyze is the average grades of the student's siblings. If a particular student is an only child, then that value will be missing, not because we failed to collect the data, but because logically there is no data to collect. This is distinct from cases where the student has siblings, but we can't find their grades.
Other examples abound: say we're in college admissions and we want to include students' AP exam results, but not all students took AP exams. Or we're looking at social network data, but not all subjects have Facebook and/or Twitter accounts.
These data are missing, but they're certainly not missing at random. How have people dealt with this in the past? This must be a solved problem in econometrics, but I can't find a good reference.
How To Handle Necessarily Missing Data
Moderators: EViews Gareth, EViews Moderator
-
startz
- Non-normality and collinearity are NOT problems!
- Posts: 3797
- Joined: Wed Sep 17, 2008 2:25 pm
Re: How To Handle Necessarily Missing Data
One method, which may not be perfect, is to include a dummy D for the missing X and then in place of X in the equation use the interaction D*X. In other words, instead of
use
Code: Select all
ls y c x
Code: Select all
ls y c D D*X-
Carlo Lazzaro
- Posts: 9
- Joined: Wed Sep 03, 2014 5:32 am
Re: How To Handle Necessarily Missing Data
Qbert123 raises an interesting issue. However, as far as her/his example are concerned, the risk of non-ignorable missing values can be avoided (or at least reduced) by fine-tuning the inclusion criteria in the study or improving the questionnaire items to be administered to participants.
Kind regards,
Carlo
Kind regards,
Carlo
Re: How To Handle Necessarily Missing Data
Thanks to Startz; I think the interactive term added *without* including the original variable itself is the trick I was looking for. EM algorithm approaches would be the other way to do it.
Carlo, I don't understand your comment. In the cases I'm talking about, no survey is going to pick up the AP scores of students who don't take AP tests. These data are *necessarily* missing, so we have to find a way to deal with it.
Carlo, I don't understand your comment. In the cases I'm talking about, no survey is going to pick up the AP scores of students who don't take AP tests. These data are *necessarily* missing, so we have to find a way to deal with it.
-
Carlo Lazzaro
- Posts: 9
- Joined: Wed Sep 03, 2014 5:32 am
Re: How To Handle Necessarily Missing Data
Qbert123: my previous reply referred to an instance when the survey was not started out yet.
Now I see that your query had a different flavour.
An interesting textbook covering this (and related issues about dealing with missing vlues) is: Van Buuren, S. (2012), Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. ISBN 9781439868249.
Now I see that your query had a different flavour.
An interesting textbook covering this (and related issues about dealing with missing vlues) is: Van Buuren, S. (2012), Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. ISBN 9781439868249.
Re: How To Handle Necessarily Missing Data
Carlo -- thanks very much for the tip. I'll look up the Van Buuren book.
Return to “Econometric Discussions”
Who is online
Users browsing this forum: No registered users and 2 guests
