How To Handle Necessarily Missing Data

For econometric discussions not necessarily related to EViews.

Moderators: EViews Gareth, EViews Moderator

Qbert123
Posts: 3
Joined: Mon Oct 13, 2014 8:49 am

How To Handle Necessarily Missing Data

Postby Qbert123 » Mon Oct 13, 2014 9:00 am

Many discussions of missing data deal with various methods of imputation, like mean values or EM. But in some cases the data will be missing as a necessary consequence of the data generation process.

For instance, let's say I'm trying to predict students' grades, and one of the inputs I want to analyze is the average grades of the student's siblings. If a particular student is an only child, then that value will be missing, not because we failed to collect the data, but because logically there is no data to collect. This is distinct from cases where the student has siblings, but we can't find their grades.

Other examples abound: say we're in college admissions and we want to include students' AP exam results, but not all students took AP exams. Or we're looking at social network data, but not all subjects have Facebook and/or Twitter accounts.

These data are missing, but they're certainly not missing at random. How have people dealt with this in the past? This must be a solved problem in econometrics, but I can't find a good reference.

startz
Non-normality and collinearity are NOT problems!
Posts: 3797
Joined: Wed Sep 17, 2008 2:25 pm

Re: How To Handle Necessarily Missing Data

Postby startz » Mon Oct 13, 2014 9:26 am

One method, which may not be perfect, is to include a dummy D for the missing X and then in place of X in the equation use the interaction D*X. In other words, instead of

Code: Select all

ls y c x
use

Code: Select all

ls y c D D*X

Carlo Lazzaro
Posts: 9
Joined: Wed Sep 03, 2014 5:32 am

Re: How To Handle Necessarily Missing Data

Postby Carlo Lazzaro » Tue Oct 14, 2014 6:13 am

Qbert123 raises an interesting issue. However, as far as her/his example are concerned, the risk of non-ignorable missing values can be avoided (or at least reduced) by fine-tuning the inclusion criteria in the study or improving the questionnaire items to be administered to participants.

Kind regards,
Carlo

Qbert123
Posts: 3
Joined: Mon Oct 13, 2014 8:49 am

Re: How To Handle Necessarily Missing Data

Postby Qbert123 » Tue Oct 14, 2014 6:39 am

Thanks to Startz; I think the interactive term added *without* including the original variable itself is the trick I was looking for. EM algorithm approaches would be the other way to do it.

Carlo, I don't understand your comment. In the cases I'm talking about, no survey is going to pick up the AP scores of students who don't take AP tests. These data are *necessarily* missing, so we have to find a way to deal with it.

Carlo Lazzaro
Posts: 9
Joined: Wed Sep 03, 2014 5:32 am

Re: How To Handle Necessarily Missing Data

Postby Carlo Lazzaro » Fri Apr 24, 2015 12:41 am

Qbert123: my previous reply referred to an instance when the survey was not started out yet.
Now I see that your query had a different flavour.
An interesting textbook covering this (and related issues about dealing with missing vlues) is: Van Buuren, S. (2012), Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL. ISBN 9781439868249.

Qbert123
Posts: 3
Joined: Mon Oct 13, 2014 8:49 am

Re: How To Handle Necessarily Missing Data

Postby Qbert123 » Sun Apr 26, 2015 9:23 pm

Carlo -- thanks very much for the tip. I'll look up the Van Buuren book.


Return to “Econometric Discussions”

Who is online

Users browsing this forum: No registered users and 2 guests