imputing missing data in panel/pooled dataset

For questions regarding the import, export and manipulation of data in EViews, including graphing and basic statistics.

Moderators: EViews Gareth, EViews Jason, EViews Steve, EViews Moderator

bcchen
Posts: 31
Joined: Tue May 02, 2017 8:34 am

imputing missing data in panel/pooled dataset

Postby bcchen » Wed Jul 05, 2017 10:28 am

Hello,

I have a question regarding simple imputing of missing data in a panel/pooled data set.

Suppose I have dated panel data for 10 quarters (Q1-Q10) for 10 people(ID1-ID10). For certain field Y, 2 people are missing 2 quarter data.

For each person, I want to use:
1) the average of the other 8 quarters Y for each person to fill in the 2 missing ones for each of them;
2) the mode of the other 8 quarters to fill in the 2 missing ones;
3) the value for the quarter before the missing one to fill in each missing one;
4) the value for the quarter after the missing one to fill in each missing one;

Any suggestions how to do it?

Thanks!

Best,
BC

EViews Glenn
EViews Developer
Posts: 2671
Joined: Wed Oct 15, 2008 9:17 am

Re: imputing missing data in panel/pooled dataset

Postby EViews Glenn » Thu Jul 06, 2017 10:47 am

If I understand the question correctly, for a panel:

1. series y_new = @recode(y<>na, y, @meansby(y, @crossid))
replaces NA values for an observation with the means of the remaining observations for that cross-section

2. you are asking for an ambiguous operation as the mode is potentially non-unique

3. series y_new = @recode(y=na and @obsid>1, @recode(y(-1)=na, y_new(-1), y(-1)), y)
this implementation pulls from the last available non-NA lagged value - not sure if this is what you want -- it's easier if you don't flag the last available

4. <not sure>
it's easy if you don't want to pull from the next available non-lagged value, otherwise it's a bit tricky. If the former, let me know, otherwise, we'll have to think about this one for a bit.

EViews Gareth
Fe ddaethom, fe welon, fe amcangyfrifon
Posts: 13307
Joined: Tue Sep 16, 2008 5:38 pm

Re: imputing missing data in panel/pooled dataset

Postby EViews Gareth » Thu Jul 06, 2017 11:34 am

Stealing from Glenn:
4.

Code: Select all

genr(r) y_new2 = @recode(y=na and @obsid>1, @recode(y(1)=na, y_new2(1), y(1)), y)
Follow us on Twitter @IHSEViews

EViews Glenn
EViews Developer
Posts: 2671
Joined: Wed Oct 15, 2008 9:17 am

Re: imputing missing data in panel/pooled dataset

Postby EViews Glenn » Thu Jul 06, 2017 11:36 am

Thought there was an "r" in genr, which reverses the recursion, but we didn't have it in the docs and I wasn't 100% certain. We'll update the docs.

bcchen
Posts: 31
Joined: Tue May 02, 2017 8:34 am

Re: imputing missing data in panel/pooled dataset

Postby bcchen » Thu Jul 06, 2017 11:38 am

Hi Glenn and Gareth,

Thanks you so much for your answers! They are exactly what I want!

I have a follow-up on it:

For (2), I agree with your point and I will think more about the implementation I want if modes are not unique. But assume that mode are unique, is there a function to pull the mode? something I imagine like @modeby(y,@crossid)?

Please advise.

Again, you guys are awesome!

Best,
BC

EViews Glenn
EViews Developer
Posts: 2671
Joined: Wed Oct 15, 2008 9:17 am

Re: imputing missing data in panel/pooled dataset

Postby EViews Glenn » Thu Jul 06, 2017 12:04 pm

No. EViews doesn't offer a mode function because the return type would have to be (in general) a vector. And then you couldn't do the @modesby since that has to feed into a series.

There's certainly a way to get at this via programming, but it's not quite as simple as the examples we gave above.

bcchen
Posts: 31
Joined: Tue May 02, 2017 8:34 am

Re: imputing missing data in panel/pooled dataset

Postby bcchen » Thu Jul 13, 2017 4:19 pm

Guys, for the reverse generation,

Code: Select all

genr(r) y_new2 = @recode(y=na and @obsid>1, @recode(y(1)=na, y_new2(1), y(1)), y)


why does @obsid have to be bigger than 1? what will happen if I don't put that?

bcchen
Posts: 31
Joined: Tue May 02, 2017 8:34 am

Re: imputing missing data in panel/pooled dataset

Postby bcchen » Mon Jul 31, 2017 2:47 pm

EViews Gareth wrote:Stealing from Glenn:
4.

Code: Select all

genr(r) y_new2 = @recode(y=na and @obsid>1, @recode(y(1)=na, y_new2(1), y(1)), y)


Hi guys,

When I ran a similar codes like above on an alpha series, it only imputes the obs right before the not-empty ones. When I run the command again, it fills up the second one before the not-empty obs.

For example: for 2011-2015 data, 2011-2013 are missing. Then the first time I run the command, 2013 obs got filled up by 2014 data; then when I run that command again (just that command line only), the 2012 got filled up.and so on.

Anyway to fill them all at the same time?

FYI, I panelized the data by an ID series and a quarter series.

Thanks.
BC


Return to “Data Manipulation”

Who is online

Users browsing this forum: No registered users and 20 guests