Page 1 of 1

### imputing missing data in panel/pooled dataset

Posted: Wed Jul 05, 2017 10:28 am
Hello,

I have a question regarding simple imputing of missing data in a panel/pooled data set.

Suppose I have dated panel data for 10 quarters (Q1-Q10) for 10 people(ID1-ID10). For certain field Y, 2 people are missing 2 quarter data.

For each person, I want to use:
1) the average of the other 8 quarters Y for each person to fill in the 2 missing ones for each of them;
2) the mode of the other 8 quarters to fill in the 2 missing ones;
3) the value for the quarter before the missing one to fill in each missing one;
4) the value for the quarter after the missing one to fill in each missing one;

Any suggestions how to do it?

Thanks!

Best,
BC

### Re: imputing missing data in panel/pooled dataset

Posted: Thu Jul 06, 2017 10:47 am
If I understand the question correctly, for a panel:

1. series y_new = @recode(y<>na, y, @meansby(y, @crossid))
replaces NA values for an observation with the means of the remaining observations for that cross-section

2. you are asking for an ambiguous operation as the mode is potentially non-unique

3. series y_new = @recode(y=na and @obsid>1, @recode(y(-1)=na, y_new(-1), y(-1)), y)
this implementation pulls from the last available non-NA lagged value - not sure if this is what you want -- it's easier if you don't flag the last available

4. <not sure>
it's easy if you don't want to pull from the next available non-lagged value, otherwise it's a bit tricky. If the former, let me know, otherwise, we'll have to think about this one for a bit.

### Re: imputing missing data in panel/pooled dataset

Posted: Thu Jul 06, 2017 11:34 am
Stealing from Glenn:
4.

Code: Select all

`genr(r) y_new2 = @recode(y=na and @obsid>1, @recode(y(1)=na, y_new2(1), y(1)), y)`

### Re: imputing missing data in panel/pooled dataset

Posted: Thu Jul 06, 2017 11:36 am
Thought there was an "r" in genr, which reverses the recursion, but we didn't have it in the docs and I wasn't 100% certain. We'll update the docs.

### Re: imputing missing data in panel/pooled dataset

Posted: Thu Jul 06, 2017 11:38 am
Hi Glenn and Gareth,

Thanks you so much for your answers! They are exactly what I want!

I have a follow-up on it:

For (2), I agree with your point and I will think more about the implementation I want if modes are not unique. But assume that mode are unique, is there a function to pull the mode? something I imagine like @modeby(y,@crossid)?

Again, you guys are awesome!

Best,
BC

### Re: imputing missing data in panel/pooled dataset

Posted: Thu Jul 06, 2017 12:04 pm
No. EViews doesn't offer a mode function because the return type would have to be (in general) a vector. And then you couldn't do the @modesby since that has to feed into a series.

There's certainly a way to get at this via programming, but it's not quite as simple as the examples we gave above.

### Re: imputing missing data in panel/pooled dataset

Posted: Thu Jul 13, 2017 4:19 pm
Guys, for the reverse generation,

Code: Select all

`genr(r) y_new2 = @recode(y=na and @obsid>1, @recode(y(1)=na, y_new2(1), y(1)), y)`

why does @obsid have to be bigger than 1? what will happen if I don't put that?

### Re: imputing missing data in panel/pooled dataset

Posted: Mon Jul 31, 2017 2:47 pm
EViews Gareth wrote:Stealing from Glenn:
4.

Code: Select all

`genr(r) y_new2 = @recode(y=na and @obsid>1, @recode(y(1)=na, y_new2(1), y(1)), y)`

Hi guys,

When I ran a similar codes like above on an alpha series, it only imputes the obs right before the not-empty ones. When I run the command again, it fills up the second one before the not-empty obs.

For example: for 2011-2015 data, 2011-2013 are missing. Then the first time I run the command, 2013 obs got filled up by 2014 data; then when I run that command again (just that command line only), the 2012 got filled up.and so on.

Anyway to fill them all at the same time?

FYI, I panelized the data by an ID series and a quarter series.

Thanks.
BC