Identify and delete duplicate observations.
Posted: Wed Aug 31, 2011 3:23 pm
This ought to be very easy, but I can't seem to find a way to count the number of distinct observations over a set of criteria. There ought to be a by-group function that does this, right? This is quite simple to do in SAS, SQL, or Stata. I think the question's been posed on this forum before, but I can't find a thread that's been responded to.
My exact problem is that I've got several duplicate observations in a workfile out of which I want to create a panel dataset. I have a cross-section id, and have created a date id. I should be able to create a series that counts the number of distinct observations in each cross-section/date group, and delete any observations that have a count of more than 1.
(Well, it doesn't seem to be possible to actually delete observations from code.....but that's a separate issue in Eviews. I'll use the work-around with pagecontract that's been referenced before on this forum.)
Any advice?
Thanks,
R.
Update: It looks like the last couple of posts in this thread raised the same question -- but the moderators don't seem to have given any resolution. http://forums.eviews.com/viewtopic.php?f=3&t=2004
Or am I missing some follow-on?
I'm working with a very large dataset, so the work-around suggested (creating a new variable that combines the two variables of interest [i.e., in some way combining the cross-section and date ids to create a unique identifier] and then looping to sample for each value of the variable and counting the number of observations in the sample) would be very inefficient.
My exact problem is that I've got several duplicate observations in a workfile out of which I want to create a panel dataset. I have a cross-section id, and have created a date id. I should be able to create a series that counts the number of distinct observations in each cross-section/date group, and delete any observations that have a count of more than 1.
(Well, it doesn't seem to be possible to actually delete observations from code.....but that's a separate issue in Eviews. I'll use the work-around with pagecontract that's been referenced before on this forum.)
Any advice?
Thanks,
R.
Update: It looks like the last couple of posts in this thread raised the same question -- but the moderators don't seem to have given any resolution. http://forums.eviews.com/viewtopic.php?f=3&t=2004
Or am I missing some follow-on?
I'm working with a very large dataset, so the work-around suggested (creating a new variable that combines the two variables of interest [i.e., in some way combining the cross-section and date ids to create a unique identifier] and then looping to sample for each value of the variable and counting the number of observations in the sample) would be very inefficient.