Page 1 of 1

delete duplicate observations within a group

Posted: Wed Jan 17, 2018 2:06 pm
by lpm
here is my data:

name value
A 1
A 1
B 2
B 2

clearly I have duplicates. how do i delete duplicates within name them ending up with

name value
A 1
B 2

Re: delete duplicate observations within a group

Posted: Thu Jan 18, 2018 10:30 am
by EViews Matt
Hello,

Is it sufficient to construct a sample that excludes the duplicates? Under the assumption that all duplicates in "name" are contiguous, it's simple to construct a dummy series that retains the first instance within a group of duplicates.

Code: Select all

series dummy = @nan(name <> name(-1), 1)
smpl if dummy

Re: delete duplicate observations within a group

Posted: Thu Jan 18, 2018 12:03 pm
by lpm
I think this is the right approach but it didn't work. I have thousands of observations. Some of the groups under name have multiple-up to 15- duplicates. using my data and your commands would i be able to create a dummy such as:

Name value dummy
A 1 1
A 1 0
B 2 1
B 2 0
C 3 1
C 3 0
C 3 0

so first observation in each group gets a value of 1. All other observations in the group get a value of 0. I could then pagecontract out all the 0 values under dummy. Is that possible?

Re: delete duplicate observations within a group

Posted: Thu Jan 18, 2018 2:27 pm
by EViews Matt
Sure, you can use my expression directly with pagecontract:

Code: Select all

pagecontract if @nan(name <> name(-1), 1)

Obviously, "name" above should be your series holding the name information. What didn't work when you tried my commands?

Re: delete duplicate observations within a group

Posted: Fri Jan 19, 2018 6:43 am
by lpm
Looks like i got it to work. I believe this is the issue that was causing the problem. My dataset is large. I want to look at a subset of that large dataset. To do so I use smpl if command. From that point I want to delete duplicates, and i did so by applying the command you gave me. it created the dummy variable, but lots of observations that should be coded as 1 were instead coded as 0.

This is what I did to get it to work. Instead of using smpl if command to limit size of data. I used pagecontract. I then used the command you provided me. It worked perfectly. Any ideas on why this makes sense? regardless it seems to work.

PS. obviously my data is a lot more complicated than the simple example I provide. But, i can't delete duplicates from smpl @all. I have to create subsets and then delete duplicates. That is, i need the first observation under name in each subset to take on a value of 1 and duplicate observations within that subset to take on values of 0. Pagecontract to include only that subset works. Is there a way to write the program so that it runs your code within each subset?

Thanks for you help. Very useful.

Re: delete duplicate observations within a group

Posted: Fri Jan 19, 2018 10:18 am
by EViews Matt
Curious. Would it be possible for you to post a portion of your actual dataset (large enough to exhibit the problem you're experiencing)? If you'd rather not do so publicly on the forum, you can attach it to a private message to me.

Re: delete duplicate observations within a group

Posted: Fri Jan 19, 2018 12:07 pm
by lpm
I can't share the data publicly or privately. But I can create a made up data set with the same characteristics that is clear to follow. Currently I'm swamped. Give me till about Wed to make data available on the public forum.

Re: delete duplicate observations within a group

Posted: Fri Jan 19, 2018 12:17 pm
by EViews Gareth
You may also email the data to support@eviews.com and reference this forum thread if that is easier than posting on the forum.