delete duplicate observations within a group

lpm · Postby **lpm** » Wed Jan 17, 2018 2:06 pm

here is my data:

name value
A 1
A 1
B 2
B 2

clearly I have duplicates. how do i delete duplicates within name them ending up with

name value
A 1
B 2

EViews Matt · Postby **EViews Matt** » Thu Jan 18, 2018 10:30 am

Hello,

Is it sufficient to construct a sample that excludes the duplicates? Under the assumption that all duplicates in "name" are contiguous, it's simple to construct a dummy series that retains the first instance within a group of duplicates.

Code: Select all

series dummy = @nan(name <> name(-1), 1)
smpl if dummy

lpm · Postby **lpm** » Thu Jan 18, 2018 12:03 pm

I think this is the right approach but it didn't work. I have thousands of observations. Some of the groups under name have multiple-up to 15- duplicates. using my data and your commands would i be able to create a dummy such as:

Name value dummy
A 1 1
A 1 0
B 2 1
B 2 0
C 3 1
C 3 0
C 3 0

so first observation in each group gets a value of 1. All other observations in the group get a value of 0. I could then pagecontract out all the 0 values under dummy. Is that possible?

EViews Matt · Postby **EViews Matt** » Thu Jan 18, 2018 2:27 pm

Sure, you can use my expression directly with pagecontract:

Code: Select all

pagecontract if @nan(name <> name(-1), 1)

Obviously, "name" above should be your series holding the name information. What didn't work when you tried my commands?

lpm · Postby **lpm** » Fri Jan 19, 2018 6:43 am

Looks like i got it to work. I believe this is the issue that was causing the problem. My dataset is large. I want to look at a subset of that large dataset. To do so I use smpl if command. From that point I want to delete duplicates, and i did so by applying the command you gave me. it created the dummy variable, but lots of observations that should be coded as 1 were instead coded as 0.

This is what I did to get it to work. Instead of using smpl if command to limit size of data. I used pagecontract. I then used the command you provided me. It worked perfectly. Any ideas on why this makes sense? regardless it seems to work.

PS. obviously my data is a lot more complicated than the simple example I provide. But, i can't delete duplicates from smpl @all. I have to create subsets and then delete duplicates. That is, i need the first observation under name in each subset to take on a value of 1 and duplicate observations within that subset to take on values of 0. Pagecontract to include only that subset works. Is there a way to write the program so that it runs your code within each subset?

Thanks for you help. Very useful.

EViews Matt · Postby **EViews Matt** » Fri Jan 19, 2018 10:18 am

Curious. Would it be possible for you to post a portion of your actual dataset (large enough to exhibit the problem you're experiencing)? If you'd rather not do so publicly on the forum, you can attach it to a private message to me.

lpm · Postby **lpm** » Fri Jan 19, 2018 12:07 pm

I can't share the data publicly or privately. But I can create a made up data set with the same characteristics that is clear to follow. Currently I'm swamped. Give me till about Wed to make data available on the public forum.

EViews Gareth · Postby **EViews Gareth** » Fri Jan 19, 2018 12:17 pm

You may also email the data to support@eviews.com and reference this forum thread if that is easier than posting on the forum.

EViews.com

delete duplicate observations within a group

delete duplicate observations within a group

Re: delete duplicate observations within a group

Re: delete duplicate observations within a group

Re: delete duplicate observations within a group

Re: delete duplicate observations within a group

Re: delete duplicate observations within a group

Re: delete duplicate observations within a group

Re: delete duplicate observations within a group

Who is online