here is my data:
name value
A 1
A 1
B 2
B 2
clearly I have duplicates. how do i delete duplicates within name them ending up with
name value
A 1
B 2
delete duplicate observations within a group
Moderators: EViews Gareth, EViews Jason, EViews Steve, EViews Moderator
-
- EViews Developer
- Posts: 563
- Joined: Thu Apr 25, 2013 7:48 pm
Re: delete duplicate observations within a group
Hello,
Is it sufficient to construct a sample that excludes the duplicates? Under the assumption that all duplicates in "name" are contiguous, it's simple to construct a dummy series that retains the first instance within a group of duplicates.
Is it sufficient to construct a sample that excludes the duplicates? Under the assumption that all duplicates in "name" are contiguous, it's simple to construct a dummy series that retains the first instance within a group of duplicates.
Code: Select all
series dummy = @nan(name <> name(-1), 1)
smpl if dummy
Re: delete duplicate observations within a group
I think this is the right approach but it didn't work. I have thousands of observations. Some of the groups under name have multiple-up to 15- duplicates. using my data and your commands would i be able to create a dummy such as:
Name value dummy
A 1 1
A 1 0
B 2 1
B 2 0
C 3 1
C 3 0
C 3 0
so first observation in each group gets a value of 1. All other observations in the group get a value of 0. I could then pagecontract out all the 0 values under dummy. Is that possible?
Name value dummy
A 1 1
A 1 0
B 2 1
B 2 0
C 3 1
C 3 0
C 3 0
so first observation in each group gets a value of 1. All other observations in the group get a value of 0. I could then pagecontract out all the 0 values under dummy. Is that possible?
-
- EViews Developer
- Posts: 563
- Joined: Thu Apr 25, 2013 7:48 pm
Re: delete duplicate observations within a group
Sure, you can use my expression directly with pagecontract:
Obviously, "name" above should be your series holding the name information. What didn't work when you tried my commands?
Code: Select all
pagecontract if @nan(name <> name(-1), 1)
Obviously, "name" above should be your series holding the name information. What didn't work when you tried my commands?
Re: delete duplicate observations within a group
Looks like i got it to work. I believe this is the issue that was causing the problem. My dataset is large. I want to look at a subset of that large dataset. To do so I use smpl if command. From that point I want to delete duplicates, and i did so by applying the command you gave me. it created the dummy variable, but lots of observations that should be coded as 1 were instead coded as 0.
This is what I did to get it to work. Instead of using smpl if command to limit size of data. I used pagecontract. I then used the command you provided me. It worked perfectly. Any ideas on why this makes sense? regardless it seems to work.
PS. obviously my data is a lot more complicated than the simple example I provide. But, i can't delete duplicates from smpl @all. I have to create subsets and then delete duplicates. That is, i need the first observation under name in each subset to take on a value of 1 and duplicate observations within that subset to take on values of 0. Pagecontract to include only that subset works. Is there a way to write the program so that it runs your code within each subset?
Thanks for you help. Very useful.
This is what I did to get it to work. Instead of using smpl if command to limit size of data. I used pagecontract. I then used the command you provided me. It worked perfectly. Any ideas on why this makes sense? regardless it seems to work.
PS. obviously my data is a lot more complicated than the simple example I provide. But, i can't delete duplicates from smpl @all. I have to create subsets and then delete duplicates. That is, i need the first observation under name in each subset to take on a value of 1 and duplicate observations within that subset to take on values of 0. Pagecontract to include only that subset works. Is there a way to write the program so that it runs your code within each subset?
Thanks for you help. Very useful.
-
- EViews Developer
- Posts: 563
- Joined: Thu Apr 25, 2013 7:48 pm
Re: delete duplicate observations within a group
Curious. Would it be possible for you to post a portion of your actual dataset (large enough to exhibit the problem you're experiencing)? If you'd rather not do so publicly on the forum, you can attach it to a private message to me.
Re: delete duplicate observations within a group
I can't share the data publicly or privately. But I can create a made up data set with the same characteristics that is clear to follow. Currently I'm swamped. Give me till about Wed to make data available on the public forum.
-
- Fe ddaethom, fe welon, fe amcangyfrifon
- Posts: 13319
- Joined: Tue Sep 16, 2008 5:38 pm
Re: delete duplicate observations within a group
You may also email the data to support@eviews.com and reference this forum thread if that is easier than posting on the forum.
Follow us on Twitter @IHSEViews
Who is online
Users browsing this forum: No registered users and 22 guests