Removing first duplicate dataframe

Asked 12 years, 9 months ago

Viewed 148 times

\$\begingroup\$

I have a dataframe with several columns. One of them is an user ID column, in this column, I have several ids that can be repeated several times.

What I want to do is remove the first ID, for instance:

1,2,3,4,3,4,2,1,3,4,6,7,7

I would like to have an output like this:

3,4,2,1,3,4,7

Where is what I have done:

#find first duplicated of the each user
dup <- duplicated(results$user)
#create other data frame, every time vector is TRUE add the row to new dataframe
results1 <- NULL
for(i in 1:length(results$user)){
 if (dup[i] == TRUE) {
 rbind(results1, results[i,]) -> results1
 }
 }

Since I'm more used to think in Python, I have a feeling this is a very ugly solution for R. I would like to have some feedback, as well as some pointers on how to improve this piece of code.

edited Mar 3, 2015 at 4:12

Jamal's user avatar

Jamal

35.2k13 gold badges134 silver badges238 bronze badges

asked Dec 31, 2012 at 9:46

psoares's user avatar

psoares psoares

2011 silver badge5 bronze badges

\$\endgroup\$

Add a comment |

2 Answers 2

Sorted by: Reset to default

\$\begingroup\$

Here's a more efficient solution:

# an example data frame
results <- data.frame(user = c(1,2,3,4,3,4,2,1,3,4,6,7,7), a = 1)
# the solution
results[duplicated(results$user), ]

How it works: duplicated returns a logical vector indicating whether a value was also present at a preceding position in the vector (for each value of results$user).

This logical index is used to choose the appropriate lines of the orginal data frame. This is achieved by using this vector as the first argument for [ and using an empty second argument (to select all columns).

The result:

edited Jan 10, 2013 at 14:18

answered Jan 10, 2013 at 13:29

Sven Hohenstein's user avatar

Sven Hohenstein Sven Hohenstein

7451 gold badge6 silver badges19 bronze badges

\$\endgroup\$

\$\begingroup\$ you're right! It is better. With R I have some tendency to do more complicated stuff.. Thank you for your response \$\endgroup\$

psoares
– psoares

2013年01月11日 11:24:42 +00:00
Commented Jan 11, 2013 at 11:24

Add a comment |

\$\begingroup\$

Well after reading some stuffs, I've come to the conclusion that I could eliminate several lines and do this instead:

rbind(results1, results[dup,]) -> results1

It is much quicker and seems more efficient.

However any suggestions or recommendations are welcome :)

answered Jan 6, 2013 at 10:45

psoares's user avatar

psoares psoares

2011 silver badge5 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-r

Stack Exchange Network

Removing first duplicate dataframe

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Removing first duplicate dataframe

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions