Asked By: Anonymous
Thanks in advance for any help.
my Data looks like this:
|year|class| |---|----| |2007|a| |2007|b| |2007|c| |2007|d| |2008|a| |2008|b| |2008|e| |2008|f| |2009|c| |2009|d| |2009|e| |2009|g|
The goal would be to delete any classes which occure in the previous year, so the final data looks like this:
|year|class| |---|----| |2007|a| |2007|b| |2007|c| |2007|d| |2008|e| |2008|f| |2009|c| |2009|d| |2009|g|
I tried this code, I intendet to group the data and then delete all within group duplicates but it did not remove everything just a few rows.
Instead of duplicates() I also tried unique() which did not work.
d %>% group_by(class, Group = c(0, cumsum(diff(year) != 1))) %>% filter(!(duplicated(class, fromLast = TRUE)| duplicated(class))) %>% ungroup() %>% select(-Group)
Is there maybe another R function which can look at group differences?
Thanks for any help
Edit: Thanks too all for your very helpfull answers!
Answered By: Anonymous
Left join DF to itself on class and a year difference of 1 and retain only those rows for which there is no such match.
library(sqldf) sqldf("select a.* from DF a left join DF b on b.class = a.class and b.year = a.year - 1 where b.year is null")
year class 1 2007 a 2 2007 b 3 2007 c 4 2007 d 5 2008 e 6 2008 f 7 2009 c 8 2009 d 9 2009 g
Lines <- "|year|class| |2007|a| |2007|b| |2007|c| |2007|d| |2008|a| |2008|b| |2008|e| |2008|f| |2009|c| |2009|d| |2009|e| |2009|g|" DF <- read.table(text = Lines, sep = "|", header = TRUE)[2:3]