Asked By: Anonymous
Thanks in advance for any help.
my Data looks like this:
|year|class|
|---|----|
|2007|a|
|2007|b|
|2007|c|
|2007|d|
|2008|a|
|2008|b|
|2008|e|
|2008|f|
|2009|c|
|2009|d|
|2009|e|
|2009|g|
The goal would be to delete any classes which occure in the previous year, so the final data looks like this:
|year|class|
|---|----|
|2007|a|
|2007|b|
|2007|c|
|2007|d|
|2008|e|
|2008|f|
|2009|c|
|2009|d|
|2009|g|
I tried this code, I intendet to group the data and then delete all within group duplicates but it did not remove everything just a few rows.
Instead of duplicates() I also tried unique() which did not work.
d %>% group_by(class, Group = c(0, cumsum(diff(year) != 1))) %>%
filter(!(duplicated(class, fromLast = TRUE)| duplicated(class))) %>%
ungroup() %>%
select(-Group)
Is there maybe another R function which can look at group differences?
Thanks for any help
Edit: Thanks too all for your very helpfull answers!
Solution
Answered By: Anonymous
Left join DF to itself on class and a year difference of 1 and retain only those rows for which there is no such match.
library(sqldf)
sqldf("select a.*
from DF a
left join DF b on b.class = a.class and b.year = a.year - 1
where b.year is null")
giving:
year class
1 2007 a
2 2007 b
3 2007 c
4 2007 d
5 2008 e
6 2008 f
7 2009 c
8 2009 d
9 2009 g
Note
Lines <- "|year|class|
|2007|a|
|2007|b|
|2007|c|
|2007|d|
|2008|a|
|2008|b|
|2008|e|
|2008|f|
|2009|c|
|2009|d|
|2009|e|
|2009|g|"
DF <- read.table(text = Lines, sep = "|", header = TRUE)[2:3]