# Calculate number of cases per year from pandas data frame

I have a data frame in the following format:

``````import pandas as pd
d = {'case_id': [1, 2, 3], 'begin': [2002, 1996, 2001], 'end': [2019, 2001, 2002]}
df = pd.DataFrame(data=d)
``````

I need to calculate how many cases are in force by year. This information can be derived from the ‘begin’ and ‘end’ columns.
For example, case 2 was in force between the years 1996 and 2001.

The resulting data frame should like as follows:

``````e = {'year': [1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
'cases': [1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
df_ = pd.DataFrame(data=e)
``````

Any idea how I can do this in a few lines for 1,000 cases?

## Solution

Assign the new value with `range` then `explode`

``````df['new'] = [range(x,y+1) for x , y in zip(df.begin,df.end)]
df = df.explode('new')
``````

And we do `groupby` + `nunique`

``````out = df.groupby(['new']).case_id.nunique().reset_index()
Out[257]:
new  case_id
0   1996        1
1   1997        1
2   1998        1
3   1999        1
4   2000        1
5   2001        2
6   2002        2
7   2003        1
8   2004        1
9   2005        1
10  2006        1
11  2007        1
12  2008        1
13  2009        1
14  2010        1
15  2011        1
16  2012        1
17  2013        1
18  2014        1
19  2015        1
20  2016        1
21  2017        1
22  2018        1
23  2019        1
``````

﻿﻿