Introduction to PHEindicatormethods

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_smr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_isr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
area = rep(c("Area1","Area2","Area3","Area4"), 2),
year = rep(2015:2016, each = 4),
obs = sample(100, 2 * 4, replace = TRUE),
pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  92 168
#> 2 Area2 2015  28 137
#> 3 Area3 2015  44 182
#> 4 Area4 2015  97 110
#> 5 Area1 2016  59 127
#> 6 Area2 2016  23 172
#> 7 Area3 2016  80 135
#> 8 Area4 2016  26 103

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  92 168 0.5476190 0.47212935 0.6209797        95% proportion of 1
#> 2 Area2 2015  28 137 0.2043796 0.14535931 0.2795259        95% proportion of 1
#> 3 Area3 2015  44 182 0.2417582 0.18530912 0.3088834        95% proportion of 1
#> 4 Area4 2015  97 110 0.8818182 0.80824955 0.9296187        95% proportion of 1
#> 5 Area1 2016  59 127 0.4645669 0.38014301 0.5510714        95% proportion of 1
#> 6 Area2 2016  23 172 0.1337209 0.09078473 0.1926607        95% proportion of 1
#> 7 Area3 2016  80 135 0.5925926 0.50826068 0.6718008        95% proportion of 1
#> 8 Area4 2016  26 103 0.2524272 0.17847829 0.3441789        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  92 168 0.5476190 0.42959835 0.6605173      99.8% proportion of 1
#> 2 Area2 2015  28 137 0.2043796 0.11891920 0.3283667      99.8% proportion of 1
#> 3 Area3 2015  44 182 0.2417582 0.15817246 0.3510929      99.8% proportion of 1
#> 4 Area4 2015  97 110 0.8818182 0.75511680 0.9475208      99.8% proportion of 1
#> 5 Area1 2016  59 127 0.4645669 0.33512834 0.5989615      99.8% proportion of 1
#> 6 Area2 2016  23 172 0.1337209 0.07258592 0.2333886      99.8% proportion of 1
#> 7 Area3 2016  80 135 0.5925926 0.46003563 0.7129155      99.8% proportion of 1
#> 8 Area4 2016  26 103 0.2524272 0.14516538 0.4017008      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  92 168 54.76190 47.212935 62.09797        95% percentage Wilson
#> 2 Area2 2015  28 137 20.43796 14.535931 27.95259        95% percentage Wilson
#> 3 Area3 2015  44 182 24.17582 18.530912 30.88834        95% percentage Wilson
#> 4 Area4 2015  97 110 88.18182 80.824955 92.96187        95% percentage Wilson
#> 5 Area1 2016  59 127 46.45669 38.014301 55.10714        95% percentage Wilson
#> 6 Area2 2016  23 172 13.37209  9.078473 19.26607        95% percentage Wilson
#> 7 Area3 2016  80 135 59.25926 50.826068 67.18008        95% percentage Wilson
#> 8 Area4 2016  26 103 25.24272 17.847829 34.41789        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  92 168 54.76190 42.959835 66.05173      99.8% percentage Wilson
#> 2 Area2 2015  28 137 20.43796 11.891920 32.83667      99.8% percentage Wilson
#> 3 Area3 2015  44 182 24.17582 15.817246 35.10929      99.8% percentage Wilson
#> 4 Area4 2015  97 110 88.18182 75.511680 94.75208      99.8% percentage Wilson
#> 5 Area1 2016  59 127 46.45669 33.512834 59.89615      99.8% percentage Wilson
#> 6 Area2 2016  23 172 13.37209  7.258592 23.33886      99.8% percentage Wilson
#> 7 Area3 2016  80 135 59.25926 46.003563 71.29155      99.8% percentage Wilson
#> 8 Area4 2016  26 103 25.24272 14.516538 40.17008      99.8% percentage Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  92 168 54.76190 42.959835 66.05173
#> 2 Area2 2015  28 137 20.43796 11.891920 32.83667
#> 3 Area3 2015  44 182 24.17582 15.817246 35.10929
#> 4 Area4 2015  97 110 88.18182 75.511680 94.75208
#> 5 Area1 2016  59 127 46.45669 33.512834 59.89615
#> 6 Area2 2016  23 172 13.37209  7.258592 23.33886
#> 7 Area3 2016  80 135 59.25926 46.003563 71.29155
#> 8 Area4 2016  26 103 25.24272 14.516538 40.17008

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value   lowercl   uppercl confidence       statistic
#> 1 Area1 2015  92 168 54761.90 44144.645  67161.31        95% rate per 100000
#> 2 Area2 2015  28 137 20437.96 13577.873  29539.64        95% rate per 100000
#> 3 Area3 2015  44 182 24175.82 17564.489  32455.70        95% rate per 100000
#> 4 Area4 2015  97 110 88181.82 71507.769 107575.39        95% rate per 100000
#> 5 Area1 2016  59 127 46456.69 35362.950  59926.80        95% rate per 100000
#> 6 Area2 2016  23 172 13372.09  8474.053  20065.59        95% rate per 100000
#> 7 Area3 2016  80 135 59259.26 46987.312  73754.19        95% rate per 100000
#> 8 Area4 2016  26 103 25242.72 16485.200  36987.90        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl   uppercl confidence    statistic
#> 1 Area1 2015  92 168 54.76190 38.787820  74.81532      99.8% rate per 100
#> 2 Area2 2015  28 137 20.43796 10.517905  35.44235      99.8% rate per 100
#> 3 Area3 2015  44 182 24.17582 14.441203  37.71066      99.8% rate per 100
#> 4 Area4 2015  97 110 88.18182 63.063667 119.52675      99.8% rate per 100
#> 5 Area1 2016  59 127 46.45669 29.964550  68.37318      99.8% rate per 100
#> 6 Area2 2016  23 172 13.37209  6.355108  24.45284      99.8% rate per 100
#> 7 Area3 2016  80 135 59.25926 40.859655  82.74312      99.8% rate per 100
#> 8 Area4 2016  26 103 25.24272 12.621892  44.63484      99.8% rate per 100
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl   uppercl
#> 1 Area1 2015  92 168 54.76190 38.787820  74.81532
#> 2 Area2 2015  28 137 20.43796 10.517905  35.44235
#> 3 Area3 2015  44 182 24.17582 14.441203  37.71066
#> 4 Area4 2015  97 110 88.18182 63.063667 119.52675
#> 5 Area1 2016  59 127 46.45669 29.964550  68.37318
#> 6 Area2 2016  23 172 13.37209  6.355108  24.45284
#> 7 Area3 2016  80 135 59.25926 40.859655  82.74312
#> 8 Area4 2016  26 103 25.24272 12.621892  44.63484

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
group_by(year) %>%
phe_proportion(obs, pop)
#> # A tibble: 2 x 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr>
#> 1  2015   261   597 0.437   0.398   0.477 95%        proportion of 1 Wilson
#> 2  2016   188   537 0.350   0.311   0.391 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
group_by(year) %>%
phe_rate(obs, pop)
#> # A tibble: 2 x 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr>
#> 1  2015   261   597 43719.  38575.  49357. 95%        rate per 100000 Byars
#> 2  2016   188   537 35009.  30183.  40387. 95%        rate per 100000 Byars

Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev  value  lowercl  uppercl confidence statistic
#> 1       449           8 30.42291 56.125 30.69081 81.55919        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
group_by(year) %>%
phe_mean(obs, confidence=0.998)
#> # A tibble: 2 x 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confidence statistic
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>
#> 1  2015       261           4  34.5  65.2  -111.     241. 99.8%      mean
#> 2  2016       188           4  27.4  47     -92.9    187. 99.8%      mean
#> # ... with 1 more variable: method <chr>

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
group_by(year) %>%
phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 x 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       261           4  34.5  65.2  -111.     241.
#> 2  2016       188           4  27.4  47     -92.9    187.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
year = rep(2006:2010, each = 19 * 2),
sex = rep(rep(c("Male", "Female"), each = 19), 5),
ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
50,55,60,65,70,75,80,85,90), times = 10),
obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0  54 15659
#> 2 Area1 2006 Male       5 175 11467
#> 3 Area1 2006 Male      10  84 16941
#> 4 Area1 2006 Male      15 162 18434
#> 5 Area1 2006 Male      20  52 10299
#> 6 Area1 2006 Male      25 169 16142

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

• broken down into 19 standardisation categories per grouping set representing the 19 x 5-year age bands from 00-04 to 90+
• sorted, within each grouping set, by these 19 age bands in ascending order
• to be standardised against the 2013 European Standard Population

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

• standard populations provided as a vector - the vector and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order

• standard populations provided as a column within the input data frame - the standard populations can be appended to the input data frame by the user prior to execution of the function - if the data is grouped to generate multiple dsrs then the standard populations will need to be repeated and appended to the data rows for every grouping set.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>
#>  1 Area1  2006 Fema~        1716    305697  590.    561.    620. 95%
#>  2 Area1  2006 Male         1820    265371  703.    669.    738. 95%
#>  3 Area1  2007 Fema~        2021    295895  665.    635.    697. 95%
#>  4 Area1  2007 Male         1966    290233  601.    572.    632. 95%
#>  5 Area1  2008 Fema~        2057    276281  802.    764.    840. 95%
#>  6 Area1  2008 Male         1936    289989  650.    618.    682. 95%
#>  7 Area1  2009 Fema~        1403    270814  526.    496.    557. 95%
#>  8 Area1  2009 Male         2070    282766  791.    756.    829. 95%
#>  9 Area1  2010 Fema~        1254    272712  527.    496.    559. 95%
#> 10 Area1  2010 Male         2694    273206 1061.   1018.   1104. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 x 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1716    305697  590.    561.    620.
#>  2 Area1  2006 Male          1820    265371  703.    669.    738.
#>  3 Area1  2007 Female        2021    295895  665.    635.    697.
#>  4 Area1  2007 Male          1966    290233  601.    572.    632.
#>  5 Area1  2008 Female        2057    276281  802.    764.    840.
#>  6 Area1  2008 Male          1936    289989  650.    618.    682.
#>  7 Area1  2009 Female        1403    270814  526.    496.    557.
#>  8 Area1  2009 Male          2070    282766  791.    756.    829.
#>  9 Area1  2010 Female        1254    272712  527.    496.    559.
#> 10 Area1  2010 Male          2694    273206 1061.   1018.   1104.
#> # ... with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>
#>  1 Area1  2006 Fema~        1716    305697  590.    561.    620. 95%
#>  2 Area1  2006 Male         1820    265371  703.    669.    738. 95%
#>  3 Area1  2007 Fema~        2021    295895  665.    635.    697. 95%
#>  4 Area1  2007 Male         1966    290233  601.    572.    632. 95%
#>  5 Area1  2008 Fema~        2057    276281  802.    764.    840. 95%
#>  6 Area1  2008 Male         1936    289989  650.    618.    682. 95%
#>  7 Area1  2009 Fema~        1403    270814  526.    496.    557. 95%
#>  8 Area1  2009 Male         2070    282766  791.    756.    829. 95%
#>  9 Area1  2010 Fema~        1254    272712  527.    496.    559. 95%
#> 10 Area1  2010 Male         2694    273206 1061.   1018.   1104. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
mutate(refpop = rep(esp2013,40)) %>%
group_by(area, year, sex) %>%
phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>
#>  1 Area1  2006 Fema~        1716    305697  590.    561.    620. 95%
#>  2 Area1  2006 Male         1820    265371  703.    669.    738. 95%
#>  3 Area1  2007 Fema~        2021    295895  665.    635.    697. 95%
#>  4 Area1  2007 Male         1966    290233  601.    572.    632. 95%
#>  5 Area1  2008 Fema~        2057    276281  802.    764.    840. 95%
#>  6 Area1  2008 Male         1936    289989  650.    618.    682. 95%
#>  7 Area1  2009 Fema~        1403    270814  526.    496.    557. 95%
#>  8 Area1  2009 Male         2070    282766  791.    756.    829. 95%
#>  9 Area1  2010 Fema~        1254    272712  527.    496.    559. 95%
#> 10 Area1  2010 Male         2694    273206 1061.   1018.   1104. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
filter(ageband <= 70) %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>
#>  1 Area1  2006 Fema~        1438    252522  590.    559.    622. 95%
#>  2 Area1  2006 Male         1416    218146  681.    645.    718. 95%
#>  3 Area1  2007 Fema~        1602    242161  662.    630.    696. 95%
#>  4 Area1  2007 Male         1313    224706  563.    532.    595. 95%
#>  5 Area1  2008 Fema~        1612    215664  809.    769.    850. 95%
#>  6 Area1  2008 Male         1507    225263  662.    628.    697. 95%
#>  7 Area1  2009 Fema~        1054    207128  533.    501.    567. 95%
#>  8 Area1  2009 Male         1625    215894  786.    748.    826. 95%
#>  9 Area1  2010 Fema~        1064    218006  550.    516.    585. 95%
#> 10 Area1  2010 Male         2251    217491 1080.   1034.   1127. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate dsrs for persons for each area and year)
df_std %>%
group_by(area, year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last") %>%
phe_dsr(obs,pop)
#> # A tibble: 20 x 10
#> # Groups:   area, year [20]
#>    area   year total_count total_pop value lowercl uppercl confidence statistic
#>    <chr> <int>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>      <chr>
#>  1 Area1  2006        3536    571068  635.    613.    657. 95%        dsr per ~
#>  2 Area1  2007        3987    586128  631.    610.    653. 95%        dsr per ~
#>  3 Area1  2008        3993    566270  718.    694.    743. 95%        dsr per ~
#>  4 Area1  2009        3473    553580  652.    629.    675. 95%        dsr per ~
#>  5 Area1  2010        3948    545918  784.    758.    810. 95%        dsr per ~
#>  6 Area2  2006        4462    583452  759.    735.    782. 95%        dsr per ~
#>  7 Area2  2007        3261    558711  649.    626.    673. 95%        dsr per ~
#>  8 Area2  2008        3767    541450  724.    700.    748. 95%        dsr per ~
#>  9 Area2  2009        4373    594046  738.    715.    762. 95%        dsr per ~
#> 10 Area2  2010        3827    547360  697.    674.    721. 95%        dsr per ~
#> 11 Area3  2006        4421    623195  750.    727.    774. 95%        dsr per ~
#> 12 Area3  2007        3574    558936  640.    618.    662. 95%        dsr per ~
#> 13 Area3  2008        4076    595022  692.    669.    715. 95%        dsr per ~
#> 14 Area3  2009        4076    562820  742.    717.    767. 95%        dsr per ~
#> 15 Area3  2010        3802    584059  658.    636.    681. 95%        dsr per ~
#> 16 Area4  2006        3966    597001  686.    663.    709. 95%        dsr per ~
#> 17 Area4  2007        3262    550467  576.    555.    597. 95%        dsr per ~
#> 18 Area4  2008        3935    609070  683.    661.    706. 95%        dsr per ~
#> 19 Area4  2009        3129    564811  584.    562.    606. 95%        dsr per ~
#> 20 Area4  2010        3861    588521  710.    687.    734. 95%        dsr per ~
#> # ... with 1 more variable: method <chr>

Execute phe_smr and phe_isr

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the phe_smr and phe_isr functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

• reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.

• reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple smrs/isrs then the reference data will need to be repeated and appended to the data rows for every grouping set.

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
filter(year == 2006) %>%
group_by(ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last")

#> # A tibble: 6 x 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0   888 140943
#> 2       5  1065 127287
#> 3      10   832 126762
#> 4      15   896 120984
#> 5      20   846 127930
#> 6      25   934 129497

Here are some example code chunks to demonstrate the phe_smr function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
df_std %>%
group_by(area, year, sex) %>%
phe_smr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>
#>  1 Area1  2006 Fema~     1716    2103. 0.816   0.778   0.855 95%
#>  2 Area1  2006 Male      1820    1820. 1.00    0.954   1.05  95%
#>  3 Area1  2007 Fema~     2021    2034. 0.994   0.951   1.04  95%
#>  4 Area1  2007 Male      1966    2030. 0.968   0.926   1.01  95%
#>  5 Area1  2008 Fema~     2057    1922. 1.07    1.02    1.12  95%
#>  6 Area1  2008 Male      1936    1991. 0.972   0.930   1.02  95%
#>  7 Area1  2009 Fema~     1403    1871. 0.750   0.711   0.790 95%
#>  8 Area1  2009 Male      2070    1982. 1.04    1.00    1.09  95%
#>  9 Area1  2010 Fema~     1254    1895. 0.662   0.625   0.699 95%
#> 10 Area1  2010 Male      2694    1894. 1.42    1.37    1.48  95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
df_std %>%
mutate(refobs = rep(df_ref$obs,40), refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
phe_smr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>
#>  1 Area1  2006 Fema~     1716    2103. 0.816   0.778   0.855 95%
#>  2 Area1  2006 Male      1820    1820. 1.00    0.954   1.05  95%
#>  3 Area1  2007 Fema~     2021    2034. 0.994   0.951   1.04  95%
#>  4 Area1  2007 Male      1966    2030. 0.968   0.926   1.01  95%
#>  5 Area1  2008 Fema~     2057    1922. 1.07    1.02    1.12  95%
#>  6 Area1  2008 Male      1936    1991. 0.972   0.930   1.02  95%
#>  7 Area1  2009 Fema~     1403    1871. 0.750   0.711   0.790 95%
#>  8 Area1  2009 Male      2070    1982. 1.04    1.00    1.09  95%
#>  9 Area1  2010 Fema~     1254    1895. 0.662   0.625   0.699 95%
#> 10 Area1  2010 Male      2694    1894. 1.42    1.37    1.48  95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
group_by(year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last") %>%
phe_smr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 6
#> # Groups:   year [5]
#>    year observed expected value lowercl uppercl
#>   <int>    <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    16385   16385  1       0.985   1.02
#> 2  2007    14084   15606. 0.902   0.888   0.917
#> 3  2008    15771   15983. 0.987   0.971   1.00
#> 4  2009    15051   15760. 0.955   0.940   0.970
#> 5  2010    15438   15636. 0.987   0.972   1.00

The phe_isr function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate isrs for each area, year and sex
df_std %>%
group_by(area, year, sex) %>%
phe_isr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 12
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>
#>  1 Area1  2006 Fema~     1716    2103.     690.  563.    537.    590. 95%
#>  2 Area1  2006 Male      1820    1820.     690.  690.    658.    722. 95%
#>  3 Area1  2007 Fema~     2021    2034.     690.  686.    656.    716. 95%
#>  4 Area1  2007 Male      1966    2030.     690.  668.    639.    698. 95%
#>  5 Area1  2008 Fema~     2057    1922.     690.  739.    707.    771. 95%
#>  6 Area1  2008 Male      1936    1991.     690.  671.    641.    702. 95%
#>  7 Area1  2009 Fema~     1403    1871.     690.  517.    491.    545. 95%
#>  8 Area1  2009 Male      2070    1982.     690.  721.    690.    752. 95%
#>  9 Area1  2010 Fema~     1254    1895.     690.  456.    432.    482. 95%
#> 10 Area1  2010 Male      2694    1894.     690.  981.    945.   1019. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same isrs by appending the reference data to the data frame
df_std %>%
mutate(refobs = rep(df_ref$obs,40), refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
phe_isr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 12
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>
#>  1 Area1  2006 Fema~     1716    2103.     690.  563.    537.    590. 95%
#>  2 Area1  2006 Male      1820    1820.     690.  690.    658.    722. 95%
#>  3 Area1  2007 Fema~     2021    2034.     690.  686.    656.    716. 95%
#>  4 Area1  2007 Male      1966    2030.     690.  668.    639.    698. 95%
#>  5 Area1  2008 Fema~     2057    1922.     690.  739.    707.    771. 95%
#>  6 Area1  2008 Male      1936    1991.     690.  671.    641.    702. 95%
#>  7 Area1  2009 Fema~     1403    1871.     690.  517.    491.    545. 95%
#>  8 Area1  2009 Male      2070    1982.     690.  721.    690.    752. 95%
#>  9 Area1  2010 Fema~     1254    1895.     690.  456.    432.    482. 95%
#> 10 Area1  2010 Male      2694    1894.     690.  981.    945.   1019. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
group_by(year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last")  %>%
phe_isr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 7
#> # Groups:   year [5]
#>    year observed expected ref_rate value lowercl uppercl
#>   <int>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    16385   16385      690.  690.    679.    701.
#> 2  2007    14084   15606.     690.  623.    612.    633.
#> 3  2008    15771   15983.     690.  681.    670.    692.
#> 4  2009    15051   15760.     690.  659.    648.    670.
#> 5  2010    15438   15636.     690.  681.    671.    692.