This vignette focuses on how to create in-text tables with the `inTextSummaryTable`

package.

In this vignette we assume you have ready the `data.frame`

(s) to create the tables. If you have doubts on the data format, please look the introductory vignette at the section “data format”.

We will use the example data available in the `clinUtils`

package. Let’s load the packages and the data, and get started!

```
library(inTextSummaryTable)
library(pander)
library(tools) # toTitleCase
```

```
library(clinUtils)
# load example data
data(dataADaMCDISCP01)
dataAll <- dataADaMCDISCP01
labelVars <- attr(dataAll, "labelVars")
```

The ** getSummaryStatisticsTable** creates an in-text table of summary statistics for variable(s) of interest.

The *Demographic* data (`ADSL`

dataset) is used as example for the summary statistics table.

` dataSL <- dataAll$ADSL`

Variable(s) to summarize in the table are specified via the ** var parameter**.

Different set of statistics are reported depending on the type of variable: Categorical variable or Continuous variable.

See the documentation in section *Base statistics* for more details on the statistics included by default for each type, via:

`? `inTextSummaryTable-stats` `

For a **discrete/categorical variable**, the in-text table can display the **counts/percentages of the number of subjects or records for each category** of the variable.

If **no variable is specified** (via the `var`

parameter), the counts are displayed for the **entire dataset**.

` getSummaryStatisticsTable(data = dataSL)`

Statistic | StatisticValue |

statN | 7 |

statm | 7 |

statPercTotalN | 7 |

statPercN | 100 |

Please note that this is equivalent of setting (`var = 'all'`

).

If a **variable is specified** (via the `var`

parameter), the counts are displayed **for each category**.

` getSummaryStatisticsTable(data = dataSL, var = "SEX")`

Variable group | StatisticValue |

Statistic | |

F | |

statN | 5 |

statm | 5 |

statPercTotalN | 7 |

statPercN | 71.43 |

M | |

statN | 2 |

statm | 2 |

statPercTotalN | 7 |

statPercN | 28.57 |

The categories of the variable are sorted alphabetically by default. To sort the categories in a specific order, the variable should be formatted as ** factor**, whose ordered categories are included in its

`levels`

```
# specify manually the order of the categories
dataSL$SEX <- factor(dataSL$SEX, levels = c("M", "F"))
getSummaryStatisticsTable(data = dataSL, var = "SEX")
```

Variable group | StatisticValue |

Statistic | |

M | |

statN | 2 |

statm | 2 |

statPercTotalN | 7 |

statPercN | 28.57 |

F | |

statN | 5 |

statm | 5 |

statPercTotalN | 7 |

statPercN | 71.43 |

```
# order categories based on a numeric variable
dataSL$SEXN <- ifelse(dataSL$SEX == "M", 2, 1)
dataSL$SEX <- reorder(dataSL$SEX, dataSL$SEXN)
getSummaryStatisticsTable(data = dataSL, var = "SEX")
```

Variable group | StatisticValue |

Statistic | |

F | |

statN | 5 |

statm | 5 |

statPercTotalN | 7 |

statPercN | 71.43 |

M | |

statN | 2 |

statm | 2 |

statPercTotalN | 7 |

statPercN | 28.57 |

By default, the table only includes the categories present in the input data, to ensure a compact table for CSR export.

```
dataSLExample <- dataSL
# 'SEX' formatted as character with only male
dataSLExample$SEX <- "M" # only male
getSummaryStatisticsTable(data = dataSLExample, var = "SEX")
```

Variable group | StatisticValue |

Statistic | |

M | |

statN | 7 |

statm | 7 |

statPercTotalN | 7 |

statPercN | 100 |

If extra categories should be represented in the table, the categorical variable should be **formatted as a factor**, whose **levels contain all categories** to be displayed in the table.

Furthermore, the parameter: `varInclude0`

should be set to `TRUE`

or to the specific variable (in case multiple variables are specified) to indicate that categories with 0 counts should be included.

```
# 'SEX' formatted as factor, to include also female in the table
# (even if not available in the data)
dataSLExample$SEX <- factor("M", levels = c("F", "M"))
getSummaryStatisticsTable(data = dataSLExample, var = "SEX", varInclude0 = TRUE)
```

Variable group | StatisticValue |

Statistic | |

F | |

statN | 0 |

statm | 0 |

statPercTotalN | 7 |

statPercN | 0 |

M | |

statN | 7 |

statm | 7 |

statPercTotalN | 7 |

statPercN | 100 |

```
# or:
getSummaryStatisticsTable(data = dataSLExample, var = "SEX", varInclude0 = "SEX")
```

Variable group | StatisticValue |

Statistic | |

F | |

statN | 0 |

statm | 0 |

statPercTotalN | 7 |

statPercN | 0 |

M | |

statN | 7 |

statm | 7 |

statPercTotalN | 7 |

statPercN | 100 |

A specific type of categorical variable is a **‘flag variable’**, which indicates if a record fulfills a specific criteria.

Such variable is typically formatted in the data as:

- ‘Y’ if the criteria is met for the specific record
- ‘N’ if the criteria is not fulfilled for the specific record
- ’’ if the criteria is missing for this record

The name of such variable typically ends with **‘FL’** in a CDISC-compliant *ADaM* or *SDTM* dataset.

For example, the subject-level dataset contains the following flag variables:

` labelVars[grep("FL$", colnames(dataSL), value = TRUE)]`

```
## SAFFL ITTFL EFFFL COMP8FL
## "Safety Population Flag" "Intent-to-Treat Population Flag" "Efficacy Population Flag" "Completers of Week 8 Population Flag"
## COMP16FL COMP24FL DISCONFL DSRAEFL
## "Completers of Week 16 Population Flag" "Completers of Week 24 Population Flag" "Did the Subject Discontinue the Study?" "Discontinued due to AE?"
## DTHFL
## "Subject Died?"
```

```
# has the subject discontinued from the study?
dataSL$DISCONFL
```

`## [1] "" "" "Y" "Y" "Y" "Y" "Y"`

If this variable is specified in `var`

, the counts for each category is reported:

```
getSummaryStatisticsTable(
data = dataSL,
var = "SAFFL"
)
```

Variable group | StatisticValue |

Statistic | |

Y | |

statN | 7 |

statm | 7 |

statPercTotalN | 7 |

statPercN | 100 |

However, the interest is often to only reports the counts for the records fulfilling the criteria (records with ‘Y’). This is the case if the variable is specified via the `varFlag`

parameter too.

```
getSummaryStatisticsTable(
data = dataSL,
var = "SAFFL",
varFlag = "SAFFL"
)
```

Statistic | StatisticValue |

statN | 7 |

statm | 7 |

statPercTotalN | 7 |

statPercN | 100 |

To include the total counts across categories, the `varTotalInclude`

parameter should be set to `TRUE`

(or to the specific variable).

```
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
varTotalInclude = TRUE
)
```

Variable group | StatisticValue |

Statistic | |

Total | |

statN | 7 |

statm | 7 |

statPercTotalN | 7 |

statPercN | 100 |

F | |

statN | 5 |

statm | 5 |

statPercTotalN | 7 |

statPercN | 71.43 |

M | |

statN | 2 |

statm | 2 |

statPercTotalN | 7 |

statPercN | 28.57 |

For a **continuous variable**, the in-text table displays **standard distribution statistics** of the variable.

Please note that **missing records (NA) for the variable are filtered**, so the **count statistics** (number of subjects, records, percentage) are based **only on the non missing records**.

For a continuous variable, the presence of different values for the same subject (and across row/column variables) are checked and an appropriate error message is returned if multiple different values are available.

` getSummaryStatisticsTable(data = dataSL, var = "AGE")`

Statistic | StatisticValue |

statN | 7 |

statm | 7 |

statMean | 74.29 |

statSD | 9.827 |

statSE | 3.714 |

statMedian | 75 |

statMin | 57 |

statMax | 89 |

statPercTotalN | 7 |

statPercN | 100 |

The table can contain a mix of categorical and continuous variables.

```
getSummaryStatisticsTable(
data = dataSL,
var = c("AGE", "SEX")
)
```

Variable | StatisticValue |

Variable group | |

Statistic | |

AGE | |

statN | 7 |

statm | 7 |

statMean | 74.29 |

statSD | 9.827 |

statSE | 3.714 |

statMedian | 75 |

statMin | 57 |

statMax | 89 |

statPercTotalN | 7 |

statPercN | 100 |

SEX | |

F | |

statN | 5 |

statm | 5 |

statPercTotalN | 7 |

statPercN | 71.43 |

M | |

statN | 2 |

statm | 2 |

statPercTotalN | 7 |

statPercN | 28.57 |

Statistics of interest and their format are specified via the ** stats parameter**.

If an unique statistic expression is specified, the ‘Statistic’ column doesn’t appear in the table.

In case multiple statistics are specified, these are included as separated row.

A standard set of statistics is specified via specific tags to be passed to the `stats`

function.

The list of available statistics is mentioned in the section ‘*Formatted statistics*’ in:

` ? `inTextSummaryTable-stats` `

Please see below examples of commonly used statistics.

```
# count: n, '%' and m
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
stats = "count"
)
```

Variable group | StatisticValue |

Statistic | |

F | |

n | 5 |

% | 71.4 |

m | 5 |

M | |

n | 2 |

% | 28.6 |

m | 2 |

```
# n (%)
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
stats = "n (%)"
)
```

Variable group | n (%) |

F | 5 (71.4) |

M | 2 (28.6) |

```
# n/N (%)
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
stats = "n/N (%)"
)
```

Variable group | n/N (%) |

F | 5/7 (71.4) |

M | 2/7 (28.6) |

```
## continuous variable
# all summary stats
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "summary"
)
```

Statistic | StatisticValue |

n | 7 |

Mean | 74.3 |

SD | 9.8 |

SE | 3.71 |

Median | 75.0 |

Min | 57 |

Max | 89 |

% | 100 |

m | 7 |

```
# median (range)
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "median (range)"
)
```

Median (range) |

75.0 (57,89) |

```
# median and (range) in a different line:
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "median\n(range)"
)
```

Median |

75.0 |

```
# mean (se)
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "mean (se)"
)
```

Mean (SE) |

74.3 (3.71) |

```
# mean (sd)
getSummaryStatisticsTable(
data = dataSL,
var = "AGE",
stats = "mean (sd)"
)
```

Mean (SD) |

74.3 (9.8) |

To change the formatting of the statistics, the `stats`

parameter should contain a language object (e.g. `expression`

or `call`

) of the default base set of statistics.

See the documentation in section ‘*Base statistics*’ for more details on the base statistics included by default, via:

`? `inTextSummaryTable-stats` `

For example, the following count table is restricted to the number of subjects per categories:

```
getSummaryStatisticsTable(
data = dataSL,
var = c("RACE", "SEX"),
stats = list(N = expression(statN))
)
```

Variable | N |

Variable group | |

RACE | |

BLACK OR AFRICAN AMERICAN | 1 |

WHITE | 6 |

SEX | |

F | 5 |

M | 2 |

The summary statistics table is restricted to the median and range:

```
getSummaryStatisticsTable(
data = dataSL,
var = c("AGE", "HEIGHTBL", "WEIGHTBL", "BMIBL"),
varGeneralLab = "Parameter", statsGeneralLab = "",
colVar = "TRT01P",
stats = list(
`median` = expression(statMedian),
`(min, max)` = expression(paste0("(", statMin, ",", statMax, ")"))
)
)
```

Parameter | Placebo | Xanomeline High Dose | Xanomeline Low Dose |

AGE | |||

median | 82 | 69 | 78 |

(min, max) | (75,89) | (57,74) | (76,80) |

HEIGHTBL | |||

median | 167.65 | 158.8 | 155.55 |

(min, max) | (157.5,177.8) | (154.9,175.3) | (151.1,160) |

WEIGHTBL | |||

median | 59.65 | 66.7 | 54.45 |

(min, max) | (47.2,72.1) | (51.7,87.1) | (45.4,63.5) |

BMIBL | |||

median | 20.9 | 27.8 | 22.75 |

(min, max) | (19,22.8) | (20.5,28.3) | (17.7,27.8) |

Note that the ‘Standard statistics set’ is formatted internally via the `getStatsData`

(and `getStats`

) functions, which creates consistently a list of `language`

objects.

```
# this count table:
getSummaryStatisticsTable(
data = dataSL,
var = "SEX",
stats = "count"
)
```