Skip to contents

mean_group_tbl() calculates descriptive statistics (mean, standard deviation, minimum, maximum, and number of non-missing observations) for interval and ratio-level variables that share a common prefix (variable stem), grouped either by another variable in your dataset or by a matched pattern in the variable names. A variable 'stem' is a shared naming pattern across related variables, often representing repeated measures of the same concept or a series of items measuring a single construct. By default, missing data are excluded using listwise deletion.

Usage

mean_group_tbl(
  data,
  var_stem,
  group,
  escape_stem = FALSE,
  ignore_stem_case = FALSE,
  group_type = "variable",
  group_name = NULL,
  escape_group = FALSE,
  ignore_group_case = FALSE,
  remove_group_non_alnum = TRUE,
  na_removal = "listwise",
  only = NULL,
  var_labels = NULL,
  ignore = NULL
)

Arguments

data

A data frame.

var_stem

A character string of a variable stem or the full name of a variable in data.

group

A character string representing a variable name or a pattern used to search for variables in data.

escape_stem

A logical value indicating whether to escape var_stem. Default is FALSE.

ignore_stem_case

A logical value indicating whether the search for columns matching the supplied var_stem is case-insensitive. Default is FALSE.

group_type

A character string that defines how the group argument should be interpreted. Should be one of pattern or variable. Defaults to variable, which searches for a matching variable name in data.

group_name

An optional character string used to rename the group column in the final table. When group_type is set to variable, the column name defaults to the matched variable name from data. When set to pattern, the default column name is group.

escape_group

A logical value indicating whether to escape string supplied to group.

ignore_group_case

A logical value specifying whether the search for a grouping variable (if group_type is variable) or for variables matching a pattern (if group_type is pattern) should be case-insensitive. Default is FALSE. Set to TRUE to ignore case.

remove_group_non_alnum

A logical value indicating whether to remove all non- alphanumeric characters (i.e., anything that is not a letter or number) from group. Default is TRUE.

na_removal

A character string that specifies the method for handling missing values: pairwise or listwise. Defaults to listwise.

only

A character string or vector of character strings specifying which summary statistics to return. Defaults to NULL, which includes mean (mean), standard deviation (sd), minimum (min), maximum (max), and count of non-missing values (nobs).

var_labels

An optional named character vector or list used to assign custom labels to variable names. Each element should be named and correspond to a variable in the returned table. If any element is unnamed or references a variable not returned in the table, all labels will be ignored and the table will be printed without them.

ignore

An optional named vector or list that defines values to exclude from variables matching the specified variable stem and, if applicable, a grouping variable in data. If set to NULL (default), all values are retained. To exclude values from variables identified by var_stem, use the stem name as the key. To exclude multiple values from both var_stem variables and a grouping variable, supply a named list.

Value

A tibble presenting summary statistics for continuous variables that share a common stem in their names. The statistics are grouped either by a specified grouping variable within the dataset or by a matched pattern in the variable names.

Author

Ama Nyame-Mensah

Examples

sdoh_child_ages_region <- dplyr::select(sdoh, c(REGION, ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9,
                                                ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17))
mean_group_tbl(data = sdoh_child_ages_region,
               var_stem = "ACS_PCT_AGE",
               group = "REGION",
               group_name = "us_region",
               na_removal = "pairwise",
               var_labels = c(ACS_PCT_AGE_0_4 = "Percentage of population between ages 0-4",
                              ACS_PCT_AGE_5_9 = "Percentage of population between ages 5-9",
                              ACS_PCT_AGE_10_14 = "Percentage of population between ages 10-14",
                              ACS_PCT_AGE_15_17 = "Percentage of population between ages 15-17"))
#> # A tibble: 16 × 8
#>    variable          variable_label      us_region  mean    sd   min   max  nobs
#>    <chr>             <chr>               <chr>     <dbl> <dbl> <dbl> <dbl> <int>
#>  1 ACS_PCT_AGE_0_4   Percentage of popu… Midwest    5.90 1.13   2.4  12.0   1055
#>  2 ACS_PCT_AGE_0_4   Percentage of popu… Northeast  5.04 0.829  0.95  8.12   217
#>  3 ACS_PCT_AGE_0_4   Percentage of popu… South      5.76 1.26   0.98 18.4   1422
#>  4 ACS_PCT_AGE_0_4   Percentage of popu… West       5.80 1.67   0.23 13.8    449
#>  5 ACS_PCT_AGE_5_9   Percentage of popu… Midwest    6.17 1.18   0.95 12.9   1055
#>  6 ACS_PCT_AGE_5_9   Percentage of popu… Northeast  5.28 0.762  0.53  7.53   217
#>  7 ACS_PCT_AGE_5_9   Percentage of popu… South      5.99 1.24   0    14.9   1422
#>  8 ACS_PCT_AGE_5_9   Percentage of popu… West       6.23 1.78   0    12.2    449
#>  9 ACS_PCT_AGE_10_14 Percentage of popu… Midwest    6.48 1.15   1.71 11.6   1055
#> 10 ACS_PCT_AGE_10_14 Percentage of popu… Northeast  5.69 0.779  1.08  7.94   217
#> 11 ACS_PCT_AGE_10_14 Percentage of popu… South      6.48 1.23   0    13.6   1422
#> 12 ACS_PCT_AGE_10_14 Percentage of popu… West       6.46 1.62   0    11.6    449
#> 13 ACS_PCT_AGE_15_17 Percentage of popu… Midwest    3.94 0.635  0.64  7.83  1055
#> 14 ACS_PCT_AGE_15_17 Percentage of popu… Northeast  3.59 0.383  2.02  4.67   217
#> 15 ACS_PCT_AGE_15_17 Percentage of popu… South      3.86 0.747  0    11.9   1422
#> 16 ACS_PCT_AGE_15_17 Percentage of popu… West       3.80 0.985  0    11.6    449

grouped_data <-
  data.frame(
    symptoms.t1 = sample(c(0:10, -999), replace = TRUE, size = 50),
    symptoms.t2 = sample(c(NA, 0:10, -999), replace = TRUE, size = 50)
  )

mean_group_tbl(data = grouped_data,
               var_stem = "symptoms",
               group = ".t\\d",
               group_type = "pattern",
               escape_group = TRUE,
               na_removal = "listwise",
               ignore = c(symptoms = -999))
#> # A tibble: 2 × 7
#>   variable    group  mean    sd   min   max  nobs
#>   <chr>       <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 symptoms.t1 t1     4.84  3.18     0    10    37
#> 2 symptoms.t2 t2     4.76  3.09     0    10    37