Summarize continuous variables by group
mean_group_tbl.Rd
mean_group_tbl()
calculates descriptive statistics (mean, standard
deviation, minimum, maximum, and number of non-missing observations) for interval
and ratio-level variables that share a common prefix (variable stem), grouped either
by another variable in your dataset or by a matched pattern in the variable names. A
variable 'stem' is a shared naming pattern across related variables, often representing
repeated measures of the same concept or a series of items measuring a single construct.
By default, missing data are excluded using listwise
deletion.
Usage
mean_group_tbl(
data,
var_stem,
group,
escape_stem = FALSE,
ignore_stem_case = FALSE,
group_type = "variable",
group_name = NULL,
escape_group = FALSE,
ignore_group_case = FALSE,
remove_group_non_alnum = TRUE,
na_removal = "listwise",
only = NULL,
var_labels = NULL,
ignore = NULL
)
Arguments
- data
A data frame.
- var_stem
A character string of a variable stem or the full name of a variable in
data
.- group
A character string representing a variable name or a pattern used to search for variables in
data
.- escape_stem
A logical value indicating whether to escape
var_stem
. Default isFALSE
.- ignore_stem_case
A logical value indicating whether the search for columns matching the supplied
var_stem
is case-insensitive. Default isFALSE
.- group_type
A character string that defines how the
group
argument should be interpreted. Should be one ofpattern
orvariable
. Defaults tovariable
, which searches for a matching variable name indata
.- group_name
An optional character string used to rename the
group
column in the final table. Whengroup_type
is set tovariable
, the column name defaults to the matched variable name fromdata.
When set topattern
, the default column name isgroup
.- escape_group
A logical value indicating whether to escape string supplied to
group
.- ignore_group_case
A logical value specifying whether the search for a grouping variable (if
group_type
isvariable
) or for variables matching a pattern (ifgroup_type
ispattern
) should be case-insensitive. Default isFALSE
. Set toTRUE
to ignore case.- remove_group_non_alnum
A logical value indicating whether to remove all non- alphanumeric characters (i.e., anything that is not a letter or number) from
group
. Default isTRUE
.- na_removal
A character string that specifies the method for handling missing values:
pairwise
orlistwise
. Defaults tolistwise
.- only
A character string or vector of character strings specifying which summary statistics to return. Defaults to NULL, which includes mean (mean), standard deviation (sd), minimum (min), maximum (max), and count of non-missing values (nobs).
- var_labels
An optional named character vector or list used to assign custom labels to variable names. Each element should be named and correspond to a variable in the returned table. If any element is unnamed or references a variable not returned in the table, all labels will be ignored and the table will be printed without them.
- ignore
An optional named vector or list that defines values to exclude from variables matching the specified variable stem and, if applicable, a grouping variable in
data.
If set toNULL
(default), all values are retained. To exclude values from variables identified byvar_stem
, use the stem name as the key. To exclude multiple values from bothvar_stem
variables and a grouping variable, supply a named list.
Value
A tibble presenting summary statistics for continuous variables that share a common stem in their names. The statistics are grouped either by a specified grouping variable within the dataset or by a matched pattern in the variable names.
Examples
sdoh_child_ages_region <- dplyr::select(sdoh, c(REGION, ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9,
ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17))
mean_group_tbl(data = sdoh_child_ages_region,
var_stem = "ACS_PCT_AGE",
group = "REGION",
group_name = "us_region",
na_removal = "pairwise",
var_labels = c(ACS_PCT_AGE_0_4 = "Percentage of population between ages 0-4",
ACS_PCT_AGE_5_9 = "Percentage of population between ages 5-9",
ACS_PCT_AGE_10_14 = "Percentage of population between ages 10-14",
ACS_PCT_AGE_15_17 = "Percentage of population between ages 15-17"))
#> # A tibble: 16 × 8
#> variable variable_label us_region mean sd min max nobs
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 ACS_PCT_AGE_0_4 Percentage of popu… Midwest 5.90 1.13 2.4 12.0 1055
#> 2 ACS_PCT_AGE_0_4 Percentage of popu… Northeast 5.04 0.829 0.95 8.12 217
#> 3 ACS_PCT_AGE_0_4 Percentage of popu… South 5.76 1.26 0.98 18.4 1422
#> 4 ACS_PCT_AGE_0_4 Percentage of popu… West 5.80 1.67 0.23 13.8 449
#> 5 ACS_PCT_AGE_5_9 Percentage of popu… Midwest 6.17 1.18 0.95 12.9 1055
#> 6 ACS_PCT_AGE_5_9 Percentage of popu… Northeast 5.28 0.762 0.53 7.53 217
#> 7 ACS_PCT_AGE_5_9 Percentage of popu… South 5.99 1.24 0 14.9 1422
#> 8 ACS_PCT_AGE_5_9 Percentage of popu… West 6.23 1.78 0 12.2 449
#> 9 ACS_PCT_AGE_10_14 Percentage of popu… Midwest 6.48 1.15 1.71 11.6 1055
#> 10 ACS_PCT_AGE_10_14 Percentage of popu… Northeast 5.69 0.779 1.08 7.94 217
#> 11 ACS_PCT_AGE_10_14 Percentage of popu… South 6.48 1.23 0 13.6 1422
#> 12 ACS_PCT_AGE_10_14 Percentage of popu… West 6.46 1.62 0 11.6 449
#> 13 ACS_PCT_AGE_15_17 Percentage of popu… Midwest 3.94 0.635 0.64 7.83 1055
#> 14 ACS_PCT_AGE_15_17 Percentage of popu… Northeast 3.59 0.383 2.02 4.67 217
#> 15 ACS_PCT_AGE_15_17 Percentage of popu… South 3.86 0.747 0 11.9 1422
#> 16 ACS_PCT_AGE_15_17 Percentage of popu… West 3.80 0.985 0 11.6 449
grouped_data <-
data.frame(
symptoms.t1 = sample(c(0:10, -999), replace = TRUE, size = 50),
symptoms.t2 = sample(c(NA, 0:10, -999), replace = TRUE, size = 50)
)
mean_group_tbl(data = grouped_data,
var_stem = "symptoms",
group = ".t\\d",
group_type = "pattern",
escape_group = TRUE,
na_removal = "listwise",
ignore = c(symptoms = -999))
#> # A tibble: 2 × 7
#> variable group mean sd min max nobs
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 symptoms.t1 t1 4.84 3.18 0 10 37
#> 2 symptoms.t2 t2 4.76 3.09 0 10 37