R/mean_group_tbl.R
mean_group_tbl.Rdmean_group_tbl() calculates summary statistics (i.e.,
mean, median, standard deviation, minimum, maximum, and count of
non-missing values) for continuous (i.e., interval and ratio-level)
variables, grouped either by another variable in your dataset or by
a matched pattern in the variable names.
mean_group_tbl(
data,
var_stem,
group,
var_input = "stem",
regex_stem = FALSE,
ignore_stem_case = FALSE,
group_type = "variable",
group_name = NULL,
regex_group = FALSE,
ignore_group_case = FALSE,
remove_group_non_alnum = TRUE,
na_removal = "listwise",
only = NULL,
var_labels = NULL,
ignore = NULL
)A data frame.
A character vector with one or more elements, where each
represents either a variable stem or the complete name of a variable present
in data. A variable 'stem' refers to a common naming pattern shared among
related variables, typically reflecting repeated measures of the same idea
or a group of items assessing a single concept.
A character string representing a variable name or a pattern
used to search for variables in data.
A character string specifying whether the values supplied
to var_stem should be treated as variable stems (stem) or as complete
variable names (name). By default, this is set to stem, so the function
searches for variables that begin with each stem provided. Setting this
argument to name directs the function to look for variables that exactly
match the provided names.
A logical value indicating whether to use Perl-compatible
regular expressions when searching for variable stems. Default is FALSE.
A logical value indicating whether the search for
columns matching the supplied var_stem is case-insensitive. Default is
FALSE.
A character string that defines how the group argument
should be interpreted. Should be one of pattern or variable. Defaults
to variable, which searches for a matching variable name in data.
An optional character string used to rename the group
column in the final table When group_type is set to variable, the column
name defaults to the matched variable name from data. When set to pattern,
the default column name is group.
A logical value indicating whether to use Perl-compatible
regular expressions when searching for group variables or matching variable
name patterns. Default is FALSE.
A logical value specifying whether the search for a
grouping variable (if group_type is variable) or for variables matching a
pattern (if group_type is pattern) should be case-insensitive. Default is
FALSE. Set to TRUE to ignore case.
A logical value indicating whether to remove
all non-alphanumeric characters (i.e., anything that is not a letter or
number) from group. Default is TRUE.
A character string specifying how missing values are
handled. Must be one of listwise or pairwise. Defaults to listwise.
listwise: Removes any row that has at least one missing value
across all variables returned or analyzed. (Effectively uses complete cases
only.)
pairwise: Handles missing values per variable or per pair of variables,
using all available data, even if other variables in the row have missing
values.
A character string or vector of character strings specifying
which summary statistics to return. Defaults to NULL, which includes mean
(mean), median (median) standard deviation (sd), minimum (min), maximum
(max), and count of non-missing values (nobs).
An optional named character vector or list used to assign
custom labels to variable names. Each element must be named and correspond
to a variable included in the returned table. If var_input is set to stem,
and any element is either unnamed or refers to a variable not present in the
table, all labels will be ignored and the table will be printed without them.
An optional named vector or list indicating values to exclude
from variables matching specified stems (or names), and, if applicable, from
a grouping variable in data. Defaults to NULL, indicating that all values
are retained. To specify exclusions for variables identified by var_stem,
use the corresponding stems or variable names as names in the vector or list.
To exclude multiple values from these variables or a grouping variable, supply
them as a named list.
A tibble showing summary statistics for continuous variables, grouped either by a specified variable in the dataset or by matching patterns in variable names.
sdoh_child_ages_region <-
dplyr::select(sdoh, c(REGION, ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9,
ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17))
mean_group_tbl(data = sdoh_child_ages_region,
var_stem = "ACS_PCT_AGE",
group = "REGION",
group_name = "us_region",
na_removal = "pairwise",
var_labels = c(
ACS_PCT_AGE_0_4 = "% of population between ages 0-4",
ACS_PCT_AGE_5_9 = "% of population between ages 5-9",
ACS_PCT_AGE_10_14 = "% of population between ages 10-14",
ACS_PCT_AGE_15_17 = "% of population between ages 15-17"))
#> # A tibble: 16 × 9
#> variable variable_label us_region mean median sd min max nobs
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 ACS_PCT_AGE_0_4 % of populati… Midwest 5.90 5.81 1.13 2.4 12.0 1055
#> 2 ACS_PCT_AGE_0_4 % of populati… Northeast 5.04 5 0.829 0.95 8.12 217
#> 3 ACS_PCT_AGE_0_4 % of populati… South 5.76 5.78 1.26 0.98 18.4 1422
#> 4 ACS_PCT_AGE_0_4 % of populati… West 5.80 5.71 1.67 0.23 13.8 449
#> 5 ACS_PCT_AGE_5_9 % of populati… Midwest 6.17 6.11 1.18 0.95 12.9 1055
#> 6 ACS_PCT_AGE_5_9 % of populati… Northeast 5.28 5.35 0.762 0.53 7.53 217
#> 7 ACS_PCT_AGE_5_9 % of populati… South 5.99 6.03 1.24 0 14.9 1422
#> 8 ACS_PCT_AGE_5_9 % of populati… West 6.23 6.1 1.78 0 12.2 449
#> 9 ACS_PCT_AGE_10… % of populati… Midwest 6.48 6.49 1.15 1.71 11.6 1055
#> 10 ACS_PCT_AGE_10… % of populati… Northeast 5.69 5.77 0.779 1.08 7.94 217
#> 11 ACS_PCT_AGE_10… % of populati… South 6.48 6.48 1.23 0 13.6 1422
#> 12 ACS_PCT_AGE_10… % of populati… West 6.46 6.29 1.62 0 11.6 449
#> 13 ACS_PCT_AGE_15… % of populati… Midwest 3.94 3.94 0.635 0.64 7.83 1055
#> 14 ACS_PCT_AGE_15… % of populati… Northeast 3.59 3.61 0.383 2.02 4.67 217
#> 15 ACS_PCT_AGE_15… % of populati… South 3.86 3.88 0.747 0 11.9 1422
#> 16 ACS_PCT_AGE_15… % of populati… West 3.80 3.78 0.985 0 11.6 449
set.seed(0222)
grouped_data <-
data.frame(
symptoms.t1 = sample(c(0:10, -999), replace = TRUE, size = 50),
symptoms.t2 = sample(c(NA, 0:10, -999), replace = TRUE, size = 50)
)
mean_group_tbl(data = grouped_data,
var_stem = "symptoms",
group = ".t\\d",
group_type = "pattern",
na_removal = "listwise",
ignore = c(symptoms = -999))
#> # A tibble: 2 × 8
#> variable group mean median sd min max nobs
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 symptoms.t1 t1 5.51 6 3.19 0 10 37
#> 2 symptoms.t2 t2 4.95 5 2.97 0 10 37