mean_tbl() calculates summary statistics (i.e., mean,
median, standard deviation, minimum, maximum, and count of non-missing
values) for continuous (i.e., interval and ratio-level) variables.
mean_tbl(
data,
var_stem,
var_input = "stem",
regex_stem = FALSE,
ignore_stem_case = FALSE,
na_removal = "listwise",
only = NULL,
var_labels = NULL,
ignore = NULL
)A data frame.
A character vector with one or more elements, where each
represents either a variable stem or the complete name of a variable present
in data. A variable 'stem' refers to a common naming pattern shared among
related variables, typically reflecting repeated measures of the same idea
or a group of items assessing a single concept.
A character string specifying whether the values supplied
to var_stem should be treated as variable stems (stem) or as complete
variable names (name). By default, this is set to stem, so the function
searches for variables that begin with each stem provided. Setting this
argument to name directs the function to look for variables that exactly
match the provided names.
A logical value indicating whether to use Perl-compatible
regular expressions when searching for variable stems. Default is FALSE.
A logical value indicating whether the search for
columns matching the supplied var_stem is case-insensitive. Default is
FALSE.
A character string specifying how missing values are
handled. Must be one of listwise or pairwise. Defaults to listwise.
listwise: Removes any row that has at least one missing value
across all variables returned or analyzed. (Effectively uses complete cases
only.)
pairwise: Handles missing values per variable or per pair of variables,
using all available data, even if other variables in the row have missing
values.
A character string or vector of character strings specifying
which summary statistics to return. Defaults to NULL, which includes mean
(mean), median (median) standard deviation (sd), minimum (min), maximum
(max), and count of non-missing values (nobs).
An optional named character vector or list used to assign
custom labels to variable names. Each element must be named and correspond
to a variable included in the returned table. If var_input is set to stem,
and any element is either unnamed or refers to a variable not present in the
table, all labels will be ignored and the table will be printed without them.
An optional named vector or list indicating values to exclude
from variables matching specified stems (or names). Defaults to NULL,
indicating that all values are retained. To specify exclusions for variables
identified by var_stem, use the corresponding stems or variable names as
names in the vector or list. To exclude multiple values from these variables,
supply them as a named list.
A tibble showing summary statistics for continuous variables.
sdoh_child_ages <-
dplyr::select(sdoh, c(ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9,
ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17))
mean_tbl(data = sdoh_child_ages, var_stem = "ACS_PCT_AGE")
#> # A tibble: 4 × 7
#> variable mean median sd min max nobs
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 ACS_PCT_AGE_0_4 5.72 5.71 1.29 0.23 18.4 3221
#> 2 ACS_PCT_AGE_5_9 6.01 5.98 1.31 0 14.9 3221
#> 3 ACS_PCT_AGE_10_14 6.42 6.39 1.25 0 13.6 3221
#> 4 ACS_PCT_AGE_15_17 3.86 3.86 0.730 0 11.9 3221
mean_tbl(data = sdoh_child_ages,
var_stem = "ACS_PCT_AGE",
na_removal = "pairwise",
var_labels = c(
ACS_PCT_AGE_0_4 = "% of population between ages 0-4",
ACS_PCT_AGE_5_9 = "% of population between ages 5-9",
ACS_PCT_AGE_10_14 = "% of population between ages 10-14",
ACS_PCT_AGE_15_17 = "% of population between ages 15-17"))
#> # A tibble: 4 × 8
#> variable variable_label mean median sd min max nobs
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 ACS_PCT_AGE_0_4 % of population betwee… 5.72 5.71 1.29 0.23 18.4 3221
#> 2 ACS_PCT_AGE_5_9 % of population betwee… 6.01 5.98 1.31 0 14.9 3221
#> 3 ACS_PCT_AGE_10_14 % of population betwee… 6.42 6.39 1.25 0 13.6 3221
#> 4 ACS_PCT_AGE_15_17 % of population betwee… 3.86 3.86 0.730 0 11.9 3221