Summarize continuous variables — mean

mean_tbl() calculates summary statistics (i.e., mean, median, standard deviation, minimum, maximum, and count of non-missing values) for continuous (i.e., interval and ratio-level) variables.

mean_tbl(
  data,
  var_stem,
  var_input = "stem",
  regex_stem = FALSE,
  ignore_stem_case = FALSE,
  na_removal = "listwise",
  only = NULL,
  var_labels = NULL,
  ignore = NULL
)

Arguments

data

A data frame.

var_stem

A character vector with one or more elements, where each represents either a variable stem or the complete name of a variable present in data. A variable 'stem' refers to a common naming pattern shared among related variables, typically reflecting repeated measures of the same idea or a group of items assessing a single concept.

var_input

A character string specifying whether the values supplied to var_stem should be treated as variable stems (stem) or as complete variable names (name). By default, this is set to stem, so the function searches for variables that begin with each stem provided. Setting this argument to name directs the function to look for variables that exactly match the provided names.

regex_stem

A logical value indicating whether to use Perl-compatible regular expressions when searching for variable stems. Default is FALSE.

ignore_stem_case

A logical value indicating whether the search for columns matching the supplied var_stem is case-insensitive. Default is FALSE.

na_removal

A character string specifying how missing values are handled. Must be one of listwise or pairwise. Defaults to listwise.

listwise: Removes any row that has at least one missing value across all variables returned or analyzed. (Effectively uses complete cases only.)
pairwise: Handles missing values per variable or per pair of variables, using all available data, even if other variables in the row have missing values.

only

A character string or vector of character strings specifying which summary statistics to return. Defaults to NULL, which includes mean (mean), median (median) standard deviation (sd), minimum (min), maximum (max), and count of non-missing values (nobs).

var_labels

An optional named character vector or list used to assign custom labels to variable names. Each element must be named and correspond to a variable included in the returned table. If var_input is set to stem, and any element is either unnamed or refers to a variable not present in the table, all labels will be ignored and the table will be printed without them.

ignore

An optional named vector or list indicating values to exclude from variables matching specified stems (or names). Defaults to NULL, indicating that all values are retained. To specify exclusions for variables identified by var_stem, use the corresponding stems or variable names as names in the vector or list. To exclude multiple values from these variables, supply them as a named list.

Value

A tibble showing summary statistics for continuous variables.

Author

Ama Nyame-Mensah

Examples

sdoh_child_ages <- 
  dplyr::select(sdoh, c(ACS_PCT_AGE_0_4, ACS_PCT_AGE_5_9,
                        ACS_PCT_AGE_10_14, ACS_PCT_AGE_15_17))

mean_tbl(data = sdoh_child_ages, var_stem = "ACS_PCT_AGE")
#> # A tibble: 4 × 7
#>   variable           mean median    sd   min   max  nobs
#>   <chr>             <dbl>  <dbl> <dbl> <dbl> <dbl> <int>
#> 1 ACS_PCT_AGE_0_4    5.72   5.71 1.29   0.23  18.4  3221
#> 2 ACS_PCT_AGE_5_9    6.01   5.98 1.31   0     14.9  3221
#> 3 ACS_PCT_AGE_10_14  6.42   6.39 1.25   0     13.6  3221
#> 4 ACS_PCT_AGE_15_17  3.86   3.86 0.730  0     11.9  3221

mean_tbl(data = sdoh_child_ages,
         var_stem = "ACS_PCT_AGE",
         na_removal = "pairwise",
         var_labels = c(
           ACS_PCT_AGE_0_4 = "% of population between ages 0-4",
           ACS_PCT_AGE_5_9 = "% of population between ages 5-9",
           ACS_PCT_AGE_10_14 = "% of population between ages 10-14",
           ACS_PCT_AGE_15_17 = "% of population between ages 15-17"))
#> # A tibble: 4 × 8
#>   variable          variable_label           mean median    sd   min   max  nobs
#>   <chr>             <chr>                   <dbl>  <dbl> <dbl> <dbl> <dbl> <int>
#> 1 ACS_PCT_AGE_0_4   % of population betwee…  5.72   5.71 1.29   0.23  18.4  3221
#> 2 ACS_PCT_AGE_5_9   % of population betwee…  6.01   5.98 1.31   0     14.9  3221
#> 3 ACS_PCT_AGE_10_14 % of population betwee…  6.42   6.39 1.25   0     13.6  3221
#> 4 ACS_PCT_AGE_15_17 % of population betwee…  3.86   3.86 0.730  0     11.9  3221