Summarize multiple response variables by group
select_group_tbl.Rd
select_group_tbl()
displays frequency counts and percentages (i.e.,
count and percent) for multiple response variables, including binary variables
(such as Unselected/Selected) and ordinal variables (such as responses ranging
from strongly disagree to strongly agree), that share a common variable stem,
grouped either by another variable in your dataset or by a matched pattern in the
variable names. A variable 'stem' is a shared naming pattern across related variables,
often representing repeated measures of the same concept or a series of items measuring
a single construct. Missing data are excluded using listwise
deletion by default
Usage
select_group_tbl(
data,
var_stem,
group,
escape_stem = FALSE,
ignore_stem_case = FALSE,
group_type = "variable",
group_name = NULL,
margins = "all",
escape_group = FALSE,
ignore_group_case = FALSE,
remove_group_non_alnum = TRUE,
na_removal = "listwise",
pivot = "longer",
only = NULL,
var_labels = NULL,
ignore = NULL
)
Arguments
- data
A data frame.
- var_stem
A character string of a variable stem or the full name of a variable in
data
.- group
A character string representing a variable name or a pattern used to search for variables in
data
.- escape_stem
A logical value indicating whether to escape
var_stem
. Default isFALSE
.- ignore_stem_case
A logical value indicating whether the search for columns matching the supplied
var_stem
is case-insensitive. Default isFALSE
.- group_type
A character string that defines how the
group
argument should be interpreted. Should be one ofpattern
orvariable
. Defaults tovariable
, which searches for a matching variable name indata
.- group_name
An optional character string used to rename the
group
column in the final table Whengroup_type
is set tovariable
, the column name defaults to the matched variable name fromdata.
When set topattern
, the default column name isgroup
.- margins
A character string that determines how percentage values are calculated; whether they sum to one across rows, columns, or the entire variable (i.e., all). Defaults to
all
, but can also be set torows
orcolumns
. Note: This argument only affects the final table whengroup_type
isvariable
.- escape_group
A logical value indicating whether to escape string supplied to
group
.- ignore_group_case
A logical value specifying whether the search for a grouping variable (if
group_type
isvariable
) or for variables matching a pattern (ifgroup_type
ispattern
) should be case-insensitive. Default isFALSE
. Set toTRUE
to ignore case.- remove_group_non_alnum
A logical value indicating whether to remove all non- alphanumeric characters (i.e., anything that is not a letter or number) from
group
. Default isTRUE
.- na_removal
A character string that specifies the method for handling missing values:
pairwise
orlistwise
. Defaults tolistwise
.- pivot
A character string that determines the format of the table. By default,
longer
returns the data in the long format. To receive the data in thewide
format, specifywider
.- only
A character string or vector of character strings of the types of summary data to return. Default is
NULL
, which returns both counts and percentages. To return only counts or percentages, usecount
orpercent
, respectively.- var_labels
An optional named character vector or list used to assign custom labels to variable names. Each element should be named and correspond to a variable in the returned table. If any element is unnamed or references a variable not returned in the table, all labels will be ignored and the table will be printed without them.
- ignore
An optional named vector or list that defines values to exclude from variables matching the specified variable stem and, if applicable, a grouping variable in
data.
If set toNULL
(default), all values are retained. To exclude values from variables identified byvar_stem
, use the stem name as the key. To exclude multiple values from bothvar_stem
variables and a grouping variable, supply a named list.
Value
A tibble showing the relative frequencies and/or percentages of multiple response variables sharing a common variable stem. The statistics are grouped either by a specified grouping variable within the dataset or by a matched pattern in the variable names.
Examples
select_group_tbl(data = stem_social_psych,
var_stem = "belong_belong",
group = "\\d",
group_type = "pattern",
group_name = "wave",
na_removal = "pairwise",
pivot = "wider",
only = "count")
#> # A tibble: 2 × 7
#> variable wave count_value_1 count_value_2 count_value_3 count_value_4
#> <chr> <chr> <int> <int> <int> <int>
#> 1 belong_belongSt… 1 22 50 144 264
#> 2 belong_belongSt… 2 12 12 48 125
#> # ℹ 1 more variable: count_value_5 <int>
tas_recoded <-
tas |>
dplyr::mutate(sex = dplyr::case_when(
sex == 1 ~ "female",
sex == 2 ~ "male",
TRUE ~ NA)) |>
dplyr::mutate(dplyr::across(
.cols = dplyr::starts_with("involved_"),
.fns = ~ dplyr::case_when(
.x == 1 ~ "selected",
.x == 0 ~ "unselected",
TRUE ~ NA)
))
select_group_tbl(data = tas_recoded,
var_stem = "involved_",
group = "sex",
group_type = "variable",
na_removal = "pairwise",
pivot = "wider")
#> # A tibble: 12 × 6
#> variable values count_sex_female count_sex_male percent_sex_female
#> <chr> <chr> <int> <int> <dbl>
#> 1 involved_arts selec… 212 187 0.0839
#> 2 involved_arts unsel… 998 1129 0.395
#> 3 involved_sports selec… 270 142 0.107
#> 4 involved_sports unsel… 940 1174 0.372
#> 5 involved_schoolClu… selec… 167 185 0.0674
#> 6 involved_schoolClu… unsel… 1021 1106 0.412
#> 7 involved_election selec… 531 717 0.233
#> 8 involved_election unsel… 538 490 0.236
#> 9 involved_socialAct… selec… 52 55 0.0206
#> 10 involved_socialAct… unsel… 1158 1261 0.458
#> 11 involved_volunteer selec… 370 424 0.146
#> 12 involved_volunteer unsel… 840 892 0.333
#> # ℹ 1 more variable: percent_sex_male <dbl>
depressive_recoded <-
depressive |>
dplyr::mutate(sex = dplyr::case_when(
sex == 1 ~ "male",
sex == 2 ~ "female",
TRUE ~ NA)) |>
dplyr::mutate(dplyr::across(
.cols = dplyr::starts_with("dep_"),
.fns = ~ dplyr::case_when(
.x == 1 ~ "often",
.x == 2 ~ "sometimes",
.x == 3 ~ "hardly",
TRUE ~ NA
)
))
select_group_tbl(data = depressive_recoded,
var_stem = "dep",
group = "sex",
group_type = "variable",
na_removal = "listwise",
pivot = "wider",
only = "percent",
var_labels =
c("dep_1" = "how often child feels sad and blue",
"dep_2" = "how often child feels nervous, tense, or on edge",
"dep_3" = "how often child feels happy",
"dep_4" = "how often child feels bored",
"dep_5" = "how often child feels lonely",
"dep_6" = "how often child feels tired or worn out",
"dep_7" = "how often child feels excited about something",
"dep_8" = "how often child feels too busy to get everything"))
#> # A tibble: 24 × 5
#> variable variable_label values percent_sex_female percent_sex_male
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 dep_1 how often child feels sa… hardly 0.230 0.274
#> 2 dep_1 how often child feels sa… often 0.0336 0.0342
#> 3 dep_1 how often child feels sa… somet… 0.227 0.202
#> 4 dep_2 how often child feels ne… hardly 0.217 0.229
#> 5 dep_2 how often child feels ne… often 0.0386 0.0510
#> 6 dep_2 how often child feels ne… somet… 0.234 0.230
#> 7 dep_3 how often child feels ha… hardly 0.0156 0.0174
#> 8 dep_3 how often child feels ha… often 0.368 0.355
#> 9 dep_3 how often child feels ha… somet… 0.106 0.138
#> 10 dep_4 how often child feels bo… hardly 0.0473 0.0585
#> # ℹ 14 more rows