R/select_group_tbl.R
select_group_tbl.Rdselect_group_tbl() displays frequency counts and
percentages for multiple response variables (e.g., a series of
questions where participants answer "Yes" or "No" to each item) as
well as ordinal variables (such as Likert or Likert-type items with
responses ranging from "Strongly Disagree" to "Strongly Agree", where
respondents select one response per statement, question, or item),
grouped either by another variable in your dataset or by a matched
pattern in the variable names.
select_group_tbl(
data,
var_stem,
group,
var_input = "stem",
regex_stem = FALSE,
ignore_stem_case = FALSE,
group_type = "variable",
group_name = NULL,
margins = "all",
regex_group = FALSE,
ignore_group_case = FALSE,
remove_group_non_alnum = TRUE,
na_removal = "listwise",
pivot = "longer",
only = NULL,
var_labels = NULL,
ignore = NULL,
force_pivot = FALSE
)A data frame.
A character vector with one or more elements, where each
represents either a variable stem or the complete name of a variable present
in data. A variable 'stem' refers to a common naming pattern shared among
related variables, typically reflecting repeated measures of the same idea
or a group of items assessing a single concept.
A character string representing a variable name or a pattern
used to search for variables in data.
A character string specifying whether the values supplied
to var_stem should be treated as variable stems (stem) or as complete
variable names (name). By default, this is set to stem, so the function
searches for variables that begin with each stem provided. Setting this
argument to name directs the function to look for variables that exactly
match the provided names.
A logical value indicating whether to use Perl-compatible
regular expressions when searching for variable stems. Default is FALSE.
A logical value indicating whether the search for
columns matching the supplied var_stem is case-insensitive. Default is
FALSE.
A character string that defines how the group argument
should be interpreted. Should be one of pattern or variable. Defaults to
variable, which searches for a matching variable name in data.
An optional character string used to rename the group
column in the final table When group_type is set to variable, the column
name defaults to the matched variable name from data. When set to pattern,
the default column name is group.
A character string that determines how percentage values are
calculated; whether they sum to one across rows, columns, or the entire
variable (i.e., all). Defaults to all, but can also be set to rows or
columns. Note: This argument only affects the final table when group_type
is variable.
A logical value indicating whether to use Perl-compatible
regular expressions when searching for group variables or matching variable
name patterns. Default is FALSE.
A logical value specifying whether the search for a
grouping variable (if group_type is variable) or for variables matching a
pattern (if group_type is pattern) should be case-insensitive. Default is
FALSE. Set to TRUE to ignore case.
A logical value indicating whether to remove
all non-alphanumeric characters (i.e., anything that is not a letter or
number) from group. Default is TRUE.
A character string specifying how missing values are
handled. Must be one of listwise or pairwise. Defaults to listwise.
listwise: Removes any row that has at least one missing value
across all variables returned or analyzed. (Effectively uses complete cases
only.)
pairwise: Handles missing values per variable or per pair of variables,
using all available data, even if other variables in the row have missing
values.
A character string that determines the format of the table. By
default, longer returns the data in the long format. To return the data in
the wide format, specify wider.
A character string or vector of character strings of the types of
summary data to return. Default is NULL, which returns both counts and
percentages. To return only counts or percentages, use count or percent,
respectively.
An optional named character vector or list used to assign
custom labels to variable names. Each element must be named and correspond
to a variable included in the returned table. If var_input is set to stem,
and any element is either unnamed or refers to a variable not present in the
table, all labels will be ignored and the table will be printed without them.
An optional named vector or list indicating values to exclude
from variables matching specified stems (or names), and, if applicable, from a
grouping variable in data. Defaults to NULL, indicating that all values are
retained. To specify exclusions for variables identified by var_stem, use the
corresponding stems or variable names as names in the vector or list. To exclude
multiple values from these variables or a grouping variable, supply them as a
named list.
A logical value that enables pivoting to the 'wider' format
even when variables have inconsistent value sets. By default, this is set to
FALSE to prevent reshaping errors when values differ across variables in the
returned table. Set to TRUE to override this safeguard and pivot to the
'wider' format regardless of value inconsistencies.
A tibble displaying the count and percentage for each category in a multiple response variable, grouped either by a specified variable in the dataset or by matching patterns in variable names.
select_group_tbl(data = stem_social_psych,
var_stem = "belong_belong",
group = "\\d",
group_type = "pattern",
group_name = "wave",
na_removal = "pairwise",
pivot = "wider",
only = "count")
#> # A tibble: 2 × 7
#> variable wave count_value_1 count_value_2 count_value_3 count_value_4
#> <chr> <chr> <int> <int> <int> <int>
#> 1 belong_belongSt… 1 22 50 144 264
#> 2 belong_belongSt… 2 12 12 48 125
#> # ℹ 1 more variable: count_value_5 <int>
tas_recoded <-
tas |>
dplyr::mutate(sex = dplyr::case_when(
sex == 1 ~ "female",
sex == 2 ~ "male",
TRUE ~ NA)) |>
dplyr::mutate(dplyr::across(
.cols = dplyr::starts_with("involved_"),
.fns = ~ dplyr::case_when(
.x == 1 ~ "selected",
.x == 0 ~ "unselected",
TRUE ~ NA)
))
select_group_tbl(data = tas_recoded,
var_stem = "involved_",
group = "sex",
group_type = "variable",
na_removal = "pairwise",
pivot = "wider")
#> # A tibble: 12 × 6
#> variable values count_sex_female count_sex_male percent_sex_female
#> <chr> <chr> <int> <int> <dbl>
#> 1 involved_arts selec… 212 187 0.0839
#> 2 involved_arts unsel… 998 1129 0.395
#> 3 involved_sports selec… 270 142 0.107
#> 4 involved_sports unsel… 940 1174 0.372
#> 5 involved_schoolClu… selec… 167 185 0.0674
#> 6 involved_schoolClu… unsel… 1021 1106 0.412
#> 7 involved_election selec… 531 717 0.233
#> 8 involved_election unsel… 538 490 0.236
#> 9 involved_socialAct… selec… 52 55 0.0206
#> 10 involved_socialAct… unsel… 1158 1261 0.458
#> 11 involved_volunteer selec… 370 424 0.146
#> 12 involved_volunteer unsel… 840 892 0.333
#> # ℹ 1 more variable: percent_sex_male <dbl>
depressive_recoded <-
depressive |>
dplyr::mutate(sex = dplyr::case_when(
sex == 1 ~ "male",
sex == 2 ~ "female",
TRUE ~ NA)) |>
dplyr::mutate(dplyr::across(
.cols = dplyr::starts_with("dep_"),
.fns = ~ dplyr::case_when(
.x == 1 ~ "often",
.x == 2 ~ "sometimes",
.x == 3 ~ "hardly",
TRUE ~ NA
)
))
select_group_tbl(data = depressive_recoded,
var_stem = "dep",
group = "sex",
group_type = "variable",
na_removal = "listwise",
pivot = "wider",
only = "percent",
var_labels =
c("dep_1" = "how often child feels sad and blue",
"dep_2" = "how often child feels nervous, tense, or on edge",
"dep_3" = "how often child feels happy",
"dep_4" = "how often child feels bored",
"dep_5" = "how often child feels lonely",
"dep_6" = "how often child feels tired or worn out",
"dep_7" = "how often child feels excited about something",
"dep_8" = "how often child feels too busy to get everything"))
#> # A tibble: 24 × 5
#> variable variable_label values percent_sex_female percent_sex_male
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 dep_1 how often child feels sa… hardly 0.230 0.274
#> 2 dep_1 how often child feels sa… often 0.0336 0.0342
#> 3 dep_1 how often child feels sa… somet… 0.227 0.202
#> 4 dep_2 how often child feels ne… hardly 0.217 0.229
#> 5 dep_2 how often child feels ne… often 0.0386 0.0510
#> 6 dep_2 how often child feels ne… somet… 0.234 0.230
#> 7 dep_3 how often child feels ha… hardly 0.0156 0.0174
#> 8 dep_3 how often child feels ha… often 0.368 0.355
#> 9 dep_3 how often child feels ha… somet… 0.106 0.138
#> 10 dep_4 how often child feels bo… hardly 0.0473 0.0585
#> # ℹ 14 more rows