Skip to contents

select_group_tbl() displays frequency counts and percentages (i.e., count and percent) for multiple response variables, including binary variables (such as Unselected/Selected) and ordinal variables (such as responses ranging from strongly disagree to strongly agree), that share a common variable stem, grouped either by another variable in your dataset or by a matched pattern in the variable names. A variable 'stem' is a shared naming pattern across related variables, often representing repeated measures of the same concept or a series of items measuring a single construct. Missing data are excluded using listwise deletion by default

Usage

select_group_tbl(
  data,
  var_stem,
  group,
  escape_stem = FALSE,
  ignore_stem_case = FALSE,
  group_type = "variable",
  group_name = NULL,
  margins = "all",
  escape_group = FALSE,
  ignore_group_case = FALSE,
  remove_group_non_alnum = TRUE,
  na_removal = "listwise",
  pivot = "longer",
  only = NULL,
  var_labels = NULL,
  ignore = NULL
)

Arguments

data

A data frame.

var_stem

A character string of a variable stem or the full name of a variable in data.

group

A character string representing a variable name or a pattern used to search for variables in data.

escape_stem

A logical value indicating whether to escape var_stem. Default is FALSE.

ignore_stem_case

A logical value indicating whether the search for columns matching the supplied var_stem is case-insensitive. Default is FALSE.

group_type

A character string that defines how the group argument should be interpreted. Should be one of pattern or variable. Defaults to variable, which searches for a matching variable name in data.

group_name

An optional character string used to rename the group column in the final table When group_type is set to variable, the column name defaults to the matched variable name from data. When set to pattern, the default column name is group.

margins

A character string that determines how percentage values are calculated; whether they sum to one across rows, columns, or the entire variable (i.e., all). Defaults to all, but can also be set to rows or columns. Note: This argument only affects the final table when group_type is variable.

escape_group

A logical value indicating whether to escape string supplied to group.

ignore_group_case

A logical value specifying whether the search for a grouping variable (if group_type is variable) or for variables matching a pattern (if group_type is pattern) should be case-insensitive. Default is FALSE. Set to TRUE to ignore case.

remove_group_non_alnum

A logical value indicating whether to remove all non- alphanumeric characters (i.e., anything that is not a letter or number) from group. Default is TRUE.

na_removal

A character string that specifies the method for handling missing values: pairwise or listwise. Defaults to listwise.

pivot

A character string that determines the format of the table. By default, longer returns the data in the long format. To receive the data in the wide format, specify wider.

only

A character string or vector of character strings of the types of summary data to return. Default is NULL, which returns both counts and percentages. To return only counts or percentages, use count or percent, respectively.

var_labels

An optional named character vector or list used to assign custom labels to variable names. Each element should be named and correspond to a variable in the returned table. If any element is unnamed or references a variable not returned in the table, all labels will be ignored and the table will be printed without them.

ignore

An optional named vector or list that defines values to exclude from variables matching the specified variable stem and, if applicable, a grouping variable in data. If set to NULL (default), all values are retained. To exclude values from variables identified by var_stem, use the stem name as the key. To exclude multiple values from both var_stem variables and a grouping variable, supply a named list.

Value

A tibble showing the relative frequencies and/or percentages of multiple response variables sharing a common variable stem. The statistics are grouped either by a specified grouping variable within the dataset or by a matched pattern in the variable names.

Author

Ama Nyame-Mensah

Examples

select_group_tbl(data = stem_social_psych,
                 var_stem = "belong_belong",
                 group = "\\d",
                 group_type = "pattern",
                 group_name = "wave",
                 na_removal = "pairwise",
                 pivot = "wider",
                 only = "count")
#> # A tibble: 2 × 7
#>   variable         wave  count_value_1 count_value_2 count_value_3 count_value_4
#>   <chr>            <chr>         <int>         <int>         <int>         <int>
#> 1 belong_belongSt… 1                22            50           144           264
#> 2 belong_belongSt… 2                12            12            48           125
#> # ℹ 1 more variable: count_value_5 <int>

tas_recoded <-
  tas |>
  dplyr::mutate(sex = dplyr::case_when(
    sex == 1 ~ "female",
    sex == 2 ~ "male",
    TRUE ~ NA)) |>
  dplyr::mutate(dplyr::across(
    .cols = dplyr::starts_with("involved_"),
    .fns = ~ dplyr::case_when(
      .x == 1 ~ "selected",
      .x == 0 ~ "unselected",
      TRUE ~ NA)
  ))

select_group_tbl(data = tas_recoded,
                 var_stem = "involved_",
                 group = "sex",
                 group_type = "variable",
                 na_removal = "pairwise",
                 pivot = "wider")
#> # A tibble: 12 × 6
#>    variable            values count_sex_female count_sex_male percent_sex_female
#>    <chr>               <chr>             <int>          <int>              <dbl>
#>  1 involved_arts       selec…              212            187             0.0839
#>  2 involved_arts       unsel…              998           1129             0.395 
#>  3 involved_sports     selec…              270            142             0.107 
#>  4 involved_sports     unsel…              940           1174             0.372 
#>  5 involved_schoolClu… selec…              167            185             0.0674
#>  6 involved_schoolClu… unsel…             1021           1106             0.412 
#>  7 involved_election   selec…              531            717             0.233 
#>  8 involved_election   unsel…              538            490             0.236 
#>  9 involved_socialAct… selec…               52             55             0.0206
#> 10 involved_socialAct… unsel…             1158           1261             0.458 
#> 11 involved_volunteer  selec…              370            424             0.146 
#> 12 involved_volunteer  unsel…              840            892             0.333 
#> # ℹ 1 more variable: percent_sex_male <dbl>

depressive_recoded <-
  depressive |>
  dplyr::mutate(sex = dplyr::case_when(
    sex == 1 ~ "male",
    sex == 2 ~ "female",
    TRUE ~ NA)) |>
  dplyr::mutate(dplyr::across(
    .cols = dplyr::starts_with("dep_"),
    .fns = ~ dplyr::case_when(
      .x == 1 ~ "often",
      .x == 2 ~ "sometimes",
      .x == 3 ~ "hardly",
      TRUE ~ NA
    )
  ))

select_group_tbl(data = depressive_recoded,
                 var_stem = "dep",
                 group = "sex",
                 group_type = "variable",
                 na_removal = "listwise",
                 pivot = "wider",
                 only = "percent",
                 var_labels =
                   c("dep_1" = "how often child feels sad and blue",
                     "dep_2" = "how often child feels nervous, tense, or on edge",
                     "dep_3" = "how often child feels happy",
                     "dep_4" = "how often child feels bored",
                     "dep_5" = "how often child feels lonely",
                     "dep_6" = "how often child feels tired or worn out",
                     "dep_7" = "how often child feels excited about something",
                     "dep_8" = "how often child feels too busy to get everything"))
#> # A tibble: 24 × 5
#>    variable variable_label            values percent_sex_female percent_sex_male
#>    <chr>    <chr>                     <chr>               <dbl>            <dbl>
#>  1 dep_1    how often child feels sa… hardly             0.230            0.274 
#>  2 dep_1    how often child feels sa… often              0.0336           0.0342
#>  3 dep_1    how often child feels sa… somet…             0.227            0.202 
#>  4 dep_2    how often child feels ne… hardly             0.217            0.229 
#>  5 dep_2    how often child feels ne… often              0.0386           0.0510
#>  6 dep_2    how often child feels ne… somet…             0.234            0.230 
#>  7 dep_3    how often child feels ha… hardly             0.0156           0.0174
#>  8 dep_3    how often child feels ha… often              0.368            0.355 
#>  9 dep_3    how often child feels ha… somet…             0.106            0.138 
#> 10 dep_4    how often child feels bo… hardly             0.0473           0.0585
#> # ℹ 14 more rows