Skip to contents

cat_group_tbl() summarizes nominal or categorical variables by a grouping variable, returning frequency counts and percentages. It supports long or wide output formats, handles missing data, and allows percentage calculations across rows, columns, or the full table.

Usage

cat_group_tbl(
  data,
  row_var,
  col_var,
  margins = "all",
  na.rm.row_var = FALSE,
  na.rm.col_var = FALSE,
  pivot = "longer",
  only = NULL,
  ignore = NULL
)

Arguments

data

A data frame.

row_var

A character string of the name of a variable in data containing categorical data. This is the primary categorical variable.

col_var

A character string of the name of a variable in data containing categorical data.

margins

A character string that determines how percentage values are calculated; whether they sum to one across rows, columns, or the entire table (i.e., all). Defaults to all, but can also be set to rows or columns.

na.rm.row_var

A logical value indicating whether missing values for row_var should be removed before calculations. Default is FALSE.

na.rm.col_var

A logical value indicating whether missing values for col_var should be removed before calculations. Default is FALSE.

pivot

A character string that determines the format of the table. By default, longer returns the data in the long format. To return the data in the wide format, specify wider.

only

A character string or vector of strings indicating the types of summary data to return. The default is NULL, which includes both counts and percentages. To return only one type, specify count or percent.

ignore

An optional named vector or list that defines values to exclude from row_var and col_var. If set to NULL (default), all values are retained. To exclude multiple values from both row_var and col_var, supply a named list.

Value

A tibble showing relative frequencies and/or percentages of row_var by col_var.

Author

Ama Nyame-Mensah

Examples

cat_group_tbl(data = nlsy,
              row_var = "gender",
              col_var = "bthwht",
              pivot = "wider",
              only = "count")
#> # A tibble: 2 × 3
#>   gender count_bthwht_0 count_bthwht_1
#>    <dbl>          <int>          <int>
#> 1      0           1340            123
#> 2      1           1409            104

cat_group_tbl(data = nlsy,
              row_var = "birthord",
              col_var = "breastfed",
              pivot = "longer")
#> # A tibble: 16 × 4
#>    birthord breastfed count  percent
#>       <dbl>     <dbl> <int>    <dbl>
#>  1        1         0   431 0.145   
#>  2        1         1   614 0.206   
#>  3        2         0   573 0.193   
#>  4        2         1   499 0.168   
#>  5        3         0   319 0.107   
#>  6        3         1   242 0.0813  
#>  7        4         0   115 0.0386  
#>  8        4         1    77 0.0259  
#>  9        5         0    49 0.0165  
#> 10        5         1    23 0.00773 
#> 11        6         0    13 0.00437 
#> 12        6         1    11 0.00370 
#> 13        7         0     7 0.00235 
#> 14        7         1     1 0.000336
#> 15        8         0     1 0.000336
#> 16        8         1     1 0.000336