Summarize two categorical variables — cat_group

cat_group_tbl() summarizes nominal or categorical variables by a grouping variable, returning frequency counts and percentages.

cat_group_tbl(
  data,
  row_var,
  col_var,
  margins = "all",
  na.rm.row_var = FALSE,
  na.rm.col_var = FALSE,
  pivot = "longer",
  only = NULL,
  ignore = NULL
)

Arguments

data: A data frame.
row_var: A character string of the name of a variable in data containing categorical data. This is the primary categorical variable.
col_var: A character string of the name of a variable in data containing categorical data. This is the secondary categorical variable.
margins: A character string that determines how percentage values are calculated; whether they sum to one across rows, columns, or the entire table (i.e., all). Defaults to all, but can also be set to rows or columns.
na.rm.row_var: A logical value indicating whether missing values for row_var should be removed before calculations. Default is FALSE.
na.rm.col_var: A logical value indicating whether missing values for col_var should be removed before calculations. Default is FALSE.
pivot: A character string that determines the format of the table. By default, longer returns the data in the long format. To return the data in the wide format, specify wider.
only: A character string or vector of character strings of the types of summary data to return. Default is NULL, which returns both counts and percentages. To return only counts or percentages, use count or percent, respectively.
ignore: An optional named vector or list that defines values to exclude from row_var and col_var. If set to NULL (default), all values are retained. To exclude multiple values from row_var or col_var, provide them as a named list.

Value

A tibble showing the count and percentage of each category in row_var by each category in col_var.

Author

Ama Nyame-Mensah

Examples

cat_group_tbl(data = nlsy,
              row_var = "gender",
              col_var = "bthwht",
              pivot = "wider",
              only = "count")
#> # A tibble: 2 × 3
#>   gender count_bthwht_0 count_bthwht_1
#>    <dbl>          <int>          <int>
#> 1      0           1340            123
#> 2      1           1409            104

cat_group_tbl(data = nlsy,
              row_var = "birthord",
              col_var = "breastfed",
              pivot = "longer")
#> # A tibble: 16 × 4
#>    birthord breastfed count  percent
#>       <dbl>     <dbl> <int>    <dbl>
#>  1        1         0   431 0.145   
#>  2        1         1   614 0.206   
#>  3        2         0   573 0.193   
#>  4        2         1   499 0.168   
#>  5        3         0   319 0.107   
#>  6        3         1   242 0.0813  
#>  7        4         0   115 0.0386  
#>  8        4         1    77 0.0259  
#>  9        5         0    49 0.0165  
#> 10        5         1    23 0.00773 
#> 11        6         0    13 0.00437 
#> 12        6         1    11 0.00370 
#> 13        7         0     7 0.00235 
#> 14        7         1     1 0.000336
#> 15        8         0     1 0.000336
#> 16        8         1     1 0.000336