Summarize a categorical variable by a grouping variable
cat_group_tbl.Rd
cat_group_tbl()
summarizes nominal or categorical variables by a
grouping variable, returning frequency counts and percentages. It supports long or
wide output formats, handles missing data, and allows percentage calculations across
rows, columns, or the full table.
Usage
cat_group_tbl(
data,
row_var,
col_var,
margins = "all",
na.rm.row_var = FALSE,
na.rm.col_var = FALSE,
pivot = "longer",
only = NULL,
ignore = NULL
)
Arguments
- data
A data frame.
- row_var
A character string of the name of a variable in
data
containing categorical data. This is the primary categorical variable.- col_var
A character string of the name of a variable in
data
containing categorical data.- margins
A character string that determines how percentage values are calculated; whether they sum to one across rows, columns, or the entire table (i.e., all). Defaults to
all
, but can also be set torows
orcolumns
.- na.rm.row_var
A logical value indicating whether missing values for
row_var
should be removed before calculations. Default isFALSE
.- na.rm.col_var
A logical value indicating whether missing values for
col_var
should be removed before calculations. Default isFALSE
.- pivot
A character string that determines the format of the table. By default,
longer
returns the data in the long format. To return the data in thewide
format, specifywider
.- only
A character string or vector of strings indicating the types of summary data to return. The default is
NULL
, which includes both counts and percentages. To return only one type, specifycount
orpercent
.- ignore
An optional named vector or list that defines values to exclude from
row_var
andcol_var
. If set toNULL
(default), all values are retained. To exclude multiple values from bothrow_var
andcol_var
, supply a named list.
Examples
cat_group_tbl(data = nlsy,
row_var = "gender",
col_var = "bthwht",
pivot = "wider",
only = "count")
#> # A tibble: 2 × 3
#> gender count_bthwht_0 count_bthwht_1
#> <dbl> <int> <int>
#> 1 0 1340 123
#> 2 1 1409 104
cat_group_tbl(data = nlsy,
row_var = "birthord",
col_var = "breastfed",
pivot = "longer")
#> # A tibble: 16 × 4
#> birthord breastfed count percent
#> <dbl> <dbl> <int> <dbl>
#> 1 1 0 431 0.145
#> 2 1 1 614 0.206
#> 3 2 0 573 0.193
#> 4 2 1 499 0.168
#> 5 3 0 319 0.107
#> 6 3 1 242 0.0813
#> 7 4 0 115 0.0386
#> 8 4 1 77 0.0259
#> 9 5 0 49 0.0165
#> 10 5 1 23 0.00773
#> 11 6 0 13 0.00437
#> 12 6 1 11 0.00370
#> 13 7 0 7 0.00235
#> 14 7 1 1 0.000336
#> 15 8 0 1 0.000336
#> 16 8 1 1 0.000336