forcats
Package forcats
contain tools for working with EDAV - Categorical Data (R Type#Factors)
Recode Factor Levels
To recode the levels, do not use levels(x) <- c("A", "B")
; use fct_recode()
in forcats instead.
library(forcats)
x <- factor(c("G234", "G452", "G136"))
y <- fct_recode(x, Physics = "G234", Math = "G452", Chemistry = "G136")
y
Reorder Factor Levels
-
fct_inorder()
: by the order in which they first appear -
fct_infreq()
: by number of observations with each level (largest first)- useful for unbinned data
-
fct_inseq()
: by numeric value of level. -
fct_relevel(x, level1, level2, after = 4)
: by hand- without
after
, you are putting levels to the beginning - with
after = Inf
, you are putting levels to the last
- without
-
fct_rev()
: reverse the current order -
fct_reorder(x,y)
: by sorting along another variable- works with
.desc = False
- works with
-
fct_reorder2(x,y,z)
: by sorting along another 2 variables -
All the above functions return new factors.
Include NA
When a factor contains NAs, NAs will not form a level. And when plotting, NA always orders first (in the top-down graph, and last in the left-right graphs). We can use fct_explicit_na(x, "NA")
to make NA a real level with the name "NA"
. Then we can reorder the levels including "NA"
df <- data.frame(temperature = factor(c("cold", "warm", "hot", NA)),
count = c(15, 5, 22, 12))
df %>%
mutate(temperature = fct_explicit_na(temperature, "NA") %>% # try comment this and the following lines
fct_relevel("NA", "hot", "warm", "cold")) %>%
ggplot(aes(x = temperature, y = count)) +
geom_col() +
coord_flip()
Lumping
fct_lump_*
: lump together factor levels into "other"fct_lump_min()
: lumps levels that appear fewer thanmin
timescc |> mutate(continent_new = fct_lump_min(continent, 30))
puts all continents with less than 30 countries (items) into the "Other" category (level)
fct_lump_prop()
: lumps levels that appear in fewerprop * n
times, wheren
is the size of the datafct_lump_n()
: lumps all levels except for then
most frequent (or least frequent ifn < 0
)fct_lump_lowfreq()
: lumps together the least frequent
levels, ensuring that "other" is still the smallest level