Bar Chart
A bar chart is a chart or graph that presents EDAV - Categorical Data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart.
One axis of the chart shows the specific categories being compared, and the other axis represents a measured value.
Some bar graphs present bars clustered in groups of more than one, showing the values of more than one measured variable.
- For unbinned data, use
geom_bar
in ggplot2 - For binned data, use
geom_col
in ggplot2 - We can use
geom_text
to put labels directly in/on bars - Do not color bars when it is not necessary. Colors are needed #For Multivariate Categorical Data.
Ordering
There are two typical scales of a categorical variable.
For ordinal data, sort bars in the logical order of the categories. For example, novice -> advanced -> expert.
You can sort in either direction top <-> bottom, or left -> right).
For nomial data, sort bars from highest to lowest (top -> bottom, left -> right).
Rebinning
When plotting bar plots, rows of different categories in the same category are automatically stacked up.
However, when coloring, stacked bars are colored separately.
To bin the data by hand, we can use dplyr#summarise() with group_by(category)
.
df %>%
group_by(Class) %>%
summarize(Freq = sum(Freq)) %>%
ggplot(aes(fct_reorder(Class, Freq, .desc = TRUE), Freq)) +
geom_col() +
xlab("")
For Multivariate Categorical Data
For Multivariate Categorical Data, instead of drawing multiple bar charts (possible with facets), we can put bars for different categories together by
- Stacked Bar Chart
- Better for the overall value
- Grouped Bar Chart
- Better for the individual value
- Grouped bar charts are less used but more recommended