Categorical Data

Categorical data is a type of data that can be divided into groups or categories.

Graphs

Points vs Areas

There are suggestions for using individual points (not the same as Cleveland Dot Plot) for cases and jittering to keep them apart. This does not work well for high-frequency groups, as it is hard to assess their densities, and the displays for low-frequency groups may exhibit non-existent patterns due to the random jittering.
Nevertheless, as always with exploratory graphics, if a graphic helps to uncover information, it is worth using.

Ordinal vs Nominal

Nominal Scale and Ordinal Scale are two typical categorical variable measurement scales.

Discrete Data

Though not strictly categorical, when a variable has discrete values and the range is small, we can regard it as a categorical variable.
In this case, a Bar Chart is the same as a Histogram if each bin in the histogram contains only one value.

Features

Facets

Sometimes categories can be further divided into different groups, with different scales. We can use facets w/ different scales if needed. See ggplot2#^32e6e9 for code. However usually, we should not use different scales in a plot.

Top/Bottom-Coded Data

When there are too many categories to present, we can combine top/bottom categories into a "or more" category. However, the "or more" category may contain too much data to be considered "or more". So when a smart cut is needed. For example, when two bars are similar, it's not reasonable to cut them; you should look for a "jump" instead.

Data Formats

Conversions:

From \ To cases counts table
cases - as.data.frame(table()) or group_by() %>% summarise(Freq = n()) table()
counts link - xtabs()
table link as.data.frame() -

Likert Data

Likert data is a special categorical data that uses a psychometric scale commonly involved in questionnaires. For example

Relative frequency stacked Bar Charts are used to present this kind of data.

Colors play an important role in presenting this kind of data: we use a neutral color to present a neutral category, and use two different sets of colors for categories on two sides.

Another type of Bar Chart, diverging stacked bar charts, sometimes are more suitable. They align bars with the neutral category always in the center. By doing this, the inclination stands out.

Furthermore, we can separate and even remove the neutral category.

Combine Continuous Variables and Categorical Variables

When combining Continuous Variables and Categorical Variables, we should consider

Creative Commons License by zcysxy