How to Visualize Categorical Data (With Examples)
Multivariate data visualization is the presentation of more than two variables in a graphical format. This type of data can be difficult to interpret if not displayed correctly. Categorical data, specifically, can be challenging to visualize because it is often non-numeric.
In this article, we’ll give you a quick overview of categorical data, how to visualize it using the most popular methods, and the best tools your business can use to visualize this data.
What Is Categorical Data?
Categorical data can be:
- Nominal: Nominal data are those that can only be classified, but not ranked in any way. For example, gender (male/female) is a type of categorical data that is nominal. We can only say that males and females are different, but we can’t say that one is better or worse than the other. Similarly, eye color (brown, blue, green) is a type of categorical data that is also nominal.
- Ordinal: Ordinal data are those that can be classified and ranked. For example, if we were to ask people to rate their satisfaction with a new product on a scale from 1 to 5, with 1 being very unsatisfied and 5 being very satisfied, we would be collecting ordinal data. In this case, we could say that a rating of 3 is better than a rating of 2, but we couldn’t say that a rating of 3 is twice as good as a rating of 2.
For visualization, the main difference between the two types of categorical data is that ordinal data can be sorted from greatest to least (or vice versa), while nominal data must be kept in the order in which it was collected.
Examples of categorical data include:
- Raw data (e.g. survey responses, individual observations)
- Aggregated data (e.g. means, totals, counts)
- Cross-tabulated data (e.g. contingency tables)
Raw data from a survey or individual observations can be difficult to visualize because it is often non-numeric. Here is what raw data about people’s features would look like:
## Hair Eye Sex
## 1 Blonde Hazel Female
## 2 Brown Brown Female
## 3 Red Brown Female
## 4 Brown Blue Female
## 5 Brown Green Male
## 6 Blonde Blue Male
You can aggregate raw categorical data into counts or proportions. This can be helpful if you want to compare how common different categories are.
agg <- count(raw, Hair, Eye, Sex)
## # A tibble: 6 x 4
## Hair Eye Sex n
## <fct> <fct> <fct> <int>
## 1 Blonde Hazel Female 31
## 2 Brown Brown Female 32
## 3 Red Brown Female 12
## 4 Brown Blue Female 8
## 5 Brown Hazel Male 10
## 6 Blonde Blue Male 7
Using the dplyr package, we can aggregate our raw data into counts. The features(agg) function shows us what the aggregated data looks like.
Another way to visualize categorical data is to cross-tabulate it. This means that you create a table showing how two or more variables are related.
Using xtabs, we can cross-tabulate the Hair and Eye colors of our aggregate data:
xtabs(n ~ Hair + Eye, data = agg)
## Hair Blue Brown Hazel
## Blonde 7 0 31
## Brown 8 32 10
## Red 0 12 0
The table shows us that more there are seven people with blonde hair and blue eyes, 31 people with blonde hair and hazel eyes, etc.
A few other commands that might be helpful when working with categorical data are:
- table(): This command will create a table showing how many observations there are for each category.
- tapply(): This command will apply a function to each level of a factor.
- split(): This command will split a data frame or matrix into groups based on the levels of a factor.
And while this list is not very comprehensive, it should give you a quick understanding of how categorical data works and what it looks like. The same logic and methodology can be applied to much larger data sets at the enterprise level.
What is the Best Visualization for Categorical Data?
Most of us aren’t analysts. But we still need to use categorical data to make data-driven decisions for our businesses. Since looking at numbers, tables, and spreadsheets is unappealing–and impossible to read unless you’re an analyst–you need to find another way to make sense of all that data.
That’s where data visualization comes in.
There are many ways to visualize categorical data, but not all of them are created equal. The best examples of categorical data visualization include:
- Bar charts: The easiest way to read multivariate data
- Pie charts or donut charts: The best way to read data with fewer than five categories
- Line charts: For tracking how data changes over time
- Stacked bar charts: When you want to compare two or more variables
- Scatterplots, heat grids, and scorecards: When you need to get more granular
For businesses that deal with a lot of categorical data, (e.g. customer demographics, product data, survey responses, etc.), the best visualization is usually a bar chart or a pie chart.
Bar charts are simple and easy to understand, even for people who don’t have a lot of experience working with data. And they can be customized to show as much–or as little–detail as you want.
Pie charts are also easy to understand, but they’re best used for data sets that have a limited number of categories (usually no more than five).
And if you want to show changes in categorical data over time, a line chart is usually the best option.
Scatterplots, heat grids, and scorecards are other options, but they’re typically used by analysts and data scientists to find trends and patterns in large data sets.
The Worst Ways to Visualize Categorical Data
There are also some methods for visualizing categorical data that you should avoid at all cost. These include:
- Tables: Too much information, not enough context
- Radar charts: Only use if you have three or fewer categories
- Bubble charts: Unless you’re an experienced data visualization expert, stay away from these
While these forms of visualization have their place, they’re not very user-friendly for people who don’t work with data on a daily basis.
What Visualization Tool Can Be Used for Categorical Data?
Appending your data into Excel spreadsheets is a free and straightforward method for visualizing data, but it’s practically impossible to create anything other than the most basic charts and graphs.
You’ll need a dedicated data visualization tool to create more advanced visualizations. Many different options are available, including Qlik, Tableau, and Power BI.
Let’s say you run a large ecommerce brand and want to visualize how your customers find you. Those ways could be:
- Organic search
- Social media
- Paid advertising
- Referral traffic
- Email marketing
You would first gather the data from each of these sources and warehouse it in Snowflake. Then, you would extract that data and load it into your visualization tool of choice. From there, you would be able to create a bar chart that shows how each of these channels contributes to your overall traffic.
Above is a high-level overview of the architecture you would need to set up in order to effectively visualize categorical data, with the end users being you and members of your organization.
Once you have your data in Snowflake, connect to your Snowflake account from the Power BI dashboard.
Then, go through the following steps to create a Power BI report from your categorical data:
1. Set Up Power BI Dashboard and Import Data
From the “Home” tab, select “Get Data.”
Under “Services,” select “Snowflake.”
Input your information, including the server URL and the warehouse name.
Enter your Snowflake account name, username, and password. Then select “OK.”
Now that you’re connected to Snowflake, select the database and table that contains your categorical data.
Click “Load” to load the data into Power BI or click “Transform” to transform the data before loading it. In the ETL process, you would typically want to select “Transform” to clean and prepare your data before creating visualizations.
2. Model Your Marketing Data
In order to create a visual report, you need to model your marketing data. Microsoft has a tutorial on how to do this, but modeling is essentially the process of organizing your data into tables, columns, and measures.
Select “Edit Queries” in Power BI from the “Home” tab.
From there, you can select which columns you want to include in your report and how you want to transform the data. For example, you might want to create a new column that calculates the percentage of each channel’s contribution to your overall traffic.
After you’ve modeled your data, it’s time to start creating visualizations.
3. Create Your Visualizations
Once your data is loaded into Power BI, you can start creating visuals. To do that, select the “Visualization” tab and then select the type of visualization you want to create.
Once you’ve created your visualization, you can save it by selecting the “File” tab and then selecting “Save.” You can also segment the data in your visualization by selecting the “Fields” tab and then choosing the fields you want to segment by. This allows your team members to quickly and easily filter the data based on their needs.
You can now share your Power BI report with other people in your organization by selecting the “Share” tab. From there, you can enter the email addresses of the people with whom you want to share the report.
There are endless possibilities for how you can use data visualization to improve your business, so don’t be afraid to experiment and see what works best for you.
How Can I Make My Visualizations More Effective?
You can do a few things to make sure your Snowflake data visualizations are as effective as possible. Many of them may seem obvious, but it’s important to keep them in mind as you’re creating your visualizations:
- Use contrasting colors: This will make it easier for people to quickly see the differences between the data points in your visualization.
- Make sure your labels are legible: Use a clear, easy-to-read font and make sure the labels are big enough to be seen from a distance.
- Choose an appropriate chart type: Not all data is best represented by one kind of chart. Choose the chart type that best illustrates the data points’ relationships.
- Don’t overload your visualization with too much information: Since others in your organization aren’t data professionals, it’s best to keep it simple and focus on one message.
- Think about the story you’re trying to tell: A visualization is only effective if it tells a clear story. Before you start creating your visualization, take some time to think about what message you want to communicate and how best to do that.
Keep these tips in mind as you create your own dashboards and import data from Snowflake.
Streamline the Data Visualization Process With DataLakeHouse
The first step in creating impactful data visualizations that enable smarter decision-making is having a central platform where all your data lakes live. But often, data is siloed across the organization in different departments, sources, and formats.
DataLakeHouse is the 100% Snowflake-focused end-to-end data platform that enables you to bring in data from multiple sources, both on-premises and in the cloud, and unify it into a single source of truth.
ELT synchronization automatically keeps your data up-to-date and ready for analysis in Snowflake. And our industry-specific data models give you a head start on your analytics so you can quickly start generating insights.
And with the power of ML, you can automate the data visualization process with self-service access to charts, graphs, and tables that are automatically created and updated as new data comes in.
If you’re ready to streamline the data visualization process, book a demo with us today.