Many data visualizations are very pretty, but don’t present the viewer with all that much information (or at least not any new information). And while it is true that confirming what you already know is a great way to test and see if the way you are organizing the data has any merit, that it is an important first step. But for a lot of projects, that process of confirmation seems to be the final step. The data are collected, something visually cool (but not all that useful) is built, and the creators promise to work on it more. A few months later, the visualization is even cooler, but the utility has not improved; what I would suggest is the most important part of the process has been left to wither.
We have to consider not just about how to visualize a data set but the virtue that the visualizations has. I used to be dismissive of a good number of visualizations and infographics that made their way onto the web, but organizing them into categories helped me appreciate that different types of data representation have their place.
In my mind I roughly categorize visualizations in the following way (the organization is only roughly hierarchical):
The chart above shows levels of data visualizations. At the lowest level is a pretty picture or animation that can be used to impress visitors to your office/site/etc, but which doesn’t really contain any information.
Next up would be a pretty picture that does contain information, but just confirms what you already know. If you have a visualization of geolocation data which illustrates that people go to work during the day and bars at night, you haven’t gained anything. It isn’t quite useless, because if things were the other way round it would be a good indication you are crunching the data incorrectly, but no decision is going to be made or affected by that information.
The next level up is where the visualization improves understanding or comprehension of a system, often by offering a top down view. These are terrifically effective when a system is too large to comprehend, often because the numbers have become so big that they can’t be understood well on an absolute basis. Great examples of this kind of visual are the death and taxes poster and the spending grams at information is beautiful.
Next, or perhaps additionally, is a visualization that contains information that affects decisions or causes decisions to be made that would not have been made otherwise. Visualizations of this type can still be largely qualitative in nature.
Next is where the visualization yields quantitative information.
The final level, the brass ring, is where machine actionable quantitative information can be extracted from the visualization. The computer can generate it and the computer can use it. The visualization is important to letting a person see what is going on, and to check on what is going on, but the person is no longer important. These are an accomplishment.
The worst kept secret in graph theory is that the most effective tool for pattern recognition is the human eye, so replicating any part of that is to be commended. As an example I would consider clustering of nodes in a graph. After coming up with a measure (the hard part) and selecting a graph layout, clustering can give you lists of groups that can then be used for other purposes. It even might be more apropos to call this last level information processing that has an obvious visual representation.
I think all of these types of visualization have their place, and by recognizing what a thing is you can avoid being disappointed that it doesn’t do more, or being overwhelmed with considerations of how much effort you should put into it.
When the amount of raw information you are dealing with exceeds your ability to digest it, a good visual representation can make the information comprehensible. Bad data visualizations just dazzle; good ones illuminate; great ones allow for discovery.