In today’s blog, I would like to introduce everyone to a new type of network visualization chart – the hive plot. Conventional network visualization charts such as node-linked diagrams suffer from the hairball problem when dealing with large networks. The hive plot is a rational visualization method for representing large data networks.
Conventional network visualization charts liked the node linked diagrams are generated using layout algorithms that are not closely connected with the properties of the network that we would like to explore. This leads to networks that are best accidentally informative and cannot be relied upon to consistently reveal meaningful patterns.
Node-linked diagrams are extremely effective in communicating the idea of a network. A network typically consists of nodes, shown as little dots or circles and they are connected by edges or links. The direction of a links is indicated using arrows. Such images can be intuitively used by most people to answer basic questions like finding the person with most friends (node with highest in-degree) or looking for highly connected groups that have only a small number of links between them (cliques). However, the simplicity and effectiveness of node-linked diagrams disappear when the number of nodes and links becomes too high and we start encountering the dreaded hairball. The above figure is one such example
The hairball problem
Hairballs turn complex data into visualizations that are even more complex. They trick us into thinking they carry a lot of information value whereas the reality is the opposite. Can you interpret anything meaningful from the hairball below ?
Martin Krzywinski from Genome Sciences Center, Vancouver, BC has this to say about hairballs.
Hairballs are the junk food of network visualization — they have very low nutritional value, leaving the user hungry.
Hairball networks are difficult to interpret for many reasons.
- Inability to address user’s specific questions: The assumption that all of a user’s questions can be addressed by the layout algorithm is usually wrong since most layout algorithms are based on aesthetics. In such cases users need to construct another hairball using a different layout algorithm to answer the unanswered questions.
- Stochastic nature of layout algorithms: The inherent stochastic nature of these layout algorithms leads to different layouts for the same network.
- Sensitivity: Due to their sensitive nature, they layout algorithms tend to get drastically affected by small changes in the network. This is counter-intuitive to our experience when dealing with large volumes of data – small changes in the data do not significantly impact the big picture.
- Inability to make comparisons: Layouts of the same network created by different algorithms cannot be easily compared. Likewise, layout of different networks created by the same algorithm cannot be easily compared.
In summary, conventional network visualizations like node-linked diagrams are not suitable for visual analytics of large networks. The hive plot is a rational visualization method that attempts to address the problems caused by hairball layout.
The hive plot
The concept of hive plots was proposed by Martin Martin Krzywinski at the Genome Sciences Center in a 2012 NCBI paper titled Hive plots–rational approach to visualizing networks. The hive plot is based on a rational layout algorithm that depends on the network features.
A hive plot uses a linear network layout in which the nodes are constrained to radial axes and edges are drawn as curves between connected nodes.
The node-to-axis assignment and node-on-axis position are determined solely by properties of the network that are of interest to users. There is no additional aesthetic sauce that is added to layout of hive plots. So unlike in node-link diagrams, any patterns shown by the hive plot layout can be directly correlated to the underlying structure of the data.
Layout based on structure and function
The axis and node mapping can be carried out in a number of ways based on the properties of the network data.
1. Node to Axis:
In the figure below, the nodes are mapped to the 3 linear axes based on the node type (source, sink, both). Here the axes are used to categorize the nodes.
Axis A(Source): All nodes that only have an out-degree are mapped to this axis
Axis B(Sink): All nodes that only have an in-degree are mapped to this axis
Axis C(Both): All nodes that have both an in-degree and out-degree are mapped to this axis
2. Axis node position
In the figure below, the nodes are positioned on the axis A in ascending order of the degree of the node.
Nodes higher up on the axis A have higher degree than nodes close to the origin.
3. Color and line weight
The figure below shows how the color and line weight attributes can be overlaid on top of a hive plot. The color of the nodes and the curves can be used to classify the nodes based on some additional properties of the data such as gender, nationality, relationship type etc. The thickness of the curves can be used to represent the edge weight.
4. Scale, orientation and segmentation
The scale, orientation and subdivision of the axes can be used effectively to reveal additional patterns.
Axis length can be absolute or normalized.
An axis can be divided into segments for classifying the nodes based on some attributes.
The axes or individual segments can be reversed or scaled.
A hive plot in action
Let’s now take a look at an application of hive plot to some real-world data. In the figure below, the dependency graph of the Flare visualization toolkit is represented using a hive plot.
Node: A class in the software library. Related classes have the same node color.
Line: An import statement from one class to another
Axis classification: Nodes are divided into 3 categories based on their in-degree/out-degree. The top axis shows classes with only outgoing dependencies (source nodes). The bottom left axis shows classes with only incoming dependencies (target nodes). The bottom right axes show classes with both incoming and outgoing dependencies – here the axes have been duplicated to show dependencies within this category.
A mouseover over a link reveals details of the exact class dependency (See red line in the figure displayed above)
The hive plot layout immediately reveals the following patterns:
- The highest implementations such as layouts and controls are in the top axis
- Interfaces and internal abstractions are in the bottom right axis
- The few de-coupled classes (those with only incoming dependencies) are shown in the bottom left axis.
In a typical hive plot, nodes are usually sorted by link degree. However, in this case the nodes are arranged by package bringing related classes together. This reveals the macro structure of the dependencies much better.
The above example shows how the hive plot layout can be customized to suit different needs by choosing different methods to group and position nodes along the axis. What we covered in today’s blog was just a basic introduction to hive plots. Hive plots are extremely useful in the following scenarios:
- Comparing multiple networks
- Inspecting different aspects of a network using hive panels – an extension of hive plots
- Visualizing a large number of ratios
These advanced concepts related to hive plots will be covered in more detail in a subsequent blog.
The dodgy grouped bar chart
In my previous blog, I had asked readers if they had any issues with the following revenue chart.
The problem with this chart is that the scale does not begin at zero and thus ends up encoding the quantitative values inaccurately. Normally the height of a bar represents its quantitative value. For example, the bars that represent the values for June suggest that the revenue was about 5 times the cost. However, a close examination of the scale tells us that this assumption is incorrect – the revenue (1500) is only three times the cost (500). Having the scale in a bar chart set to a non-zero value (400) leads to an incorrect visual perception.
A Question for readers
The following bar chart is used to compare the average load percentages for flights originating from LAX for the month on February. Even though bar graphs can do an excellent job of displaying quantitative data, clearly in the example below they have been misused. Can you list the problems that you see with the design of this bar chart ?