Learning Objectives
By the end of this tutorial, you will
- understand the grammar of graphics for networks
- be able to work confidently with the R package
ggraph3, be able to create high quality network visualizations in R
Target audience
This tutorial is aimed at beginners in network analysis with knowledge in R.
Setting up the computational environment
The following R packages are required:
To run all the code in this tutorial, you need to install and load several packages.
install.packages(c("igraph", "graphlayouts", "ggraph","ggforce"))
devtools::install_github("schochastics/networkdata")
Make sure you have at least the version given below. Some of the examples may not be backward compatible.
packageVersion("igraph")
[1] '2.0.3'
packageVersion("graphlayouts")
[1] '1.1.1'
packageVersion("ggraph")
[1] '2.2.1'
packageVersion("networkdata")
[1] '0.2.2'
packageVersion("ggforce")
[1] '0.4.2'
igraph is mostly used for its data structures and graphlayouts and ggraph for visualizations. The networkdata package contains a huge amount of example network data that always comes in handy for learning new visualization techniques.
library(igraph)
Attaching package: 'igraph'
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
library(ggraph)
Loading required package: ggplot2
library(graphlayouts)
library(ggforce)
Introduction
Most network analytic tasks are fairly straightforward to do in R. But when it comes to visualizing networks, R may lack behind some standalone software tools. Not because it is not possible to produce nice figures, but rather because it requires some time to obtain pleasing results. Just take a look at the default output when plotting a network with the plot() function.
library(networkdata)
data("got")
gotS1 <- got[[1]]
plot(gotS1)
It is definitely possible to produce nice figures with the igraph package (Check out this wonderful tutorial), yet it may take some time to familiarize yourself with the syntax. Additionally, most of the layout algorithms of igraph are non-deterministic. This means that running the same plot call twice may produce different results.
In this tutorial, you will learn the basics of ggraph, the “ggplot2 of networks”, together with the graphlayouts package, which introduces additional useful layout algorithms to R. Arguably, using ggraph is not really easier than igraph. But once the underlying principle of the grammar of graphics is understood, you’ll see that it is actually quite intuitive to work with.
Quick plots
It is always a good idea to take a quick look at your network before starting any analysis. This can be done with the function autograph() from the ggraph package.
autograph(gotS1)
autograph() allows you to specify node/edge colours too but it really is only meant to give you a quick overview without writing a massive amount of code. Think of it as the plot() function for ggraph
Before we continue, we add some more node attributes to the GoT network that can be used during visualization.
# define a custom color palette
got_palette <- c(
"#1A5878", "#C44237", "#AD8941", "#E99093",
"#50594B", "#8968CD", "#9ACD32"
)
# compute a clustering for node colors
V(gotS1)$clu <- as.character(membership(cluster_louvain(gotS1)))
# compute degree as node size
V(gotS1)$size <- degree(gotS1)
The basics of ggraph
Once you move beyond quick plots, you need to understand the basics of, or at least develop a feeling for, the grammar of graphics to work with ggraph.
Instead of explaining the grammar, let us directly jump into some code and work through it one line at a time.
ggraph(gotS1, layout = "stress") +
geom_edge_link0(aes(edge_linewidth = weight), edge_colour = "grey66") +
geom_node_point(aes(fill = clu, size = size), shape = 21) +
geom_node_text(aes(filter = size >= 26, label = name), family = "serif") +
scale_fill_manual(values = got_palette) +
scale_edge_width(range = c(0.2, 3)) +
scale_size(range = c(1, 6)) +
theme_graph() +
theme(legend.position = "none")
ggraph works with layers. Each layer adds a new feature to the plot and thus builds the figure step-by-step. We will work through each of the layers separately in the following sections.
Layout
ggraph(gotS1, layout = "stress")
The first step is to compute a layout. The layout parameter specifies the algorithm to use. The “stress” layout is part of the graphlayouts package and is always a safe choice since it is deterministic and produces nice layouts for almost any graph. I would recommend to use it as your default choice. Other algorithms for, e.g., concentric layouts and clustered networks are described further down in this tutorial. For the sake of completeness, here is a list of layout algorithms of igraph.
c(
"layout_with_dh", "layout_with_drl", "layout_with_fr",
"layout_with_gem", "layout_with_graphopt", "layout_with_kk",
"layout_with_lgl", "layout_with_mds", "layout_with_sugiyama",
"layout_as_bipartite", "layout_as_star", "layout_as_tree"
)
To use them, you just need the last part of the name.
ggraph(gotS1, layout = "dh") +
...
Note that there technically is no right or wrong choice. All layout algorithms are in a sense arbitrary since we can choose x and y coordinates freely (compare this to ordinary data!). It is all mostly about aesthetics.
You can also precompute the layout with the create_layout() function. This makes sense in cases where the calculation of the layout takes very long and you want to play around with other visual aspects.
gotS1_layout <- create_layout(gotS1 = "stress")
ggraph(gotS1_layout) +
...
Edges
geom_edge_link0(aes(width = weight), edge_colour = "grey66")
The second layer specifies how to draw the edges. Edges can be drawn in many different ways as the list below shows.
c(
"geom_edge_arc", "geom_edge_arc0", "geom_edge_arc2", "geom_edge_density",
"geom_edge_diagonal", "geom_edge_diagonal0", "geom_edge_diagonal2",
"geom_edge_elbow", "geom_edge_elbow0", "geom_edge_elbow2", "geom_edge_fan",
"geom_edge_fan0", "geom_edge_fan2", "geom_edge_hive", "geom_edge_hive0",
"geom_edge_hive2", "geom_edge_link", "geom_edge_link0", "geom_edge_link2",
"geom_edge_loop", "geom_edge_loop0"
)
You can do a lot of fancy things with these geoms but for a standard network plot, you should always stick with geom_edge_link0 since it simply draws a straight line between the endpoints. Some tools draw curved edges by default. While this may add some artistic value, it reduces readability. Always go with straight lines! If your network has multiple edges between two nodes, then you can switch to geom_edge_parallel().
In case you are wondering what the “0” stands for: The standard geom_edge_link() draws 100 dots on each edge compared to only two dots (the endpoints) in geom_edge_link0(). This is done to allow, e.g., gradients along the edge.
Warning: The dot-dot notation (`..index..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(index)` instead.
You can reproduce this figure by substituting
geom_edge_link(aes(edge_alpha = ..index..), edge_colour = "black")
in the code above.
The drawback of using geom_edge_link() is that the time to render the plot increases and so does the size of the file if you export the plot (example) Typically, you do not need gradients along an edge. Hence, geom_edge_link0() should be your default choice to draw edges.
Within geom_edge_link0, you can specify the appearance of the edge, either by mapping edge attributes to aesthetics or setting them globally for the graph. Mapping attributes to aesthetics is done within aes(). In the example, we map the edge width to the edge attribute “weight”. ggraph then automatically scales the edge width according to the attribute. The colour of all edges is globally set to “grey66”.
The following aesthetics can be used within geom_edge_link0 either within aes() or globally:
- edge_colour (colour of the edge)
- edge_linewidth (width of the edge)
- edge_linetype (linetype of the edge, defaults to “solid”)
- edge_alpha (opacity; a value between 0 and 1)
ggraph does not automatically draw arrows if your graph is directed. You need to do this manually using the arrow parameter.
geom_edge_link0(aes(...), ...,
arrow = arrow(
angle = 30, length = unit(0.15, "inches"),
ends = "last", type = "closed"
)
)
The default arrowhead type is “open”, yet “closed” usually has a nicer appearance.
Nodes
geom_node_point(aes(fill = clu, size = size), shape = 21) +
geom_node_text(aes(filter = size >= 26, label = name), family = "serif")
On top of the edge layer, we draw the node layer. Always draw the node layer above the edge layer. Otherwise, edges will be visible on top of nodes. There are slightly less geoms available for nodes.
c(
"geom_node_arc_bar", "geom_node_circle", "geom_node_label",
"geom_node_point", "geom_node_text", "geom_node_tile", "geom_node_treemap"
)
The most important ones here are geom_node_point() to draw nodes as simple geometric objects (circles, squares,…) and geom_node_text() to add node labels. You can also use geom_node_label(), but this draws labels within a box.
The mapping of node attributes to aesthetics is similar to edge attributes. In the example code, we map the fill attribute of the node shape to the “clu” attribute, which holds the result of a clustering, and the size of the nodes to the attribute “size”. The shape of the node is globally set to 21.
The figure below shows all possible shapes that can be used for the nodes.
Personally, I prefer “21” since it draws a border around the nodes. If you prefer another shape, say “19”, you have to be aware of several things. To change the color of shapes 1-20, you need to use the colour parameter. For shapes 21-25 you need to use fill. The colour parameter only controls the border for these cases.
The following aesthetics can be used within geom_node_point() either within aes() or globally:
- alpha (opacity; a value between 0 and 1)
- colour (colour of shapes 0-20 and border colour for 21-25)
- fill (fill colour for shape 21-25)
- shape (node shape; a value between 0 and 25)
- size (size of node)
- stroke (size of node border)
For geom_node_text(), there are a lot more options available, but the most important once are:
- label (attribute to be displayed as node label)
- colour (text colour)
- family (font to be used)
- size (font size)
Note that we also used a filter within aes() of geom_node_text(). The filter parameter allows you to specify a rule for when to apply the aesthetic mappings. The most frequent use case is for node labels (but can also be used for edges or nodes). In the example, we only display the node label if the size attribute is larger than 26.
Scales
scale_fill_manual(values = got_palette) +
scale_edge_width_continuous(range = c(0.2, 3)) +
scale_size_continuous(range = c(1, 6))
The scale_* functions are used to control aesthetics that are mapped within aes(). You do not necessarily need to set them, since ggraph can take care of it automatically.
ggraph(gotS1, layout = "stress") +
geom_edge_link0(aes(edge_linewidth = weight), edge_colour = "grey66") +
geom_node_point(aes(fill = clu, size = size), shape = 21) +
geom_node_text(aes(filter = size >= 26, label = name), family = "serif") +
theme_graph() +
theme(legend.position = "none")
Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
ℹ Please use the `transform` argument instead.
While the node fill and size seem reasonable, the edges are a little too thick. In general, it is always a good idea to add a scale_* for each aesthetic within aes().
What kind of scale_* function you need depends on the aesthetic and on the type of attribute you are mapping. Generally, scale functions are structured like this:
scale_<aes>_<variable type>().
The “aes” part is easy. Just us the type you specified within aes(). For edges, however, you have to prepend edge_. The “variable type” part depends on which scale the attribute is on. Before we continue, it may be a good idea to briefly discuss what aesthetics make sense for which variable type.
| aesthetic | variable type | notes |
|---|---|---|
| node size | continuous | |
| edge width | continuous | |
| node colour/fill | categorical/continuous | use a gradient for continuous variables |
| edge colour | continuous | categorical only if there are different types of edges |
| node shape | categorical | only if there are a few categories (1-5). Colour should be the preferred choice |
| edge linetype | categorical | only if there are a few categories (1-5). Colour should be the preferred choice |
| node/edge alpha | continuous |
The easiest to use scales are those for continuous variables mapped to edge width and node size (also the alpha value, which is not used here). While there are several parameters within scale_edge_width_continuous() and scale_size_continuous(), the most important one is “range” which fixes the minimum and maximum width/size. It usually suffices to adjust this parameter.
For continuous variables that are mapped to node/edge colour, you can use scale_colour_gradient() scale_colour_gradient2() or scale_colour_gradientn() (add edge_ before colour for edge colours). The difference between these functions is in how the gradient is constructed. gradient creates a two colour gradient (low-high). Simply specify the the two colours to be used (e.g. low = “blue”, high = “red”). gradient2 creates a diverging colour gradient (low-mid-high) (e.g. low = “blue”, mid = “white”, high = “red”) and gradientn a gradient consisting of more than three colours (specified with the colours parameter).
For categorical variables that are mapped to node colours (or fill in our example), you can use scale_fill_manual(). This forces you to choose a color for each category yourself. Simply create a vector of colors (see the got_palette) and pass it to the function with the parameter values.
ggraph then assigns the colors in the order of the unique values of the categorical variable. This are either the factor levels (if the variable is a factor) or the result of sorting the unique values (if the variable is a character).
sort(unique(V(gotS1)$clu))
[1] "1" "2" "3" "4" "5" "6" "7"
If you want more control over which value is mapped to which colour, you can pass the vector of colours as a named vector.
got_palette2 <- c(
"5" = "#1A5878", "3" = "#C44237", "2" = "#AD8941",
"1" = "#E99093", "4" = "#50594B", "7" = "#8968CD", "6" = "#9ACD32"
)
Using your own colour palette gives your network a unique touch. If you can’t be bothered with choosing colours, you may want to consider scale_fill_brewer() and scale_colour_brewer(). The function offers all palettes available at colorbrewer2.org.
ggraph(gotS1, layout = "stress") +
geom_edge_link0(aes(edge_linewidth = weight), edge_colour = "grey66") +
geom_node_point(aes(fill = clu, size = size), shape = 21) +
geom_node_text(aes(filter = size >= 26, label = name), family = "serif") +
scale_fill_brewer(palette = "Dark2") +
scale_edge_width_continuous(range = c(0.2, 3)) +
scale_size_continuous(range = c(1, 6)) +
theme_graph() +
theme(legend.position = "none")
(Check out this github repo from Emil Hvitfeldt for a comprehensive list of color palettes available in R)
Themes
theme_graph() +
theme(legend.position = "none")
themes control the overall look of the plot. There are a lot of options within the theme() function of ggplot2. Luckily, we really don’t need any of those. theme_graph() is used to erase all of the default ggplot theme (e.g. axis, background, grids, etc.) since they are irrelevant for networks. The only option worthwhile in theme() is legend.position, which we set to “none”, i.e. don’t show the legend.
The code below gives an example for a plot with a legend.
ggraph(gotS1, layout = "stress") +
geom_edge_link0(aes(edge_linewidth = weight), edge_colour = "grey66") +
geom_node_point(aes(fill = clu, size = size), shape = 21) +
geom_node_text(aes(filter = size >= 26, label = name), family = "serif") +
scale_fill_manual(values = got_palette) +
scale_edge_width_continuous(range = c(0.2, 3)) +
scale_size_continuous(range = c(1, 6)) +
theme_graph() +
theme(legend.position = "bottom")
Another full example
Let us work through one more visualization using a very special data set. The “Grey’s Anatomy” hook-up network
data("greys")
Start with the autograph call.
autograph(greys)
The network consists of several components. Note that the igraph standard is to pack all components in a circle. The standard in graphlayouts is to arrange them in a rectangle. You can specify the bbox parameter to arrange the components differently. The plot above arranges all components on one level, but two levels may be desirable. You may need to experiment a bit with the parameter, but for this network, bbox=15 seems to work best (see below).
We will use this network to quickly illustrate what can be done with geom_edge_link2(). The function allows to interpolate node attributes between the start and end node along the edges. In the code below, we use the “position” attribute. The line which adds the node labels illustrates two further features of ggraph. First, aesthetics don’t need to be node attributes. Here, for instance, we calculate the degree and then map it to the font size. The second one is the repel = TRUE argument. This option places the node labels in a way that labels do not overlap.
ggraph(greys, "stress", bbox = 15) +
geom_edge_link2(aes(edge_colour = node.position), edge_linewidth = 0.5) +
geom_node_point(aes(fill = sex), shape = 21, size = 3) +
geom_node_text(aes(label = name, size = degree(greys)),
family = "serif", repel = TRUE
) +
scale_edge_colour_brewer(palette = "Set1") +
scale_fill_manual(values = c("grey66", "#EEB422", "#424242")) +
scale_size(range = c(2, 5), guide = "none") +
theme_graph() +
theme(legend.position = "bottom")
While the coloured edges look kind of artistic, we should go back to the “0” version.
ggraph(greys, "stress", bbox = 15) +
geom_edge_link0(edge_colour = "grey66", edge_linewidth = 0.5) +
geom_node_point(aes(fill = sex), shape = 21, size = 3) +
geom_node_text(aes(label = name, size = degree(greys)),
family = "serif", repel = TRUE
) +
scale_fill_manual(values = c("grey66", "#EEB422", "#424242")) +
scale_size(range = c(2, 5), guide = "none") +
theme_graph() +
theme(legend.position = "bottom")
Code through: Recreate the polblogs viz
In this section, we do a little code through to recreate the figure shown below.
Social Science Usecase(s)
Network visualization offers social scientists a powerful tool for analyzing relationships and interactions within digital traces. For instance, researchers studying online communities can use network visualization to map interactions on social media platforms, such as X or Reddit. By visualizing user interactions (like replies, mentions, or shared links), researchers can uncover patterns in information flow, identify influential users, and explore the formation of communities or echo chambers. Network visualization can reveal clusters of users who frequently engage with one another, suggesting tightly-knit subgroups with shared interests or beliefs. It also helps identify key influencers within these networks, who may play a critical role in spreading information or shaping public opinion. This analysis is particularly useful for understanding phenomena like misinformation spread, public discourse on sensitive topics, or the social dynamics of online activism, offering insights into how ideas and behaviors propagate through digital spaces.