Graphs are sets of vertices and their edges:
Where the edges represent connections between the nodes. If edges do not have directions, we call a graph undirected. A real-life example of an undirected graph can be a chemical molecule, where the vertices are atoms, and bonds are represented as edges.
However, sometimes we need information about whether the edge goes from u to v, from v to u, or both ways. For example, if Mark likes Alice, it doesn’t necessarily mean it’s mutual ( ☹ ). In those situations, we can define the edge as an ordered tuple instead of unordered one.
Using the graph structure, we can define a centrality measure. It’s a metric used for answering the question:
How important is this vertex/edge in a graph?”
And there are many ways to answer it.
Depending on the task, we can start from a different point evaluating centrality. One of the most common metrics are: Degree, Closeness and Betweenness. We will discuss them using Zachary’s Karate Club graph [more info]. It presents ties between different karate club members. You can find code used to generate pictures below here.
The most basic of centralities. It’s defined only for vertices and it’s equal to the degree of the vertex (which is the number of the neighboring vertices). As an example, we can think back to the graph of human relationships, and in case of the friendships among people this metric would answer the question
“How popular is this person?”
Paths in graph
For the next two centralities, we need to introduce a few concepts to our knowledge of the graph theory. All of them are very intuitive, starting from the edge’s weights. We can add weights to our edges, to mark the difference between them. For example, this can be road length in case of traffic graph.
In graphs we can define paths, which are lists of vertices we need to traverse to get from A to B. Consecutive vertices in the path are neighbors, first vertex is the A, and the last is B. Path distance is the sum of the edges weights along of it. The shortest path between A and B is the path with the smallest distance.
Having all this new knowledge, we can go back to our metrics. Next one is closeness centrality, which tells us how close a node to the rest of the graph is. It’s defined for a specific vertex as an inverse of a mean of shortest paths to all other vertices in the graph. This way, shorter average path translates to higher closeness centrality.
Betweenness centrality gives us information, which nodes of a graph are crucial for the traffic going through it. Imagine a city with an extensive road network, where every junction is a node. Some of those serve as a key connectors in daily commutes, while others may be a cul-de-sacs with close to none impact on traffic flow. The former one possess high Betweenness centrality scores, calculated as proportion of the shortest paths traversing through the intersection.
Now, as we have tools for describing and analyzing graph, we can start extracting city’s plan to a graph form. To do that we can Open Street Maps (OSM), to import it in Python as NX graph using osmnx library. We’ll start with a smaller example to discuss what additional process we need to apply, in order to improve time and efficiency of our work.
Grzegórzki is one of the eighteen districts of Krakow’s city, with two complex roundabouts — Mogilskie and Grzegórzeckie, and many junctions. Thus, we’ll be able to see most of potential pitfalls with data engineering.
Let’s start with importing data from the OSM repository to a Python graph, and plot the results:
There’s something wrong with this graph — can you spot what it is?
We get multiple edges for single sections of roads, resulting the graph with almost 3 000 “junctions”. This does not provide proper representation (we can’t make a U-turn in the middle of a road, and every node cause calculation to be slower). To fix this situation, we will perform graph topology simplification by removing all nodes on the road between two junctions. In OSMnx, we have a function for that called ox.simplify_graph().
There’s one more catch — as you may see, we have two edges for the most of roads, one for each way. Due to this, we have multiple nodes for every intersection, which is an unwanted behavior. Imagine that we’re on a junction, we’re turning left, and there’s no separate lane for a left turn (or it’s already full). As long as we won’t be able to do the turn, the other cars are blocked. In our current graph, this isn’t the truth. The left turn is made of 2 separate nodes, one for turning left, and the other for crossing opposite lane. This would indicate that those are two independent operations, while they are not.
That’s why we’re going to consolidate intersections, meaning that we will combine multiple nodes close to each other into one. We’ll choose the consolidation radius big enough to consolidate multiple parts of the intersections into one, but on the other hand keep roundabouts as multiple node structures, as they can be only partially blocked. To do this we will use osmnx function ox.consolidate_intersections().
After those operations, we’re almost ready for the analysis. The last caveat is Krakow’s municipality borders — as many people travel from the neighboring towns, and graph analysis includes only data within the graph, we need to include those areas. I’ll present in the next chapter implications of not doing that. And here’s our graph:
You can find the source code used to generate this map, as well as all graphic used in the next chapter in this jupyter notebook.
For this case study we will focus only on Betweenness centrality measurement for estimating road traffic. In future, this might be extended to other techniques from graph theory, including GNN usage (Graph Neural Networks).
We will start with calculating Betweenness centrality measurement for all nodes and edges in a road layout representation. For that we will use NetworkX library.
Due to a high number of roads on a graph, it’s hard to see which components have highest probability of being critical for traffic. Let’s take a look at a centrality measurement distribution for the graph.
We can use those distributions to filter out less important junctions and streets. We will select top 2% of each where the threshold values are:
- 0.047 for nodes,
- 0.021 for edges.
We can see that the most important road segments by betweenness are:
- The A4 highway and the S7 being the beltway of Krakow (note that Krakow does not have northern part of the beltway),
- The western part of 2nd ring road and it’s connection to A4,
- The northern part of 3rd ring road (substituting missing northern beltway),
- The Nowohucka street connecting 2nd ring road with north-eastern part of the city,
- The Wielicka road leading from city center to the south-eastern highway part.
Let’s compare this information to a real life traffic map of Krakow from Google Maps:
We can see that our insights correlate with the results from traffic radar. The mechanism behind that is quite simple — components with high betweenness centrality are those used to commute most of shortest paths in the graph. If car drivers select the best paths for their routes, then the streets and junctions with the highest traffic volumes will be the ones with the highest betweenness centrality.
Let’s head back to the last part of the graph engineering — extending graph borders. We can check what would happen if we only took the city’s borders to our analysis:
The A4 highway, which is one of the most important component due to the beltway nature, has one of the lowest centrality measures in the whole graph! This happens because as the A4 is at the outskirts of the city, and most of its traffic comes from the outside, we cannot include this factor in the betweenness centrality.
Let’s take a look at a different scenario for graph analysis. Suppose that we want to predict how a road closure (for example due to the accident) affects the traffic. We can use the centrality measurements to compare differences between two graphs, and thus examine changes in the centrality.
In this study, we will simulate car accident on A4–7 highway segment, which is a common occurrence. The accident will cause a complete closure of the segment.
We will start by creating a new road network by removing A4–7 segment from graph, and recalculating centrality measurement.
Let’s take a look at a centrality distribution:
We can see that it’s still very similar to the original one. To inspect changes in the centrality measurements we will calculate residual graph, where centrality measurements are the difference between original road layout and after the accident. Positive values will indicate higher centrality after the accident. Nodes and junctions missing in one the graphs (such as A4–7) won’t be included in the residual graph. Below is the measurement distribution of the residuals:
Again, we will filter out top 2% of streets and nodes affected. The threshold values this time are:
- 0.018 for nodes,
- 0.017 for edges.
We can see increases in roads connecting split parts of beltway to the city center, where the 2nd ring road is located. The highest change can be seen in the 2nd ring road which contains one of two left bridges over Vistula river on the western side of the city.
There are a few things that we cannot take account in during graph analysis. The two most important ones, that we could see in this analysis, are:
- Graph centrality analysis assumes uniform distribution of traffic among the nodes.
Which is false in most cases, as villages and cities have different population densities. However, there are other effects that can reduce this, for example a higher amount of people living in neighboring villages will choose a car as a commute option in comparison to the people living in a city center.
- Graph analysis takes into the account only things that are present within the graph.
This is harder to see in the provided examples, especially for someone outside the Krakow. Let’s take a look at Zakopianka. It’s a major traffic artery between the city centre and most of the municipalities south of Krakow, and it’s also part of DK7 (national road no. 7) which spans across whole country.
If we compare typical traffic on DK7 in Krakóow to our centrality measures, they’re completely different. Average betweenness centrality is around 0.01, which is a two times smaller value than the top 2% threshold. While in reality, it’s one of the most blocked sections.
Graph theory and its analysis have applications in multiple scenarios, such as traffic analysis presented in this study. Using basic operations and metrics on graphs, we can get valuable insights in much shorter time in comparison to building a whole simulation model.
This whole analysis can be performed using several dozen lines of Python code, and it’s not limited to one road layout. We can also very easily transition to other analysis tools from Graph Theory.
As all things, this method has also its drawbacks. The major ones being assumptions about uniform traffic distribution and scope limited to graph structure.
Github repository containing code used in this study can be found here.