Learn about communities and closeness centrality in social network analysis with Python and NetworkX
In Part 2, we expanded our understanding of social network analysis by graphing the relationships between the members of the bands Smashing Pumpkins and Zwan. Then, we examined metrics like degree centrality and betweenness centrality to investigate the relationships between the members of the different bands. At the same time, we discussed how domain knowledge helps to inform our understanding of the results.
In Part 3, we will cover the basics of closeness centrality and how it is calculated. Then, we will demonstrate how to calculate closeness centrality with NetworkX using Billy Corgan’s network as an example.
Before you start…
- Do you have basic knowledge of Python? If not, start here.
- Are you familiar with basic concepts in social network analysis, like nodes and edges? If not, start here.
- Are you comfortable with degree centrality and betweenness centrality? If not, start here.
Closeness centrality is a measure in social network analysis that quantifies how close a node is to all other nodes in a network in terms of the shortest path distance.
Closeness centrality focuses on the efficiency of information or resource flow within a network. The idea is that nodes with higher closeness centrality are able to reach other nodes more quickly and efficiently, as they have shorter average distances to the rest of the network.
The closeness centrality of a node is calculated as the reciprocal of the sum of the shortest path distances (SPD) from that node to all other nodes in the network.
Closeness Centrality = 1 / (Sum of SPD from the node to all other nodes)
Higher values indicate greater centrality and efficiency in information flow within the network.
Calculating Closeness Centrality
Let’s break it down, using a simple network with eight nodes.
- Calculate the shortest path distances (SPD) from node A to all other nodes. For our example, we will use simple example distances. In practice, this would be done with a shortest path algorithm like Breadth-First Search or Dijkstra’s algorithm.
2. Calculate the sum of the shortest path distances from node A to all other nodes.
3. Apply the closeness centrality formula.
Closeness and Community
We can think of communities as groups of nodes that are more densely connected within themselves compared to connections with nodes outside the group. Communities capture the idea of cohesive subgroups or modules within a network, where nodes within the same community have stronger connections to each other. Communities are characterized by the presence of dense intra-community connections and relatively sparser inter-community connections.
When we consider the members of the bands Smashing Pumpkins and Zwan, it is easy to imagine how the bands are connected to each other by the members that they share. This demonstrates both intra-group connectivity among the members within each band, and inter-group connectivity between both bands.
While closeness centrality measures individual node importance and information flow efficiency, communities capture cohesive subgroups with dense connections. Together, they contribute to understanding the dynamics of information flow and the organization of the network.
Let’s discuss a few ways that we can use closeness centrality and community to interpret network dynamics.
- Closeness centrality within communities
Nodes that belong to the same community often have higher closeness centrality values within the community. This indicates that nodes within a community are closely connected and can reach each other quickly in terms of shortest path distances. Higher closeness centrality within communities reflects the efficient information flow and communication within the subgroups.
2. Bridging Communities with Closeness Centrality
Nodes that connect different communities or act as bridges between communities may have higher closeness centrality compared to nodes within individual communities. These nodes play a crucial role in connecting separate communities, facilitating communication and information flow between them.
3. Community-level analysis using closeness centrality
Closeness centrality can also be used at the community level to analyze the importance of communities within the network. By aggregating closeness centrality values of nodes within a community, one can assess the overall efficiency of information flow within the community. Communities with higher average closeness centrality may be considered more central and influential in terms of their ability to access and disseminate information within the network.
Closeness centrality measures individual node importance and information flow efficiency, while communities capture cohesive subgroups with dense connections. Together, they contribute to understanding the dynamics of information flow and the organization of the network.
When considering Billy Corgan’s sphere of influence, closeness centrality can provide insight into how members of Smashing Pumpkins and Zwan directly and indirectly influence the other musicians in Billy Corgan’s network. We can use the concept of community to describe each band, but we can also use it to describe the aggregate of both bands. In reality, the community of alternative rock musicians from the 1990’s is vast, and when we add more bands to the network, more communities will emerge.
- Just as we did in Part 2, we are going to create a function that will generate all of the combinations of band members for each band.
2. Next, we define each band, and apply the function to generate the list of tuples. Then, we combine the lists and use a list comprehension to remove any doubles.
3. Now we can draw the graph.
It should look something like this:
4. Finally, let’s calculate the closeness centrality and analyze the values.
The output should look something like this:
So what can we say about the values?
- Billy Corgan and Jimmy Chamberlin have the highest closeness centrality of 1.00, indicating that that they are the most central member in terms of reaching other members quickly.
- James Iha, Katie Cole, D’arcy Wretzky, Melissa Auf der Maur, Ginger Pooley, Mike Byrne, and Nicole Fiorentino have the same closeness centrality value of 0.785714. This suggests that these members are closely connected and can reach each other quickly.
- Paz Lenchantin, David Pajo, and Matt Sweeney have a slightly lower closeness centrality value of 0.611111. This indicates that they may be less central in terms of reaching other members compared to the previous group, but they are still relatively well connected within the network.
Since we are still dealing with a relatively simple network, these results do not reveal anything beyond what we learned when we calculated degree centrality and betweenness centrality for Billy Corgan’s network. In Part 4, we will add complexity by introducing more bands and musicians to the network. As a bonus, we will introduced some advanced techniques in Matplotlib to make your NetworkX graphs even more engaging!