Storytelling via Temporal Networks

We take a quick dive into the evolution of characters in the Harry Potter series across different books using Python Programming and Network Science find out how the relationships between characters change over time

Zhiheng Jiang
7 min readApr 9, 2022

This post was co-authored with Tan Yi Kai.

Photo by Shayna Douglas on Unsplash

The Harry Potter series a timeless classic which has been enjoyed by both children and adults for more than two decades now, both in books and cinema. The Harry Potter movie series comprises of 8 movies spanning over 7 years in the Harry Potter universe. In this post, we will explore the relationships of characters in the Harry Potter movies from a temporal perspective, and shed some light on how we can use Graph Theory techniques with Python Programming to discover how characters and communities evolve over time.

Data

The scripts of the Harry Potter movies were analyzed rather than the books themselves because of the regular structure the scripts provided, which made defining the network more straightforward, and removes the need to denoise the data which would be a necessary step if we used Natural Language Processing techniques like Named-Entity Recognition to do so. The script data for the Harry Potter movie series was obtained from Kaggle, at the link below.

The license to the data can be found here.

Defining the network

We are going to construct a dynamic network of characters in Harry Potter. In our case, we are going to examine 8 snapshots of the character network, which are derived from the 8 Harry Potter movies. In each snapshot, each character is a node, and two characters are defined to be connected (i.e. related) by an edge if they share a common scene in the movie. A graph comprises of a set of nodes and edges. For each graph snapshot i, the snapshot will comprise of all the edges from the first movie to movie i.

The edge weight (or significance) would be calculated using the sum of the frequency of two characters appearing together. For the purposes of this study, we only considered characters who have appeared at least 10 times in total as part of our initial pruning process to remove unimportant characters.

Network Structure

Let’s take a look at the number of nodes and edges over time as shown below in Fig 1.

Fig 1: Number of nodes and edges over time

We can see that at the start of the Harry Potter series, the number of nodes (or characters) increases slightly faster than the number of interactions between them, as new characters are being introduced into the story. Towards the end of the series, the author starts to focus on developing existing relationships between the characters and the number of new main characters introduced decreases.

This pattern is also showcased by the overall network density of the network over time. The network density refers to the total number of edges divided by the maximum possible number of edges in the network. The formula for network density is given below.

Fig 2 below shows the change in network density over time.

Fig 2: Network density over time

As shown above, the network density initially decreases as new characters are introduced and relationships between them are not fully developed yet. By the 4th book (index 3 above), the network density reaches a minimum point and stabilizes as fewer new characters are added relative to the number of new interactions.

The 4th book also happens to be the movie, Harry Potter and the Goblet of Fire, which marked the return of the main villain in the story, Lord Voldemort, together with his cronies, the Death Eaters. At this point of the story, most of the main characters have already been introduced. This point serves as a turning point in the story, as for subsequent movies like the Order of the Phoenix, Half Blood Prince and especially the Deathly Hallows will largely involve existing characters and existing interactions (since edge weights are disregarded for now).

Centrality of characters

Eigenvector centrality is often used to measure the centrality of a node in a graph, in relation to the other nodes it is connected to. In other words, a node has higher eigenvector centrality (greater importance) if it is connected to other important nodes. This model is useful in the context of a story network, as the importance of a character does indeed depend on the people that particular character interacts with. Fig 3 below shows the eigenvector centralities of some of the characters selected.

Fig 3: Eigenvector centralities of characters over time

We can make a few key observations regarding the data above. Firstly, we can see that the main characters, Harry Potter, Ron Weasley and Hermione Granger, consistently remain at the top in eigenvector centrality, since the story revolves around them. The centrality scores decreases slightly as the network becomes more complex and new characters are added over time.

We can also use the centrality scores to measure influence peaks of characters. Let’s take a look at Draco Malfoy. The eigenvector centrality peaked during the 2nd and 3rd books (indexes 1 and 2 above) as Draco Malfoy was Harry Potter’s main nemesis in the early books before that role was replaced by Voldemort.

Similarly we can see how certain characters become more influential and consequential to the storyline over time. For example, Voldemort in the early books had not regained his full strength and hence remained largely in the background until the 4th book (index 3). Towards the end, it is evident that the character’s influence on other characters (especially key characters) increased significantly, based on the eigenvector centrality.

Other metrics of centrality (degree centrality, closeness centrality, etc.) can also be used, and similar patterns can be obtained. However, for more complex social interactions, different metrics may yield more diverse patterns than this study. More examples can be found in the notebook attached.

Forming Communities

Another aspect of the network we can analyze is the communities, which are clusters of characters who commonly interact with each other (i.e. social groups) of characters in the network. In this project, we will be using the Community Detection using Louvain Modularity Maximization (see here for documentation).

Firstly, we removed the main agents in the story (Harry, Ron and Hermione) because their role in the story is so central such that they interact with almost all the characters in the story and can disturb the community structure of the overall network. We then removed 50% of the edges with the lowest edge weight as part of our initial cleansing process.

In order to measure the strength of community structure of a network, we can use the modularity metric to gauge how well a graph of nodes and edges can be partitioned into different communities. Fig 4 below shows the modularity score over the different stories.

Fig 4: Changes in modularity of network partitioning over time

From Fig 4 above, we can see that the modularity score has two peaks, a small peak at around index 2 and a large peak at the last index 7. The first peak is located somewhere in the middle of the story, which can be considered a turning point in the community structure of the network. This was before the villain’s (Voldemort’s) return to power. Hence it can be suggested that the Harry Potter series can be split into two separate parts, where the first part was mainly about developing Harry Potter’s connections at Hogwarts, where the second part was mainly about the interactions between characters during the fight against Voldemort. At the end of the story, at index 7, modularity reaches a new high, hence this can be an indicator that the communities of characters are fully formed only by the last book when the author decides to wrap up all the relationships between the different characters and tie up loose ends.

Conclusion

A dynamic, time-varying network is a useful model for us to understand how agents interact in a network by introducing a new temporal perspective. In the context of narratives, studying the temporal evolution of a story network can help us discover new insights on how characters are introduced and developed upon by authors over time, through network and community structures, which can be measured using certain network metrics. Using temporal networks allows us to obtain a macro view of the entire storyline, which can be useful in comprehending such literature. Beyond stories, such temporal networks can also be used when tackling other forms of influence networks which is key to helping us better analyze how the interactions between different agents vary over time.

The code used in this post can be found here.

--

--

Zhiheng Jiang

Enthusiast about AI, Data Science and Complex Systems.