Movie Plots Explained : Deriving the Most Profitable Movie Plot from Data

Movie Plots Explained : A sentiment-driven method of identifying which kind of plotline maximizes revenue

Image for post
Image by Igor Ovsyannykov from Pixabay.


In today’s world, stories (in TV shows, movies, books, even commercials) generally adopt one of a few famous plotlines.

For example, there’s the “rags-to-riches” story in which the main character starts in strife and ends in happiness (e.g. Hansel and GretelCinderella, Austen’s Pride and Prejudice, etc.).

There’s the “man-in-the-hole” story (one of the most common) in which the main character exists happily but goes through pain and loss to solve a problem, ultimately returning to their happiness in the end (e.g. Disney’s Finding Nemo, Tolkien’s The Hobbit, etc.). There are some stories that don’t fit any particular pattern but still retain characteristics of the other categories.

Still, movies capture everyone’s attention. Especially in 2020 (and so far in 2021), movies are how people kept themselves entertained! With all the attention on film, a relevant question to ask is:

Which type of story arc is the most successful? What is the “ideal” story plot that will make money in the film industry?

To find out, I pulled some movie metadata (revenue, title, etc.) from a dataset on Kaggle and found another containing Wikipedia’s plot descriptions for about 35,000 movies. You can find the code for this project on my GitHub, here.

Digitizing the Emotions of a Movie

My approach was to use movie plots to inform the overall emotion at a particular point in the movie. More specifically, I wanted to take the full plot summary for a movie and do sentiment analysis sentence-by-sentence to construct the story curve. For example, here’s an excerpt of Cinderella’s plot description:

Cinderella is a kind young woman who lives with her wicked stepmother and ugly stepsisters. They abuse her and use her as the house maid. Cinderella thinks she’s all alone in the world, but doesn’t know a fairy godmother is constantly helping her.

A sentiment analysis program might look at these brief sentences and interpret the first two as clearly negative and the third as negative with a bit of upside (since “help” is a key part of the sentence).

We can therefore conclude that, at this point in the movie, things aren’t looking so great for Cinderella.

My aim was to construct plots of each of the movies using this approach, then perform a weighted average by how successful (how much annual revenue the film produced) they were. This can also be interpreted as the movie most people would enjoy watching. This way, I pay more attention to the plots that performed well.

After I used sentiment analysis on the plot descriptions, I ended up with numbers that traced out an approximate plot of the movie. For most of the films, it worked really well. Here’s the plot curve for Cinderella (1950):

Image for post
Cinderella’s plot curve. Cinderella starts poor (1), discovers she has a fairy godmother (2), leaves the prince at the ball when the clock strikes midnight (losing her slipper) (3), then finally gets together with Prince Charming for a happily ever after (4). Image by author.

Note that the negative numbers on the y-axis correspond to negative emotion. Cinderella starts negative, becomes positive, turns negative once more, then finally ends positive.

Another, more complicated example is Twelve Angry Men (1957):

Image for post
Twelve Angry Men’s plot curve. The movie starts fairly positive (1), then runs into issues as people debate (2), change sides (3) (4), then finally reach the resolution (5) (6). Image by author.

Results: The “Ideal” Movie

Taking a simple average of all the plots yields the most common movie plotline throughout history:

Image for post
The average movie curve demonstrates a “man-in-the-hole” plot. Image by author.

Not surprisingly, it’s the “man-in-the-hole” storyline.

This makes a bit of sense seeing as how it’s one of the easier arcs to create and leaves the audience feeling complete and content by the end (especially useful for children’s movies!).

Some more examples of this you might know: Disney’s Monsters Inc., Lewis Carroll’s Alice in Wonderland, and The Day After Tomorrow (2004).

But the question I was trying to answer wasn’t which arc was the most common — it was which arc was the most successful.

I weighted each movie by its total revenue (essentially paying less attention to movies that didn’t perform so well and more attention to movies that were successful) and recalculated the movie plot.

Image for post
Weighted average movie curves by revenue. The arc is still a variant of the “man-in-the-hole”! Image by author.

The “ideal” story arc is… the “man-in-the-hole”! This plotline is extremely common (which means there are tons of failures and successes but mostly failures since doing well in film is a tough task), but the fact that it performs so well speaks volumes about the power of a simple plot brought to life by other themes and characters.

There’s a reason so many of the ancient myths all over the world (from the Greeks to the Ancient Indians to the Aztecs) are based primarily on the man-in-the-hole storyline.

I personally find it fascinating that one of the oldest and most popular story types continues to live on in our modern world, even through a multitude of mediums.

Written by Abhinav Raghunathan

Edited by Alexander Fleiss

Movie Plots Explained