Analyzing the Highest-Grossing Movies of All Time
Let's see what are some secrets of the highest-grossing movies of all time.
Introduction
In my last post of scraping and cleaning the data from the IMDb website, I introduced ways to collect desired variables and put them together as a dataframe from html code. However, that’s just data collecting; not even the beginning of a real data analysis.
In this post, I am going to analyze the dataframe I gather last time using several different charts in seaborn. We will see what insteresting or suprising things we can find.
This is the IMDb dataframe we collected last time
IMDb vs Box Office and Rating
We are all curious at the relationship between IMDb, Box Office, and Rating. There are several insteresting questions we are going to explore here in this section. These are the questions like Is a high IMDb a guarantee of a high Box Office? Or what kind of rating has the highest Box Office and IMDb? And the lowest Box Office and IMDb? Do people really love PG-13 and R more than G and PG?
Let’s see what we found
According to the regression chart above, IMDb and Box Office are strongly positively related. That means one will increase when the other increase. Therefore, we can clearly answer the first question we asked. This is the question we asked: Is a high IMDb a guarantee of a high Box Office? Yes, in most cases, a high IMDb is almost always a garantee of a high Box Office. This definitely makes sense, people are more likely go to see a particular movie if others give it a good rating.
Here is a both interesting and surprising fact I found which verify a thought I always have:
Old movies are better!
Isn’t that shocking? We can see that IMDb is decreasing over time by the trend line chart above. There are definitely ups and downs, but the overall trend is going down… No wonder all the movies I love are from at least a decade ago like “The Shawshank Redemption(1994)” and “Schindler’s List(1993).” The fact is, in the top 10 IMDb movies, 8 of them are at least a decade old.
The last one
Several things I found in this scatter plot:
- The highest-grossing movies are mostly rated as PG and PG-13
- The No.1 highest-grossing movie’s IMDb is below a 8.
- There is not a single R-rated movie’s Box Office is below 200M
- PG-13 has the most spread IMDb and Box Office among all ratings</ul>
Rating vs IMDb and Box Office
Let’s continue our findings! We still have a couple of questions to answer. Let’s not forget our questions: First, what kind of rating has the highest Box Office and IMDb? Second,Do people really love PG-13 and R more than G and PG?
The boxplot above will help us find out the answer of our first question. According to the plot, the mean IMDb of G, PG, PG-13, and R are 7.7, 7.1, 7.2, and 7.9. People really tend to love R-rated movies! Of course it’s not accurate to say that is absolutely right, but at least from this chart we can see that is the tendency.
According to boxplot1 and boxplot2, my conclusion is people do tend to got to movies and give it a higher rating if it’s a G or R rated movie. I think it’s very interesting, if I were a producer or a director, now I know what kind of movie will make more money :)
The above heat map sums up the whole analysis pretty well! We can see that the ranking and the box office have the strongest relationship, although it’s a negtive relationship. The ranking is according to the worldwide grossing so far, no.1 has the highest worldwide grossing. Therefore, it makes prefect sense to say that the higher your box office is, the higher your gross ranking is going to be.
Conclusion
There are my main findings:
- The higher the IMDb, the higher the box office, vice versa.
- Generally, old movies’ IMDb are higher.
- The highest-grossing movies are mostly rated as PG and PG-13.
- People love R-rated movies and G-rated movies more than PG and PG-13.</ul>
During the EDA, We really found a lot of interesting things. Some of them are more of a common sense of IMDb, rating, and box office. Others are surprising and we didn’t expect that. The important thing is we have done the EDA successfully and beautifully. I hope you all enjoy this post and found it useful.
Here’s a link to my github repo link