Introduction


Thank you for walking along with for this data analysis journey, I hope you all enjoy my posts and are learning something cool about data! My motivation to start this journey comes from my love for great movies. I wanted to add more awesome/high-rating movies to my packet movie list. Therefore, a question raised in my mind, that is, Does a high box office movie necessarily means it’s also a high-rating movie? To answer this this question and a series of questions that follow, I collected and consisted a dataframe using BeautifulSoup and Pandas library. Furthermore, I performed a Explanatory Data Analysis (EDA) on that dataframe and found serveral interesting and surprising facts that help me better understanding the relationship between IMDb, rating, box office, runtime, and more. For more details, please check my previous blog posts.

The Data Story


In my previous post, I claimed that there are serveral facts of the relationship between IMDb, box office, and rating after I have done the EDA. In this post, I want to further support my claim and tell a story by first showing a graph which I think could summarize the story. We can observe a few things when we look at the plot. First, if we were going to draw a line that sort of connect all dots it would be a positive correlation line; it means that as the IMDb goes up, the box office goes up as well. Second, most of the dots are either blue or orange, which are PG-13 or PG; it means that the highest-grossing movies are mostly rated as PG and PG-13. The third one is not so obvious since there are so many dots, but the R and G dots are mostly in the right side of IMDb 7; it means that people generally give R or G movies a fairly high IMDb rating.

Test Image

The Story doesn’t end here


There are far more facts or secrets can be discovered when you dig deeper into the dataframe. The data analysis I performed and the story I told are only a corner of a huge iceburg. I hope you would spend more time on this fascinating data set and uncover more things than me; if you do, please comment below so we can learn from each other. I really appreciate your time and see you again soon :)

Link to Github Repository