Gearing up for the Microsoft Data Insights Summit, Microsoft hosted a Power BI Data Analytics Challenge. We participated by entering in the “Just For Fun” category, submitting a quickly thrown together dashboard with analytics about The Simpsons.
The data was sourced from Kaggle – The Simpsons by the Data. The data is driven by script lines. In addition, we supplemented the data with images and bios from http://www.simpsonsworld.com/. The Simpsons Data Analysis dashboard allows users to explore character profiles, season appearances, and dialog sentiment.
The Character Report tab assess who’s talking and where they are talking by word count. Images and bios enrich the character narratives. There’s over 6,000 characters, so we’ve grouped the Top 10 Talking Characters by word count to narrow the scope of analysis to the most prevalent characters in the series.
What’s interesting here, is if we click Ned Flanders on the bar chart for Word Count by Character, we can see he has a higher word count at the Simpson Home vs his own home (Flanders Home) on the bubble chart with Word Count by Locations. We can also see his image below and his occupation as Left Handed Items Merchant.
The next tab on the dashboard is Character Appearances. Here, the analytics provide Season Appearances by characters and shows the trend of Speaking Lines by Season. The heat map provides Location Count by season for script lines.
It appears that Homer has the most speaking lines in Season 10. If we click this point of data on the Speaking Lines by Season visualization, the location count heat map highlights where he has the most speaking lines – it appears to be Alec and Kim’s House.
Lastly the Character Sentiment Tab analyzes speaking lines by looking at detailed levels of sentiment by character – Character Sentiment Breakdown. Using R, sentiment was derived and normalized to categorize more levels of negative and positive. On the Sentiment by Season visualization, the overall sentiment also trends across season and dynamically populates a word cloud for the most frequently used words in script lines.
What we found interesting here was Mr. Burns and Ned Flanders have some of the highest positive sentiment among the top ten characters, almost equally. Mr. Burns does have the sinister catch phrase “Excellent…” which could be driving the higher sentiment, getting read as a positive word with the sentiment analysis despite having contextual meaning rather than affirmation.
If we had more time, we would have added audible catch phrases and interactive geo locations. We also wanted to explore character interactions, identifying when characters were talking to each other and where, given the sentiment of the dialog.
Explore for yourself here– can you find anything interesting about the characters, their appearances, and dialog sentiment?
Special thanks to Hana Rizvić, Le Bui, Charles Yorek, and Dusty Mangum for their contributions.