Abbey Code

A Picture Is Worth 1000 Words. In This Case, I Decided To Stick With 400.

The Simpsons in Abbey Road by rastaman77 on DeviantArt

Introduction

Thanks for visiting my blog today!

For those of you who may not know, I love music and also play a couple instruments. One of my top 3 favorite bands of all time is The Beatles (#1 is Chumbawamba). The Beatles do not need an introduction as they are the most influential musical group in history.

Today, I’ve got a blog about the Beatles and their lyrics. I will later do some topic modeling and discuss how I might run an unsupervised learning algorithm with Beatles lyrics. For today, however, I’m going to share how I created a customized word cloud. That may not sound terribly exciting, but trust me when I say that this will be fun and useful.

Data Collection

First things first, we need some data as this is part of a larger topic modeling project. Now I’m sure that I can find a data repository or two with Beatles lyrics, but I decided to leverage web scraping. I found a lyrics website called lyricsfreak.com (https://www.lyricsfreak.com/b/beatles/) and web scraped the links for every song and later went into every link to extract lyrics via more web scraping and put them into a data frame. Interestingly, a couple of my songs were literally just short speeches (that were classified online as a speech, thankfully). One limitation of this source is that the website sometimes cuts repeated lyrics. I first noticed this when I was checking the lyrics of “All My Loving.” It’s a great song that hopefully you’ve heard before. All the unique phrases and groups of lyrics in the song are listed but it doesn’t repeat some of them even though the song itself does. Also, they don’t have every song. It was a little tricky to find a good website for lyrics that would work well with web scraping so I will be sticking with a slightly limited amount of data for this blog. I still think my results will be highly representative. You’ll see what I mean if you know The Beatles. So just a quick recap: I web scraped all my lyrics. More importantly, the way I performed this task was rather robust and I now believe I can quickly and easily scale my process to web scrape a whole discography worth of lyrics from any other band. We’ll see if anything happens with that some time in the future.s

Data Cleaning

I mentioned speeches. The first step in this process was to get rid of those. I’ll show you all the libraries I used (so keep them in mind for later) and what these speeches looked like:

Imports:

Some songs:

One speech:

Here’s all the speeches:

Let’s take a look at one song. I picked A Day In The Life as it was the first non-speech and also is widely considered to be the greatest Beatles song ever written given it’s advanced composition (I know a bunch of readers are going to disagree and say that either a song like Hey Jude or Strawberry Fields Forever is better or potentially contend that early Beatles music is better than later Beatles music. You’re entitled to your opinion; this blog is not an authoritative essay on what the best Beatles song may be).

By the way, the last chord in A Day In The Life (https://www.youtube.com/watch?v=YSGHER4BWME&ab_channel=TheBeatles-Topic around the 4:19 mark) is an E major, even though the song is in the key G major which does not have a G# note, necessary for an E major chord, in the scale. We call that move from E minor to E major (by changing the G note to a G# note) a “Picardy Third” in music theory and that difference is part of what makes this last chord so magical. If you’ve never heard the song, you’re missing out on possibly the most famous single chord in Beatles history. The other contender is the first chord (and literally the first thing you hear) in “A Hard Day’s Night.” Google says this other chord is an Fadd9 (adding a high G note to the F chord whose basic composition is just an F, A, and C note). (Special thanks to https://www.youtube.com/watch?v=jGaNdKabvQ4&ab_channel=DavidBennettPiano).

Ok, hopefully that made a little bit of sense or at least sparked some intrigue, but back to data cleaning. You may notice that the lyrics are written as a list. Here is how I dealt with that problem:

Here is a cleaning function designed to remove annoying characters like digits and punctuation:

Here, I combined every lyric into a list:

Next, I created a word count dictionary:

I earlier imported stop words (words like “the” that don’t help a whole lot) and here I have removed them and created a new list:

That’s basically it.

Word Cloud

So here is the basic word cloud (notice we are using generate_from_frequencies function to use the dictionary created above):

Looks pretty simple… and boring. Let’s try and make this a bit cooler. Like wouldn’t it be nice if we could work in an actual Beatles theme? How about the iconic Abbey Road album cover? Here is the picture you’ve likely seen before:

Why The Beatles' Abbey Road Album Was Streets Ahead Of Its Time

First, I decided to find a silhouette online that is a bit more black-and-white. I found this picture:

To actually use this picture, we will need to convert it to a numpy array. I’ve heard the term mask used to describe this process. I’ll copy the file path into python to get started.

What does it look like now?

It’s a bit messy, but…

And one step further…

Word clouds in python allow you to pass in a mask to shape your word cloud. In the code below, we basically have the same word cloud but have just added a mask.

Output:

I arrived at the magic number of 400 words through trail-and-error. I needed enough words to make a logo full and capture the letters of “The Beatles” and individual silhouettes while still avoiding having too many words and making each word hard to read and identify.

In terms of design aesthetics, we are almost done here. The color scheme doesn’t look great. Conveniently, word clouds let you pass in colors.

Output:

That looks a lot better now that we are using a new colormap. Interestingly, the word that stands out the most is “love.” The Beatles were well known for writing love songs and that is in fact the most common word after data cleaning..

Conclusion

Word clouds are fun and effective. Word clouds can also be utilized in different ways. If you want to customize your word clouds, you can find an image or two that stand out to you, copy the path name (or bring it into the folder), and then just work out the details within the code. It’s pretty scalable. We were able to see in this blog a process of web scraping lyrics, cleaning data, and creating an exciting and unique visualization at the end.

Thanks for reading!

Sources and further reading

(https://towardsdatascience.com/create-word-cloud-into-any-shape-you-want-using-python-d0b88834bc32)