Piano Guidance
Photo by Karolina Grabowska Pexels Logo Photo: Karolina Grabowska

Did Bach use seventh chords?

Furthermore, Bach's use of chords, mainly major and minor chords and the related intervals of those chords, agrees with the Baroque music form. More complicated chords, such as 7th chords, were considered dissonant and were avoided in the Baroque era.

Does flow key work?
Does flow key work?

Teachers designed Flowkey to help you learn how to play specific songs, and it does that very well. It provides you with videos of sheet music and...

Read More »
What are the 8 keys in an octave?
What are the 8 keys in an octave?

Octave in music is defined as the distance between the two musical notes of the same letter but twice the difference in the pitch of the respective...

Read More »

Bach’s favorite note

I always liked classical music. As a child, I spent about 10 years studying music theory and playing the violin. My great passion, of course, is science and data. I always tried to find ways to combine the two, to analyze classical music using data science. I didn’t want to analyze the sound of classical works, I wanted to focus my analysis on music theory. I wanted to study how a musical composer composed the music, not how it is played. In essence, I was looking for a way to transform a music score to structured data. I chose the music of Bach for several reasons. First, Bach is considered a musical genius and one of the greatest composers of classical music, so his music is always interesting to analyze. Second, Bach has written more than 1000 music pieces, so there was a good amount of data to be analyzed. Third, Bach’s music is written following the rules of the tonal system and counterpoint very strictly, which makes the music more structured and easier to analyze from a chord analysis perspective. Finally, I really enjoy Bach’s music so there was some personal preference as well. In this article, I don’t go into the technical details of the code behind this analysis, but if interested, the reader can find the full code here.

The data

My first problem was obtaining data that would let me analyze the music theory, as explained above. Audio files such as .wav , .aiff or compressed audio files such as .wma , .mp3 , etc. were not suitable for this kind of analysis. Going from raw audio of a full orchestra to specific music notes, chords, music keys, etc., would be a much more difficult project, one which I wouldn’t know how to finish in a reasonable amount of time. There is a format though, that could be used for my analysis. The .midi format, which is used to control electronic or digital instruments. With .midi , the note (pitch) playing, the duration of the note and the instrument (channel) that is playing the note, are directly specified. Furthermore, I had an easy way to transform .midi files to .csv files, i.e. structured data, at which point I could analyze it in familiar ways, like any other structured data. I discovered The Mutopia Project, which has a large collection of classical music works, all in the public domain. I scraped the site for works of J.S. Bach. The site has 417 works of Bach, but some have multiple files, so the total number of files was 529. I transformed all the .midi files to .csv files using code I found here. After I had the .csv files, I could start my analysis. The analysis was done in Python and my code can be found here.

Counting music notes

The first question about a musical piece I wanted to answer was which notes were used to compose it. Which note was the most used? Which note was the least used? Is the distribution of notes more or less uniform, or are some notes used a lot more than others?

Counting the occurrence of a note, in other words, the event that a new note appears could give us such a distribution. On the other hand, some notes may last for a full bar, while some others only appear as sixteenth-notes during a fast passage. I believe that the duration of a note should act as a weight in the distribution so that longer notes count more than notes with shorter duration. In a .midi file, an instrument is playing one or multiple notes for a specific amount of time. The time the note starts playing and the time the note stops playing are recorded. This gives us the duration of the note, which is measured in clock ticks. It can be converted to conventional note duration with the conversion factor (how many ticks in a quarter note) which is also included in the file. Thus, we can extract the duration of notes in conventional music notation, i.e. quarter notes, eighth notes, etc. Adding the duration of all the times a note is played in a musical piece will give us the total duration of that note. We could do that for every note, so in the end, we can compare the total duration of all notes to see which one was played the most. As an example, we look at Bach’s Invention 1 in C major (BWV 772). The sheet music can be found here. We can see that the piece is written in C major, which has no sharps or flats, so the natural notes all have higher durations. Also, C, the key’s tonic, is the most used note, as expected, the fifth (G) and the third (E), being in second and third place. Even though the piece is only 22 bars long, there is modulation to keys such G major, A minor, F major which is where the B flat, F sharp and C sharp come from. This can be extended to all of Bach’s works in our dataset. The values of each note of each piece are added and we get the total sum of the duration of all notes. The results can be seen in the following graph. The note D is the most used note by Bach, with very slightly higher usage than A which comes second and the rest of the natural notes following. In the graph, we plot the natural notes, the flats, and the sharps separately. This is done to demonstrate a striking observation. The natural notes certainly show some variability, with the most played note (D) having almost double the duration of the least played note (F). Nevertheless, the distribution is similar to a uniform distribution. On the other hand, the duration of the sharp notes varies greatly, with the most played note (F sharp) having a duration 46.5 times higher than the least played note (B sharp). The same applies to the flats. What is even more interesting though, is the order of the sharps from first to last matches exactly the order of sharps in key signature notation: F♯, C♯, G♯, D♯, A♯, E♯, B♯. In the same fashion, the order of flats matches exactly the order of flats in key signature notation: B♭, E♭, A♭, D♭, G♭, C♭, F♭. Finally, the double sharp and flat notes have duration hundreds of times lower than the natural notes.

Is violin supposed to hurt?
Is violin supposed to hurt?

Unless there is an underlying injury or one is playing in an orchestra where the chairs and sight-lines are inadequate, there should never be any...

Read More »
What is the number 1 guitar brand in the world?
What is the number 1 guitar brand in the world?

To sum it all up, the best guitar brands are Ibanez, Fender, and Gibson. The best overall option would be the Ibanez JSM100, based on its overall...

Read More »

The order of the sharps from first to last matches exactly the order of sharps in key signature notation

Counting chords

So far, we have been counting individual notes. In music though, several notes can sound simultaneously, forming chords. This happens if various instruments or voices play different notes at the time, or the same instrument plays more than one note at the same time. Here, we will define a chord as any number of notes played together. Thus, three notes played together, e.g. C, E, G, are a chord but also just two notes played together (E, G) or even one note by itself (E) is considered a chord. Also, we will consider notes at different octaves as the same note and only count it once in the chord. For example, if C4, E4, C5, and E5 sound together we will consider this as the (C, E) chord. Additionally, the order of the notes within a chord does not matter. For example, the chord C4, E4, G4 will be considered the same as the E3, G3, C4. More formally, our definition of chords is that of a set in music theory. We wish to apply the same method we used for notes to the chords. But how do we measure the duration of a chord? In order to do this, we split the piece to the smallest rhythmic values, which are sixty-fourth notes. For each sixty-fourth note interval, we collect all notes that are played at that interval and form a chord. We sum all the instances of sixty-fourth note intervals of each chord, which gives us the total duration of each chord in sixty-fourth notes. Finally, we transform the duration to quarter-notes. There are thousands of ways that the notes can be combined to form a chord, but most of them are very dissonant to be used in practice. Also, a lot of pieces are written for only one instrument, such as a violin, a cello, or a lute, so their full score is usually just single notes, with no chords appearing. This skews the data towards single notes. So, here we show only chords with 2 or more notes. Still, there are thousands of combinations of notes used in the works of Bach, even if some of them are used only once, or for a very small amount of time. In the following plot, we show the 30 most used chords in Bach’s works. We notice that the interval (B, G) takes the first place, followed by intervals (E, G) and (A, C). The first full chord, i.e. a harmonic triad having a tonic, a third and a fifth note, is (B, D, G) or G major, which appears in the fourth place. The harmonic triads that are following are D major, C major, A major, D minor and A minor.

Chord ranking

As mentioned, we found thousands of note combinations in Bach’s works, but most of them are very rarely used, while others are used all the time. This is similar to the use of words in natural language. Zipf’s law describes this behavior of ranking data. Specifically, Zipf–Mandelbrot law, which is an extension of Zipf’s law, describes the distribution of ranked data. We fit a function in the form of Zipf–Mandelbrot law and obtained a relatively good fit. We should emphasize that, unlike the last graph, single-note chords are included in this graph. There is a paper by Juan Ignacio Perotti and Orlando Vito Billoni, where they describe similar findings when Zipf’s law is applied to notes and chords in musical pieces.

Chord types

Another interesting finding is that all the full chords in the first positions are either major or minor chords. We would like to explore this idea even further and determine all the types of chords used by Bach. In order to do that, we need to convert the chords from regular chords of notes to pitch-class sets. A pitch-class set is a set of numbers denoting the relative difference of the notes comprising the chord to the first pitch, which is given the number 0. The difference is measured in semitones. For example, a major chord triad corresponds to the set (0,4,7), since the first and the second note are 4 semitones apart (a major third), and the first and the third note are 7 semitones apart (a perfect fifth). This way we can group the types of chords together since all major chords correspond to the same set. In the same manner, all minor chords correspond to the set (0,3,7). Other common sets have traditional names as well, for example, the set (0,2,7) is called a suspended chord and the set (0,1,4,7) is called a diminished major seventh chord. A set of notes usually corresponds to more than one pitch-class set. As an example, the set (C, E, G) can correspond to the set (0,4,7) but also to the set (0,3,8) depending on the order of the notes. We always take the most compact form of a set — i.e. the form where the distance between the first and last pitch is the smallest — to be the pitch-class set. Furthermore, there might be more than one most compact set. In this case, we choose the set that comes first in a shorted order. For example, between the sets (0,1,5) and (0,4,5) we choose the first one. We present the results in the following figure. Whenever possible, we substitute number sets with their traditional chord names. Immediately, we notice that all first positions correspond to well-known intervals and chords. Here, we have identified major seventh chords, suspended chords, and other types, using modern terminology. Of course, in Baroque music, the composers did not think in those terms. Except for major, minor, diminished and augmented chords, the rest are instances where one voice is playing a melody with “non-chords” notes, before joining the rest of the voices in a consonant chord.

Are ivory piano keys solid ivory?
Are ivory piano keys solid ivory?

The keys are actually veneer coated wood. The ivory itself is simply a thin cover, cut to fit in segments atop the key surfaces and faces. If you...

Read More »
How do you read jazz chords?
How do you read jazz chords?

Finally, we come to one of the most common chords in jazz; the dominant 7 chord. It is formed by adding a b7 to a major triad. 1 3 5 b7. With a...

Read More »

Additionally, we notice that single notes and two-note intervals are a lot more common than full chords. This is, again, because several works are for solo instruments, so they contain a lot more single notes. Another interesting observation is that major chords are almost 1.5 times more used than minor chords.

Chord progression

We are moving now to the progression of chords. From a music theory perspective, a music piece can be thought of as a series of chords, one after the other. This is called chord progression and is a major topic in music theory. We need to clarify that the composers of the Baroque era did not think in terms of chord progression, this is a modern interpretation. Nevertheless, it is interesting to look at the chord progression in Bach’s works from a modern music theory standpoint. This time, the duration of a chord is not important. We only care about the chord that comes next. We convert the chords to pitch-class sets, as we saw before. For each piece, we create a list with the chords in order, from the first chord played to the last one. We then group the chords in sequences of two or more chords. This process is equivalent to n-grams for a piece of text, where the chords can be thought of as the words of the text and the music piece as a document. Actually, in the code, we used exactly the same methodology as we would use for natural language processing. At this point, we restrict our search to chord progression with at least one triad of notes. This is done because, as we noticed before, the data is skewed towards single-note and double-note chords, but to better understand the harmony of a music piece we need to study full chords. If we didn’t use this restriction, the first positions will all be single-note and two-note progressions. In the following graph, we show the first 30 two-chord sequences (bigrams). We notice that Bach is usually moving from a major chord to a minor chord and then back to a major chord, or moving from one major chord to another. We also notice the use of suspended chords. A common technique in counterpoint is to carry over a note from a previous chord to the next chord and finally to resolve it to the third or the tonic. We could generalize to all dissonant chords — at least according to Baroque standards — such as suspended chords, seventh chords, etc. We can see that these chords are always resolved to a consonant chord (such as a major or a minor chord). Next, we plot three-chord sequences (trigrams). This time, we notice that the progressions become more specific. It is less probable that we will find a specific order of three chords. For example, the progression (major -> minor -> major) is different that (minor -> major -> minor). Even though it is obvious that in both progressions we are just moving from a major chord to a minor and back, we could have several permutations of the order of the progression, all resulting in different progression. If we extended this to even longer sequences (n-grams) the progressions would become even more specific and less frequent. Thus, we stop the analysis here. Maybe later in time, we could try to group the permutations to achieve more interesting results for long sequences.

Conclusion

In this article, I tried to analyze the music of J.S. Bach using data science methods. The findings, combined with historical knowledge about the music period when Bach composed his music, agree with well-known facts from a music theory perspective. First, the frequency of the notes used by Bach almost perfectly fits the well-tempered tuning and common practice tonality. Bach played a significant role in showcasing the possibilities of well-tempered tuning, which later evolved in the standard equal temperament system used in all Western music. Furthermore, Bach’s use of chords, mainly major and minor chords and the related intervals of those chords, agrees with the Baroque music form. More complicated chords, such as 7th chords, were considered dissonant and were avoided in the Baroque era. In Bach’s works, such dissonant chords mostly occur when one of the voices plays a melodic passage over a major or minor chord. The notes of the passage combined with the main chord might give a more dissonant chord, but in Bach’s era, this would not even be considered as a chord. Finally, the chord progression analysis is in agreement with the common practice chord progression. We observe that the most common progressions are moving from a major to a minor chord and back, or even sustain the major chord with some additional notes, occasionally. We also notice that most dissonant chords are immediately resolved to a major or a minor chord. This analysis can be extended to the works of any other composer of the classical period, given enough data, even to collections of works of several composers of the same time period. Other ideas can also be explored, such as grouping the works in clusters to reveal interesting patterns, or use this kind of analysis to identify the composer of a musical piece.

The full code for this analysis can be found here.

Can you replicate a broken key?
Can you replicate a broken key?

Can You Copy a Broken Key? While broken keys can be copied, you need to take into consideration how badly damaged the key actually is, as the key...

Read More »
Can I glue a broken keyboard key?
Can I glue a broken keyboard key?

Do not under any circumstance use glue to repair your keys. This causes the key malfunction. Mar 13, 2021

Read More »
Join almost HALF A MILLION Happy Students Worldwide
Join almost HALF A MILLION Happy Students Worldwide

Pianoforall is one of the most popular online piano courses online and has helped over 450,000 students around the world achieve their dream of playing beautiful piano for over a decade.

Learn More »
What is the black key on a piano called?
What is the black key on a piano called?

sharps and flats The white keys are known as natural notes, and the black keys are known as the sharps and flats. Jul 20, 2017

Read More »
Can a piano be in the sun?
Can a piano be in the sun?

In fact, sunlight also causes these finishes to fade over time. Never keep a piano exposed to sunlight. If you absolutely have no choice, the best...

Read More »