Music Understood By Numbers

If there is one topic I am quite ignorant about, it is probably music. One reason for that is I never learned to play an instrument (other than the computer :), because traditional music notation has always seemed highly inelegant to me, for something as elegant as music, so I refused to deal with it. I still wanted to understand music however, but since no book talked about it in my “language”, I did what every programmer would: write some code to figure it all out on the basis of just numbers.

The initial goal is to find a justification for our current musical system (12 half notes).

For any musical system, our goal is to layer sine waves on top of eachother to create more complex sounds.

To create sounds that sound “pleasant”, we need to be able to layer sine waves such even though they have different frequencies, they have wavelengths that are fixed ratios of eachother, such that they overlap in the same way again, after very few cycles. Infact, a ratio of 2:1 (an octave) is easiest to the ear, and repeats in the shortest amount of time (after 1 cycle of the lower tone). Other ratios, such as 2:3 and 3:4 are equally important. 2 sinewaves which don’t have such a ratio will take many cycles to coincide again, by which time the ear has lost track, which means they sound as if the sounds don’t belong together or work against eachother. This is what you get if you have a dissonant or just a sound that is “off”.

So we need a system whereby the notes we have chosen can most easily build these ratios. To start with, the concept of an octave is a given, if you see that we need to divide up notes in an logarithmic rather than linear fashion. The question that remains is: how do we split up the octaves.

Lets first look at our current 12 half note system, and then see if it is any better with an N other than 12. If we divide an octave in 12 steps, doing this we get the following ratios going from 1 to 2:

[1.00000,1.05946,1.12246,1.1892,1.25991,1.33482,1.41419,
 1.49828,1.58736,1.68175,1.78175,1.88769,1.99993]

What we can simply compute to get an impression of the layering capacity of a certain musical system, is to compute for each note how many wavelength cycles are needed before we are back at the same starting point relative to a governing octave. The lower this number, the better suited the note for building pleasent ratios.

The problem is that for waveforms to get back to the starting point exactly is almost an impossibility with our logarithmic sequence. We can assume that if the waveforms almost coincide, that the ear won’t hear the difference. How much tolerance the ear exactly has before a pleasent note turns into a dissonant I don’t know, and probably differs per person, so I experimented with different error tolerances to be able to compare the results:

tolerance -> 0.01 0.03 0.05 0.10
1 84 17 16 16
2 49 8 8 8
3 37 16 16 5
4 50 23 4 4
5 3 3 3 3
6 70 12 12 5
7 2 2 2 2
8 63 17 12 5
9 22 22 3 3
10 55 23 9 5
11 89 9 9 9
12 1 1 1 1

the numbers in the table are the amount of cycles before waveforms coincide again, and is the important number. As you can see, even at the very low tolerance of 0.01, the 5th and 7th half note stand out as being very close to perfect ratios with the octave… this is no surpise as the correspond to well known tonics/ratios in traditional music, i.e.:

half note cycles ratio name
12 1 1:2 octave
7 2 2:3 septime
5 3 3:4 quint

The code to compute the above table is btw (in haskell):

tolerance = 0.01
octave::Int
octave = 12
factor = [0,0,0,1.25992,1.18920,1.14869,1.12246,1.10408,1.09050,1.08005,
          1.07177,1.06504,1.05946,1.05476,1.05075,1.04729,1.04437] !! octave
f x n = (n,g y):f y (n+1) where y = x*factor
g y = take 1 (filter (close [1..100]) (h (y-1) 1))
close [] _ = False
close (n:ns) (q,x) = (o<tolerance && o>(-tolerance)) || (close ns (q,x)) where o = n-x
h y n = (n,(y*n)):h y (n+1)
main = take octave (f 1 1)

So using these tables we can order each note to “pleasantness”, depending on tolerance:

0.01
12
7
5
9
3
2
4
10
8
6
1
11
0.03
12
7
5
2
11
6
3
8
1
9
4
10
0.05
12
7
5
9
4
2
10
11
6
8
3
1
0.10
12
7
5
9
4
3
6
8
10
2
11
1

The ones in red are notes that would be “not pleasant”. As you can see from the numbers, these are close to the ratios used in music, predicted purely statistically.

As it turns out, the official hertz for notes used today doesn’t follow the exact logarithmic progression. This makes sense too, as we can see from these numbers that the important ratios like 2:3 etc don’t coincide exactly. So we can make them coincide exactly, by fiddling with the numbers slightly, making the 2:3 and 3:4 sound even better, probably at the cost of notes like 2 and 11, but they sounded bad anyway.

Another interesting question remains, and that is: is a subdivision of an octave logarithmically by 12 the most optimal system? An easy way to find out is to simply compute the above tables for systems other than 12, and see which one has the best number of cycles for 2:3 (most important after 1:2, which is present by definition):

other octaves (tolerance 0.01):

divisor 2:3 cycles
16 11
15 11
14 25
13 13
12 2 <-
11 7
10 8
9 6
8 6
7 26
6 49
5 27
4 22
3 50

So maybe an 8 or 9 tone system could work, but nearly as good as the 12 tone one.

So assuming we have just proven that 12 steps is the best subdivision, how did we ever get into this mess of having whole and half notes, shifted differently depending on sharp/flat?

[edit: the text below is questionable, this article explains it much better. Feel free to keep reading for amusement, however]. [edit2: this article is also pretty good, an easier to follow intro into the topic].

Well, for playing on instruments (piano being a good example), we want to play notes with distances such as 7, 5, 3, 4 etc all the time. But those are large number so it is inconvenient. What if we could somehow divide that number 12 in 2 or 3 (6 notes of 2 steps, or 4 notes of 3 steps)? sadly, the numbers are the wrong distance from an octave for that (not divisible by 2 or 3).

But it can be made to work if instead of 6 notes, we do 7 of them and make 2 of them steps of just 1. if we split the remaining 5 in the middle as best as we can, then the 5 and 7 distances are always available on the main 7 notes!

So starting from 12, 7 and 5, the standard piano layout actually makes mathematical sense, in the way that is no better system for splitting up 12 notes (!). The big problem this of course introduces, is that depending on which note you start an octave, certain half notes are not available for playing these ratios, or in the case of a piano, would be on the black keys. This is where the ugly hack major/minor and sharp/flat got introduced where the starting half note effectively gets shifted, and thus the 7 full notes are mapped to different half notes every time. This takes what is such a clean system, and makes it very messy.

So just for the convenience of having less keys/strings than 12, we have to cope with those 7 meaning something different every time. I don’t think that is worth it, and prefer to stick with the 12, which can express all ratios without sharp/flat needing to exist at all. Take the famous C-E-G combination, that only is C-E-G in C-major. In many other scales, they are different notes, or you have to introduce sharp/flat to reproduce them (i.e. half a note up it becomes C#-F-G#, or something different yet again in a different scale… is that intuitive to you? not to me). In the 12 half note system, they are always the same 0-4-7 ratios, no matter where you place them. As long as you have these ratios, it will sound good in a certain way, independently of “scale”.

By predominantly using 7 out of 12 notes, each different scale has gotten its own “flavour”. So the question is, if you use ratios on 12 half notes, you’d essentially be making music that could be said to be without a set scale, r rather it would have a certain scale, just you’d be using a lot of “black keys”. It may be that we are by now culturally conditioned to find music that stick to subsets of notes (a scale) more pleasent than music that uses the full spectrum.

The above only explains what sounds are pleasent in isolation, but it doesn’t yet address the much bigger topic of what combinations in sequence sound good… obviously there, a dissonant can suddenly be amazing, given the right context. Hopefully using ratios this will be easier to play around with than classical notation… but that is for a future installment :)