Main Blog Music Information Retrieval

In this post we move a step up from the Intervals Table and we describe what we call the Interval Matrix.

The Intervals Matrix has been described in our London Information Retrieval meetup, in October 2019 (here the slides): it is an intermediate data structure used in the Cover Song Detection approach we are investigating in.

We already described in a previous post what an interval is and its central role in our approach; however, it’s worth to briefly repeat: it allows to think in terms of “relativeness” without caring about absolute frequencies or pitches.

Brief recap: what is an Interval?

The music theory defines an interval as the difference in pitch between two sounds.

An interval can be horizontal (or linear or melodic), if it represents the distance between two successively sounding tones, and vertical or harmonic if it pertains to simultaneously sounding tones, such as in a chord.

Horizontal and Vertical Intervals

In Western music, an interval represent the distance between two notes of a diatonic scale. The smallest of these intervals is a semitone.

The diatonic scale is a musical scale that has

  • seven pitches per octave
  • five double semitones (a.k.a. tones) intervals
  • two semitones intervals separated by two of three tones intervals

The C-Major scale is a perfect example of a diatonic scale (1/2 = semitone, 1 = Tone):

The Diatonic Scale

Brief recap: The Intervals Table

The Intervals Table is a matrix which defines the possible intervals between each note of the chromatic scale:


Although in Music Theory each interval is denoted using a name (e.g. 0 semitones => unison) the table above uses the number of semitones as a measure for indicating the distance.

Using the Intervals Table above we can say that:

  • the distance between A and D# is 6 semitones
  • the distance between B and B is 0 semitones (unison)

The Intervals Matrix

Before introducing the central concept of this post, it’s definitely better to start from the input, the chroma features / chroma matrix, and then see the manipulation needed for getting an intervals matrix.

Chroma Features

The chroma features is a representation, along an interval composed by subsequent instants {t0 – tn}, where for each instant the audio signal is decomposed using the energy associated to 12 classes representing the 12 distinct semitones (or chroma) of the musical octave used in Western music notation. Here’s a sample matrix:

where you can see 12 * t (t=6 in this case, t6 is partially hidden) matrix where each vector represents the twelve pitches strengths for a given instant t.

The following is another example: the bass riff of “Zombie”, by Cranberries. On the upper staff you can see the symbolic notation of the riff, and then below there’s the corresponding chromagram; the coloured boxes highlight the compounding parts of the riff across the two different representations.

From Chroma Matrix to Intervals Matrix

For each Chroma Vector we keep the top k most intense pitch energies and we replace them with their corresponding ordinal position:

We will end up having a matrix of integers the with the same columns of the original matrix but only k rows. Each vector contains a ranked list of the k stronger pitch classes, sorted in descending order. Note the underlying assumption is that in order to represent and summarise a chroma vector, the stronger a class energy is, the better.

Almost done: the Interval Matrix is obtained applying the Intervals Table as a function between cells (ranks) having the same index and belonging to subsequent chroma vectors.

In this way we compute a distance between adjacent vectors (x[n] – x[n-1]) and those measures compose an output matrix that has:

  • the same number of rows of the input matrix
  • m-1 columns: where m is the number of the columns of the input matrix

The implementation

As usual, let’s first define the behaviour we want to achieve:

The Intervals Matrix

  • should use the IntervalsTable in order to compute the distance vectors
  • should be a zero/null matrix if there’s no distance between the input vectors
  • should throw an exception in case the input k parameter is greater than 12
  • should throw an exception in case the input consists of an empty chroma matrix
  • should filter out all vectors having a cardinality smaller than the input k parameter
  • given in input an m * n chroma matrix should have a dimension equals to (m – 1) * n
  • should return an interval vector, whose size is k, for each subsequent pair of chroma vectors in the input matrix

We are going to use the same BDD approach for expressing those requirements as code (full code is available here):

class IntervalsMatrixSpecs extends FlatSpec {

  "The Intervals Matrix" should "be a zero/null matrix if there's no distance between chroma vectors" in {
    val table = new IntervalsTable


  it should "throw an exception in case the input k parameter is greater than 12" in {
    val table = new IntervalsTable
    val inputMatrix = chromaMatrix(31)

    assertThrows[IllegalArgumentException] {
      IntervalsMatrix(inputMatrix, 13, table)

    assertThrows[IllegalArgumentException] {
      IntervalsMatrix(inputMatrix, 15, table)


And finally, the IntervalsMatrix is also available as a gist. As you can read, it is a basic wrapper around a matrix of integers. The constructor takes a chroma matrix and creates the internal data structure which captures the ranked intervals.

Great! Our test suite is growing:

to be continued in the next episode…


Andrea Gazzarini

Andrea Gazzarini is a curious software engineer, mainly focused on the Java language and Search technologies. With more than 15 years of experience in various software engineering areas, his adventure in the search world began in 2010, when he met Apache Solr and later Elasticsearch.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.