Category

NSRegularExpression capture groups: Chord Symbols

Swift Language

Swift

How to use NSRegularExpression with capture groups in Swift.

Introduction

Table of Contents

I want to parse chord symbols using a regular expression. The first step is to define a valid regular expression and the second is to determine how to use NSRegularExpression to retrieve matches from input.

The Regular Expression

Table of Contents

Of course, we need to define a valid regular expression to use NSRegularExpression.

I want to match chord symbols using Weber’s roman numeral root chacha.

Here’s a few examples:

The tonic chord in a scale: I
The dominant seventh chord in a scale: V7
A Neapolitan seventh: biimaj
Typical jazz chord: Vm7b5

There are a few things I need from each chord specification:
What is the root? Is the root altered? What is the symbol if it exists?
So let’s break that up into a few groups.

The root is going to be a roman numeral. I don’t need all possible numerals, just the ones used in music.
The character I can occur consecutively at most 3 times. In some numbers it doesn’t appear.
Let’s make I, or II, or III match.

I can be preceded by V.
Let’s make any of I, II, III, V, VI, VII, VIII match by adding V zero or more times (using ?).

That’s most of them! We’re missing IV, so let’s add that using an “or” (|).

So there’s a regexp to match roman numerals I to VIII – which are the only ones we need.

Capture Groups

Table of Contents

I want to be able to alter any of those roots with a flat or sharp. So, bII or #IV would be grokked. So begin the regexp with an optional b|#. But remember what we’re going to do with the matches: get the root, then alter it (and then get the chord symbol to create a chord). So there are two separate bits of info to retrieve to get the entire root: the degree specified by the roman numeral and any alteration.

A regular expression capture group allows you to get parts of a match. They are delineated by parenthesis (). For example, the root alteration (optional) group would be (b|#)?. Even if the input does not contain an alteration, the parser will still refer to the first group as 1 (and it would be empty if there is no alteration).

Here are two groups. Note that the alteration and the root are both contained in parenthesis. The alteration is optional, but the root is not.
let regexp = “(b|#)?(IV|V?I{0,3})”

That leaves the chord symbol. Yup, one more group to add. You can go nuts and try to specify actual chord symbols, or simply say “any alphanumeric character or sharp zero or more times”.

So there it is. You can check it out with an online regular expression tester like the one at regex101.com. Back in the 80s when I was first learning regexps, I had to use pencil and paper – and I also had to walk to school 5 miles in the snow, uphill, both ways.

NSRegularExpression

Table of Contents

The init function for NSRegularExpression will throw an exception if the pattern you specify is invalid. I use fatalError to handle this – and then go and fix the pattern by using a validator probably. The init allows you to specify some options. Here I specify that matches are case insensitive. But then my pattern contains [A-Za-z] so it’s case insensitive for chord symbols. It’s up to you: do you want to bother with using II for major supertonic and ii for minor supertonic? I’m just going with symbols: iimaj or iimin (or IImaj, IImin).

Retrieve an array of matches from an input string using the NSRegularExpression. Here I’m specifying that the entire input string be searched.

In most cases, the first element in the matches array will contain what you’re looking for. This is another place where using an online regexp checker is very helpful; you can see the matches.

Using the first match, I access the capture groups using match.range(at: X) where X is the number of the capture group (starting from 1 not zero!). The match does have a range variable, but this range matches the entire thing. Useful sometimes, but not here.
In my chord regexp, the accidental is capture group 1, the roman root is group 2, and the symbol is group 3. So for each, retrieve the appropriate range and then check to see if it’s valid by comparing its location to NSNotFound. This is an NSRange instance. To create a Swift String though, we are going to need a Swift Range. So we create one from the NSRange and the input string. Then you can use that range to extract the matching substring from the input string. In Swift 4, string[range] returns a Substring instance and not a String. So to get a String, you have to use String’s init(Range).

Here’s my final func for parsing the chord symbol. I return the values of the three capture groups as a tuple for convenience. Each value is initialized to an empty string, so the caller needs to check them.

Summary

Table of Contents

NSRegularExpression isn’t horrible. It works. It makes sense after you’ve seen it work, but getting there is a bit of a pita.

Resources

Table of Contents

2 thoughts on “NSRegularExpression capture groups: Chord Symbols”

  1. Your article got me going. I wanted to do something to interpret typical chord symbols for a musical instrument project I am working on, but I wanted to do the sort of symbols someone might enter from a jazz “cheat sheet” or an online chord listing, i.e. similar, but not roman numerals and I wanted to do the “note in the bass” thing, i.e. C#min/Eb, though that’s generally a horrible chord written in a weird way.

    Starting with your example, I came. up with: /(([a-gA-G])(b|#)?)([A-Za-z0-9#]*)(\/(([a-gA-G])(b|#)?))?/gm
    This allows me to do these examples, if they make any sense written like this. I exported them with the “Plain Text” option:

    I think it’s going to work out.

    Thanks,
    Bruce

    1. Thanks Bruce!

      Using the letter name is probably over 90% of actual usage in the real world.

      WordPress barfed on your test input, so I put a pre tag around it.
      I’m not sure I understand it though.
      I’ll play around with your regex and make some unit tests.
      Thanks for that.

      I saved your regexp here for playing around.
      https://regex101.com/r/YI2ZqP/1

      FWIW, John Meheegan’s Jazz books use the roman numeral thing.
      https://amzn.to/3HcA6Sw

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.