Regex In 10 Minutes

We'll cover everything you need to know about regular expressions and some tips to help you get the most out of them in Xcode.

Today, we'll look at how regular expressions work and how we can leverage them to improve our programming efficiency. We'll start by reviewing some regex basics and then we'll dive into some Xcode-specific use cases.

Introduction

Regular expressions, commonly referred to as regex, represent a search pattern as a sequence of special characters. Oftentimes, it is used to identify misspelled words, validate data, check user input, or scrape the web.

With expressions like this ^(?=(?!(.)\1)([^\DO:105-93+30])(?-1)(?<!\d(?<=(?![5-90-3])\d))).[^\WHY?]$ it's no surprise that people avoid regex whenever possible.

However, mastering regex can greatly improve our capabilities as programmers if we can make it past the awkward syntax and the learning curve. Luckily, regular expressions are universal and exist across all programming languages, so we only have to learn them once.

Similar to how a programming language consists of keywords like for,if, while, etc., regular expressions simply consists of a series of special characters used to express a variety of text patterns.

To begin, we'll examine all of the different types of characters and their respective responsibilities. While the information might seem overwhelming at first, once we look at some examples, I promise it will all make sense.

Getting Started

Regular expressions begin with a / followed by any number of the following symbols (also referred to as metacharacters).

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.

We use regex to match patterns by combining these metacharacters into longer expressions.

Characters

[ ]

You can use this bracket expression to match text against the character(s) contained within the brackets:

  • [abc] would match a, b, or c
  • [a-z] would match any lowercase letter from a to z
  • [abd-j] would match a, b, d, e, f, j
  • [a-zA-Z] would match letters from a to z and from A to Z
  • [0-9] would match any digit in the range 0 through 9
  • [df]og matches dog and fog.
  • ab matches ab, but not AB

^

You can use this character to match the starting position of the string:

  • ^Hello will match strings that start with the word Hello

[^ ]

This combination of characters allows us to specify characters we do not want to include in our search:

  • [^abc] will match all other characters except a, b, or c
  • [^a-z] matches any single character that is not a lowercase letter from a to z

.

This can be used as a wildcard character to match a single character (excluding newlines):

  • a.c would match any three-character string starting with an a and ending with a c (i.e. abc, a4c, a@c, etc.) , but would not match abbc
Remember: [a.c] would match only a, ., or c
  • a.* would match an a followed by zero or more characters (i.e. a, abc, a123, etc.)
  • view.*Appear would match all instances of viewDidAppear and viewWillAppear

$

Matches the ending position of a string or the position just before a string-ending newline:

  • [bc]at$ matches bat and cat, but only at the end of the string or line.

Quantifiers

*

This character allows you to match the previous character 0 or more times:

  • /ab*c/would match ac, abc, abbc, abbbc, etc.
  • /[abc]*/ would match a, b, c, ca, cba, abcc and all other permutations of these 3 characters
  • /a.*b/ would match axb, axxb, a12345b , etc.
  • [ab]*cd matches cd, acd, bcd, aacd, bacd, abcd, bbabacd, etc.

+

The + operator is quite similar to the *, but instead allows you to match the previous character 1 or more times:

  • /ab+c/ would match abc, abbc, abbbc, but would not match ac
  • /[df]+og/ would match dog, ddog

?

This operator allows us to match the previous character exactly 0 or 1 times:

  • /ab?c/ would match ac, abc, but would not match abbc
  • /ea?/ matches one e followed by an optional a
  • [bp]?at matches at, bat, and pat.

\

Just like in normal programming languages, the backslash allows you to escape special characters:

  • \+ will match the + in 1+2=3 which would otherwise be treated as a metacharacter
  • \( \) is now treated as the string "( )" and \{ \} is now evaluated as "{ }"

{ n }

This operator allows you to match the previous character exactly n times:

  • {3} will match the previous character exactly 3 times
  • {3,} will match the previous character exactly 3 or more times
  • {2,4} will match the previous character exactly 2-4 times [inclusive]
  • aa{2} matches aa
  • aa{2,3} matches aa and aaa

Logic

|

This operator allows you to specify alternative possibilities:

  • t|The matches the string t or The explicitly
  • (t|T)he applied to The ball is over there matches both The and the the in "there"
  • seriali[sz]e matches both serialise and serialize

(...)

Parentheses allow you to define what's called a capture group which lets you extract the matching text into a variable for later use.

Given the following regular expression:

(\d\d\d)-(\d\d\d)-(\d\d\d\d)

When we apply it to "123-456-7890", we can see the breakdown of the captured groups below:

Now, if we wanted to remove the formatting (i.e. "1234567890"), we could concatenate the captured groups together:

$1$2$3

Note: The captured group - $0 - represents the original expression itself (i.e. 123-456-7890).

Character Classes

\w

This will match all alphanumeric character, including "_" and is case-insensitive - it is equivalent to [a-zA-Z0-9_]:

  • \w applied to "hello world my name is 42" would match hello, world, my, name, is, 42 - notice, though, that all spaces are ignored
  • \w{4,} matches any words 4 or more characters long
  • \w{4,5} matches any words between 4 and 5 characters in length

\W

This will match anything that isn't a word:

  • \W applied to "the year is 2022" would match on 2022 and all of the whitespace in between the words

\d

Matches a digit (i.e. [0-9]).


\D

Matches anything other than a digit (i.e. [^0-9]) including spaces.


\s

This will match a whitespace.


\S

This will match anything that isn't a whitespace.


Regular Expression Examples & Xcode

To use regular expressions in Xcode, simply select Regular Expression from the the "Find" menu:

Note: Xcode automatically adds the starting / in regular expressions for you.

With the theory and the fundamentals out of the way, let's look at some real-world use cases for regex.


Validating An Email

When writing a new regex expression, I find it easier to work backwards from the requirements.

What do we know about an email address?

We know the first part of the email will contain a mixture of uppercase and lowercase letters along with zero or more digits.

As a reminder, the bracket syntax allows us to specify a set of valid characters to match against and the + operator allows us to look for one or more instances of the previous expression.

So, combining these together we have the first part of our email validation regex:

\[a-zA-Z0-9]+

Then, we know we'll see exactly 1 @ symbol, so our updated implementation now looks like this:

\[a-zA-Z0-9]+@

@ is outside of any brackets because we're looking for a single instance of that character - not a pattern or a group of characters.

Finally, we expect to see another combination of uppercase and lowercase letters followed by a domain extension.

\[a-zA-Z0-9]+@[a-zA-Z0-9.-]+.[a-zA-Z]+

The results seem promising as we're only catching the valid email addresses!

However, this is an overly simplified implementation and would fail on perfectly valid emails like:

If we tried to be extremely thorough, we'd probably end up in the neighborhood of Perl's 6,500 character long regular expression, so let's mutually agree to treat this as a stopping point πŸ˜….


Validating A Phone Number

Phone numbers can appear in a variety of formats:

  • (555) 444-6789
  • 555-444-6789
  • 555.444.6789
  • 555 444 6789

Looking at the first grouping of 3 numbers we can see that they may be surrounded by parentheses, so our regex will start off with:

\(?\d{3})?

Next, we can see that we may have spaces, periods, or hyphens between groupings of 3 characters, so we'll need to handle that as well:

[-.\s]?

Now, once we've combined everything together, our expression will successfully match the area code from our list of sample phone numbers above:

(?\d{3})?[-.\s]?

Now, let's add support for the middle grouping of numbers which will be very similar to the previous expression:

\d{3}[-.\s]?

Finally, we can complete our implementation by validating the final grouping of 4 numbers with \d{4}.

Here's the final regular expression:

/(?\d{3})?[-.\s]?\d{3}[-.\s]?\d{4}


Matching Whitespace

If you've ever used SwiftLint before, you're likely no stranger to linter warnings about leading and trailing whitespaces. This behavior can be expressed as - /^[ \t]+|[ \t]+$ - which will match any excess whitespace at the beginning or end of a line.

\t matches a single tab.

Standardizing Coding Style

Let's say that your code contains a mixture of variable names declared in both camel-case (i.e. loginButton) and snake-case (i.e. login_button) and you want to standardize them.

Our typical "Find and Replace" options won't work here.

With "Find", our only option would be of finding every "_" character in our codebase which isn't particularly useful; this is a problem only regular expressions can solve.

We can use the following regular expression to find every expression written in snake-case in our codebase:

\w+?_.+?(?=[( )])

This uses an advanced regex feature called lookahead which you can read more about here.

Making Classes Final By Default

Let's say that we want to ensure that all of the UIViewControllers in our project are final by default.

It's easy enough to apply this change to all future UIViewControllers, but how can we apply this change to our existing controllers?

We can use the following expression to find all declarations of UIViewControllers that start with class instead of final class:

^(class)\s[\w]+ViewController:\s?[A-Z]+ViewController

Great! Since we're capturing class, we can use our captured groups in the "Replace" textfield and hit "Replace All":

Remember that capture groups are 0 based where $0 is the input itself.

Now, all of our previous UIViewController declarations are now final.


Regular Expression Builder In Xcode

If you're still feeling a little shaky with the syntax, don't worry!

In Xcode, we can easily create basic regular expressions without the need for this special syntax.

Start by switching the Find accessory action from Contains to Regular Expression:

Then, we can select the + and use the following menu to visually build our regular expressions:

Although this approach is limited in its ability to produce complex regular expressions, it is a good place to start as you learn the basics.

Tools for Practicing Regex

RegExr: Learn, Build, & Test RegEx
RegExr is an online tool to learn, build, & test Regular Expressions (RegEx / RegExp).
regex101: build, test, and debug regex
Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET.

If you're interested in more articles about iOS Development & Swift, check out my YouTube channel or follow me on Twitter.

If you want to be notified whenever I post a new article, join the mailing list below.


Do you have an iOS Interview coming up?

Check out my book Ace The iOS Interview!


Sources

Regular expression - Wikipedia
Regex Cheat Sheet
Regular Expressions Syntax Reference. Includes tables showing syntax, examples and matches.
What is a Regex (Regular Expression)?
Computer dictionary definition of what regular expression (regex) means, including related links, information, and terms.

Subscribe to Digital Bunker

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe