Regular expressions are now possible in Excel thanks to a new set of functions!
Excel has long possessed formula-based ways to extract and manipulate text, such as the LEN, FIND, REPLACE, SUBSTITUTE, LEFT, RIGHT, and MID functions. In more recent times, the likes of TEXTBEFORE, TEXTAFTER, and TEXTSPLIT have arrived.
However, for complex cases, it can be a nightmare having to combine several of these to achieve a certain task.
Thankfully, the REGEX functions provide you with a flexible and versatile alternative to construct special sequences to do this.
The set includes REGEXTEST, REGEXEXTRACT, and REGEXREPLACE.
REGEX is indeed short for 'regular expression'. And depending on your location, you may pronounce it as 'REJ-EX' (like I do) or 'REG-GEX', but it really doesn't really matter!
There are so many possibilities that it's impossible to cover everything, so I'm only going to touch the surface in this post, but it'll equip you with the basics so you can immediately start taking advantage of these functions.
Note: REGEXTEST, REGEXEXTRACT, and REGEXREPLACE are only available to Microsoft 365 Insiders on the Beta Channel as of July 2024.
REGEXTEST
REGEXTEST checks if the supplied text matches a regex pattern and returns TRUE or FALSE.
REGEXTEST(text,pattern,[case_sensitivity])
text
: The cell or range containing the text string(s) to check.pattern
: The regular expression to match.[case_sensitivity]
: Determines if the match is case sensitive. (OPTIONAL)- 0 - Case-sensitive match (default)
- 1 - Case-insensitive match
Example
A list of codes are displayed in A3:A12. There are ones composed of numbers, letters, and a mixture of both.
To return TRUE for the codes that contain a number and FALSE for the ones that don't, use the following formula:
=REGEXTEST(A3:A12,"[0-9]")
The range is housed in the text
argument, and the regular expression is defined in pattern
and must be enclosed between quotation marks ("").
The square brackets ([]) define a character class and a range specifier (-) is sandwiched in between 0 and 9 to save having to put [0123456789].
Using a similar concept in C3:C12, the following returns TRUE for those that have an alphabetic character:
=REGEXTEST(A3:A12,"[a-zA-Z]")
Although there are only uppercase letters in this example, using `[a-zA-Z]
` ensures lowercase ones are covered as well for any future values.
REGEXEXTRACT
REGEXEXTRACT lets you extract one or more parts of supplied text that match a regex pattern.
REGEXEXTRACT(text,pattern,[return_mode],[case_sensitivity])
text
: The cell or range containing the text string(s).pattern
: The regular expression to match.[return_mode]
: What should be extracted? (OPTIONAL)- 0 - First match(default)
- 1 - All matches
- 2 - Capture groups of first match
[case_sensitivity]
: Determines if the match is case sensitive. (OPTIONAL)- 0 - Case-sensitive match (default)
- 1 - Case-insensitive match
Example
In A3:A12, a collection of text strings are present. Each contains an email address in a different position.
To extract the email addresses and display in their own cells, use the following formula:
=REGEXEXTRACT(A3:A12,"[\w\.-]+@[\w\.-]+\.\w+")
Yes, that pattern looks like a random assortment of characters, but it isn't. Let's break it down:
`[\w\.-]`
: Matches any alphanumeric character, underscore (_), dot (.), or hyphen (-). The `\w` is shorthand for the first two, whereas the backslash in `\.` removes the dot's special meaning to ensure it's a literal character.`+`
: This is a quantifier that means 'one or more' occurrences of any of the aforementioned characters.`@`
: Matches the literal `@` character in the email address.`[\w\.-]`
: Already explained.- `+`: Already explained.
`\.`
: Already explained.`\w`
: Matches any alphanumeric character or underscore (_).-
`+`
: Already explained.
REGEXREPLACE
REGEXREPLACE searches for a regex pattern within the text you supply and replaces it with different text.
REGEXREPLACE(text,pattern,replacement,[occurrence],[case_sensitivity])
-
text
: The cell or range containing the text string(s).
pattern
: The regular expression to match.
replacement
: Specifies what text should replace the match.
[occurrence]
: Determines which pattern instance should be replaced. (OPTIONAL)
0 - All occurrences (default)
n - nth occurrence from the start
-n - nth occurrence from the end
[case_sensitivity]
: Determines if the match is case sensitive. (OPTIONAL)
- 0 - Case-sensitive match (default)
- 1 - Case-insensitive match
Example
A3:A12 contains a set of fixed length account numbers.
To mask the first four characters with asterisks, use the following formula:
=REGEXREPLACE(A3:A12,"^(\d{4})","****")
Let's break down the pattern:
`^`
: The caret is an anchor that matches only the start of a string.`()`
:`\d`
:`{4}`
: The "****" inreplacement
specifies the characters to override the first four digits of each number.
Confused by those patterns?
You might have looked at the patterns for the REGEXTRACT and REGEXREPLACE examples and freaked out. Don't worry, though, as I have a secret to reveal…
No, I didn't happen to put them together by memorising the entire regular expression language. Heard of ChatGPT? I took advantage of its vast knowledge to generate them, and you should too.
For the REGEXEXTRACT example, I asked:
what is the regex for extracting the email address from "support@example.com is the best way to get in touch with our help desk."?
It not only gave me the correct regular expression I could easily copy and paste into Excel, but it gave a breakdown as well to aid my understanding. How good is this?
Likewise, for the REGEXREPLACE example, I asked:
using google sheets' regexreplace function, what is the regex for masking the first 4 digits from the account number 93721027?
Why did I mention Google Sheets?
The great thing about regular expressions is they are cross-compatible with other platforms like Sheets and many programming languages. That's why you haven't got to wait for ChatGPT to catch up with these new Excel functions — take advantage now!
Conclusion
It's clear that Excel's new game-changing REGEX functions are going to drastically streamline certain text extraction and manipulation tasks. You've only seen a handful of examples in this post, but the possibilities are endless.
I sense there will be less of a need for the traditional text functions as time moves forward. However, let's not get ahead of ourselves — these are still in beta after all. And even when they do reach the production version, compatibility constraints mean many users will be reluctant to use them for shared workbooks. But that is nothing new for any arrival in Excel for Microsoft 365.
The key takeaway is you don't need to spend hours and hours revising regular expressions to gain encyclopedic knowledge. As I've already alluded to, ChatGPT can do much of the heavy lifting for you. It is worth trying to understand what it generates, though, so you can prune a pattern if it is unnecessarily verbose.
On that note, if you really want to gain an appreciation of how extensive the regular expression library is, ask the following:
please give me a regex cheat sheet
What will you be using regex for?
Watch the video
For more Excel tips and tricks like this, check out our Video Tutorials page.
Conversation
As you comment, let's keep the discussion friendly and respectful to add value to the conversation.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere. uis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.