Python Regular Expressions: Examples & Reference

Last updated: 25 Jun 2023

Table of Contents

Summary
String matches regex
Whole string matches regex
String contains regex
Extract capture group
Extract capture group, anywhere in string
Extract capture group, multiple times
Extract first occurrence of regex
Extract all occurrences of regex
Extract all regex matches
Replace regex in string
Replace only the first occurrence of regex
Replace captured group
Split string by regex
Split string by word boundary
Non-capturing groups
re.match VS re.search
re.findall VS re.finditer
Case-insensitive regex

Usage examples for regular expressions in Python 3

Unless otherwise stated, examples use Python 3. See all examples on this jupyter notebook

Summary

function	where	number of matches returned
`re.match`	Match the beginning of the string	First match only
`re.search`	Match anywhere in a string	First match only
`re.findall`	Match anywhere in a string	All matches

String matches regex

The pattern must match at the beginning of the string. To match the full string, see below

Use re.match(pattern, string).

This method returns a match object in case there was a match, or None if there was no match.

Example pattern: one or more digits

import re

if re.match('\d+','123foo'):
    # match because the string starts with '123'
else:
    # no match

Whole string matches regex

In this case, there is only a match if the string fully matches the given pattern.

Example pattern: "foo" followed by one or more digits

If on Python 3.4 or newer, use re.fullmatch(pattern,string):

import re
# Python version >=3.4

# match because th string FULLY matches the pattern
re.fullmatch('foo\d+','foo123')
#>>> <_sre.SRE_Match object; span=(0, 6), match='foo123'>

# no match because although the pattern matches the beginning of the
# string, there's extra characters at the end ("bar")
re.fullmatch('foo\d+','foo123bar')
# >>>None

On older Python versions, wrap the pattern with ^ and $ and use re.match(pattern,string) instead:

import re
# Python version < 3.4

# match because th string FULLY matches the pattern
re.match('^foo\d+$','foo123')
#>>> <_sre.SRE_Match at 0x7f690806b648>

# no match because although the pattern matches the beginning of the
# string, there's extra characters at the end ("bar")
re.match('^foo\d+$','foo123bar')
# >>>None

String contains regex

To know if a regex is present in a string, use re.search(pattern, string):

Example pattern: a sequence of 3 digits:

import re

# returns a match
re.search('\d{3}','foo 123 bar')
# >>> <_sre.SRE_Match object; span=(4, 7), match='123'>

# no match, returns None
re.search('\d{3}','foo 1 23 bar')
# >>> None

Extract capture group

To extract anywhere in the string, use re.search instead

Use re.match(pattern, string). Note that the pattern must match the whole string.

Example pattern: 3 characters followed by a dash then another 3 characters

import re

# this pattern matches things like "foo-bar" or "bar-baz"
pattern = "^(\w{3})-(\w{3})$"

string1 = "abc-def"

# returns Null if no match
matches = re.match("^(\w{3})-(\w{3})$",string1)

if matches:
    # match indices start at 1
    first_group_match = matches.group(1) 
    # abc

    second_group_match = matches.group(2)
    # def

    print(first_group_match+" AND "+second_group_match) 

    # prints: "abc AND def"

Extract capture group, anywhere in string

Only the first occurrence of the capture can be extracted.

Use re.search(pattern, string) and .group().

Example: capture digits followed by "x"

import re

# one or more digits followed by 'x'
pat = r"(\d+)x"

# the second occurrence ('456') is not captured.
re.search(pat, "123x 456x").group(1)
# >>> "123"

Extract capture group, multiple times

finditer also works, but you get the full Match object for each capture

import re

pattern = r'a(\d+)'

re.findall(pattern, 'a1 b2 a32')
# >>> ['1', '32']

Extract first occurrence of regex

This matches a pattern anywhere in the string, but only once.

Will return None if there are no matches.

Use re.search(pattern, string) rather than re.match

Example pattern: letter 'b' followed by two alphanumeric characters

import re

pattern = r'b\w{2}'

match = re.search(pattern,"foo bar baz quux")
# >>> <_sre.SRE_Match object; span=(4, 7), match='bar'>

# will be None if there are no matches
if match:
  match.group(0)
  # >>> 'bar'

Extract all occurrences of regex

findall is like finditer, but returns strings instead of Match objects.

Use re.findall(pattern, string) to extract multiple ocurrences of pattern in string.

import re

# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'

re.findall(pattern,"foo bar baz quux")
# >>> ['bar', 'baz']

Extract all regex matches

finditer is like findall, but returns Match objects instead of strings

To extract all matches of a regex in a string, returning Match objects, use re.finditer(pattern, string):

import re

# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'
matches = re.finditer(pattern,'foo bar baz quux')

# wrap the result in a list because it's a generator
list(matches)
# >>> [<_sre.SRE_Match object; span=(4, 7), match='bar'>,
#      <_sre.SRE_Match object; span=(8, 11), match='baz'>]

Replace regex in string

This will replace all occurrences of regex in a string.

Use re.sub(pattern, replacement, string):

import re

# Example pattern: a digit
re.sub('\d','#','123foo456')
# returns '###foo###'

# Example pattern: valid HTML tags
re.sub('<[^>]+>','','foo <p> bar')
# returns 'foo  bar'

Replace only the first occurrence of regex

count=N means replace only the first N occurrences

Use re.sub(pattern, replacement, string, count=1)

Replace captured group

Groups are referenced like this: '\1' for first group, '\2' for second group, etc.

Note the use of the r'' modifier for the strings.

Groups are 1-indexed (they start at 1, not 0)

Example pattern: match file names (e.g. "file.txt")

import re

# match strings like "foo.txt" and "bar.csv"
pattern = r'(^.+)\.(.+)$'

# replace the extension with "MYEXT"
re.sub(pattern, r'\1.MYEXT',"foo.txt")
# returns foo.MYEXT

# replace the file name with "MYFILE"
re.sub(pattern, r'MYFILE.\2',"foo.txt")
# returns MYFILE.txt

Split string by regex

Use re.split(pattern, string). Returns a list of strings that matched.

Example pattern: split string by spaces or commas

import re

re.split('[\s,]+', 'foo,bar bar  quux')
# ['foo', 'bar', 'bar', 'quux']

Split string by word boundary

You can't use re.split(r'\b', string) because Python will complain: split() requires a non-empty pattern match.

Use findall(r'\w+', string) instead:

import re

# ValueError: split() requires a non-empty pattern match. 
re.split(r'\b','foo,bar   bar-quux')

# this is what you want:
re.findall(r'\w+','foo,bar   bar-quux')
# >>> ['foo', 'bar', 'bar', 'quux']

Non-capturing groups

TEMPLATE: (?:PATTERN)

Use non-capturing groups to match something you don't need to capture.

Non-capturing groups are also faster to compute than capturing groups.

Example pattern: a word boundary, the string "foo", then another word boundary

import re

# Replace "foo" when it's got non-words or line boundaries to the left and to the right
pattern = r'(?:\W|^)foo(?:\W|$)'
replacement = " FOO "

string = 'foo bar foo foofoo barfoobar foo'

re.sub(pattern,replacement,string)
# >>> ' FOO bar FOO foofoo barfoobar FOO '

re.match VS re.search

re.match(pattern,string)	re.search(pattern,string)
`pattern` must match the beginning of the string or nothing at all	`pattern` may be anywhere in the string, but only the first match is returned.

re.match matches the beginning of the string, re.search matches the pattern a single time anywhere in the string

import re

# returns None, no match
re.match('abc','xx abc xx')
# None

# return a single MATCH
re.match('abc','abc')
# <_sre.SRE_Match object; span=(0, 3), match='abc'>

## returns a single MATCH
re.search('abc','xx abc xx')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>

# still returns a single MATCH, even though pattern occurs more than once
re.search('abc','xx abc xx abc')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>

re.findall VS re.finditer

Both findall(pattern, string) and finditer(pattern, string) can be used to return multiple occurrences of a pattern in a string.

re.findall(pattern, text)	re.finditer(pattern, text)
Returns a (possibly empty) list of `string`s where the regex found a match in the string	Returns a (possibly empty) list of `Match` objects, containing the matching text but also the starting and end positions where there were matches

Case-insensitive regex

In order to make your regular expressions case-insensitive Just add (?i) before the pattern string.

Example pattern: match strings beginning with "foo", "FOO","Foo", etc.

import re

re.match("(?i)^foo.*","Foo")
# >>> <_sre.SRE_Match object; span=(0, 3), match='Foo'>

re.match("(?i)^foo.*","FOO")
# >>> <_sre.SRE_Match object; span=(0, 3), match='FOO'>

re.match("(?i)^foo.*","foo")
# >>> <_sre.SRE_Match object; span=(0, 3), match='foo'>

Felipe 17 Mar 2016 25 Jun 2023 python-3 regular-expressions