Python Regular Expressions: Examples & Reference

Python Regular Expressions: Examples & Reference

Last updated:
Table of Contents

Usage examples for regular expressions in Python 3

Unless otherwise stated, examples use Python 3. See all examples on this jupyter notebook

Summary

function where number of matches
returned
re.matchMatch the beginning of the stringFirst match only
re.searchMatch anywhere in a stringFirst match only
re.findallMatch anywhere in a stringAll matches

String matches regex

The pattern must match at the beginning of the string. To match the full string, see below

Use re.match(pattern, string).

This method returns a match object in case there was a match, or None if there was no match.

Example pattern: one or more digits

import re

if re.match('\d+','123foo'):
    # match because the string starts with '123'
else:
    # no match      

Whole string matches regex

In this case, there is only a match if the string fully matches the given pattern.

Example pattern: "foo" followed by one or more digits

  • If on Python 3.4 or newer, use re.fullmatch(pattern,string):

    import re
    # Python version >=3.4
    
    # match because th string FULLY matches the pattern
    re.fullmatch('foo\d+','foo123')
    #>>> <_sre.SRE_Match object; span=(0, 6), match='foo123'>
    
    # no match because although the pattern matches the beginning of the
    # string, there's extra characters at the end ("bar")
    re.fullmatch('foo\d+','foo123bar')
    # >>>None
    
  • On older Python versions, wrap the pattern with ^ and $ and use re.match(pattern,string) instead:

    import re
    # Python version < 3.4
    
    # match because th string FULLY matches the pattern
    re.match('^foo\d+$','foo123')
    #>>> <_sre.SRE_Match at 0x7f690806b648>
    
    # no match because although the pattern matches the beginning of the
    # string, there's extra characters at the end ("bar")
    re.match('^foo\d+$','foo123bar')
    # >>>None  
    

String contains regex

To know if a regex is present in a string, use re.search(pattern, string):

Example pattern: a sequence of 3 digits:

import re

# returns a match
re.search('\d{3}','foo 123 bar')
# >>> <_sre.SRE_Match object; span=(4, 7), match='123'>

# no match, returns None
re.search('\d{3}','foo 1 23 bar')
# >>> None

Extract capture group

To extract anywhere in the string, use re.search instead

Use re.match(pattern, string). Note that the pattern must match the whole string.

Example pattern: 3 characters followed by a dash then another 3 characters

import re

# this pattern matches things like "foo-bar" or "bar-baz"
pattern = "^(\w{3})-(\w{3})$"

string1 = "abc-def"

# returns Null if no match
matches = re.match("^(\w{3})-(\w{3})$",string1)

if matches:
    # match indices start at 1
    first_group_match = matches.group(1) 
    # abc

    second_group_match = matches.group(2)
    # def

    print(first_group_match+" AND "+second_group_match) 

    # prints: "abc AND def"

Extract capture group, anywhere in string

Only the first occurrence of the capture can be extracted.

Use re.search(pattern, string) and .group().

Example: capture digits followed by "x"

import re

# one or more digits followed by 'x'
pat = r"(\d+)x"

# the second occurrence ('456') is not captured.
re.search(pat, "123x 456x").group(1)
# >>> "123"

Extract capture group, multiple times

finditer also works, but you get the full Match object for each capture

import re

pattern = r'a(\d+)'

re.findall(pattern, 'a1 b2 a32')
# >>> ['1', '32']

Extract first occurrence of regex

This matches a pattern anywhere in the string, but only once.

Will return None if there are no matches.

Use re.search(pattern, string) rather than re.match

Example pattern: letter 'b' followed by two alphanumeric characters

import re

pattern = r'b\w{2}'

match = re.search(pattern,"foo bar baz quux")
# >>> <_sre.SRE_Match object; span=(4, 7), match='bar'>

# will be None if there are no matches
if match:
  match.group(0)
  # >>> 'bar'

Extract all occurrences of regex

findall is like finditer, but returns strings instead of Match objects.

Use re.findall(pattern, string) to extract multiple ocurrences of pattern in string.

import re

# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'

re.findall(pattern,"foo bar baz quux")
# >>> ['bar', 'baz']

Extract all regex matches

finditer is like findall, but returns Match objects instead of strings

To extract all matches of a regex in a string, returning Match objects, use re.finditer(pattern, string):

import re

# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'
matches = re.finditer(pattern,'foo bar baz quux')

# wrap the result in a list because it's a generator
list(matches)
# >>> [<_sre.SRE_Match object; span=(4, 7), match='bar'>,
#      <_sre.SRE_Match object; span=(8, 11), match='baz'>]

Replace regex in string

This will replace all occurrences of regex in a string.

Use re.sub(pattern, replacement, string):

import re

# Example pattern: a digit
re.sub('\d','#','123foo456')
# returns '###foo###'

# Example pattern: valid HTML tags
re.sub('<[^>]+>','','foo <p> bar')
# returns 'foo  bar'

Replace only the first occurrence of regex

count=N means replace only the first N occurrences

Use re.sub(pattern, replacement, string, count=1)

Replace captured group

Groups are referenced like this: '\1' for first group, '\2' for second group, etc.

Note the use of the r'' modifier for the strings.

Groups are 1-indexed (they start at 1, not 0)

Example pattern: match file names (e.g. "file.txt")

import re

# match strings like "foo.txt" and "bar.csv"
pattern = r'(^.+)\.(.+)$'

# replace the extension with "MYEXT"
re.sub(pattern, r'\1.MYEXT',"foo.txt")
# returns foo.MYEXT

# replace the file name with "MYFILE"
re.sub(pattern, r'MYFILE.\2',"foo.txt")
# returns MYFILE.txt

Split string by regex

Use re.split(pattern, string). Returns a list of strings that matched.

Example pattern: split string by spaces or commas

import re

re.split('[\s,]+', 'foo,bar bar  quux')
# ['foo', 'bar', 'bar', 'quux']

Split string by word boundary

You can't use re.split(r'\b', string) because Python will complain: split() requires a non-empty pattern match.

Use findall(r'\w+', string) instead:

import re

# ValueError: split() requires a non-empty pattern match. 
re.split(r'\b','foo,bar   bar-quux')

# this is what you want:
re.findall(r'\w+','foo,bar   bar-quux')
# >>> ['foo', 'bar', 'bar', 'quux']

Non-capturing groups

TEMPLATE: (?:PATTERN)

Use non-capturing groups to match something you don't need to capture.

Non-capturing groups are also faster to compute than capturing groups.

Example pattern: a word boundary, the string "foo", then another word boundary

import re

# Replace "foo" when it's got non-words or line boundaries to the left and to the right
pattern = r'(?:\W|^)foo(?:\W|$)'
replacement = " FOO "

string = 'foo bar foo foofoo barfoobar foo'

re.sub(pattern,replacement,string)
# >>> ' FOO bar FOO foofoo barfoobar FOO '

re.match(pattern,string) re.search(pattern,string)
pattern must match the beginning of
the string or nothing at all
pattern may be anywhere in the string, but
only the first match is returned.

re.match matches the beginning of the string, re.search matches the pattern a single time anywhere in the string

import re

# returns None, no match
re.match('abc','xx abc xx')
# None

# return a single MATCH
re.match('abc','abc')
# <_sre.SRE_Match object; span=(0, 3), match='abc'>

## returns a single MATCH
re.search('abc','xx abc xx')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>

# still returns a single MATCH, even though pattern occurs more than once
re.search('abc','xx abc xx abc')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>

re.findall VS re.finditer

Both findall(pattern, string) and finditer(pattern, string) can be used to return multiple occurrences of a pattern in a string.

re.findall(pattern, text) re.finditer(pattern, text)
Returns a (possibly empty) list
of strings where the regex
found a match in the string
Returns a (possibly empty) list
of Match objects, containing the
matching text but also the starting
and end positions where there
were matches

Case-insensitive regex

In order to make your regular expressions case-insensitive Just add (?i) before the pattern string.

Example pattern: match strings beginning with "foo", "FOO","Foo", etc.

import re

re.match("(?i)^foo.*","Foo")
# >>> <_sre.SRE_Match object; span=(0, 3), match='Foo'>

re.match("(?i)^foo.*","FOO")
# >>> <_sre.SRE_Match object; span=(0, 3), match='FOO'>

re.match("(?i)^foo.*","foo")
# >>> <_sre.SRE_Match object; span=(0, 3), match='foo'>