Week 5 Day 2 - Regex
Regular Expressions is a deep topic in computer science and in programming. Regex is not something I’m going to be able to teach you right off the bat! It’s not something I always remember exactly how to do. But it’s something I want to introduce you to so that when we start reading in strings from files next week, it’ll be a tool in your toolbox.
I do not recommend using these lecture notes as a reference for regular expressions. There are many, many, MANY better and more complete references out there, including the two linked below.
Here’s a link to my favorite regex checker, which is not Python-specific.
Here’s a link to another pretty good regex checker, which you can set to be Python-specific.
For most use cases, there won’t be any difference between the two.
Live Code/Lecture
Regex: General
Instead of me stumbling through an explanation of regex for you, I want to point you at a video that I think does an excellent job of explaining what a regular expression is regardless of programming language. We’ll watch the first 12 or so minutes of it, pausing when people have questions.
Using Regexes in Python
To use these regexes in Python, the process is pretty simple. First we have to import the regex module:
import re
There are three basic parts to checking something with a regular expression.
Let’s use as an example the idea at the beginning of checking if someone has a valid Lawrence email address.
1 - Compile the RegEx
First you have to tell Python what you want the regex to be, in regex terms.
email_regex = re.compile(r'[a-z\.]+@lawrence.edu')
2 - Check if it matches
Now, email_regex
stores a piece of data that you can access to compare to a string.
To check this match, try something like this:
is_match = re.match(email_regex, "acacia.ackles@lawrence.edu")
print(is_match)
Now we can see how we can use this in a loop to check these emails.
3 - Use the information somehow
def email_checker(emails):
for email in emails:
if not re.search(email_regex, email):
print(f"{email} is not a valid LU email!")
Go to In-Class Exercise 3 for some practice writing regexes.
Other Features of the RE module
If there’s time, here’s a few additional features we’ll cover:
- ignorecase: You can ignore the case of an entry with
re.IGNORECASE
, or passingre.I
as an optional third argument otre.search()
. - match:
re.match()
has the same syntax asre.search()
but only searches the beginning of a string. - findall:
re.findall()
has the same syntax as `re.search() but will return a list of matches instead of a True or False value.