Using Regex

Shows the usage of the regular expressions in python

Introduction

import re
def find(pattern, text):
    match = re.search(pattern, text)
    if match: return match.group()
    else: print("Match not Found")
pattern = "igs"
text = "called piiig"
find(pattern, text)
Match not Found
find("iig", "called piiig")
'iig'
find("...g", "called piiig")
'iiig'
find("...g", "called pimig")
'imig'

Repeating characters

  • regex+ - one or more repetitions of preceding element
  • regex* - zero or more repetitions of preceding element
  • regex? – zero or one repetition of preceding element
  • regex{n} - exactly n repetitions of preceding element
  • regex{n,m} - from n to m repetitions of preceding element
  • regex{n,} - n or more repetitions of preceding element

In Python you can use (.)\1{9,}

  • (.) makes group from one char (any char)
  • \1{9,} matches nine or more characters from 1st group
find(r"(.)\1{2}g", "called piiig")  #\1 - refers to first group. () indicates the group
'iiig'
find(r"(.)\1{2}g", "called piig")
Match not Found
find(r"(.)\1{2}g", "called psssg")
'sssg'
find(r"(.)\1{2}g", "called pssssg")
'sssg'
find(r"(.)\1{1,}g", "called pssssg")
'ssssg'
find(r"(.)\1{1,}g", "called pssg")
'ssg'
find(r"(.)\1{1,}g", "called psg")
Match not Found

Finding Email

find(r"[\w.]+@[\w.]+", "My email is foo@bar blah blah rahul.sarafiitk@gmail.com") # How to ensure atleast 1 . in second word
'foo@bar'
find(r"[\w.]+@[\w.]+", "My email is rahul.sarafiitk@gmail.com blah blah foo@bar")
'rahul.sarafiitk@gmail.com'