Regular Expressions - Basics

Regular Expressions - Basics

ยท

5 min read

Simply put, Regular Expressions are a way to search for a pattern in a given dataset.

Basic Symbols used in Regular Expression

๐ŸŒˆ Asterisk(*) -> It is used to represent zero or more occurrences of a character

  • In below example h* is used to represent zero or more occurrences of character 'h' img1.JPG

๐ŸŒˆ Dot(.) -> It is a wild card symbol that can be used to represent any character

  • In below example . represents a wild card character which can be filled by any character img2.JPG

๐ŸŒˆ Wildcard and Asterisk together (.*) -> It is used to represent zero or more occurrences of wild card character

  • In below example combination of Wildcard and Asterisk .* is used to represent zero or more occurrences of any character img3.JPG

๐ŸŒˆ\s -> Used to represent white space

  • \s represents space. In below example \s* is used to indicate zero or more occurrences of space img4.JPG

๐ŸŒˆ Character Class [abc] -> This represents a single character which can be either an 'a' or 'b' or 'c'

  • [] represents character class. Any characters specified between the [] are valid characters in its position. It represents only one position and only one character from the character class is allowed img5.JPG

๐ŸŒˆCharacter Class with range[p-r] -> This represents a single character that falls in range of 'p-r'. It can be either 'p' or 'q' or 'r'

  • Instead of specifying individual characters we can specify range of characters in the character class. The ascii value of start character should be less than the ascii value of end character. Any character falling in the range given in square brackets is considered as a valid character img6.JPG

  • If few characters fall outside the range they can be specified in the character class immediately after the range. img7.JPG

  • We can specify multiple ranges in the character class img8.JPG

  • [^ab] -> Negation. This represents any character except 'a' and 'b'

    • Sometimes the number of characters in the character class can increase and can become unmanageable. In such cases we can focus on the exclusion list. We can use negation of character class.
    • Caret symbol ^ is used to negate the character class.
    • Caret symbol ^ inside the character class means all characters except any of those mentioned in the square bracket are valid img9.JPG

๐ŸŒˆLine Beginning Anchor ^

  • Caret ^ symbol is used as a placeholder to indicate the beginning of a line. It is used in cases where we want to match patterns beginning with specific characters. In below example we want strings beginning with Hash. img10.JPG

  • Caret symbol ^ is always placed at the beginning of the pattern

  • Caret symbol ^ has other meaning (negation) when used inside character class[]. Outside character class it is used to inidicate beginning of the line

๐ŸŒˆLine Ending Anchor $

  • The dollar symbol $ is used as placeholder indicating the ending of the line. If we want to match pattern which ends with specific characters we can use the $ anchor.
  • Dollar symbol $ is always placed at the end of the pattern img11.JPG

Thank you for reading!!

ย