April 2, 2026

Regular Expressions Tutorial for Beginners (with Examples)

Regular expressions (regex) look intimidating at first, but they follow simple rules. This tutorial takes you from zero to writing complex patterns, step by step. Try every example in our free regex tester.

What is a Regular Expression?

A regular expression is a pattern that describes a set of strings. You use it to search, match, validate, or replace text. Every programming language supports regex — JavaScript, Python, Java, Go, PHP, and more.

// JavaScript example
const pattern = /hello/;
pattern.test("say hello world");  // true

// Python example
import re
re.search(r"hello", "say hello world")  # Match found

1. Literal Characters

The simplest regex is just normal text. catmatches the literal string "cat" anywhere in the input.

Pattern: cat
  "the cat sat"   → matches "cat" at position 4
  "concatenate"   → matches "cat" at position 3
  "dog"           → no match

Regex is case-sensitive by default. Cat does not match "cat" unless you use the i flag.

2. Metacharacters (Special Characters)

These characters have special meaning in regex:

.  ^  $  *  +  ?  {  }  [  ]  (  )  |  \

To match them literally, escape with a backslash:
  \.  matches a literal dot
  \$  matches a literal dollar sign
  \(  matches a literal opening parenthesis

3. The Dot (.)

Matches any single character except newline.

Pattern: c.t
  "cat"    → match
  "cot"    → match
  "cut"    → match
  "coat"   → no match (two characters between c and t)
  "ct"     → no match (zero characters between c and t)

4. Character Classes [ ]

Match one character from a set.

[aeiou]     — any vowel
[0-9]       — any digit (0 through 9)
[a-z]       — any lowercase letter
[A-Z]       — any uppercase letter
[a-zA-Z]    — any letter
[a-zA-Z0-9] — any alphanumeric character

Examples:
  Pattern: b[aeiou]t
  "bat" → match
  "bet" → match
  "bit" → match
  "but" → match
  "bot" → match
  "bxt" → no match

Negated character class [^ ]

A caret inside brackets means "anything exceptthese."

[^0-9]      — anything that is NOT a digit
[^aeiou]    — anything that is NOT a vowel
[^\s]       — anything that is NOT whitespace

5. Shorthand Character Classes

\d    — digit           [0-9]
\D    — non-digit       [^0-9]
\w    — word character   [a-zA-Z0-9_]
\W    — non-word         [^a-zA-Z0-9_]
\s    — whitespace       [ \t\n\r\f]
\S    — non-whitespace   [^ \t\n\r\f]

Example:
  Pattern: \d\d\d-\d\d\d\d
  "555-1234"  → match
  "abc-defg"  → no match

6. Anchors

Anchors do not match characters — they match positions.

^     — start of string (or start of line with m flag)
$     — end of string (or end of line with m flag)
\b    — word boundary

Examples:
  Pattern: ^hello
  "hello world"  → match (hello is at the start)
  "say hello"    → no match

  Pattern: world$
  "hello world"  → match (world is at the end)
  "world cup"    → no match

  Pattern: \bcat\b
  "the cat sat"  → matches "cat"
  "concatenate"  → no match (cat is not a whole word)

7. Quantifiers

Control how many times a character or group is matched.

*       — 0 or more
+       — 1 or more
?       — 0 or 1 (optional)
{3}     — exactly 3
{2,5}   — between 2 and 5
{3,}    — 3 or more

Examples:
  Pattern: colou?r
  "color"    → match (u appears 0 times)
  "colour"   → match (u appears 1 time)

  Pattern: go+gle
  "gogle"    → match
  "google"   → match
  "gooogle"  → match
  "ggle"     → no match (need at least 1 'o')

  Pattern: \d{3}-\d{4}
  "555-1234"  → match
  "55-1234"   → no match (need exactly 3 digits)

Greedy vs Lazy

By default, quantifiers are greedy — they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible).

Input: "<b>bold</b> and <i>italic</i>"

Greedy:  <.*>   → matches "<b>bold</b> and <i>italic</i>"
                   (everything from first < to LAST >)

Lazy:    <.*?>  → matches "<b>", then "</b>", then "<i>", then "</i>"
                   (stops at the FIRST > each time)

8. Groups ( )

Parentheses group characters together, creating a sub-expression that can be quantified, alternated, or captured.

Capturing groups

Pattern: (\d{3})-(\d{4})
Input:   "Call 555-1234"

Group 0 (full match): "555-1234"
Group 1: "555"
Group 2: "1234"

// JavaScript
const match = "Call 555-1234".match(/(\d{3})-(\d{4})/);
// match[1] === "555"
// match[2] === "1234"

Non-capturing groups

Use (?:...) when you need grouping but do not need to capture.

Pattern: (?:http|https)://\S+
— Groups http|https for alternation
— Does NOT create a capture group (more efficient)

9. Alternation (OR)

The pipe |means "or".

Pattern: cat|dog
  "I have a cat"  → matches "cat"
  "I have a dog"  → matches "dog"

Pattern: (Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day
  Matches any day of the week

10. Lookahead & Lookbehind

Assert that something exists ahead or behind the current position without including it in the match.

Positive lookahead (?=...)

Pattern: \d+(?= dollars)
Input:   "I have 100 dollars"
Match:   "100"  (followed by " dollars", but " dollars" is not in the match)

// Useful: match a word only if followed by something
Pattern: \w+(?=\()
Input:   "call myFunction(42)"
Match:   "myFunction"

Negative lookahead (?!...)

Pattern: \d{3}(?!-)
Input:   "555-1234 and 999"
Match:   "999" (three digits NOT followed by a hyphen)

// The "555" is followed by "-" so it doesn't match

Positive lookbehind (?<=...)

Pattern: (?<=\$)\d+
Input:   "Price: $50"
Match:   "50"  (preceded by $, but $ is not in the match)

Negative lookbehind (?<!...)

Pattern: (?<!\d)\d{3}(?!\d)
— Matches exactly 3 digits not surrounded by other digits

11. Flags / Modifiers

i — case-insensitive   /hello/i matches "Hello", "HELLO"
g — global             find ALL matches, not just the first
m — multiline          ^ and $ match line boundaries
s — dotAll             . also matches newline characters
u — unicode            full Unicode support

// JavaScript example
const matches = text.matchAll(/pattern/gi);

12. Common Real-World Patterns

Email (simplified)

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:
  ^                     — start of string
  [a-zA-Z0-9._%+-]+    — one or more valid email chars (local part)
  @                     — literal @
  [a-zA-Z0-9.-]+       — domain name
  \.                    — literal dot
  [a-zA-Z]{2,}         — TLD (at least 2 letters)
  $                     — end of string

URL

https?://[\w.-]+(?:\.[a-zA-Z]{2,})(?:/[^\s]*)?

Breakdown:
  https?               — http or https
  ://                  — literal ://
  [\w.-]+              — domain characters
  (?:\.[a-zA-Z]{2,})   — .com, .org, etc.
  (?:/[^\s]*)?         — optional path

Phone number (US format)

^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Matches:
  (555) 123-4567
  555-123-4567
  555.123.4567
  5551234567

IP address (IPv4)

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

Matches: 192.168.1.1, 10.0.0.255
Rejects: 256.1.1.1, 192.168.1

Strong password validation

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Requirements:
  (?=.*[a-z])        — at least one lowercase
  (?=.*[A-Z])        — at least one uppercase
  (?=.*\d)           — at least one digit
  (?=.*[@$!%*?&])    — at least one special character
  {8,}               — minimum 8 characters

Date (YYYY-MM-DD)

^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$

Matches: 2026-04-02, 2025-12-31
Rejects: 2026-13-01, 2026-00-15

HTML tag

<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>

Matches: <p>text</p>, <div class="x">content</div>
\1 is a backreference to the tag name captured in group 1

13. Tips & Common Pitfalls

Start simple, then refine. Get a basic pattern working before adding edge case handling.
Use a regex tester. Try your patterns interactively with our regex tester tool — it highlights matches and groups in real time.
Beware of catastrophic backtracking. Patterns like (a+)+$ can cause exponential runtime on certain inputs. Keep quantifiers unambiguous.
Don't parse HTML with regex. Use a proper HTML parser. Regex is fine for simple tag extraction but falls apart with nested structures.
Use raw strings. In Python use r"...", in JavaScript use /pattern/ — this avoids double-escaping backslashes.
Test edge cases. Empty strings, strings with only whitespace, very long inputs, and Unicode characters.

Regex Cheatsheet

Pattern	Meaning
`.`	Any character (except newline)
`\d`	Digit [0-9]
`\w`	Word character [a-zA-Z0-9_]
`\s`	Whitespace
`^`	Start of string
`$`	End of string
`\b`	Word boundary
`*`	0 or more
`+`	1 or more
`?`	0 or 1
`{n}`	Exactly n
`{n,m}`	Between n and m
`[abc]`	Any of a, b, or c
`[^abc]`	Not a, b, or c
`(group)`	Capture group
`a\|b`	a or b
`(?=...)`	Positive lookahead
`(?!...)`	Negative lookahead

Practice Regex Right Now

Open our free regex tester and try every pattern from this tutorial. Real-time highlighting, group extraction, and flag toggles included.

Open Regex Tester|Hash Generator|URL Encoder|Get Premium Tools