Optimizing Performance: Avoiding Catastrophic Backtracking
Regex operations are usually fast, right? Until they aren't. A phenomenon known as "Catastrophic Backtracking" can cause a regex engine to take years to evaluate a short string.
The Dangerous Pattern
It typically happens with nested quantifiers, like: (x+x+)+y.
If you feed this pattern a string of "x"s that doesn't end in "y", the engine will try every possible combination of how to potential split the "x"s between the two internal groups. This grows exponentially.
3 Rules for Speed
- Anchor your patterns: Use
^and$whenever possible to limit scan scope. - Be specific: Use
[^"]*instead of.*. The dot is ambiguous; negated character classes are strict. - Fail Fast: If your language supports it, use atomic grouping
(?>...)or possessive quantifiers++to prevent backtracking.