How to Match HTML Tags (Safely)
Executive Summary
- Clarifies the main production use case and where regex fits in the workflow.
- Provides implementation boundaries that prevent over-matching and fragile behavior.
- Highlights testing and rollout practices to reduce regressions.
In Short
Use narrowly scoped regex patterns, validate with fixture-driven tests, and verify behavior in the target engine before deployment.
Example Blocks
Input
Sample input
Expected Output
Expected match or transformed output
Engine Caveats
- Flag semantics vary by engine.
- Named groups and lookbehind support differ across runtimes.
- Replacement syntax is not portable across all languages.
The golden rule of the internet: Do not use regex to parse HTML. Use a proper DOM parser like Cheerio or BeautifulSoup.
The Exception to the Rule
Sometimes you just need to strip tags from a string to count words or generate a snippet.
<[^>]+>
This matches an opening bracket, anything that isn't a closing bracket, and then a closing bracket. It's fast and "good enough" for non-security contexts.
Security Warning
Never use regex to sanitize input against XSS. It is trivial to bypass regex filters with malformed HTML that browsers still execute.
Reusable Patterns
FAQ
What problem does this guide solve?
It focuses on a practical regex workflow that can be applied directly in production codebases.
Which regex engines should I verify?
Validate behavior in the exact runtime engines your product uses before rollout.
How do I avoid regressions?
Add explicit passing and failing fixtures in CI for every key pattern introduced in the guide.
Related Guides
Test related patterns in the live editor
Open Editor