"I need a regular expression to parse my HTML"

New programmers who want to extract information from an HTML document often turn to regular expressions.

This is rarely a good idea. HTML is an irregular language and regexes are inadequate for the job. You should use an HTML parser.

This site shows you how.

To do

  • Add more languages.
  • Explain why regexes are bad.
  • Explain how fragile regexes are.

Thanks

Thanks to the following folks for their contributions:

  • M. Buettner
  • Kirk Kimmel
  • Anubhava Srivastava
  • Nathan Mahdavi
  • Jeffrey Kegler
  • Bill Ricker
  • Stuart Caie
  • and Jeana Clark
Fork me on GitHub