Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Michael Ash's Regex Blog

Regex Musings

Validating Email Revisited

First off let me say I'm a bit over my head here. Not regex part but host the language of the regex engine.

Many moons ago I posted a blog article stating why you could not write a regex that validated an e-mail address 100%. Well this is still true, however in that posted I also stated that the pattern was so massive that it wasn't worth using. This is also still true however I was made aware of a flavor-specific syntax that reduces the regex from massive to very large.

This regex is for the PCRE engine. http://www.myregextester.com/?r=337

Though from what I've read this will work for PHP too.  Now I don't know Perl or PHP or what minimum version of PCRE supports this syntax. That being the case I also don't how well it performs. I wrote the original version using the .Net syntax and not only was the regexPublish massive, which is one reason I never posted it but the performance was terrible. Given that most people want to use this type regex to validate a data entry field, the pattern was overkill. In fact I recommend that you don't use this, except to learn from. The PCRE version may perform better but I don't have the means or time to test, so use at your own risk. For simple field validation even this is still overkill. For a large text file performance may suffer horribly. Most likely you aren't going to want to use this pattern as it is too large for simple test and performs poorly for large test.

When I see people asking for Email regex, I point out that perfect validation is not possible. And when I see so-call email validating regex that are only about 50 characters long, it makes me chuckle. This pattern is probably to most compact version of a RFC 2822 address regex you'll find and it is still huge. Ports to other regex engines not supporting the recursive syntax will easily be 4x as large as my .Net version was.

The above pattern does the RFC Spec up to the address-spec, which pretty much what people are thinking about when they are saying Email address.

It not to hard to take to it up a few more level in the spec using this syntax

RFC 2822 mailbox : http://www.myregextester.com/?r=338

but like I said it likely won't perform well enough to be useful. The two patterns I've linked to I've wrapped in anchors so they are just matching against the whole string. Searching  for a string within a larger body, without anchors will probably degrade performance very fast.  But if any of you PHP or Perl gurus want to stress test this beast, have fun. Maybe it's not as bad as I think it may be.


Save and Continue Writing



Sponsor
Published Sunday, September 28, 2008 12:20 AM by mash
Filed under: ,

Comments

No Comments
Anonymous comments are disabled