Today I came across an example that nicely illustrates the use of a zero-width look-behind assertion. A zero-width assertion is an expression that may match text but does not consume any characters from the input string.
I wanted to extract an entire <img> tag from a chunk of text, while also verifying that the tag was self-closed. A starting pattern might look like this:
<img [^/]* />
That will work except for the occasional img tag that includes an embedded forward-slash.
One way to correct for that while keeping the pattern nice and simple is to use a zero-width assertion. Applying a look-ahead assertion would produce:
<img [^>]* (?=/)>
But that doesn't quite work. When applied to the sample text:
<img src='foo' />
the character class will consume characters up-to and including the forward-slash. Then the assertion is applied and 'looks ahead' for a forward-slash, but it finds only the greater-than, and so fails.
The solution then, is to look backward to find the forward-slash:
<img [^>]* (?<=/)>
Note: The regex patterns above include extra spaces in them to make them more readable; remove the spaces or apply the 'IgnorePatternWhitespace' option when using them.
-Wayne
Taking another look at this, I see I got carried away with using the 'cool' look-behind functionality. For the specific problem I was trying to solve — that is, extract an entire <img> tag from a chunk of text while also verifying that the tag is self-closed — use of the look-behind feature is overkill. This 'traditional' regex will do that:
<img [^>]* />