Got more questions? Find advice on: ASP | SQL | XML | Windows
Welcome to RegexAdvice Sign in | Join | Help

Re: Regex for unknown number of lines - Assistance Request

  •  06-07-2008, 9:01 PM

    Re: Regex for unknown number of lines - Assistance Request

    I have a problem which seems similar to the one documented here, so I hope somebody reading this can help me too. I have a page of PHP code which works some of the time, but not always, so I'm trying to discover the bugs in it. It includes the line 

    preg_match_all('#<!-- start content -->(.*?)<!-- end content -->#es', $file, $ar);

    This is supposed to find the regular expression  

    '#<!-- start content -->(.*?)<!-- end content -->#es'

    in the file $file. I don't understand the # at the beginning of the regular expression, and the #es at the end of it. I've searched high and wide for documentation, but without success.

    Obviously we're looking for text which is bracketed between  <!-- start content --> and <!-- end content -->; and because it is multiple lines which need to be treated like a single line we need the singleline option to be turned on, as has already been pointed out in this thread. Does the initial # turn the singleline option on? Does the ending #es stand for "end singleline", and turn the option off? If so, why is this use of the # character apparently completely undocumented? If not, what does the code really mean and what am I missing?

    The purpose of the code is to scan though wikipedia-like wiki pages; to select just the main text from the page, discarding headers and footers, by returning the text between the  <!-- start content --> and the <!-- end content -->. With the # and the #es in place it works most of the time, but not always. With them removed it doesn't work at all. The # and #es are obviously crucial elements of the regex, but I don't fully understand why. Any thoughts? Thanks.

View Complete Thread