Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

New to RegEx. Need help!

Last post 03-08-2010, 10:58 PM by Aussie Susan. 1 replies.
Sort Posts: Previous Next
  •  03-08-2010, 1:50 PM 60492

    New to RegEx. Need help!

    I'm using WebClient in c# to fetch a webpage. It is returned as a string. I would like to use RegEx to extract everything between <font face="Arial"> and </font><br>, but only if the data between  <font face="Arial"> and </font><br> contains Ber&#228;knas. The fetched string contains several snippets like these and I want to search them all.

    <font face="Arial">08:45 till Stockholm<br>T&#229;g nr <a href="/TRAFIK/(r4txvq551iutle55rohjvp55)/WapPages/TrainShow.aspx?JF=1&train=20100308,164">164</a><br>Ber&#228;knas08:27<br>Sp&#229;r 1</font><br>
    Am I making any sense? I hope someone can help me.
  •  03-08-2010, 10:58 PM 60513 in reply to 60492

    Re: New to RegEx. Need help!

    It is unfortunate that the only example you gave us has the whole text being matched by the pattern. Therefore this has only had VERY limited testing but does match your example text (no negative testing has been performed)

    One approach is:

    <font\s+face="Arial">((?!Ber&\#228;knas|</font><br>).)*Ber&\#228;knas((?!</font><br>).)*</font><br>

    with the "singleline" and "ignore case" options set as appropriate (if you don't have the "extended" ir "ignore whitespace" option set then you can take the '\' from before the '#'s but leaving them in should not hurt).

    What this does is to look for the opening tag and then scan forward looking for either the ending sequence or the "must have in the middle" sequence. If it is not the "must have in the middle" sequence then the pattern will fail as we must have found a "<font......</font><br>" block which you say you don't want. Otherwise the search continues to the end tag knowing that we have the "middle" text.

    You can clean this up a bit depending on the exact sequence of characters you need in the beginning, middle and end.

    Susan

View as RSS news feed in XML