Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

I need help parsing parameters from a URL

Last post 03-09-2010, 10:43 PM by Aussie Susan. 1 replies.
Sort Posts: Previous Next
  •  03-09-2010, 8:48 PM 60611

    I need help parsing parameters from a URL

    I'm running in the .Net 3.0 environment, specifically VB.Net.

    http://www.somedomain.com/gifts/store/gift____gourmet-gift-sales-and-values_shop-by-occasion-all-occasion-gifts?cm_mmc=Email: Housefile-_-m100108 January Sale 2-_-44I1A2A-_-sale&cm_mmc=Email: Housefile-_-m100108 January Sale 2-_-44I1A2A-_-sale

     With the URL above I need to parse out all the strings that are past the cm_mmc= that are delimited by -_-. So the desired results would be:

    Email: Housefile

    m100108 January Sale 2

    44I1A2A

    sale

    Email: Housefile

    m100108 January Sale 2

    44I1A2A

    sale

     I tried a patten of (?<=cm_mmc=).*(?:-_-.*)but that returns the remainder of the URL after the first cm_mmc= 

    Many thanks in advance

     

  •  03-09-2010, 10:43 PM 60619 in reply to 60611

    Re: I need help parsing parameters from a URL

    The first instance of '.*' needs to be turned into '.*?' and you probably want something other than the '(?:-_-.*)' part of your pattern.

    What '.*' tells the regex engine to do is to match all characters (with the possible exception of the 'newline' character spending on the setting of the "singleline" option which you have not told us)  from there it is within the string to the end of the string (or line). In general this is too much and when ti comes to match the '-_-' part of your pattern, it has to backtrack through the characters it has already matched until it gets to the required sequence (or has backtracked to the starting place of the '.*'). Therefore this becomes, in effect, a search from the end of the string/line back towards the start for the last instance of whatever follows the '.*'.

    If I were asked to do this I would probably use one of the many URL parsing classes that are available to do the work for me. However, if I HAD to use a regex pattern and I had the .NET regex variant available to me 9as you do) then I would start with the pattern:

    cm_mmc=((((?!-_-|&).)+)(-_-)?)*

    and look at the capture array for match group #2 as it will contain the items you are after. Based on your test string there will be 2 matches and each match will have 4 captures for match group #2. (You could use match group #1 as well but the text there will have the delimiter in it as well except for the last one)

    Susan

View as RSS news feed in XML