Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Matching URLs with more than one attached values

Last post 03-16-2010, 10:12 PM by Peadarin. 1 replies.
Sort Posts: Previous Next
  •  03-04-2010, 7:56 AM 60342

    Matching URLs with more than one attached values

    Hey guys,

    I'm modifying a content management system written in PHP which uses quite unattractive URLs like this: http://domain.com/cms/website.php?id=/de/index.htm

    Using the following replacement rules I managed to convert the addresses to something like this: http://domain.com/de/index.htm

     

    $strBuffer = preg_replace( '#([\"\'])website\.php\?id=(/.+)(?:&|&)(.+)(\\1)#','\\1\\2?\\3\\4', $strBuffer );
    $strBuffer = preg_replace( '#([\"\'])website\.php\?id=(/[^\\1]+)(\\1)#U','\\1\\2\\3', $strBuffer );
    $strBuffer = preg_replace(' #([\"\'])\.\.(/.+\\1)#U','\\1\\2', $strBuffer );

    So far everything works fine. However, some features of the CMS require adding one or more parameters to the URL, for instance:

    http://domain.com/cms/website.php?id=/de/search.htm&action=search&searchterm=seo&tagsearch=1 

    Since all additional values except id may vary in any order, I want to leave them untouched, only changing the first '&'-character to a question mark so the address remains valid:

    http://domain.com/de/search.htm?action=search&searchterm=seo&tagsearch=1 

    However, with the current expressions I get this broken URL: http://domain.com/de/search.htm&action=search&searchterm=roman?amp;tagsearch=1 
    Can anyone tell me how to fix this?

    Thank you in advance
    Michael 

     

  •  03-16-2010, 10:12 PM 61254 in reply to 60342

    Re: Matching URLs with more than one attached values

    It looks like you use a greedy operator within your regular expression

    .....       =(/.+)(?:&|&)(.+)(   .....................

    (/.)+ does match the longest string, and only the rightmost & will be matched by your rule. You want the first one &  (leftmost).

     

    The following lazy expressions are equivalent, the first one being likely the most performant one (less backtracking on most implementations)

    (/[^&]+)(?:&|&)

    (/.+?)(?:&|&)

    Does it solve your problem ?

     

View as RSS news feed in XML