Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

URL matching... it's been done to death, but what is going on with this?

Last post 05-13-2008, 4:42 PM by Lyndar. 1 replies.
Sort Posts: Previous Next
  •  05-13-2008, 1:45 PM 42196

    URL matching... it's been done to death, but what is going on with this?

    Hi guys and gals,

    Ok, some background: I'm writing a Twitter application (javascript) so I have to parse all kinds of URLs. This is what I have:

    regex = new RegExp("((ht|f)tp(s?)\:\/\/|~/|/)([\\w]+:[\\w]+@)?(([\\w-]+\\.)+[a-z]{2,10})(:[0-9]{1,5})?((\/?[\\w.%]+\/)+|\/)([\\w.%]+\.[\\w.%]+)?((\\?[\\w.%]+\=([\\w.%]+)?)(&[\\w.%]+\=([\\w.%]+)?)*)?(\#[\\w]*)?\\s", "gi"); 

     (it looks for a space at the end.. because.. I don't know, I'm a noob that's the only way I got it to recognize '/' at the end)

    Matches that work as intended:

    "@garyvee was on Conan last night! Check it: http://cdevroe.com/notes/garyveetv/ - awesome job Gary! " 

    --> "http://cdevroe.com/notes/garyveetv/ "

     "@johnmorton RSS TWiT Live cal: http://tinyurl.com/5qehmk "

    --> "http://tinyurl.com/5qehmk "

    "@purpleshark Try this: http://www.google.com/calendar/ical/4qj83c651jkqcpmhvmp1pk9t58%40group.calendar.google.com/public/basic.ics " 

    --> "http://www.google.com/calendar/ical/4qj83c651jkqcpmhvmp1pk9t58%40group.calendar.google.com/public/basic.ics "

     

    Here's the problem: 

    "I've created a TWiT Live production calendar: http://snurl.com/28nng on the web or add this to your iCal or gCal: http://snurl.com/28nno "

    -->  "http://snurl.com/28nng on"

    -->  "http://snurl.com/28nno "

     

    Why is it matching the "on" after the space in the first URL? I don't get it, and I've been on this for a while. If you haven't noticed, I'm a regex noob.

     

    Thanks for any help I can get on this.

  •  05-13-2008, 4:42 PM 42209 in reply to 42196

    Re: URL matching... it's been done to death, but what is going on with this?

    Try escaping your periods in those cases where you want to match a literal period.

     i.e. \. instead of .

     

View as RSS news feed in XML