Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Extract href + domain

Last post 02-23-2010, 8:18 PM by Peadarin. 1 replies.
Sort Posts: Previous Next
  •  11-03-2009, 9:58 AM 57122

    Extract href + domain

    Hi guys,

     Currently I have the following regex:

    [php]
    $link = "domain.com";
    $patterns = "/\s(href)\s*=\s*\"[^\"]*".$link."\/?\"/";
    [/php]

    This extracts the href + the domain.
    It extracts the following:
    href="http://www.domain.com"
    href="http://domain.com"
    href="http://live.domain.com/"

    It doen NOT extracts when there is a folder behind the domain such as:
    href="http://domain.com/site"
    href="http://domain.com?id=2"

    Which all is good.

    The only problem is, it must not extract any href when there is a subdomain other then www
    so it should only extract
    href="http://domain.com"
    href="http://www.domain.com"

     Can anyone help me with that?

     

  •  02-23-2010, 8:18 PM 60026 in reply to 57122

    Re: Extract href + domain

     

    Does it work the way you wish if you exclude the dot and / from the [^\"]* rule , and include wwww. ?

    $patterns = "/\s(href)\s*=\s*\"([^\/.\"]*|www.[^\/\"]*)".$link."\/?\"/";

     

View as RSS news feed in XML