Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Text replacement for Javascript

Last post 10-30-2006, 2:49 PM by Sergei Z. 6 replies.
Sort Posts: Previous Next
  •  10-22-2006, 5:09 PM 23566

    Text replacement for Javascript

    I would be grateful if someone can help me with this. I would like to have the following (newline character at the end of each line):

    |
    [http://www.cnn-english.com CNN in english]
    | Description line one.
    | Description line two.
    | Description line three.
    |-

    replaced by:

    {{New site|Logo-of-CNN in english.gif|CNN in english|http://www.cnn-english.com|Description line two. Description line three.}}

    This will help me prevent doing it manually for about 300 records. I'll be using Javascript's string.replace function which accepts Regexs. 
    Your help is much appreciated!

    thank you
    Whale

  •  10-23-2006, 10:41 AM 23576 in reply to 23566

    Re: Text replacement for Javascript

    tested OK in Expresso"

    match with:

    \|.*?(http://(?:[\w.-]+)+\w+)\x20+([\w\x20]+)\].*?\|.*?\|\s*(.*?)\s*\|\s*(.*?)\s*\|-

    replace with

    {{New site|Logo-of-$2.gif|$2|$1|$3 $4}}

    result of the match/replace

    {{New site|Logo-of-CNN in english.gif|CNN in english|http://www.cnn-english.com|Description line two. Description line three.}}

     C# code for the Regex obj: [watch OPTIONS!]:

    using System.Text.RegularExpressions;

    Regex regex = new Regex(
        @"\|.*?(http://(?:[\w.-]+)+\w+)\x20+([\w\x20]+)\].*?\|.*?\|\s*"
        + @"(.*?)\s*\|\s*(.*?)\s*\|-",
        RegexOptions.IgnoreCase
        | RegexOptions.Singleline
        | RegexOptions.IgnorePatternWhitespace
        );

  •  10-23-2006, 2:44 PM 23586 in reply to 23576

    Re: Text replacement for Javascript

    Thanks for your help! It picked up about 60 out of the 300 cases. The ones it didnt pick have some issues with the URL's: 

    [http://www.aboutksa.com/ About KSA]

    [http://pola.org/marry.htm POLA]

    [http://members.cnn.com/alnour/index2.html Al Nour]

    [http://www.peoplesforum.com/ The Truth's Forum]

    [http://www.cnn.org/Pack/apples.html An guy's Guide to Apples]

    These are different from the original URL sample:

    [http://www.cnn-english.com CNN in english]

    This doesnt have the ending / and doesnt have extra characters like .HTML. That means that the Green and Red are only separated by a Space and there can be anything in the Green and Red themselves, but the space will always be there.

    The original Regex you gave was:

    \|.*?(http://(?:[\w.-]+)+\w+)\x20+([\w\x20]+)\].*?\|.*?\|\s*(.*?)\s*\|\s*(.*?)\s*\|-

    Thanks again! I also downloaded the Expresso and ran the checks in it.

  •  10-23-2006, 2:55 PM 23588 in reply to 23586

    Re: Text replacement for Javascript

    to be able to distingush between red and green text, u'll have to come up with the rule as to where the URL ends and where the *other* text starts. W/o the rule u cannot parse them [URL ans the *other* text] with the regex. The white space does not always qualify as a delimiter.

    so u need to do some data analysis and come up with an exhaustive list of possible URL formats to be able to write a regex that would account for all possibilities.

     

  •  10-23-2006, 10:47 PM 23605 in reply to 23588

    Re: Text replacement for Javascript

    Thanks. This is an example of a real URL: 

    [http://www.cnn-english.com/index.htm CNN in english]

    We could say the parsing rules are like this:

    - begins with [http

    - From the first "["   , follow up all characters until the first Space is found, make that into one group ($1)

    - The rest of the Characters (including spaces) are to be grouped into $2 until we hit a ]

    This is always the case, i.e. - the URL ends when the first Space is found. The Red text ends after that when the first ] is found.

    I hope that helps. So the real example would be:

    |
    [http://www.cnn-english.com/index.html CNN in english]
    | Description line one.
    | Description line two.
    | Description line three.
    |-

    To be formatted to:

    {{New site|Logo-of-CNN in english.gif|CNN in english|http://www.cnn-english.com/index.htm|Description line two. Description line three.}}

    URL's can also end in / but they will always be one string, which starts from "http" and terminated by the first space after the 'http'.

  •  10-28-2006, 10:24 PM 23715 in reply to 23605

    Re: Text replacement for Javascript

    Hi Sergie,

    I got it working. Using your expression as a beginning, I was able to make a dirty expression with the help of Expresso and Rad Software Regex builder and thankfully it worked. I just wanted it to work, so for you it could probably be reduced simpler, but thanks for your help!

    \|.*?(http://(?:[\w.-]+)+\w+\S+)\x20+([':",!\?&\.<>/\w-\(\)\x20]+)\].*?\|\s*(.*?)\s*\|\s*(.*?)\s*\|\s*(.*?)\s*\|-

  •  10-30-2006, 2:49 PM 23734 in reply to 23715

    Re: Text replacement for Javascript

    looks good to me; i'd only change a few things in the [....] class:

    \|.*?(http://(?:[\w.-]+)+\w+\S+)\x20+([':",!?&.<>/\w()\x20-]+)\].*?\|\s*(.*?)\s*\|\s*(.*?)\s*\|\s*(.*?)\s*\|-

View as RSS news feed in XML