Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Replacing text with matched text from same string

Last post 08-15-2008, 12:54 PM by dhl. 4 replies.
Sort Posts: Previous Next
  •  08-15-2008, 1:32 AM 45371

    Replacing text with matched text from same string

    Hello, I need help on matching and replacing text with different matched text from same string. My goal is to be able to batch process a large set of text files that each have similar patterns and replacement criteria. Below is an example of a typical case. In the example below, BOLD text replaces strikeout text:

    TITLE: Beautiful Losers - 8/22/2004 12:43:00 AM
    AUTHOR: mt San Francisco
    DATE: 8/22/2004 12:43:00 AM
    -----
    BODY:
    Yerba Buena Center for the Arts
    Through October 10, 2004

    Reviewed by Clark Buckner


    ....  text of varying length here ...


    <!-- BIO>Clark Buckner works as the gallery director at Mission 17 and writes the weekly "Critic's Choice: Art" column for the <i>San Francisco Bay Guardian</i>.*review by Clark Buckner~--><!-- MENU>Beautiful Losers by Clark Buckner~-->
    -----
    --------

    What I want to do is in the line:

    AUTHOR: mt, San Francisco

     replace


    mt, San Francisco

     with:

    Clark Buckner

    to get:

    AUTHOR: Clark Buckner

    Clark Buckner is matched from the bold instance in this section of code at the bottom:

    <!-- BIO>Clark Buckner works as the gallery director at Mission 17 and writes the weekly "Critic's Choice: Art" column for the <i>San Francisco Bay Guardian</i>.*review by Clark Buckner~--><!-- MENU>Beautiful Losers by Clark Buckner~-->


    the match I want to capture is always the first instance of text between "*" and the string "by ", and ending in the first "~"

    (I do not want to capture from the area beginning with <!-- MENU>)

    I've experimented on regex simulator tools and am just hitting a wall on how to form the expression. I'm working in an environment that supports standard perl regex. Any help would be very much appreciated. Many thanks.

  •  08-15-2008, 1:56 AM 45374 in reply to 45371

    Re: Replacing text with matched text from same string

    Try a matching pattern of:

    (AUTHOR:\s*)([^\r\n]*)([\S\s]+)\*review\s+by\s+([^~]*)

    and replacement text of

    $1$4$3$4

    with the 'ignore case' option set (or the text altered to suit). You don't mention the regex/language tht you are using, to the replaecment text may need to be "\1\4\3\4" or whatever is appropriate.

    Susan

    PS: Only tested with the single example you provided and making assumptions about there never being a "*review by" within the text before the one you are after.

  •  08-15-2008, 3:00 AM 45375 in reply to 45374

    Re: Replacing text with matched text from same string

    Aussie Susan:

    Try a matching pattern of:

    (AUTHOR:\s*)([^\r\n]*)([\S\s]+)\*review\s+by\s+([^~]*)

    and replacement text of

    $1$4$3$4

    with the 'ignore case' option set (or the text altered to suit). You don't mention the regex/language tht you are using, to the replaecment text may need to be "\1\4\3\4" or whatever is appropriate.

    Susan

    PS: Only tested with the single example you provided and making assumptions about there never being a "*review by" within the text before the one you are after.

     

    Hi Susan, thank you for the quick reply. This is very close to exactly what I need, just a couple tweaks and I think we'll have it.

    I'm working with a text processor which supports the PCRE library so the "\1\4\3\4" replace format was the one that worked.

    Your assumption that "*review by" never occurs within the text before the one I'm after is partially correct. My fault for not being more clear:

    "*review by" isn't a constant, what I meant is "*(some wildcard text) by "

    i.e. "*by", "*review by", "*a review by", "*an essay by", etc. would all be valid delimiters. If it helps, another thing to note is that this match will always be from within the block of text that begins with <!-- BIO>.

    update:

    After some experimentation, I got it working by using

    *[\S\s]*?\s*?by

    instead of

    *review\s+by

    like this:

    (AUTHOR:\s*)([^\r\n]*)([\S\s]+)\*[\S\s]*?\s*?by\s+([^~]*)

    So now the only tweak is after running the expression, the string "*review by" is getting deleted in the result. This text needs to remain unaffected. Any thoughts?

    Thanks again!

    best,

    --dhl

     

  •  08-15-2008, 6:06 AM 45377 in reply to 45375

    Re: Replacing text with matched text from same string

    dhl,

    Add another set of parentheses around the part of the expression that is matching "*review by":

    (AUTHOR:\s*)([^\r\n]*)([\S\s]+)(\*[\S\s]*?\s*?by\s+)([^~]*)

    Modify your replacement to use the new group:

    \1\5\3\4\5

    That should do it.

     

    Jeff

  •  08-15-2008, 12:54 PM 45386 in reply to 45377

    Re: Replacing text with matched text from same string

    jeff.hillman:

    dhl,

    Add another set of parentheses around the part of the expression that is matching "*review by":

    (AUTHOR:\s*)([^\r\n]*)([\S\s]+)(\*[\S\s]*?\s*?by\s+)([^~]*)

    Modify your replacement to use the new group:

    \1\5\3\4\5

    That should do it.

     

    Jeff

    Jeff,

    Perfect! Thank you very much!

    best,

    --dhl 

View as RSS news feed in XML