Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

finding selected text between matching tags

Last post 02-11-2010, 12:15 PM by bob.aldrich. 8 replies.
Sort Posts: Previous Next
  •  02-07-2010, 12:47 AM 59359

    finding selected text between matching tags

    Recently upgraded a Cognos reporting environment with over 4000 reports specifications stored in XML.  There are certain style functions that have been deprecated and my task is to find them and replace with the current function.  My attempt to search for specfic style references is returning more data than desired. 

    <(style>){1}.*?"np".*?</\1  

    It is finding all the text shown below (note: the carriage returns were added for readiability, they don’t exist in the source XML file). 

    I really only want to find the last line (underlined).   It starts correctly, finding the first style tag, but I would like it to fail if it finds a closing “/style” tag, before it finds the “refStyle” , I was hoping the {1} would limit the result to just include the reference to style> only once. 

    Any suggestions on how to fix this Regular expression will be appreciated.

    <style><CSS value="border-bottom:1pt solid black;border-right:1pt solid black"/></style>

    </tableCell><tableCell><contents><textItem><dataSource><staticValue>Sum:</staticValue></dataSource></textItem></contents>

    <style><CSS value="background-color:white;font-family:Arial;font-size:10pt;font-weight:bold;vertical-align:top;text-align:center;width:90px;height:35px;padding-top:5px;padding-bottom:5px;border:1pt solid black"/></style>

    </tableCell></tableCells></tableRow><tableRow><tableCells><tableCell><contents><textItem><dataSource><staticValue>Sum:</staticValue></dataSource></textItem></contents>

    <style><CSS value="background-color:white;font-family:Arial;font-size:10pt;font-weight:bold;vertical-align:middle;text-align:left;width:90px;height:35px;padding-top:5px;padding-bottom:5px;padding-left:3px;border:1pt solid black"/></style>

    </tableCell><tableCell><contents/>

    <style><CSS value="width:90px;height:25px;border:1pt solid black"/></style>

    </tableCell></tableCells></tableRow></tableRows>

    <style><CSS value="border-collapse:collapse;text-align:center"/><defaultStyles><defaultStyle refStyle="tb"/></defaultStyles></style>

    <conditionalStyles><conditionalStyleCases refVariable="Conditonalblock2 - Var"><conditionalStyle refVariableValue="1"/></conditionalStyleCases><conditionalStyleDefault><CSS value="display:none"/></conditionalStyleDefault></conditionalStyles></table><table><tableRows><tableRow><tableCells><tableCell><contents><textItem><dataSource><dataItemValue refDataItem="Assignment"/></dataSource>

    <style><defaultStyles><defaultStyle refStyle="np"/></defaultStyles></style>

  •  02-07-2010, 2:27 AM 59361 in reply to 59359

    Re: finding selected text between matching tags

    i can suggest this one:

    <style>.*?refStyle.*?/style>

    does it help?

  •  02-07-2010, 2:56 AM 59363 in reply to 59361

    Re: finding selected text between matching tags

    your regex, <style>.*?refStyle.*?/style>  , does not help as it pulls more even more styles back.  But many thanks for trying to assist.  There has to be a way to lookbehind and see if a /style has been included before it comes to the "np&quot:  

  •  02-07-2010, 3:18 AM 59364 in reply to 59363

    Re: finding selected text between matching tags

    <style>((?!/style).)*?refStyle.*?/style>

    should make sure that string

    /style

    does not come before

    refStyle

    i'm still not clear on your requirements; can u elaborate a bit?

  •  02-07-2010, 3:32 AM 59365 in reply to 59364

    Re: finding selected text between matching tags

    i'm looking at your statement:

    1) ***There has to be a way to lookbehind and see if a /style has been included before it comes to the "np&quot: ***

    and then at the string u need to match:

    2) <style><defaultStyles><defaultStyle refStyle="np"/></defaultStyles></style>

    ...and i find them contradicting to each other , because in the string 2)   ;/style  comes AFTER   ="np&quot,   not before as you requested.

    What am I missing?

  •  02-07-2010, 2:45 PM 59374 in reply to 59365

    Re: finding selected text between matching tags

    Your last regex returns all instances of refStyle, I am merely trying to find those that are deprecated.   Modifying your last suggestion provides the results I am looking for. The revised regex is

    <style>((?!/style).)*?refStyle="np".*?/style>

    I had tried a similar statement but was unsuccessful, because I was placing the negative lookahead in the wrong spot.  Actually, I was using a negative lookbehind placing it after the first *?  like this.

    &lt;style&gt;.*?(?<!/style)refStyle=&quot;np&quot;.*?&lt;/style&gt;

    I was trying to make the search fail, if it encounted a /style, before the refStyle. Once it failed I assume it would move forward in the XML untill it found the next &lt;style&gt;

    I am not clear on why you are nesting the lookahead in round parensthesis, does that make it do the lookahead after it reads each new character?

  •  02-07-2010, 5:40 PM 59378 in reply to 59374

    Re: finding selected text between matching tags

    bob.aldrich:

    I am not clear on why you are nesting the lookahead in round parensthesis, does that make it do the lookahead after it reads each new character?

    correct

  •  02-11-2010, 11:27 AM 59661 in reply to 59359

    Re: finding selected text between matching tags

    bob.aldrich:

    > but I would like it to fail if it finds a closing “/style” tag, before it finds the “refStyle” ,

    > I was hoping the {1} would limit the result to just include the reference to style&gt; only once. 

     

     

    Here is a quick script.

     

    Thanks for attaching the input sample - makes it easier.

     

     

     

     

    # Script style.txt

    var str file, content

    # Read file's content into a string variable.

    cat $file > $content

    # While the string is present, keep extracting it.

    while ( { sen -r "^\&lt\;style\&gt\;&refStyle&\&lt\;/style\&gt\;^" $content } > 0 )

        stex -r "^\&lt\;style\&gt\;&refStyle&\&lt\;/style\&gt\;^" $content

     

     

     

    I tested the script like this - saved the script in file /Scripts/a.txt, copied the sample input into file /Scripts/b.txt, started biterscripting ( http://www.biterscripting.com ) and ran the following command.

     

     

    script "/Scripts/a.txt" file("/Scripts/b.txt")

     

     

    I got this output.

     

    &lt;style&gt;&lt;CSS value=&quot;border-collapse:collapse;text-align:center&quot;/&gt;&lt;defaultStyles&gt;&lt;defaultStyle refStyle=&quot;tb&quot;/&gt;&lt;/defaultStyles&gt;&lt;/style&gt;
    &lt;style&gt;&lt;defaultStyles&gt;&lt;defaultStyle refStyle=&quot;np&quot;/&gt;&lt;/defaultStyles&gt;&lt;/style&gt;

     

     

    Is this what you are looking for ? Then, you can translate this to any lanaguage.

     

    Use -c for case-insensitive.

    > I was hoping the {1} would limit the result to just include the reference to style&gt; only once. 

     

     

    There are two instances with refStyle present. Did I misunderstand ?

     

     

     

     

    There are two instances with refStyle present. Did I misunderstand ?

     

     

     

     

     

     

     

     

     

     

     

     

     

     

  •  02-11-2010, 12:15 PM 59663 in reply to 59661

    Re: finding selected text between matching tags

    The regex solution "&lt;style&gt;.*?(?<!/style)refStyle=&quot;np&quot;.*?&lt;/style&gt;' did the trick.   What I needed to accomplish was to find the all the Style tags which included a reference to a deprecated function , that function was  refstyle='np',  there are many refstyles.   I only needed to identify just 'np'', once found, I will compute a new Style tag for those cognos reports and then do a replacement of the whole style tag. 

    What the above Regex solution allows me to do is create an extract of only the style tags which include "np"  ( I don't want all the style tags)

    Thank you so much; for taking the time to respond to my question.

     

View as RSS news feed in XML