Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Match text between two words

Last post 12-12-2007, 12:35 PM by oskhar. 10 replies.
Sort Posts: Previous Next
  •  12-10-2007, 5:51 PM 37497

    Match text between two words

    Let this text:

        WORD1 blahblah1 WORD1 blahblah2 WORD2

    (Note that WORD1 is repeated 2 times, could be more, and that "blahblah" is some random text, always changing, but matched with .*\s*)

    The problem is: You want to match when WORD1 is followed at any distance by WORD2, but without any other WORD1 in the way. I mean:

        "WORD1 blahblah2 WORD2" --> match

        "WORD1 blahblah1 WORD1 blahblah2 WORD2" --> not match, although starting also with WORD1 and ending with WORD2,

    but it has another WORD1 in the middle.

    I tried this pattern (WORD1.*\s*){1}WORD2 as a way to avoid "WORD1 blahblah" repetitions... but doesn't work! Don't know why...

    Thanks in advance. 

    P.D.: I'll use TRegExpr library 

    Filed under: ,
  •  12-10-2007, 6:02 PM 37500 in reply to 37497

    Re: Match text between two words

    Provide real sample data.  The posting guideline were momentarily down when you posted your question but they ask that you do not make up sample data as that is not helpful in regex constructing.  The guidelines are back up and at the top of this forum.

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  12-10-2007, 6:52 PM 37501 in reply to 37500

    Re: Match text between two words

    Ok, real data (but sorry, cannot give you real person names).

    I will use TRegExpr for Delphi to process old ascii files with person names (code 720, plus name) optionally followed by the profession (code 100, plus profession name).

    ...| 720 | John Doe | 720 | Winston Smith | 720 | Peter Parker | 100 | Journalist | 720 | Humpty Dumpty | 720 | ...

    You see that many names are not followed by a corresponding profession, only 720|Peter Parker has a 100|Journalist.

    It seems that to match the names that have a profession defined, it's enough to locate when a 720 code is immediatly followed by a 100 code.

    So you must match:
        720 | Peter Parker | 100 | Journalist

    However you must not match:
        720 | Winston Smith | 720 | Peter Parker | 100 | Journalist  

    because the first 720 is not immeadiatly followed by a 100, there is another 720 in between. Or, if you prefer, there is a sequence "720|name" repeated

    within the match.

    I've tried to avoid repetitions with (720.*\s*){1}100\s*.* but unlucky. How do I exclude the 720|name repetition?

    Is RE suitable for this problem? Or will be easier programatically, with Delphi code?

    Thanks again

    Filed under: ,
  •  12-10-2007, 7:34 PM 37506 in reply to 37501

    Re: Match text between two words

    \|\s*([^|]*)\s*\|\s*100\s*\|\s*([^|]*)\s*

    Results in:

    Array
    (
        [0] => Array
            (
                [0] => | Peter Parker | 100 | Journalist 
            )
    
        [1] => Array
            (
                [0] => Peter Parker 
            )
    
        [2] => Array
            (
                [0] => Journalist 
            )
    
    )
    

  •  12-10-2007, 8:00 PM 37508 in reply to 37506

    Re: Match text between two words

    It works, great!!

    I'll try to learn from your magic. It seems you look around each 100, forgetting 720, why not? Idea Will do perfectly.

    However, I'll never understand why (720.*\s*){1}100\s*.* didn't work... [:'(]

    Thanks! 

  •  12-10-2007, 8:31 PM 37509 in reply to 37508

    Re: Match text between two words

    I started with the occupation code and worked my way left and right since that was what you wanted in the match.

    If you want to include 720 in the match:

    \s*720\s*\|\s*([^|]*)\s*\|\s*100\s*\|\s*([^|]*)\s*


  •  12-11-2007, 2:24 AM 37528 in reply to 37508

    Re: Match text between two words

    oskhar:

    It works, great!!

    I'll try to learn from your magic. It seems you look around each 100, forgetting 720, why not? Idea Will do perfectly.

    However, I'll never understand why (720.*\s*){1}100\s*.* didn't work... [:'(]

    Thanks! 

    It works, just not how you want

    720.* = matches everything after the first 720 (inclusive)

    since everything after that part is optional the regex will backtrack until it find the last occurance of 100, then match everything after that .

    If you don't know what backtracking is then you should find a regex tutorial and read up on it. It's very important in regex-land


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  12-11-2007, 2:30 AM 37529 in reply to 37501

    Re: Match text between two words

    oskhar:

    Ok, real data (but sorry, cannot give you real person names).

    I understand if you can't use real names so changing them to fictional characters is perfectly acceptable. I've suggested the same approach for sensitive data before . http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  12-11-2007, 7:11 AM 37537 in reply to 37529

    Re: Match text between two words

    mash:

    I understand if you can't use real names so changing them to fictional characters is perfectly acceptable. I've suggested the same approach for sensitive data before . http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx

    Interesting article. Neverending learning!

    Let me a bit of philosophy about rookies, referenced in the article: Many of us will be rookies until the end, please be patient with us...

    There is so much information out there (and here, e.g., RE is a micro-world itself), and there are a lot more worlds around, full of info (programming, electronics, music, arts...), so that many of us must get used to always start from scratch at every step we take. And we are 6,000,000,000 people stepping around!

    And the world is nowadays too dynamic to specialize (it's a rookie-generator), unless you are lucky and have a stable job/hobby, like programming for many years with the same language (that must be heaven). Do you remember the old times, where blacksmith was a lifetime job?

    That's why I think we'll always be rookies, it's a sign of new times. I know you know it, and that you all are used to bear us, the eternal rookies always growing everywhere and asking around disorderly, shaked by the surrounding tons of info, interchanging everywhere, and the winds of change always blowing. Let's enjoy it, while we can! (End of chatter Zip it!).

    Thanks for the lessons!

    Filed under:
  •  12-12-2007, 3:02 AM 37579 in reply to 37537

    Re: Match text between two words

    @oskhar, I don't identify with that viewpoint at all. Paraphrased: "Some people won't try to learn, so you should help them in your free time for free forever."

    My regex-centric blog :: JavaScript regex tester
  •  12-12-2007, 12:35 PM 37606 in reply to 37579

    Re: Match text between two words

    Stevezilla00:
    "Some people won't try to learn, so you should help them in your free time for free forever."

    Ooooops... sorry, the chatter was so long, drunky, and convoluted (and I am a simple spaniard trying to speak english...), that is easily misunderstood. I also don't identify with the paraphrasis, if the sense was like that.

    I was thinking about the information we *wish to learn*, which is so much that our brain (mine, at least) is too small to reach all, so that I'll be a rookie in almost all of them forever, *even learning*. This happend in Mathematics history; not much time ago, near the Riemann's years, Maths became so complex that one man was so far not able to keep all the knowledge, even a genius. Since then, geniuses shine only in their respective branches. It is impossible to learn everything.

    And of course, if we get into webs like this is becaus we want to learn... although, reading your words, I see you refer to people that ask you not for learning, but only to solve their problem, and then fly away.

    To bear that idea, and still keep working in spite of it, makes your work harder and more valuable.

View as RSS news feed in XML