|
|
Match text between two words
Last post 12-12-2007, 12:35 PM by oskhar. 10 replies.
-
12-10-2007, 5:51 PM |
-
oskhar
-
-
-
Joined on 12-11-2007
-
-
Posts 5
-
-
|
Match text between two words
Let this text:
WORD1 blahblah1 WORD1 blahblah2 WORD2
(Note that WORD1 is repeated 2 times, could be more, and that "blahblah" is some random text, always changing, but matched with .*\s*)
The problem is: You want to match when WORD1 is followed at any distance by WORD2, but without any other WORD1 in the way. I mean:
"WORD1 blahblah2 WORD2" --> match
"WORD1 blahblah1 WORD1 blahblah2 WORD2" --> not match, although starting also with WORD1 and ending with WORD2, but it has another WORD1 in the middle.
I tried this pattern (WORD1.*\s*){1}WORD2 as a way to avoid "WORD1 blahblah" repetitions... but doesn't work! Don't know why... Thanks in advance. P.D.: I'll use TRegExpr library
|
|
-
12-10-2007, 6:02 PM |
-
mash
-
-
-
Joined on 04-14-2005
-
Birmingham, AL
-
Posts 1,577
-
-
|
Re: Match text between two words
Provide real sample data. The posting guideline were momentarily down when you posted your question but they ask that you do not make up sample data as that is not helpful in regex constructing. The guidelines are back up and at the top of this forum.
Michael
"In theory, theory and practice are the same. In practice, they are not." Albert Einstein
|
|
-
12-10-2007, 6:52 PM |
-
oskhar
-
-
-
Joined on 12-11-2007
-
-
Posts 5
-
-
|
Re: Match text between two words
Ok, real data (but sorry, cannot give you real person names). I will use TRegExpr for Delphi to process old ascii files with person names (code 720, plus name) optionally followed by the profession (code 100, plus profession name).
...| 720 | John Doe | 720 | Winston Smith | 720 | Peter Parker | 100 | Journalist | 720 | Humpty Dumpty | 720 | ...
You see that many names are not followed by a corresponding profession, only 720|Peter Parker has a 100|Journalist.
It seems that to match the names that have a profession defined, it's enough to locate when a 720 code is immediatly followed by a 100 code.
So you must match: 720 | Peter Parker | 100 | Journalist
However you must not match: 720 | Winston Smith | 720 | Peter Parker | 100 | Journalist
because the first 720 is not immeadiatly followed by a 100, there is another 720 in between. Or, if you prefer, there is a sequence "720|name" repeated within the match.
I've tried to avoid repetitions with (720.*\s*){1}100\s*.* but unlucky. How do I exclude the 720|name repetition? Is RE suitable for this problem? Or will be easier programatically, with Delphi code?
Thanks again
|
|
-
12-10-2007, 7:34 PM |
-
ddrudik
-
-
-
Joined on 05-24-2007
-
USA
-
Posts 1,631
-
-
|
Re: Match text between two words
\|\s*([^|]*)\s*\|\s*100\s*\|\s*([^|]*)\s* Results in: Array
(
[0] => Array
(
[0] => | Peter Parker | 100 | Journalist
)
[1] => Array
(
[0] => Peter Parker
)
[2] => Array
(
[0] => Journalist
)
)
|
|
-
12-10-2007, 8:00 PM |
-
oskhar
-
-
-
Joined on 12-11-2007
-
-
Posts 5
-
-
|
Re: Match text between two words
It works, great!! I'll try to learn from your magic. It seems you look around each 100, forgetting 720, why not? Will do perfectly. However, I'll never understand why (720.*\s*){1}100\s*.* didn't work... [:'(]
Thanks!
|
|
-
12-10-2007, 8:31 PM |
-
ddrudik
-
-
-
Joined on 05-24-2007
-
USA
-
Posts 1,631
-
-
|
Re: Match text between two words
I started with the occupation code and worked my way left and right since that was what you wanted in the match. If you want to include 720 in the match: \s*720\s*\|\s*([^|]*)\s*\|\s*100\s*\|\s*([^|]*)\s*
|
|
-
12-11-2007, 2:24 AM |
-
mash
-
-
-
Joined on 04-14-2005
-
Birmingham, AL
-
Posts 1,577
-
-
|
Re: Match text between two words
oskhar:It works, great!! I'll try to learn from your magic. It seems you look around each 100, forgetting 720, why not? Will do perfectly. However, I'll never understand why (720.*\s*){1}100\s*.* didn't work... [:'(]
Thanks!
It works, just not how you want 720.* = matches everything after the first 720 (inclusive) since everything after that part is optional the regex will backtrack until it find the last occurance of 100, then match everything after that . If you don't know what backtracking is then you should find a regex tutorial and read up on it. It's very important in regex-land
Michael "In theory, theory and practice are the same. In practice, they are not." Albert Einstein
|
|
-
-
12-11-2007, 7:11 AM |
-
oskhar
-
-
-
Joined on 12-11-2007
-
-
Posts 5
-
-
|
Re: Match text between two words
mash:
Interesting article. Neverending learning!
Let me a bit of philosophy about rookies, referenced in the article: Many of us will be rookies until the end, please be patient with us... There is so much information out there (and here, e.g., RE is a micro-world itself), and there are a lot more worlds around, full of info (programming, electronics, music, arts...), so that many of us must get used to always start from scratch at every step we take. And we are 6,000,000,000 people stepping around!
And the world is nowadays too dynamic to specialize (it's a rookie-generator), unless you are lucky and have a stable job/hobby, like programming for many years with the same language (that must be heaven). Do you remember the old times, where blacksmith was a lifetime job? That's why I think we'll always be rookies, it's a sign of new times. I know you know it, and that you all are used to bear us, the eternal rookies always growing everywhere and asking around disorderly, shaked by the surrounding tons of info, interchanging everywhere, and the winds of change always blowing. Let's enjoy it, while we can! (End of chatter ).
Thanks for the lessons!
|
|
-
12-12-2007, 3:02 AM |
-
12-12-2007, 12:35 PM |
-
oskhar
-
-
-
Joined on 12-11-2007
-
-
Posts 5
-
-
|
Re: Match text between two words
Stevezilla00:"Some people won't try to learn, so you should help them in your free time for free forever."
Ooooops... sorry, the chatter was so long, drunky, and convoluted (and I am a simple spaniard trying to speak english...), that is easily misunderstood. I also don't identify with the paraphrasis, if the sense was like that.
I was thinking about the information we *wish to learn*, which is so much that our brain (mine, at least) is too small to reach all, so that I'll be a rookie in almost all of them forever, *even learning*. This happend in Mathematics history; not much time ago, near the Riemann's years, Maths became so complex that one man was so far not able to keep all the knowledge, even a genius. Since then, geniuses shine only in their respective branches. It is impossible to learn everything.
And of course, if we get into webs like this is becaus we want to learn... although, reading your words, I see you refer to people that ask you not for learning, but only to solve their problem, and then fly away.
To bear that idea, and still keep working in spite of it, makes your work harder and more valuable.
|
|
|
|
|