Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Regex for unknown number of lines - Assistance Request

Last post 06-08-2008, 10:16 PM by ddrudik. 10 replies.
Sort Posts: Previous Next
  •  06-05-2008, 11:37 AM 42924

    Regex for unknown number of lines - Assistance Request

    I am trying to parse a log file in C# using regular expressions and I ran into a challenge.  The section of the log file has a distinct beginning and a distinct end, but the number of lines in between might vary.

     

    Is there a way I can use regular expressions to pull the beginning of the section, the end of the section and everything in between regardless of the number of lines and or characters?

     

    Here is an example of what I am trying to parse out:

     

    ******* script runtime error *******
    type undefined is not a vector: (file 'k3/_laserguided.gsc', line 120)
     playfx(level.fx_guided_exp,trace["position"]); //,trace["normal"]); // SOMETIMES IT PLAYS WRONG DIR o_O
                                     *
    Error: started from:
    (file 'k3/_laserguided.gsc', line 91)
     wait 0.05;
     *
    Error: ************************************

     

    In some cases, there are more than one errors listed in the body after " ******* script runtime error *******" and before "Error: ************************************"

     

    Any help would be appreciated.

  •  06-05-2008, 11:58 AM 42925 in reply to 42924

    Re: Regex for unknown number of lines - Assistance Request

    If there is a common set of characters between each error, then we can separate the errors.  If not, you can still do something like this:

    { regex matching beginning of section } .*? { regex matching end of section }

     

  •  06-05-2008, 1:54 PM 42929 in reply to 42925

    Re: Regex for unknown number of lines - Assistance Request

    Hmm, I'm fine with getting the whole thing becuae it is surrounded b y other text that I don't need so if it pulls the whole thing like the example I posted, I would be happy with that. I don't need to have the errors within the body of the text separated.  The entire block will do nicely.  

    I tried:

    (\x2A{7}\x20script runtime error\x20\x2A{7}).*?(Error:\x20\x2A{36})

    and some variations on it but it doesn't seem to work. The first part pulls the first line just fine but it will not pick up the lines that follow. 

     

    Thank you for your assistance, 

  •  06-05-2008, 2:06 PM 42932 in reply to 42929

    Re: Regex for unknown number of lines - Assistance Request

    If you are running this regex on the whole file as a single string, and you want .*? to match everything across multiple lines, you need to turn the SingleLine option on.  Without it, the dot character will not match new line characters.

     For .NET, this is one of the available options within the option enumeration at the end of the matching function call.

  •  06-05-2008, 3:44 PM 42933 in reply to 42932

    Re: Regex for unknown number of lines - Assistance Request

    This is what I tried:

    This is what I tried:

    string strCurrentFormat = @"(?si)(\x2A{7}\x20script runtime error\x20\x2A{7}).*?(Error:\x20\x2A{36})";

    MatchCollection mc = Regex.Matches(file, strCurrentFormat);

    int X=1;

    ArrayList AllError = new ArrayList();

    foreach (Match m in mc)

    {

    textBox2.Text += ("(" + X + ") " + m.ToString() + Environment.NewLine) + Environment.NewLine;

    X++;

    }

     and it does appear to work!!  (Thank you SO much) but I'm not sure that this is quite what you were talking about.  I would be interested in know if this is the proper way of doing this of if you were referring to something different when you mention "For .NET, this is one of the available options within the option enumeration at the end of the matching function call."

    The single line is definitely the key.  I am very new to C# so please forgive my ignorance, but I really like to know the proper way of doing things.

  •  06-05-2008, 5:47 PM 42935 in reply to 42933

    Re: Regex for unknown number of lines - Assistance Request

    The (?si) is what I was referring to.  You used it within the regex itself.  You can also set options at the end of the match statement.  Something like: Regex.Matches(file, strCurrentFormat, options.SingleLine)   (from memory, so that may not be quite right).

     Either way works.

  •  06-06-2008, 1:54 AM 42940 in reply to 42935

    Re: Regex for unknown number of lines - Assistance Request

    Lyndar:

    The (?si) is what I was referring to.  You used it within the regex itself.  You can also set options at the end of the match statement.  Something like: Regex.Matches(file, strCurrentFormat, options.SingleLine)   (from memory, so that may not be quite right).

     Either way works.

    options.SingleLine would be RegexOptions.SingleLine . 


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  06-06-2008, 9:30 AM 42944 in reply to 42940

    Re: Regex for unknown number of lines - Assistance Request

    Thanks so much... you guys have been an incredible help.  I'm going to go put my bottle of Advil away now.  Big Smile
  •  06-07-2008, 9:01 PM 42979 in reply to 42924

    Re: Regex for unknown number of lines - Assistance Request

    I have a problem which seems similar to the one documented here, so I hope somebody reading this can help me too. I have a page of PHP code which works some of the time, but not always, so I'm trying to discover the bugs in it. It includes the line 

    preg_match_all('#<!-- start content -->(.*?)<!-- end content -->#es', $file, $ar);

    This is supposed to find the regular expression  

    '#<!-- start content -->(.*?)<!-- end content -->#es'

    in the file $file. I don't understand the # at the beginning of the regular expression, and the #es at the end of it. I've searched high and wide for documentation, but without success.

    Obviously we're looking for text which is bracketed between  <!-- start content --> and <!-- end content -->; and because it is multiple lines which need to be treated like a single line we need the singleline option to be turned on, as has already been pointed out in this thread. Does the initial # turn the singleline option on? Does the ending #es stand for "end singleline", and turn the option off? If so, why is this use of the # character apparently completely undocumented? If not, what does the code really mean and what am I missing?

    The purpose of the code is to scan though wikipedia-like wiki pages; to select just the main text from the page, discarding headers and footers, by returning the text between the  <!-- start content --> and the <!-- end content -->. With the # and the #es in place it works most of the time, but not always. With them removed it doesn't work at all. The # and #es are obviously crucial elements of the regex, but I don't fully understand why. Any thoughts? Thanks.

  •  06-08-2008, 9:31 AM 42980 in reply to 42979

    Re: Regex for unknown number of lines - Assistance Request

    It's OK, I think I've found the answer to my own question. The # signs bracket the actual regular expression, and the e and the s are modifiers - e is equivalent to PREG_REPLACE_EVAL and  s is equivalent to PCRE_DOTALL. Maybe these are things which are so familiar to the experienced players that they almost go without saying, but it's amazing how tricky it is to find such things out by searching through the PHP help file or by Googling. Queries like

    PHP preg_match_all pound

    returned nothing useful, at least not in the first few hits. I finally found a helpful page by Googling

    PHP pcre_dotall pound

    but I'd never have thought to search for such a combination of terms if I didn't almost have the answer already. 

  •  06-08-2008, 10:16 PM 42988 in reply to 42980

    Re: Regex for unknown number of lines - Assistance Request

    mtgradwell, it's best practice to start a new thread.

    the # in the expression is used as a delimeter, it's just used to bound the pattern and separate it from the modifiers e and s. 

    any character could be used for the delimiter, but if you use the delimiter character within the actual pattern as well you would escape the character within the pattern with a \ such as:

    preg_match('~joe\~user~',$src,$arr);

    it's easiest to avoid having to escape delimiter characters within the pattern by using a delimiter character other than those in the pattern, such as:

    preg_match('/joe~user/',$src,$arr);

     


View as RSS news feed in XML