Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

How to find sentences?

Last post 09-21-2006, 11:50 AM by mash. 2 replies.
Sort Posts: Previous Next
  •  09-21-2006, 10:56 AM 22711

    How to find sentences?

    Dear all, I got a task to learn and use regular expression to split a long string into sentences and save them. 

    The sentence can be broken by stop mark".", question mark "?" and exclamation mark "!" (In this moment, I needn't to care about enter mark )

    I am using vb.net to write a expression  "([^\.\?\!]*)[\.\?\!]", and did some test.

    The input string for testing is "I love icecream. The icecream is £5.99? I found it in www.yahoo.com! "

    The result is I got 6 sentences given below:

    sentence 1,---------I love icecream.
    sentence 2,---------The icecream is £5.
    sentence 3,---------99?
    sentence 4,---------I found it in www.
    sentence 5,---------yahoo.
    sentence 6,---------com!

    The problem is it read period mark even in the middle of the sentence. Then I changed my expression to "([^\.\?\!]*)[\.\?\!]\s",  I try to split the sentence with stop mark and space together.

     The result is I got 3 sentences( which is correct) given below:

    sentence 1,---------I love icecream.

    sentence 2,---------99? 

    sentence 3,---------com! 

    Some of them are not a sentence.

     Is there anyone can help me?

     Thanks

  •  09-21-2006, 11:23 AM 22713 in reply to 22711

    Re: How to find sentences?

    hi, Jane

    the logic u could ue for splitting: *find white spaces PRECEDED by [excl. sign OR period OR Question mark]; split the input sentence by those white spaces*

    the regex for split:

    (?<=[.!?])\x20+

    VB Code for regex obj:

    Imports System.Text.RegularExpressions

    Dim regex = New Regex( _
        "(?<=[.!?])\x20+", _
        RegexOptions.IgnoreCase _
        Or RegexOptions.Singleline _
        )

    when run vs input:

    I love icecream. The icecream is £5.99? I found it in www.yahoo.com! I love even more icecream.

    it'd split it according to your rules. 

     

    //hope it helps

  •  09-21-2006, 11:50 AM 22719 in reply to 22713

    Re: How to find sentences?

    Unless your sentence breaks follow a hard and fast format, such as two spaces after ending puncutations you'll never have a perfect solution.

    The best you'll be able to do is an "in most cases" regex solution.

    Quoted text, abbrevations and initials will cause problems.  For example  She said "How are you?"  I said "Fine." Then Mr. J. Smith walked in.

     If the formatting is consistant you should be able to modify Sergei's regex.

    Michael 


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
View as RSS news feed in XML