Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

[VBS] Names regex needs tweaking

Last post 06-28-2008, 1:19 AM by Aussie Susan. 3 replies.
Sort Posts: Previous Next
  •  06-26-2008, 2:20 PM 43510

    [VBS] Names regex needs tweaking

    Hi Everyone,

     I'm new to this whole regex thing, and throughly confused. I need a regex that can pick out proper names from a document, which translates into a regex looking for 2 or more consequitive words that begin with a capital letter. Ideally this would include titles (Dr., Mr. etc) too. The names will reliably be in the proper format, the purpose of the script is just to count the occurances of different names.

     

    What i have so far is:

    [A-Z]+(([\'\,\.\- ][A-Z ])?[a-zA-Z]*)*

    This works but it also picks up on single capitalized words, so any word at the beginning of a sentence will give a match. For example, the string "Hi, my name is John Smith" will return a match for "Hi" and "John Smith"

    I feel like i'm really close with this, so needless to say its drive me nuts

     please save me from this regex!!!

     -swish

     

    Filed under: , ,
  •  06-26-2008, 3:53 PM 43512 in reply to 43510

    Re: [VBS] Names regex needs tweaking

    Parsing out people's names is virtually impossible, because there are no hard and fast rules as the what someone can be name. You criteria is a bit overly simplistic but if you real data is really that simple your can try

    [A-Z][a-z]+(\x20[A-Z][a-z]+)+

    Which will work against your sample but would fail quite easily against many real names.


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  06-27-2008, 9:26 AM 43536 in reply to 43512

    Re: [VBS] Names regex needs tweaking

    Thanks Mash!

     Any suggestions on making this slightly more robust? Essentially i'm using it to count the occurrences of a doctor’s name in a document, so this will vary between “Dr. Bob”, “Dr. Bob Thomson”, “Dr. Thomson” and “Bob Thomson” for example. I’m using the scripting dictionary in vbscript to store and count the results which are then displayed to the user to ensure right number of occurrences.


    So ideally I’m looking for a regex that matches a string of consecutive words beginning with capital letters, including Dr. or Mr.

     

    Thanks!

  •  06-28-2008, 1:19 AM 43558 in reply to 43536

    Re: [VBS] Names regex needs tweaking

    I don't mean to be a pain, but how are you going to handle "Mr Paul van der Mere" or "Dr. deBuin"

    Susan 

View as RSS news feed in XML