Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

IRegex

Roy Osherove's (?<thoughts>.*)

  • The Regulator now hosted at Regex.osherove.com

    FYI:

    The Regulator will be hosted from now on at http://regex.osherove.com/ 

    No longer will the fact that my home machine is down mean that The Regulator's site will be offline.

    Phew.

    And for the record, I'm working on a new version.

    Sponsor
  • Q&A - Greedy matching in regular expressions

    This came in the mail, thought other folks might be interested.

    Hi Roy. I need to check a line of html and make the value of the style attribute lowercase.  I've tried to come up with a regex that will work but I keep making the entire line of html lowercase instead of just the stuff in the style value.  I can't get the match to end with the correct quote, instead it goes to the last quote on the line.  So something like this:

    [Tag style="WIDTH:20px; color:blue;" href="blah.com/PageTWO"] I want to change to this:
    [Tag style="width:20px; color:blue;" href="blah.com/PageTWO"]

    But instead I get this:
    [Tag style="width:20px; color:blue;" href="blah.com/pagetwo"]

    Because the match ends with the end quote of the href.


    If you can point me in the right direction (or having something like this laying around), I would GREATLY appreciate it.

    Answer:

    It's called "greedy matching" - because it looks for the *last* character.
    Try to add a "?" after the quanitiy specifier (probably '*'). That makes the match end on the *first* match.

    For example, given the following string as input:
    "abcdfgdrbdtargd"
    The following greedy regex (greedy by default) will match up until the lasd 'd':
    (.*d)

    However, this regex will find several matches, the first one is "abcd":
    (.*?d)
    (you can do without the braces if you want).

    I'd also suggest adding two good regex mailing list to your arsenal instead of sending help messages to various people:
    http://groups.yahoo.com/group/dotnetregex/
    http://lists.aspadvice.com/SignUp/list.aspx?l=68&c=16

    There are people there that know a whole lot more than me on regular expressions.

     

    Sponsor
  • Full source for The Regulator released

    I finally got around to release the full source code for The Regulator. Up until now the source code was partial and included mostly just the plugins for it. Now you get to see what a sloppy coder I really was when I dreamed up this thing, and the stages I took from making it from a simple GUI  to a pluggable architecture of GUI and logical plugins.
    There's lots of code there that I would do differently today. for example, the whole threading thing in the GUI I would change to use the BackgroudWorker that Juval Lowy wrote. Many more things like that.
    In any case, have fun mocking me :)
     
    Note that I use a lot of 3rd party libraries such as ComponentOne's Toolbars and Syncfusion's Edit control. You'll need those to be able to compile. (C1's components were free as part of the VB resource kit you can download. SyncFusion donated their control. it's not free.)
    I'm trying to find time to make a new version. Hope that'll happen soon.
     
    Why did I decide to release it now? mainly because I got an angry email from one of the SourceForge.Net admins asking whether what a complaint from an end-user to the site is true; that  the project was released without source code. I explained that it comes with *partial* source, but if that's what's going to make them happy, I'll add the source now. Since I was planning to anyway, Just didn't feel very confident about the way the code looks, this wasn't an issue.  In fact, I welcome this catalyst as an opportunity to give more to the community. Sorry it took so long in the first place.
    Sponsor
  • Using Regex to return the first N words in a string

    Jeff Perrin needed a function to return the first N words in a string (to create a small summary or a snippet thingy). He did it using the manual and awkward method of parsing the string manually. That method is more error prone and usually makes for less readable code. Fortunately, you can use regular expressions here quite nicely. Here's a test that makes sure that we get the first 4 words in a string and the function "FindFirstWords" that does this very easily using a simple regular expression.

    What I'm doing here is that I'm using the expression to find the first 4 occurrences of text that is composed of alphanumeric text with one or more spaces after it. Then I simply iterate over the match I found. The match should contain 4 captures inside it - one for each "word" that was found.

    It's not fully tested as you can see. I only wrote one test to see it works on this sort of sentence. More tests could and should be added to test other cases. In fact, if this were reall TDD, I would have started with a test of an empty string, and continued on to test getting only one word, and then two and so on.

    [Test]

    public void TestRegexFindFirstNWords()

    {

          const string INPUT =

    "this is word four five six seven eight nine ten eleven twelve thirteen!";

          const int NUM_WORDS_TO_RETURN = 4;

     

          string output = FindFirstWords (INPUT, NUM_WORDS_TO_RETURN);

     

          string expectedOutput = "this is word four ";

          Assert.AreEqual(expectedOutput,output);

    }

     

    private string FindFirstWords (string input, int howManyToFind)

    {

          string REGEX = @"([\w]+\s+){" + howManyToFind + "}";

          StringBuilder output = new StringBuilder();

          foreach (Capture capture in Regex.Match(input,REGEX).Captures)

          {

                output.Append( capture.Value) ;

          }

          return output.ToString();

    }

    Sponsor
  • [OT] Some free time coming soon

    I'm finally gonna get some more free time in the near future, so I'll be starting work on a newer version of The Regulator. Stay tuned for more. I'm listening for any cool feature requests BTW.
    Sponsor
  • The Regulator 2.0.3 - *finally* with a help file!

    It's finally done. I took the time and made a Help file for The Regulator. It's now bundled along with the new version up on SourceForge - 2.0.3. Go get it.  You can also download just the Help file as a separate zip file is you really want to.
    The help file also lists some of my Regex articles and other resources. Do you have an article you want to contribute to the Regulator Help file? drop me a line and I'll see if the article fits in content-wise. If so, I'll put it in the next version along with a nice credit in the help file. How's that?
     
    Mind you - the help integration was the only change in this version. SO if you have version 2.02 the you already have the same functionality - but non of the cool help file tips & tricks! (including keyboard shortcuts...) 
    :)
    Sponsor
  • Recommended C# Bloggers

    If you've read my blog for a while you'd know that I'm not much for a popularity contest. however, I am most honored to be listed along with these great people in the same place, in what Eric Gunnerson posted as a list of Recommended C# bloggers.
    At least half of these folks are inside my aggregator, and I 3 names here are not familier to me. Looks like I need to add some subscriptions! I can attest to those that I know that they certainly deserve to be listed as a good source of knowledge.
     
    The weird thing is that my Regex blog is the one listed over there, and not my primary weblog. Quite weird,  because I have less than 20 posts on this blog. Still, this is very cool. I know I have'nt been posting much tech stuff in the past few weeks but I promise to get back to that as soon as I have something worth saying.
    Sponsor
  • New version for Regulator

    Yeah. I haven't been a good blogger and have not been paying much attention to this blog. It's pretty hard keeping up with two blogs. Harder than I though. I'll try a little more and see where it takes me.
    The big news about The Regulator is that :
    1) it moved to SourceForge
    2) A new version has been released that includes Intellisense, among other things.
     
    Sponsor
  • Matching a value inside XML tags with an attribute

    I was asked how to parse for the value inside an XML tag, containing an attribute of some kind. Here's a simple way:
    Say you have this as an input:
     
    <Customer VIP=“false“>Danny Glover</Customer>
    <Customer VIP=“true“>John Roberts</Customer>
    <Customer VIP=“false“>Eva Gardner</Customer>
     
    The value we want eventually is bolded.
    it's fairly easy to set up an expression like this. :
    Fire up The Regulator and put in the input text in the “input“ field.
    Then start out writing an expression to matche any XML element:
    In Regex, each letter usually matches itself, so if we wanted to match the text “roy“ in an input, we could simply write “roy“ in the regex and it would match. But some chars are “special“ chars, part of the regex syntax, so like in any good language, we need to “escape“ them using a backslash. The “<“ and “>“ are a good example of this. to match “<“ we need to write “\<“. THe same for the “>“ char. So, our expression could start out like this:
    • \<Customer\sVIP=“.*“\>
    Some explanation:
    • \s = one space or a tab
    • .* = actually two things: “.“ means “anything“, and “*“ means “0 or more“
    • SO, what I'm actually writing here is: “Match anything that starts with “<Customer VIP=“[anything at any length]“>
    Pressing F5 and running this would match all the XML elements's starting node (you can see the results in the match tree of the regulator.
    Now we are in the important part: Whatever comes after this match and before the closing node in the actual text we want. We don't really care what it is as long as we can access this information easily. Since we want to match anything at any legth inside these tags, we can use “.*“ in there. But, if we want to get back the value in those tags easily in code, we need to put them in a “group“. A group is a logical construct that we wrap around parts of our matched text, so that we can later refer to it in code using the Regex object model exposed by the .Net framework.
    To wrap something in a group we simply put round braces on it:
    • \<Customer\sVIP=“.*“\>(.*)\</Customer\>
    explanation:
    • The red part signifies the group I've just created.
    • The expressoin after the group simply matches the closing tag of the customer node/
    • To get to the “meant“, the group we created, our code might look like this (if we were parsing one line at a time):
    string pattern =“\<Customer\sVIP=“.*“\>(.*)\</Customer\>“;
    string input=“<Customer VIP=“true“>John Roberts</Customer>“;
    Match match = Regex.Match(input,pattern);
    if(match.Success)
    {
        //valueInside should now contain the value “John Roberts“
        string valueInside = match.Groups[0].Value;
    }
     
    It's that easy. If we really wanted to, we could get the group by name. To give the group a name in the expression, put inside the round bracesm before the expression in them, this text: “?<MyGroupName>”:
    • \<Customer\sVIP=“.*“\>(?<InnerText>.*)\</Customer\>
    Now we can access the group like so:
        string valueInside = match.Groups[“InnerText“].Value;
     
    Can you see the power in this? I can now divide my text up to as many groups as I want and simply refer to each part I want by name, without having to parse a single string in the conventionaly methods. Groups can even nest inside other groups, but you still get them all by name in just the same way.
    Sponsor
  • Running The Regulator on .Net 1.0

    Up until now, The Regulator could only run on .Net 1.1. Now you can run it against .Net 1.0 using this config file:
    just put it in along with the regulator.exe file and run.
    I took the config settings from Jeff Key's Snippet Compiler config file, since he was having the same problem with his users. So, thanks Jeff.
     
     
    The config file actually redirects all the main .Net assemblies in v1.1 to word on the v1.0 Assemblies. Pretty cool.
    If you want to learn more about how this works, you can check out this cool article from early & adopter about running the two frameworks side by side, and redirecting between them:
     
    It's really quite a funny article, but lots of great info. Recommended.
    Sponsor
  • The Regulator featured on Regexlib.com

    The Regulator, my Regex testing tool, has been named by Regexlib.com as their “recommended tool for testing regular expressions” and has been put up there in the front page. They also put up a resource page for many other regex testing tools, books, articles and links.  I'm totally flattered.
    Regexlib.com is a website dedicated to collecting and sharing regular expressions to parse almost anything. Anyone can register and post their own regular expressions that they've found interesting and helpful. You can too.
     
    The Regulator has tight integration with Regexlib.com's web service, which allows the user to search for regular expressions in regexlib.com's database from within the tool, and with a click import them and test them inside the program.
    It also features a “publish“ wizard which lets you post regular expressions onto regexlib.com directly from within The Regulator.Did I mention it's totally free?
    Sponsor
  • The Regulator Wiki

    I've decided to create a Wiki site for The Regulator. Since I can't find the time to create all the documentation myself, I thought if I make this a public thing then people who use this tool can put their knowledge on this Wiki for others to benefit from.

    Right now there's not much to see there (I just put it up a few days ago and have not really touched it) BTW I'm hoping with time we'll get it into a workable stage.

    The Regulator Wiki

    Sponsor
  • free sample chapter on Regular Expressions

    The long-awaited 'phantom' C# Cookbook from OReilly will be available in Jan '04. It has a sample chapter on 'Regular Expressions'  available for download.. something to read during the cold Norwegian evenings ;-)

    [via SBC]

    Sponsor
  • Makes me feel great

    It's posts like this one that make me really happy I took the time and created The Regulator. I promise to keep working on it and make it even better.
    Sponsor
  • The Regulator version 1.02 beta 2 is out

    get it here.

    What's new :

    ---------------
    -added: split functionality. You have a new tab in the output tabs pane (bottom left) called "splits".
       pressing F7 will call Regex.Split(input text) using the specified options and show them in the new tab
       showing index and text columns for the results array.
    -added: Split icon(F7) in main toolbar
    -added: Load input from a file on disk.
       In the input text area there is a new toolbar that allows you to select an input file from which to load the text.
       You can refresh the text from the file using the new "refresh" button in that toolbar.
    -added: "Window" menu to navigate between open documents (known bug: empty menu item always exists there)
    -added: Shortcut keys to set focus on each document part:
       CTRL+1: Regex text
       CTRL+2: input text
       CTRL+3: replace text
       CTRL+4: matches tree
       CTRL+5: replace output
       CTRL+6: split output


    -fixed: snippets.xml was readonly in the distro. It should not be or you won't be able to save your snippets from the "snippets" pane.
    -fixed: toolbar is now disabled when no document is open
    -fixed: you could open the same file multiple times.
       Now you'll get an alert and the existing document will be activated.

    -modified: Made input text and replace text fonts bigger.
    -modifed: changed form font to Verdana for easier reading
    -modifed: changed replace,match and cancel icons for easier understanding

    -removed: removed references for magic library from the project

    Sponsor
More Posts Next page »

This Blog

Syndication

Tags

No tags have been created or used yet.

News



MSN: RoyOsherove@hotmail.com


Get Feedable Today!
Get serious about regular expressions with this free tool
I Recommend:
More recommendations
What is RSS?

Amazon Associate <!-- Begin Nedstat Basic code --> <!-- Title: IRegex --> <!-- URL: http://regexblogs.com/royo/ --> <script language="JavaScript" type="text/javascript" src="http://m1.nedstatbasic.net/basic.js"> </script> <script language="JavaScript" type="text/javascript" > <!-- nedstatbasic("ACiUkwy89+aed+tCOH1yf7rmzOpw", 0); // --> </script> <noscript> Nedstat Basic - Free web site statistics </noscript> <!-- End Nedstat Basic code -->