Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Syntax Help

Last post 09-26-2005, 11:45 AM by SlideGuitar. 6 replies.
Sort Posts: Previous Next
  •  09-23-2005, 1:45 PM 12862

    Syntax Help

    I'm writing an ASP.NET user control (using the System.Text.RegularExpressions namespace) that will allow users to use limited text formatting capabilities - bold, italic, underline, 8pt -14pt font sizes, etc. To do this, I am using bb-style formatting code (Beer [B] for bold, Idea [I] for italic, etc).

    I'm having problems writing an expression to handle the font size and font color. The code I am using would look like:
    [font color="black"]text[/font] or [font size="10"]text[/font]

    What I've gotten so far is this:
    (\[font ((size)="(8| 10| 12| 14)"|(color)="(black|gray|silver)")\])([\S\s]*)(\[\/font\])

    It works great on a single instance of one of those strings, like this:
    [font color="black"]text[/font]

    And I get these match groups, which is correct:
    [font color="black"]text text text[/font]
    [font color="black"]
    color="black"


    color
    black
    text text text
    [/font]

    But if there's more than one, like this:
    [font color="black"]text[/font][font color="silver"]text[/font]

    Then the results get all screwed up:
    [font color="black"]text text text[/font][font color="silver"]text text text[/font]
    [font color="black"]
    color="black"


    color
    black
    text text text[/font][font color="silver"]text text text
    [/font]

    Also, I would REALLY like to slim down the number of groups I get.

    Currently, I think i have a group each for:

    match
    start tag
    size
    size value
    color
    color value
    inner text
    end tag

    I would like to slim it down to:
    match
    start tag
    attribute
    value
    inner text
    end tag

    However, it shouldn't return a match of the attribute is set to an invalid value. For example, "black" and "10" are both valid values, but a match should only be returned if "size" is set to "10", and no match if "size" is set to "blacK".

    Is this possible? Thanks!

  •  09-25-2005, 8:36 PM 12877 in reply to 12862

    Re: Syntax Help

    You can skip over matches by preceding them with ?:, i.e., "Match this, but don't capture it, since I have no use for it."


    "Some day, and that day may never come, I will call upon you to do a service for me." — Don Vito Corleone
  •  09-25-2005, 8:37 PM 12878 in reply to 12877

    Re: Syntax Help

    Sorry if that was inadequte: you can preced the entire regex with ?:, because you're not interested in the match as a whole. However, any subexpressions (enclosed in parens, i.e.) will still be captured.
    "Some day, and that day may never come, I will call upon you to do a service for me." — Don Vito Corleone
  •  09-25-2005, 8:38 PM 12879 in reply to 12878

    Re: Syntax Help

    I should have also pointed out that the beginning and end of the tag are part of the regex, but you probably don't want to capture them, so you'd precede them as well with ?:.


    "Some day, and that day may never come, I will call upon you to do a service for me." — Don Vito Corleone
  •  09-26-2005, 12:53 AM 12884 in reply to 12879

    Re: Syntax Help

    I don't see any way of doing this without ruining the entire match: However, it shouldn't return a match of the attribute is set to an invalid value. For example, "black" and "10" are both valid values, but a match should only be returned if "size" is set to "10", and no match if "size" is set to "blacK". I'm not going to claim there's no way to do it, but it seems much easier to validate the attribute values after you've extracted them. If you don't like them, toss them out.
    "Some day, and that day may never come, I will call upon you to do a service for me." — Don Vito Corleone
  •  09-26-2005, 1:44 AM 12886 in reply to 12884

    Re: Syntax Help

    After playing with this for about an hour, I don't see any way to do this in one fell swoop. You're really trying to do three things: find blocks of text surrounded by tags, split up the attributes within the tags and validate the attribute values, and then, finally, apply them to the text. This:

    (?:\[font\s*)(.*?)\](.*?)(?:\[\/font\])

    ...will find every such block. It won't capture the beginning and end tags (because of the ?:), but will capture the text (the boldfaced group). You should then split the captured attribute list on \s*, then split each name/value pair on =. Throw out the values you don't like. If you try to use the regex to validate values, you'll likely lose valid values for some other attribute within the same tag, or lose valid attributes because some other attribute name is wrong (or not backward-compatible). I just wouldn't use a regex to match, parse, and validate all at once.


    "Some day, and that day may never come, I will call upon you to do a service for me." — Don Vito Corleone
  •  09-26-2005, 11:45 AM 12891 in reply to 12886

    Re: Syntax Help

    I'm going to assume, though you haven't said it, that a tag may not include more than one attribute. If it may, then I wouldn't recommend trying to force a regex to parse this syntax. However, if the syntax is so simple that it does in fact fall into a simple pattern (i.e., a font tag with one name-value pair inside), then this regex will extract the name-value pair and the text (.NET flava):

    (?:\[font\s*)(?:(?<attr>color|size)="(?<value>[^"]*)")(?<text>.*?)(?:\[\/font\])

    How are you going to handle nested tags? Do you want the regex to be case-insensitive? How are you going to handle badly-formed tags (my previous suggestion addressed this problem: you'll extract the tag, then throw it out because the stuff inside is badly formed. The current suggestion will leave badly-formed tags as part of the text!).


    "Some day, and that day may never come, I will call upon you to do a service for me." — Don Vito Corleone
View as RSS news feed in XML