Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Problem with non-matching prefix in VB.net

Last post 04-14-2006, 11:26 AM by mash. 8 replies.
Sort Posts: Previous Next
  •  04-12-2006, 3:49 PM 16408

    Problem with non-matching prefix in VB.net

    I have a regex:
    (?<!value=)(0-8311-)(?=\d{4}-\d|X)

    That is attempting to match ISBN values that generally look like 0-8311-dddd-d or 0-8311-dddd-X. What I need to do is to match just the 0-8311- portion of the string in a web page where I will enclose this substring in parenthesis.

    My regex above works fine, but there are case when I have the ISBN in an INPUT element that I don't want to match that ISBN (hence the non-matching prefix).

    The regex works fine when using any regex tool I have found, but when actually running, it doesn't exclude the ISBNs in the text boxes.

    The VB.Net code I use for performing the match looks like this.

    FilteredHtml = Regex.Replace(FilteredHtml, OutputFilter.LookFor, AddressOf ApplyFilter)

    The OutputFilter.LookFor property is the Regex identified above, and the ApplyFilter method just replaces the match (in this case 0-8311-) with "(0-8311-)

    I am not sure why this doesn't work correctly. Any thoughts?
  •  04-13-2006, 11:43 AM 16433 in reply to 16408

    Re: Problem with non-matching prefix in VB.net

    It does work correctly. It works exactly as you told it work.  Nothing is wrong the regex. It's your logic that's incorrect.

    I'll assume your regex got corrupted when you posted it and that you are doing a negative lookbehind (?<!) where you desired result is not preceeded by a value attribute. If this is the case, your logic error is the assumption that the value attribute is manditory to give an input variable a value.  It is not, it an optional way of defining a value, in most cases. While commonly used when defining radio buttons and check boxes, most web developers or IDEs aren't going to set the value of a text box in that manner.  It is only required to set the value of a self-closing tag syntax.  If the open tag - close tag syntax is used, which is typically the way textboxes are commonly coded,  the value is simply the text between those tags.  The attribure in that case doesn't need be included.  If you look at you HTML source code I doubt if you'll find a value attribute on any of your textboxes, but your value between an open and close INPUT tag

    Michael


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  04-13-2006, 3:35 PM 16449 in reply to 16433

    Re: Problem with non-matching prefix in VB.net

    Here is the HTML (obtained using Instant Source) for the textbox in question &lt;INPUT class=PortalStore_NormalTextBox id=dnn_ctr432_DNNDispatch_ctlMain__ctl0__ctl4_ctlPackageAddEdit_subPackageDetail_txtSKU_UPC style="WIDTH: 200px" size=28 value=(0-8311-)3169-1 name=dnn:ctr432:DNNDispatch:ctlMain:_ctl0:_ctl4:ctlPackageAddEdit:subPackageDetail:txtSKU_UPC&gt;
    (html entities munged to get past the filters)

    In this case you can see that the 0-8311- was matched and the replacement applied, even though it clearly should not because of the negative look behind. I have tested this regex in several regex tools and it never matches in this case, and yet there it is.

    The only thing I can think is that the match evaluator may be in some weird mode where the negative lookbehind is ignored. I have debugged and it always matches in this case. That is why I am stumped.
  •  04-13-2006, 6:16 PM 16456 in reply to 16449

    Re: Problem with non-matching prefix in VB.net

    try this code in C#.NET. it does work the way u want it : does not match

    0-8311-3169-1

    when it's preceded by 'value='

    using System;

    using System.Text.RegularExpressions;

    public class Regex_Match

    {

    public static void Main()

    {

    //Scan an input string for matches, print them

    int count = 0; //match count

    //declare an input string of text

    string InputString = @"INPUT class=PortalStore_NormalTextBox id=dnn_ctr432_DNNDispatch_ctlMain__ctl0__ctl4_ctlPackageAddEdit_subPackageDetail_txtSKU_UPC style=WIDTH: 200px size=28 value=(0-8311-3169-1 name=dnn:ctr432:DNNDispatch:ctlMain:_ctl0:_ctl4:ctlPackageAddEdit:subPackageDetail:txtSKU_UPC&gt; ";

    //declare a regex pattern

    // string pattern = @"\(0-8311-\d{4}-[\dx]";                                        //ver1 MATCH

    string pattern = @"(?<!value=)\(0-8311-\d{4}-[\dx]";                        //ver2 NO MATCH

    //declare new regex object

    Regex r = new Regex(pattern, RegexOptions.IgnoreCase| RegexOptions.Singleline );

    //declare a match object

    MatchCollection mc;

    //match all the words in the input sentence using Match method

    mc = r.Matches(InputString);

    //print the matches

    foreach(Match m in mc)

    { count++;

    Console.WriteLine("the match #" + count.ToString() +" is: " + m.Value);

    }

    //if no matches

    if (count == 0) {Console.WriteLine("NO matches found");}

    Console.WriteLine("\nPress Enter to Exit");

    Console.ReadLine();

    }

    }

  •  04-13-2006, 10:58 PM 16467 in reply to 16449

    Re: Problem with non-matching prefix in VB.net

     jbrinkman wrote:
    Here is the HTML (obtained using Instant Source) for the textbox in question <INPUT class=PortalStore_NormalTextBox id=dnn_ctr432_DNNDispatch_ctlMain__ctl0__ctl4_ctlPackageAddEdit_subPackageDetail_txtSKU_UPC style="WIDTH: 200px" size=28 value=(0-8311-)3169-1 name=dnn:ctr432:DNNDispatch:ctlMain:_ctl0:_ctl4:ctlPackageAddEdit:subPackageDetail:txtSKU_UPC> (html entities munged to get past the filters) In this case you can see that the 0-8311- was matched and the replacement applied, even though it clearly should not because of the negative look behind. I have tested this regex in several regex tools and it never matches in this case, and yet there it is. The only thing I can think is that the match evaluator may be in some weird mode where the negative lookbehind is ignored. I have debugged and it always matches in this case. That is why I am stumped.


    No the regex that you tried to post would and should match that HTML if you typed it as is.
    You regex say match 0-8311 if it is not proceeded by "value=".  In you HTML 0-8311 is proceeded by "value=(", the open parethesis would prevent the look-behind from matching.
    The parethesis in your  regex are grouping constructs and not part of the data to be matched.

    Michael

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  04-14-2006, 11:11 AM 16487 in reply to 16467

    Re: Problem with non-matching prefix in VB.net

    As I tried to indicate (apparently not very successfully) the parens in my previous post were caused by the my web filter triggering the regex when it shouldn't have. I am trying to prevent those parens from being inserted in this case.

    You previously indicated that INPUT might not put the data in the value attrib, but clearly in this HTML it is coming through in the value attribute.

    So I am back to the same problem with my regex not working, although everyone seems to agree that the regex string itself is ok.

    Now that I am pretty confident that the regex seems to be correct, I can go back to the .Net community forums and ask why this particular API does not seem to be behaving correctly.
  •  04-14-2006, 11:17 AM 16489 in reply to 16487

    Re: Problem with non-matching prefix in VB.net

    your regex in the form

    (?<!value=)(0-8311-)(?=\d{4}-\d|X)

    isnot 100 % valid, i'd rather use (escape parens)

    (?<!value=)\(0-8311-\)(?=\d{4}-[\dx])

  •  04-14-2006, 11:18 AM 16490 in reply to 16489

    Re: Problem with non-matching prefix in VB.net

    the above regex is based on the input u listed: (c bold)

    Here is the HTML (obtained using Instant Source) for the textbox in question &lt;INPUT class=PortalStore_NormalTextBox id=dnn_ctr432_DNNDispatch_ctlMain__ctl0__ctl4_ctlPackageAddEdit_subPackageDetail_txtSKU_UPC style="WIDTH: 200px" size=28 value=(0-8311-)3169-1 name=dnn:ctr432:DNNDispatch:ctlMain:_ctl0:_ctl4:ctlPackageAddEdit:subPackageDetail:txtSKU_UPC&gt;
    (html entities munged to get past the filters)

  •  04-14-2006, 11:26 AM 16491 in reply to 16487

    Re: Problem with non-matching prefix in VB.net

     jbrinkman wrote:
    As I tried to indicate (apparently not very successfully) the parens in my previous post were caused by the my web filter triggering the regex when it shouldn't have. I am trying to prevent those parens from being inserted in this case. You previously indicated that INPUT might not put the data in the value attrib, but clearly in this HTML it is coming through in the value attribute. So I am back to the same problem with my regex not working, although everyone seems to agree that the regex string itself is ok.


    Ok, it would be more helpful if provided your input to determine if your regex is working,  rather than your output after you've applied changes.

    Michael


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
View as RSS news feed in XML