Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Regex and Html-- please help!

Last post 07-10-2008, 7:14 PM by Aussie Susan. 4 replies.
Sort Posts: Previous Next
  •  07-08-2008, 6:21 PM 43915

    Regex and Html-- please help!

    Hi, I'm very new at regular expressions and very confused on how to use them.

    Right now I'm using Actionscript and what I've done so far is gotten the source of a page to come back to me in the form of a string. In this source there's a table (in the html language), which I sort of need to use to create a table of my own... I mean, with those numbers in that table. I don't know if this is making sense or not. So I used a regular expression to take out everything between the two <TABLE> tags. I basically used this to do that : var pattern:RegExp = /<TABLE>(.*?)<\/TABLE>/;

    But now, I'm trying to create a 2-d array and put the columns and rows from the html table into that. So does anyone know how I can do that? What code to use?

    Otherwise, does anyone know how to use code to search through a string to find how many of a particular word there are (namely the <TR> tag)? Would I use regular expressions, and how?

     Please let me know if I need to explain more clearly... thanks!
     

     

     

     

  •  07-08-2008, 7:43 PM 43919 in reply to 43915

    Re: Regex and Html-- please help!

    Terry,

    I have done similar things with HTML pages but it is MUCH simpler to use the HTML DOM. You simply locate the table, get a collection of all rows (search for 'tr') - that's one dimension - and then for each row, get a collection of all columns (search for 'td') - that's the second dimension. All of this takes about 10-15 lines of code, depending on your programming language.

    If you really do need to use a regex, then you may need to do it in 3 passes:
    - locate the text between the  "<table>" and "</table>" tags - in much the way you have done
    - rescan the text and locate the text between each of the "<tr>" and "</tr>" tags - use the same approach as for the table
    - rescan that text and locate the text between each of the  "<td>" and "</td>" tags

    The first pass will return a single match (if there is only one table - if not, you will get multiple matches and you will need to select which one is the table you are wanting). Each of the other two scans will return multiple matches, one match for each row or column.

    Susan 

  •  07-10-2008, 1:03 PM 43996 in reply to 43919

    Re: Regex and Html-- please help!

    Thanks, Susan.

    I think I sort of understand what you're saying...

    But what do you mean by rescan?

    And also, what do I do after I have the text between the row and column tags?
     

  •  07-10-2008, 4:47 PM 44004 in reply to 43996

    Re: Regex and Html-- please help!

    Also, I looked into the HTML DOM thing you were talking about, and it does seem easier.

    And there's this one line that seems to make sense for what I want:

    node.getElementsByTagName("tagname");
    I'm using ActionScript (I don't know if you have any background in that), so I typed this in: 

    var table1:String = str.getElementsByTagName("TABLE");

    But the line about shows an error for me. Now, am I not supposed to be searching through a string? Is that where I'm going wrong?

    Or is the place that I'm trying to put the information that's between the table tags wrong? (I'm trying to put it into a string)

    Thank you!

    T
     

  •  07-10-2008, 7:14 PM 44007 in reply to 44004

    Re: Regex and Html-- please help!

    Terry,

    By 'rescan' I mean that you will need to take the text from the match(es) you obtained previously and then use them as the text for a (nested?) call to the regex with a different pattern.

    The use of the DOM is really off topic for this forum s I have sent you a 'personal message'. You should find it in the inbox for your account.

    Susan 

View as RSS news feed in XML