Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

table parsing

Last post 04-28-2008, 11:33 PM by BubikolRamios. 2 replies.
Sort Posts: Previous Next
  •  04-23-2008, 9:01 PM 41604

    table parsing

    hi every one .. I want to parse product name , price  from source kod of web site by using C#.   Example in that code I want to take  PANASONIC BB-HCM515CE CCD MEGAPIKSEL POE VIDEO, 838,8 € + KDV,2.081,52 YTL,Havaleye %3 İndirim: with 2.019,07 YTL. But that code is not just between the <table> </table> , code is in source code of website.. thx a lot ..  

     

    <table border="0" width="200" height="95" cellspacing="0" cellpadding="0">
     <tr>
      <td width="200" height=30 colspan="2" class=urunadi align=center><a class=urunadi href="detay.asp?kod=463250004">
      PANASONIC BB-HCM515CE CCD MEGAPIKSEL POE VIDEO</a></td>
     </tr>
     <tr>
      <td width="97" height=90 align=center rowspan="2"><a href="detay.asp?kod=463250004"><img border="0" src="rsm.asp?p=463250004&w=90&h=90" OnError="this.src='urunresimleri/TN_fotoyok.jpg'"></a></td>
      <td width="100" height=18 class=koyukirmizi align=right valign=bottom></td>
     </tr>
     <tr>
      <td width="100" height=80 class=koyukirmizi align=center><a href="BLOCKED SCRIPT;">
      <img border="0" src="resimler/sepetekle.gif" onclick="parent.sepet.location.href='minisepet.asp?durum=sepete_ekle&kod=463250004'" align="right"></a></td>
     </tr>
     
     <tr>
      <td width="200" height=5 colspan="2" class=fiyat1 valign=top>Fiyat : <a class=fiyat2>838,8 € + KDV</a></td>
     </tr>
     <tr>
      <td width="200" height=5 colspan="2" class=fiyat1>KDV Dahil : <a class=fiyat2>2.081,52 YTL</a> </td>
     </tr>
     <tr>
      <td width="200" height=5 colspan="2" class=fiyat1>Havaleye %3 İndirim: <a class="koyukirmizi">2.019,07 YTL</a></td>
     </tr>
     
    </table>

  •  04-28-2008, 10:38 PM 41711 in reply to 41604

    Re: table parsing

    I would be very tempted to use the HTML or XML DOM to access the information. Depending on how you want to extract the text or what else is in the original source text, you could find all <table> tags, then within that find all <td> tags and look for the 'innerText' value.

    If you really need a regex solution, you could try

    <[^>]*>|\s+|([^<]*)

    which will find all text that is not inside a tag. You will actually get a lot of matches but if you look at match group #1, any non-null value will be the text you are after. If you need to limit the search to obnly be within the <table> tags, then use

    <table[^>]*>(<[^>]*>|\s+|([^<]*))+</table>

    and look at match group #2. Again, depending on the actual structure of the source text, you may need to ignore any null text matches in this group, but there should only be a few.

    Susan 

  •  04-28-2008, 11:33 PM 41715 in reply to 41711

    Re: table parsing

    regex comes in at the end of job, and that is mybe. Google for org 'apache html parser', it is in java, c systax though, you will get the idea.
View as RSS news feed in XML