My script basically parses a game website for information about players in clans. Although not pretty, my pattern generally works, but sometimes there is an inconsistency in the code I am matching against.
$pattern = '|<span[^>]*><li>\s*([^:]+):\s*</span><a[^>]*>([a-zA-Z]+)\s*\((.+?)\)\s*"([^"]*)"</a><img\s+.*?alt\s*=\s*"([^"]*)"[^>]*>( )*|si';
The following is a sample of what I am matching, and which parts I am trying to pull. In line 4, there is no match4, so it skips the next line. This causes that match row to be half of line 4 and the other half line 5. I would either like to to pull match4, or null, but I haven't found anything that can help me do this. Any help would be greatly appreciated.
<span class="big"><LI> match1: </span><a class="link" href="http://users.nexustk.com/?name=jelia" target="_new">match2 (match3) "match4"</a><IMG SRC="buttongreen.gif" WIDTH="13" HEIGHT="13" ALT="match5">match6<BR>
<span class="big"><LI> Kindred: </span><a class="link" href="http://users.nexustk.com/?name=jelia" target="_new">Jelia (Swift - Ee San) "Paladin"</a><IMG SRC="buttongreen.gif" WIDTH="13" HEIGHT="13" ALT="Active"> <BR>
<span class="big"><LI> Kindred: </span><a class="link" href="http://users.nexustk.com/?name=kailee" target="_new">KaiLee (Chung Ryong - Level 99) "Ascendant"</a><IMG SRC="buttongreen.gif" WIDTH="13" HEIGHT="13" ALT="Active"> <BR>
<span class="big"><LI> Kindred: </span><a class="link" href="http://users.nexustk.com/?name=kaufman" target="_new">Kaufman (Rogue - Level 66) </a><IMG SRC="buttonyellow.gif" WIDTH="13" HEIGHT="13" ALT="Inactive"><IMG SRC="Notreg.gif" WIDTH="15" HEIGHT="12" ALT="Unregistered"><BR>
<span class="big"><LI> Kindred: </span><a class="link" href="http://users.nexustk.com/?name=kawakami" target="_new">Kawakami (Baekho - Level 99) "Apprentice"</a><IMG SRC="buttonred.gif" WIDTH="13" HEIGHT="13" ALT="Absent"><IMG SRC="Notreg.gif" WIDTH="15" HEIGHT="12" ALT="Unregistered"><BR>