Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

PHP Substring Equivalent in Regex

Last post 12-04-2007, 2:20 AM by ccisystems. 13 replies.
Sort Posts: Previous Next
  •  12-02-2007, 3:09 PM 37200

    PHP Substring Equivalent in Regex

    Hello, I am building some custom urls for finding images online. These are like:

     www.website.com/images/123456789.jpg

     Usually these are isbns or upc codes for products or books.

    I am currently doing string replacement by php, for substituting the number by an expression and replacing that expression with whatver number, when I do a search.

    For example

    www.website.com/images/{REPLACEMENT}.jpg

     and then I replace this with the actual product number.

    This works fine but some sites store the numbers as such:

    www.website.com/images/978/9789731234.jpg

    or

    www.website.com/images/1/2/3/123456789.jpg

    In this case, I must create the query with the first, second, third character of the search number and THEN the number + .jpg

    This works fine with PHP, and there are multiple possible combinations. Since I want to do this script database-driven, I can't store in the database, obviously, the substring function which I use in PHP. How can I write a regex to help with this?

    So, for example on this website:

    http://isbn.abebooks.com/mz/39/67/0679728139.jpg

    this book has the last two numbers, then the first two, without the 0 and then the isbn itself.

    In the database I want to store something like:

    http://isbn.abebooks.com/mz/[substr|-2|2]/[substr|1|2]/[str].jpg

    and then my code would replace these things between [ and ] with the appropriate characters in a query.

    Any help would be so much appreciated.

    Thanks

  •  12-02-2007, 8:00 PM 37208 in reply to 37200

    Re: PHP Substring Equivalent in Regex

    If I understand correctly you want to start out with string such as:

    www.website.com/images/978/9789731234.jpg

    And, without additional code you would like a regex process determine what substrings of the filename, if any, are in the folder pathnames above the filename?  That is not possible without additional code, and even with additional code you will have problems when you attempt to decode strings such as:

    www.website.com/images/9/7/9789731234.jpg

    Notice that there are two 97's found in the filename, which would cause a problem for code attempting to do this in an automated fashion.  You would need to discard any such matches and only attempt decipher matches without path name duplicates in the filename.  Still, additional code would be required for all of this.


  •  12-02-2007, 8:23 PM 37210 in reply to 37200

    Re: PHP Substring Equivalent in Regex

    This is NOT an apptempt to solve the overall problem (I don't think that it has been properly specified in your post) but as a first step in the right direction, try:

    0*(\d+)(\d{2})(\d{2})

    when applied to the source text

    123456789
    0679728139
    9789731234

    and with the replacement test

    http://isbn.abebooks.com/mz/$3/$2/$0

    results in

    http://isbn.abebooks.com/mz/89/67/123456789
    http://isbn.abebooks.com/mz/39/81/0679728139
    http://isbn.abebooks.com/mz/34/12/9789731234

    I have used Expresso (a .NET based rather than php based regex engine) for this, but I think my pattern would be "php-compatible".

    As to how you would store this in your database I'm not sure.

    Are the same ISBN's (I assume that's what the source numbers are) applied to multiple stores with each store using a different format? (What are the rules in forming an ISBN)

     

    Susan

     

  •  12-02-2007, 8:49 PM 37213 in reply to 37210

    Re: PHP Substring Equivalent in Regex

    Aussie Susan, I agree that more information is needed, but the asker did say:

    So, for example on this website:

    http://isbn.abebooks.com/mz/39/67/0679728139.jpg

    this book has the last two numbers, then the first two, without the 0 and then the isbn itself.

    matching with:

    $pattern '/0*(\d{2})\d*(\d{2})/';

    replacing with:

    $repl'http://isbn.abebooks.com/mz/$2/$1/$0.jpg';

    results in:

    http://isbn.abebooks.com/mz/39/67/0679728139.jpg

    But the bigger issue how do you dynamically construct the pattern to match with and subsequently the pattern to replace with; you would have use substrings in code or possibly DFA regex matching (of which I am not familiar) to accomplish that.


  •  12-02-2007, 9:35 PM 37214 in reply to 37213

    Re: PHP Substring Equivalent in Regex

    ...and they also said

    This works fine but some sites store the numbers as such:

    www.website.com/images/978/9789731234.jpg

    or

    www.website.com/images/1/2/3/123456789.jpg

    which would need a different pattern and replacement text.

    However, I'm not clear from the original question if the source value is only the ISBN and how it is related to the various template URL's etc etc etc.

    I was really trying to get the overall 'conversation' going so that we can get the full requirements (the business analyst is coming out in  me!)

    As for DFA vs NFA matching, they are really the same thing except that, with a DFA, you always have a single option to move to the next state (or there is an error in the source you are trying to match) and you therefore never have to backtrack. With an NFA, there could be more than one valid option leaving one state to another, and if the one you initially selected proves to be wrong, then you need to backtrack so that you can try the next. Therefore I feel that the "the bigger issue" (to use your phrase) is what is the OP trying to achieve, and then the 'how' (string substitutions, regex's [NFA or DFA] or whatever else) will follow.

    Susan

     

  •  12-02-2007, 9:45 PM 37215 in reply to 37214

    Re: PHP Substring Equivalent in Regex

    I guess I read the question to be how one could use substring-like regex functions (that do not exist) to find substrings of an ISBN-based image filename that would match paths in that file path without having to use code outside of regex.  But, that could have been a bad read of the question, having the asker explain their actual intention further is warranted.
  •  12-03-2007, 1:13 AM 37218 in reply to 37215

    Re: PHP Substring Equivalent in Regex

    Dear Friends,

     Thanks for all your feedback. I am grateful for the attempts, but the problem is the exact way around. You all seem to try to match the number IN the url. Whereas, in my case, I already KNOW the number, and I want to properly construct the URL to fetch the image for that ISBN.

     This is what I am trying to achieve:

    I have built a few small websites that sell books. The book databases that feed these websites are coming from regular POS systems, and therefore contain product information such as titles, barcodes, isbn numbers (most of them) and price info. However, that is not sufficient for presentation on a website. While the website owners will be able to add text descriptions and other info, it takes a long time to scan images or hunt them down on the web and add them to the website.

    So, I decided to give them a hand and I have done some research and have come up with about 60 different book-selling websites that save/construct urls for their images involving the ISBN of the book.

    Currently I have a programmatic solution, in which I have these pre-defined urls with their substitution rules (str_replace and substring), like so:

    $urls[] = array("http://ec2.images-amazon.com/images/P/".$against.".02._SS500_SCLZZZZZZZ_V58694805_.jpg", "AmazonEC2");
    $urls[] = array("http://www.cokesbury.com/products/5.0/".$against.".jpg", "Cokesbury");
    $urls[] = array("http://www.inspire4less.com//productimages/".substr($against, 0, 2)."/".$against.".JPG", "Inspire4Less");
    $urls[] = array("http://www.bookfinder4u.com/noindex/show_images_isbnsearch.aspx?size=l&isbn={$against}", "BookFinder4U");
    $urls[] = array("http://img.textbookx.com/images/large/".substr($against, -2, 2)."/".$against.".jpg", "TextBookX");

    So, I have about 60 elements that get added to the $urls array, all with their respective rules and substring replacements. $against is my query (isbn) number.

    In my program, when I want to find the cover of a book, I run this method and pass the ISBN of the particular book as parameter. Then, I loop through the elements of the array, and for each iteration, I replace that $against in the url with the actual ISBN (or fragments of it). Then I try that url to see if the image does exist and fetch it.

    Now, this works perfectly programmatically, but I would like to move those urls (before construction) IN the database, so that I can easily add new ones, edit them and so on.

    While this is very simple for urls that only contain the WHOLE ISBN in the replacement, I do not know how to do it for urls that must also manipulate that isbn and use only two characters or one character and so on.

     So, while it is easy, in the database, so put the following:

    http://www.bookfinder4u.com/noindex/show_images_isbnsearch.aspx?size=l&isbn={$against}

    in some form like this:

    http://www.bookfinder4u.com/noindex/show_images_isbnsearch.aspx?size=l&isbn=[ISBN]

    and at runtime replace [ISBN] with the actual number I am searching for, 

    I do not know how to also construct a substituteable variant for the PHP substring equivalents:, such as this one:

    http://img.textbookx.com/images/large/".substr($against, -2, 2)."/".$against.".jpg

    Here, the ISBN is used in its entirety, but also, a substring of it, namely the last two characters, so two replacements must be done, one with the whole isbn and one with a fragment.

    I do not know how to construct a rule for these (I suppose something very similar to the substr function AND further more how to write a regex to replace these at run time, so that all I would have to do when I need to add a new url for parsing or edit one, I would not have to edit the source-code of my program.

    For example I am thinking I could do this, in the DB:

    http://img.textbookx.com/images/large/[substr($against, -2, 2)]/[$against].jpg

    and at runtime, use the regex to match and replace [substr($against,-2,2)] with the last characters and [$against] with the entire number. There can be more variants, so there must be a standard way, so that while I maintain the same "syntax", the regex would match the correct characters.

    I hope this clarifies things. Hope I did not confuse even more.

    [EDIT]: I guess now that I think of it, and as has been mentioned, maybe this is not even possible without extra programmatic manipulation. Maybe there can be a way to bring it closer. But I think as it stands right now, I guess my question really is: how to construct a substr(STRING, Start, CharNum) function equivalent AND the appropriate rules for it. My examples above include something like [$against] and [substr($against, 1, 1)] but I don't necessarily have to use the [ and ] characters. Any way this can be made so that it works, would help. The PHP substr function is explained here:

    http://www.php.net/substr

    and the str_replace, here:

    http://www.php.net/str_replace

     Note that substr uses - (minus) to denote characters starting at the end of the string, and this must be supported since some of my examples use this also.

  •  12-03-2007, 12:33 PM 37235 in reply to 37218

    Re: PHP Substring Equivalent in Regex

    Let's say you know the ISBN is:

    0123456789

    match that with:

    $pattern '/0*(\d{2})\d*(\d{2})/';

    replacing with:

    $repl'http://isbn.abebooks.com/mz/$2/$1/$0.jpg';

    Each website would require a unique $pattern/$repl pair.


  •  12-03-2007, 1:02 PM 37236 in reply to 37235

    Re: PHP Substring Equivalent in Regex

    ddrudik:

    Let's say you know the ISBN is:

    0123456789

    match that with:

    $pattern '/0*(\d{2})\d*(\d{2})/';

    replacing with:

    $repl'http://isbn.abebooks.com/mz/$2/$1/$0.jpg';

    Each website would require a unique $pattern/$repl pair.

    I don't have time to really get into this discussion right now but just FYI, I know that is just a sample but that pattern doesn't really handle all ISBNs. How relaxed you allow a more accurate pattern can affect the grouping but the general idea is sound.

    Second to ccisystems, does all this account for the new ISBN format or just the old? or both?

     


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  12-03-2007, 1:17 PM 37237 in reply to 37236

    Re: PHP Substring Equivalent in Regex

    // does all this account for the new ISBN format or just the old? or both?

    Well, the isbn was just an example, and the rules should not care about whether the string to be added is ISBN, Barcode (UPC, etc.) or simply just another string. There are websites that store images with the code (proprietary) which is public, so I do have access to it, such as: KMCD12345 or something.

    The rules should just help replace string according to the rules. This is tricky, and like I said, I don't know whether it's actually possible without extra processing.

    Thanks for the feedback. 

  •  12-03-2007, 1:46 PM 37240 in reply to 37237

    Re: PHP Substring Equivalent in Regex

    I haven't had time to look at this too closely but it sounds like a simple regex replace is all you really need.  If your string is simply alphanumerics and you are wanting parts of the match as well as the whole, that fairly simple.  But if your substring parts vary in length by string type you'd likely need different either different patterns or replacement strings it all depends on what you are trying to return.  If I have some time later I'll look at this more closely but I think the other have supplied you with viable approaches to your problem.

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  12-03-2007, 5:31 PM 37248 in reply to 37218

    Re: PHP Substring Equivalent in Regex

    I'm not sure that we DO have the problem the wrong way around, but we are dealing with the limited information (now expanded - thanks) about your problem and so we were all trying to indicate possible directions to take, leaving you to take the ideas and expand on them.

    I admit that I don't know PHP, but according to http://php.net/eval there is an 'eval' statement that takes a string and interprets it as PHP code. From what you have said in your explanation, you can do what you need to with the various string substitution commands - why not store those commands in the database and then 'eval' them as required?

    Just a thought!

    Susan 

    PS: are there copyright implications in taking book cover images from commercial sites for your own commercial use? 

  •  12-03-2007, 6:52 PM 37249 in reply to 37248

    Re: PHP Substring Equivalent in Regex

    Ok after reading this a little closer I do think this is just a simple regex replace. The replacement string would vary but they could all you the same regex.  You would just have to figure out which replacement string to apply to which input.  ddrudik's recommondation is basically what you'd want to do regex-wise

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  12-04-2007, 2:20 AM 37259 in reply to 37248

    Re: PHP Substring Equivalent in Regex

    Thank you all for your great feedback. I must admit I have not thought about the eval possibility. I will try that and see what happens. To me this seems the best approach, if it works, since I have the full power of the PHP function substr and if need be I can add more processing later. Also it is easier since I have very little experience with regex, but no doubt a regex solution would have been a "cleaner", "smarter" one, but in this case it's not worth persisting too much on it.

    As far as copyright, my opinion is that a book cover, used for promotional use is what people call "fair-use" - whether I scan the book in hand or save an image from whatever website, search engine, etc., is really the same thing. 

View as RSS news feed in XML