Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

lookbehind simulation for app with no +ve lookbehind support

Last post 11-25-2007, 2:20 AM by scubamut. 8 replies.
Sort Posts: Previous Next
  •  11-22-2007, 10:56 AM 36786

    lookbehind simulation for app with no +ve lookbehind support

    I'm using WOA pdf-Excel to extract values from a table. An example of a line in the table would be, say:

    itemname  12,123  23,456  34,789

    using lookbehind, I can easily extract the numbers using the regex (?<=itemname\s+(\d+,\d+\s+){n})\d+,\d+    where n=0,1,2 to extract the 3 numbers.

    But, there's no lookbehind support in WOA pdf-Excel, so (as a very inexperienced newbie) I need some expert advice as to how to do this - I've been hugely unsuccessful to date!

    Thanks to anyone who can assist.
     

    Filed under:
  •  11-22-2007, 1:18 PM 36790 in reply to 36786

    Re: lookbehind simulation for app with no +ve lookbehind support

    what do u need the lookbehind for? the regex

    \d+,\d+

    will extract numbers like

    12,123

    from input 

    12,123  23,456  34,789

    just fine. Please explain again what the problem is.

  •  11-23-2007, 10:47 AM 36817 in reply to 36790

    Re: lookbehind simulation for app with no +ve lookbehind support

    Sorry, obviously didn't explain well enough.

    The pdf table described consists of many lines of data such as the one described. The first column describes what the data is ( "itemname" in my example) and each row of data has a different itemname. I am only interested in extracting the data from a single row with the itemname I need. Also, I need to extract that same row from the same table in about 100 pdf's in a batch, so extracting everything and sorting it out using excel would be messy. I need to do this every month. It's very important that the process is automated.

     In short, I want a regex to extract just the row of data named "itemname" and I want each number in the row in a separate excel column.

     Hope this makes it clearer- and that someone can help!
     

  •  11-23-2007, 11:47 AM 36819 in reply to 36817

    Re: lookbehind simulation for app with no +ve lookbehind support

    i' m not familiar with the regex engine u r using., so i'll show you generic approach to the problem as I see it. To iextract N numbers after the field name *itemname* u can use

    (?<=itemname)(\s+\d+,\d+)+

    against your sample input:

    itemname  12,123  23,456  34,789

    the numbrs will be captured as N capturing groups of your match:

      12,123  23,456  34,789

    u'll have to programmatically access all the groups in your Match object, and then Trim() leading white spaces from the strings and them Cast() then to numeric datatype

    12,123 

    23,456 

    34,789

    If you dont want to Trim(), u can use this regex that directly will capture the numerics

    (?<=itemname)(?:\s+(\d+,\d+))+

    but for that to be done your regex engine must support non-capturing groups (?:  ....)

  •  11-23-2007, 12:25 PM 36820 in reply to 36819

    Re: lookbehind simulation for app with no +ve lookbehind support

    You can try something like this:

     itemname\s+(?:(\d+,\d+)\s*)*

     I don't know what degree of control you have on your groups or matches afterwards.

  •  11-23-2007, 7:29 PM 36824 in reply to 36817

    Re: lookbehind simulation for app with no +ve lookbehind support

    That add-in does not support ?<= but seems to support other pattern constructs, however you might best ask the author how to use it: dwilkie@gmail.com


  •  11-24-2007, 12:58 AM 36826 in reply to 36819

    Re: lookbehind simulation for app with no +ve lookbehind support

    This is exactly my problem: the regex engine does not support the (?<=....) construct. I need a way to simulate it.
  •  11-24-2007, 6:54 PM 36830 in reply to 36826

    Re: lookbehind simulation for app with no +ve lookbehind support

    I would then recommend that you just use the app to get the lines of data and use regex witihin Excel to split the line data (in a single column) into multiple columns.

    If in your app if you use a pattern such as:

    itemname.*?(?=\r|$)

    you should end use with a column A with cells that look like:

    itemname    10,000  12,222  8,999

    Then you could construct a macro to select column A that would read the cells and run this regex routine against them to split them into B,C,D, etc.:

    Sub RegExp_Late()
    'Late binding
    'Dimension the RegExp objects
        Dim RegEx As Object, RegMatchCollection As Object, RegMatch As Object
        Dim C As Range, i As Long

        ' create the RegExp Object with late binding
        Set RegEx = CreateObject("vbscript.regexp")

        ' set the RegExp parameters
        With RegEx
            'look for global matches
            .Global = True
            .Pattern = "[^ ]+"
        End With

        ' set the Excel range to parse the seelection
        For Each C In Selection
             i=0
            Set RegMatchCollection = RegEx.Execute(C.Value)
            For Each RegMatch In RegMatchCollection
                i = i + 1
                C.Offset(0, i) = RegMatch
            Next
        Next

        Set RegMatchCollection = Nothing
        Set RegEx = Nothing
    End Sub

    That should result in:

    A1:itemname    10,000  12,222  8,999

    B1:itemname

    C1:10,000

    D1:12,222

    E1:8,999

    If you need additional assistance with Excel macros that would be out of the scope of this forum, I provided the macro code as an example of what I would do.


  •  11-25-2007, 2:20 AM 36842 in reply to 36830

    Re: lookbehind simulation for app with no +ve lookbehind support

    many thanks for this - looks like I'll need to do it in 2 stages, as suggested.
View as RSS news feed in XML