Got more questions? Find advice on: ASP | SQL | XML | Windows
Welcome to RegexAdvice Sign in | Join | Help

finding substring from index a to b

  •  09-27-2007, 4:10 PM

    finding substring from index a to b

    Hi, I tried searching the forums, but could not get anything on this topic.

     

    What I want to do is simple - given 2 indices i,j, return the substring which occurs between them.

     

    My sample input.txt file is:

     

    ---------------------------------------------------------------------

    >chromosome_1..............1,000,000-2,000,000

    AGGCATAGAATTAGACATAAGGTCTCTTCT

    AGAAGGGGTTATCCTCAGGCTTGAGGCAT

    --------------------------------------------------------------------- 

     

    Assume that each of the 3 lines in the above file is 30 characters long. And except the first header line which starts with a ">", all other lines have characters randomly chosen from the character set {A,G,C,T}.

     

     

    So if I say - return substring (40,70), I should  get in return only the underlined characters below as a single line i.e. without the \n at the end of the second line.

     

    ---------------------------------------------------------------------

    >chromosome_1..............1,000,000-2,000,000

    AGGCATAGAATTAGACATAAGGTCTCTTCT

    AGAAGGGGTTATCCTCAGGCTTGAGGCAT

    ---------------------------------------------------------------------

     

     On the other hand, if I say return substring 70 - 200, I should just get the last 20 characters and if I say return substring 300-400, I should get nothing.

     

     

    I am ok using perl -ep or grep or egrep for this task, however, I am not able to figure out or construct the right expression for this.

    I have tried things on the lines of extracting the first match only and then use a backreference to exclude that from the string. But I am not conceptually clear and the patterns are not working. Also, I think it could be done in a simpler way. 

     

     Any help would be greatly appreciated.

     PS: I am a newbie, so kindly provide any pointers or a few sentences of explanation if about the solution.

      

    Filed under:
View Complete Thread