[BioRuby] removing primers and corresponding quality data from sequences

George Githinji georgkam at gmail.com
Fri Feb 12 08:57:54 UTC 2010


Hi

I would like to remove both the primer and the portion before the 5'
end and one after the 3' end

def primers
  ['G*CACG[A|C]AGTTT[C|T]GC','GC[G|A]AAACT[T|G]CGTGC','G*CCCATTC[G|C]TCGAACCA','TGGTTCGA[C|G]GAATGGGC']
  #primers.collect! { |primer| create_regexp(primer) }
 end

 def bioentries(reads_file)
   Bio::FlatFile.auto(reads_file){ |f| f.map {|entry| entry} }
 end

def remove_primers(file_name)
  reg1 = Regexp.new(primers[0])
   bioentries(file_name).map do |entry|
    # puts ">#{entry.definition}"
     #puts entry.seq

    puts  entry.seq.gsub(reg1,'')

 end
end

would remove the primers but not the portion before the 5'  end

Secondly, it does not give me the corresponding co-ordinates so that i
can remove the associated quality data for the removed file

third the approach seems  'dirty'

On Fri, Feb 12, 2010 at 11:56 AM, George Githinji <georgkam at gmail.com> wrote:
> Hi would like to remove both the primer and the portion before the 5'
> end and one after the 3' end
>
> def primers
>   ['G*CACG[A|C]AGTTT[C|T]GC','GC[G|A]AAACT[T|G]CGTGC','G*CCCATTC[G|C]TCGAACCA','TGGTTCGA[C|G]GAATGGGC']
>   #primers.collect! { |primer| create_regexp(primer) }
>  end
>
>  def bioentries(reads_file)
>    Bio::FlatFile.auto(reads_file){ |f| f.map {|entry| entry} }
>  end
>
> def remove_primers(file_name)
>   reg1 = Regexp.new(primers[0])
>    bioentries(file_name).map do |entry|
>     # puts ">#{entry.definition}"
>      #puts entry.seq
>
>     puts  entry.seq.gsub(reg1,'')
>
>  end
> end
>
> would remove the primers but not the portion before the 5'  end
>
> Secondly, it does not give me the corresponding co-ordinates so that i
> can remove the associated quality data for the removed file
>
> third the approach seems  'dirty'
>
>
>
> On Fri, Feb 12, 2010 at 11:46 AM, Andrew Grimm <andrew.j.grimm at gmail.com> wrote:
>> I can't really help, but is it primers that you want removed, or the
>> portion of sequence that's before the 5' primer or after the 3'
>> primer?
>>
>> Andrew
>>
>> On Fri, Feb 12, 2010 at 7:35 PM, George Githinji <georgkam at gmail.com> wrote:
>>> Hi All,
>>> I have a list of sequences and corresponding quality files for the
>>> same data. I would like to remove the primers as well as the
>>> corresponding quality information.
>>> The approach that i am using is proving to be dirty and buggy,
>>>
>>> For example given:
>>> 1.A list of sequences in fasta file format
>>> 2.A list of 4 possible primer patterns. (no idea which sequence might
>>> contain which primer)
>>> 3.A list of quality data in phred format for each sequence,
>>>
>>> The task is to remove the possible primers from the sequences and
>>> anything before or after the primer.
>>> Each sequence has at least 2 combination of primes. one on the 5' and
>>> the other on the 3' end.
>>>
>>> Return a list of sequences with primer ends removed and the
>>> corresponding quality data for the primers removed.
>>>
>>> What would be a nice way to approach this problem.
>>>
>>>
>>>
>>>
>>> --
>>> ---------------
>>> Sincerely
>>> George
>>> PhD Student
>>> KEMRI/Wellcome-Trust Research Program
>>> Skype: george_g2
>>> Blog: http://biorelated.wordpress.com/
>>> _______________________________________________
>>> BioRuby Project - http://www.bioruby.org/
>>> BioRuby mailing list
>>> BioRuby at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>
>>
>
>
>
> --
> ---------------
> Sincerely
> George
> PhD Student
> KEMRI/Wellcome-Trust Research Program
> Skype: george_g2
> Blog: http://biorelated.wordpress.com/
>



-- 
---------------
Sincerely
George
PhD Student
KEMRI/Wellcome-Trust Research Program
Skype: george_g2
Blog: http://biorelated.wordpress.com/




More information about the BioRuby mailing list