[Bioperl-l] match patterns to retrieve seqs

Philipp Schiffer philipp.schiffer at googlemail.com
Mon Mar 19 19:23:10 UTC 2012


Thanks Jason! I guess it's better to try around (and fail in my case) a 
bit before getting to know this. ;-)

Am 19.03.12 17:40, schrieb Jason Stajich:
> And take a look at the script in the bioperl distribution
>
> index/bp_seqret.pl
>
> which will do exactly this.
>
> Jason
> On Mar 18, 2012, at 9:49 AM, Philipp Schiffer wrote:
>
>> Hi!
>>
>> I am trying to write a small script to retrieve such sequences
>> (including headers) from a fasta file that match the headers I listed
>> in another file.
>> The fasta file looks like:
>> >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein
>> AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191
>> MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR.............
>> >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein
>> AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108
>> MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............
>> >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein
>> AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5
>> MLHHR.......
>>
>> while the list file is:
>> maker-scaffold539_size40162-snap-gene-0.4-mRNA-1
>> snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1
>> maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1
>> maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1
>>
>> My perl so far would be:
>>
>> #! /usr/bin/perl -w
>>
>> use Bio::SeqIO;
>>
>> my $usage = "list_from_fasta.pl fastafile format headerlistfile\n";
>> my $file = shift or die $usage;
>> my $format = shift or die $usage;
>> my $headerlist = shift or die $usage;
>>
>> open LIST, "<$headerlist" or die $!;
>> my $queries = <LIST>; #might be a hash as well, don't know what's better
>>
>> my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format);
>> while (my $fastaseq = $fastafile->next_seq){
>>
>> if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much
>> trying it seems actually to look for matches, but does not find any,
>> still there must be
>>
>> {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";}
>> ;
>> };
>>
>> So I am wondering, is this the right way to tackle the problem. Will
>> it work? Should I use another bioPerl module? My thinking is, that the
>> pattern matching operation after the 'if' statement is wrong.
>>
>> Any help highly appreciated!
>>
>> Thanks a lot.
>>
>> Philipp
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason.stajich at gmail.com <mailto:jason.stajich at gmail.com>
> jason at bioperl.org <mailto:jason at bioperl.org>
>



More information about the Bioperl-l mailing list