[Bioperl-l] match patterns to retrieve seqs

Philipp Schiffer philipp.schiffer at googlemail.com
Sun Mar 18 20:36:18 UTC 2012


Sorry,

at least my $queries should read my queries at .


Am 18.03.12 17:49, schrieb Philipp Schiffer:
> Hi!
>
> I am trying to write a small script to retrieve such sequences
> (including headers) from a fasta file that match the headers I listed in
> another file.
> The fasta file looks like:
>  >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein
> AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191
> MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR.............
>  >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein
> AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108
> MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............
>  >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein
> AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5
> MLHHR.......
>
> while the list file is:
> maker-scaffold539_size40162-snap-gene-0.4-mRNA-1
> snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1
> maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1
> maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1
>
> My perl so far would be:
>
> #! /usr/bin/perl -w
>
> use Bio::SeqIO;
>
> my $usage = "list_from_fasta.pl fastafile format headerlistfile\n";
> my $file = shift or die $usage;
> my $format = shift or die $usage;
> my $headerlist = shift or die $usage;
>
> open LIST, "<$headerlist" or die $!;
> my $queries = <LIST>; #might be a hash as well, don't know what's better
>
> my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format);
> while (my $fastaseq = $fastafile->next_seq){
>
> if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much
> trying it seems actually to look for matches, but does not find any,
> still there must be
>
> {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";}
> ;
> };
>
> So I am wondering, is this the right way to tackle the problem. Will it
> work? Should I use another bioPerl module? My thinking is, that the
> pattern matching operation after the 'if' statement is wrong.
>
> Any help highly appreciated!
>
> Thanks a lot.
>
> Philipp
>



More information about the Bioperl-l mailing list