[Bioperl-l] retrieving sequences by ID

khoueiry khoueiry at ibdm.univ-mrs.fr
Wed Apr 27 11:20:26 EDT 2005


Hi,

First of all try to index your fasta file...

use Bio::Index::Fasta;
#indexing fasta file
my $type = $ENV{'BIOPER_INDEX_TYPE'};  
if( $type ) {
   $Bio::Index::Abstract::USE_DBM_TYPE = $type;
} 

my $index = Bio::Index::Fasta->new($tmp_dir."index.idx", 'WRITE'); #the
name you will give to the index
$index->make_index($yourfastafile.fasta);
#the fasta file you want to index


Loop here{
...
my $seq = $index->fetch($id);
#fetching the fasta file searching for a specified ID
$seq_out->write_seq($seq);
...
}



Le mercredi 27 avril 2005 à 15:33 +0100, Sean O'Keeffe a écrit : 

> Hi all,
> This is probably an easy problem for ye, but one I'm having difficulty
> with none the
> less.
> I'm trying to extract only sequences from a fasta file (containing
> ~38,000 sequences)
> containing a specific ID in the header line e.g.
> return only the sequence for header containing 'ABCD12346' from:
> >ABCD12345|followed by the rest of the description
> acgtacgtgttttgggccctttaaa.....
> >ABCD12346|description
> acgtacgtgttttgggccctttaaa.....
> >ABCD12347|description
> acgtacgtgttttgggccctttaaa.....
> ...
> 
> The specific ID's are contained in a list(~20,000) which I want to loop through.
> This is what I have done so far w/out any luck:
> 
> ############################
> 
> use strict;
> use lib "/usr/lib/perl5/site_perl/5.8.1/";
> use Bio::SeqIO;
> 
> my (@ids)=@_
> my $seq_in = Bio::SeqIO->new( '-format' => "fasta",
>                               '-file' => "$fastafile");
> my $seq_out = Bio::SeqIO->new( '-format' => "fasta",
>                               '-file' => "$outfile");
> for ($i=0;$i<=scalar(@ids);$i++){
> while ($sequence = $seq_in->next_seq){
>         if ($sequence->display_id =~ /^>$ids[$i](.*$)/){
>                 $seq_out->write_seq($sequence);
>         }
> }
> }
> exit;
> 
> #############################
> 
> This has so far returned a list of 3 fasta headers and the programs
> then finishes without errors.
> I'd like to know where I'm going wrong and if possible, how I could
> improve on things to prevent memory usage/speed it up.
> 
> Thanks in advance,
> Sean O'Keeffe.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


More information about the Bioperl-l mailing list