[Bioperl-l] retrieving sequences by ID

Brian Osborne brian_osborne at cognia.com
Wed Apr 27 20:54:29 EDT 2005


Sean,

Definitely take a look at indexing, it's described in a few places including
the Beginner's HOWTO (http://bioperl.org/HOWTOs/Beginners/indexing.html).
Easy and much faster than SeqIO as a means of retrieving sequences.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Sean O'Keeffe
Sent: Wednesday, April 27, 2005 10:33 AM
To: bioperl-l at portal.open-bio.org
Subject: [Bioperl-l] retrieving sequences by ID


Hi all,
This is probably an easy problem for ye, but one I'm having difficulty
with none the
less.
I'm trying to extract only sequences from a fasta file (containing
~38,000 sequences)
containing a specific ID in the header line e.g.
return only the sequence for header containing 'ABCD12346' from:
>ABCD12345|followed by the rest of the description
acgtacgtgttttgggccctttaaa.....
>ABCD12346|description
acgtacgtgttttgggccctttaaa.....
>ABCD12347|description
acgtacgtgttttgggccctttaaa.....
...

The specific ID's are contained in a list(~20,000) which I want to loop
through.
This is what I have done so far w/out any luck:

############################

use strict;
use lib "/usr/lib/perl5/site_perl/5.8.1/";
use Bio::SeqIO;

my (@ids)=@_
my $seq_in = Bio::SeqIO->new( '-format' => "fasta",
                              '-file' => "$fastafile");
my $seq_out = Bio::SeqIO->new( '-format' => "fasta",
                              '-file' => "$outfile");
for ($i=0;$i<=scalar(@ids);$i++){
while ($sequence = $seq_in->next_seq){
        if ($sequence->display_id =~ /^>$ids[$i](.*$)/){
                $seq_out->write_seq($sequence);
        }
}
}
exit;

#############################

This has so far returned a list of 3 fasta headers and the programs
then finishes without errors.
I'd like to know where I'm going wrong and if possible, how I could
improve on things to prevent memory usage/speed it up.

Thanks in advance,
Sean O'Keeffe.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list