[Bioperl-l] retrieving sequences by ID

Devin Scannell scannedr at tcd.ie
Thu Apr 28 05:53:22 EDT 2005


Hi Sean,

an easy way to do this is to (literally) use Bio::DB::Fasta

best,

Devin

#!/usr/bin/perl

use Bio::DB::Fasta;
use Bio::SeqIO;

my $db = Bio::DB::Fasta->new('your_fasta_file');
my $seq_out = Bio::SeqIO->new(-format => 'fasta', -file => 'new_file')

@names = @_;

foreach my $name (@names) {

	my $seq_obj = $db->get_Seq_by_id($name);

	$seq_out->write_seq($sequence) if $seq->display_id =~ 
/^>$ids[$i](.*$)/;
}



On 26 Apr 2005, at 18:50, sokeeff at tcd.ie wrote:

> Hi all,
> Probably an easy problem for ye, but one I'm having difficulty with 
> none the
> less.
> I'm trying to extract only sequences from a fasta file(~38,000 
> sequences)
> containing a specific ID in the header files e.g.
> return only the sequence for header containing 'ABCD12346' from:
>> ABCD12345|description
> acgtacgtgttttgggccctttaaa.....
>> ABCD12346|description
> acgtacgtgttttgggccctttaaa.....
>> ABCD12347|description
> acgtacgtgttttgggccctttaaa.....
> ...
>
> The ID's are contained in a list(~20,000) which I want to loop through.
> This is what I have done so far w/out any luck:
>
> ############################
>
> use strict;
> use lib "/usr/lib/perl5/site_perl/5.8.1/";
> use Bio::SeqIO;
>
> my (@ids)=@_
> my $seq_in = Bio::SeqIO->new( '-format' => "fasta",
> 			      '-file' => "$fastafile");
> my $seq_out = Bio::SeqIO->new( '-format' => "fasta",
> 			      '-file' => "$outfile");
> for ($i=0;$i<=scalar(@ids);$i++){
> while ($sequence = $seq_in->next_seq){
> 	if ($sequence->display_id =~ /^>$ids[$i](.*$)/){
> 		$seq_out->write_seq($sequence);
> 	}
> }
> }
> exit;
>
> #############################
>
> This returns a small list of fasta headers. I'd like to know if it's 
> the right
> way to go about the task, where I'm going wrong and if possible, how I 
> could
> improve on things to prevent memory usage/speed it up.
>
> Thanks in advance,
> Sean O'Keeffe.
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list