NCBI fasta format [was: Re: [Bioperl-l] loading data into
bioperl-db]
Hilmar Lapp
hlapp at gnf.org
Fri Jun 6 15:33:46 EDT 2003
> -----Original Message-----
> From: Aaron J Mackey [mailto:ajm6q at virginia.edu]
> Sent: Friday, June 06, 2003 1:07 PM
> To: Bioperl
> Subject: NCBI fasta format [was: Re: [Bioperl-l] loading data
> into bioperl-db]
>
[...]
>
> It should make loading up biosql databases from flatfiles a
> bit easier, too.
>
> Any lurkers want to write Bio::SeqIO::fasta_ncbi.pm (inheriting from
> Bio::SeqIO::fasta) ?? I guess we'd have to agree on where
> the "db" and any secondary accession/names would be stored in
> which Seq model ...
>
Or as I pointed earlier you'd write a Bio::Seq::BaseSeqProcessor-derived
module:
package MySeqProcessor;
use vars(@ISA);
use strict;
use Bio::Seq::BaseSeqProcessor;
@ISA = qw(Bio::Seq::BaseSeqProcessor);
# this is the only method you need to override
sub process_seq{
my $self = shift; my $seq = shift;
my @idflds = split(/\|/,$seq->display_id);
if(@idflds > 1) {
$seq->namespace($idflds[1]);
my ($acc,$v) = ($idflds[@idflds-1]);
if($acc =~ /^(.*)\.(\d{1,2})$/) {$acc = $1; $v = $2;}
$seq->accession_number($acc);
$seq->version($v);
}
# I could massage many more things here
# when done, return it
return $seq;
}
1;
__END__
And then you'd do
my $seqio = <open your SeqIO here as would otherwise>;
my $pipe = MySeqProcessor->new(-source_stream => $seqio);
# treat $pipe as if it were a SeqIO stream
while(my $seq = $pipe->next_seq()) {
# whatever
}
$pipe->close; # cascades
Or, to load via load_seqdatabase.pl:
$ load_seqdatabase.pl <your normal options here> \
--pipeline "MySeqProcessor"
The advantage is you can modify and tweak it easily at any time and plug
it back in (no make / install or messing with perl libraries), and you
can use it for any format, not just fasta.
-hilmar
> -Aaron
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-> bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list