[Bioperl-l] How do you create a genbank file?
Jason Stajich
jason at cgt.duhs.duke.edu
Fri Oct 17 08:58:08 EDT 2003
The should all be captured when you parse a genbank file (and be
available when you write a genbank file) that is how we can round trip
the format and have it be 99% identical.
gi number goes to primary_id()
accession number goes to accession_number()
version goes to seq_version() and version() (not sure why we have 2 methods here???)
# From Bio::SeqIO::genbank
if( /^ACCESSION\s+(\S.*\S)/ ) {
push(@acc, split(/\s+/,$1));
while( defined($_ = $self->_readline) ) {
/^\s+(.*)/ && do { push (@acc, split(/\s+/,$1)); next };
last;
}
$buffer = $_;
next;
}
....
elsif( /^VERSION\s+(.+)$/ ) {
my ($acc,$gi) = split(' ',$1);
if($acc =~ /^\w+\.(\d+)/) {
$params{'-version'} = $1;
$params{'-seq_version'} = $1;
}
if($gi && (index($gi,"GI:") == 0)) {
$params{'-primary_id'} = substr($gi,3);
}
}
the first item in @acc is passed to the appropriate accession_number
slot and the rest are passed to secondard accessions slot.
The %params hash gets passed to the appropriate Seq object
through the factory. The arguments are passed through the hierarchy
SEQ_VERSION is consumed by Seq::RichSeq
VERSION is consumed by Seq::PrimarySeq
PRIMARY_ID is consumed by Seq::PrimarySeq
-jason
On Fri, 17 Oct 2003, Marc Logghe wrote:
>
> > You can set GI number with
> > $seq->primary_id($ginumber);
>
> True. You can set it up in the object but it is not shown in the genbank
> dump. And, also the way around, when a genbank record is parsed
> containing the line 'VERSION AB050006.1 GI:26453358' it does not get
> into the Bio::Seq::RichSeq object. As far as I can see.
>
> >
> > If you want to be creating RichSeq instead of Bio::Seq objects (in the
> > event you want to set some fields which are only available in RichSeq
> > objects, initialize your Bio::SeqIO fasta parser like this:
> >
> > use Bio::SeqIO;
> > use Bio::Seq::SeqFactory;
> > my $seqio = new Bio::SeqIO(-format => 'fasta',
> > -file => $file,
> > -seqfactory => new Bio::Seq::SeqFactory
> > ( -type => 'Bio::Seq::RichSeq'));
> That is a really cool one. Never used it like this. Thanks
> ML
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list