[Bioperl-l] How do you create a genbank file?

Marc Logghe Marc.Logghe at devgen.com
Fri Oct 17 09:23:42 EDT 2003


I must have had huge flies in my eyes: the GI is there indeed, in the object *and* in the genbank dump.
Thanks for your patience, Jason !!
Cheers,
Marc

> -----Original Message-----
> From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu]
> Sent: Friday, October 17, 2003 2:58 PM
> To: Marc Logghe
> Cc: Wes Barris; Bioperl Mailing List
> Subject: RE: [Bioperl-l] How do you create a genbank file?
> 
> 
> The should all be captured when you parse a genbank file (and be
> available when you write a genbank file) that is how we can round trip
> the format and have it be 99% identical.
> 
> gi number goes to primary_id()
> accession number goes to accession_number()
> version goes to seq_version() and version() (not sure why we 
> have 2 methods here???)
> 
>   # From Bio::SeqIO::genbank
>   if( /^ACCESSION\s+(\S.*\S)/ ) {
> 	      push(@acc, split(/\s+/,$1));
> 	      while( defined($_ = $self->_readline) ) {
> 		  /^\s+(.*)/ && do { push (@acc, 
> split(/\s+/,$1)); next };
> 		  last;
> 	      }
> 	      $buffer = $_;
> 	      next;
> 	  }
>   ....
>   elsif( /^VERSION\s+(.+)$/ ) {
> 	      my ($acc,$gi) = split(' ',$1);
> 	      if($acc =~ /^\w+\.(\d+)/) {
> 		  $params{'-version'} = $1;
> 		  $params{'-seq_version'} = $1;
> 	      }
> 	      if($gi && (index($gi,"GI:") == 0)) {
> 		  $params{'-primary_id'} = substr($gi,3);
> 	      }
> 	  }
> 
>    the first item in @acc is passed to the appropriate 
> accession_number
>    slot and the rest are passed to secondard accessions slot.
> 
>    The %params hash gets passed to the appropriate Seq object
> through the factory.  The arguments are passed through the hierarchy
> 
>    SEQ_VERSION is consumed by Seq::RichSeq
>    VERSION     is consumed by Seq::PrimarySeq
>    PRIMARY_ID  is consumed by Seq::PrimarySeq
> 
> -jason
> 
> On Fri, 17 Oct 2003, Marc Logghe wrote:
> 
> >
> > > You can set GI number with
> > > $seq->primary_id($ginumber);
> >
> > True. You can set it up in the object but it is not shown 
> in the genbank
> > dump. And, also the way around, when a genbank record is parsed
> > containing the line 'VERSION AB050006.1 GI:26453358' it does not get
> > into the Bio::Seq::RichSeq object. As far as I can see.
> >
> > >
> > > If you want to be creating RichSeq instead of Bio::Seq 
> objects (in the
> > > event you want to set some fields which are only 
> available in RichSeq
> > > objects, initialize your  Bio::SeqIO fasta parser like this:
> > >
> > > use Bio::SeqIO;
> > > use Bio::Seq::SeqFactory;
> > > my $seqio = new Bio::SeqIO(-format => 'fasta',
> > > 			   -file   => $file,
> > > 			   -seqfactory => new Bio::Seq::SeqFactory
> > > 			( -type => 'Bio::Seq::RichSeq'));
> > That is a really cool one. Never used it like this. Thanks
> > ML
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> 



More information about the Bioperl-l mailing list