[Bioperl-l] How do you create a genbank file?

Jason Stajich jason at cgt.duhs.duke.edu
Fri Oct 17 08:58:08 EDT 2003


The should all be captured when you parse a genbank file (and be
available when you write a genbank file) that is how we can round trip
the format and have it be 99% identical.

gi number goes to primary_id()
accession number goes to accession_number()
version goes to seq_version() and version() (not sure why we have 2 methods here???)

  # From Bio::SeqIO::genbank
  if( /^ACCESSION\s+(\S.*\S)/ ) {
	      push(@acc, split(/\s+/,$1));
	      while( defined($_ = $self->_readline) ) {
		  /^\s+(.*)/ && do { push (@acc, split(/\s+/,$1)); next };
		  last;
	      }
	      $buffer = $_;
	      next;
	  }
  ....
  elsif( /^VERSION\s+(.+)$/ ) {
	      my ($acc,$gi) = split(' ',$1);
	      if($acc =~ /^\w+\.(\d+)/) {
		  $params{'-version'} = $1;
		  $params{'-seq_version'} = $1;
	      }
	      if($gi && (index($gi,"GI:") == 0)) {
		  $params{'-primary_id'} = substr($gi,3);
	      }
	  }

   the first item in @acc is passed to the appropriate accession_number
   slot and the rest are passed to secondard accessions slot.

   The %params hash gets passed to the appropriate Seq object
through the factory.  The arguments are passed through the hierarchy

   SEQ_VERSION is consumed by Seq::RichSeq
   VERSION     is consumed by Seq::PrimarySeq
   PRIMARY_ID  is consumed by Seq::PrimarySeq

-jason

On Fri, 17 Oct 2003, Marc Logghe wrote:

>
> > You can set GI number with
> > $seq->primary_id($ginumber);
>
> True. You can set it up in the object but it is not shown in the genbank
> dump. And, also the way around, when a genbank record is parsed
> containing the line 'VERSION AB050006.1 GI:26453358' it does not get
> into the Bio::Seq::RichSeq object. As far as I can see.
>
> >
> > If you want to be creating RichSeq instead of Bio::Seq objects (in the
> > event you want to set some fields which are only available in RichSeq
> > objects, initialize your  Bio::SeqIO fasta parser like this:
> >
> > use Bio::SeqIO;
> > use Bio::Seq::SeqFactory;
> > my $seqio = new Bio::SeqIO(-format => 'fasta',
> > 			   -file   => $file,
> > 			   -seqfactory => new Bio::Seq::SeqFactory
> > 			( -type => 'Bio::Seq::RichSeq'));
> That is a really cool one. Never used it like this. Thanks
> ML
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list