[Bioperl-l] How do you create a genbank file?
Marc Logghe
Marc.Logghe at devgen.com
Fri Oct 17 09:23:42 EDT 2003
I must have had huge flies in my eyes: the GI is there indeed, in the object *and* in the genbank dump.
Thanks for your patience, Jason !!
Cheers,
Marc
> -----Original Message-----
> From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu]
> Sent: Friday, October 17, 2003 2:58 PM
> To: Marc Logghe
> Cc: Wes Barris; Bioperl Mailing List
> Subject: RE: [Bioperl-l] How do you create a genbank file?
>
>
> The should all be captured when you parse a genbank file (and be
> available when you write a genbank file) that is how we can round trip
> the format and have it be 99% identical.
>
> gi number goes to primary_id()
> accession number goes to accession_number()
> version goes to seq_version() and version() (not sure why we
> have 2 methods here???)
>
> # From Bio::SeqIO::genbank
> if( /^ACCESSION\s+(\S.*\S)/ ) {
> push(@acc, split(/\s+/,$1));
> while( defined($_ = $self->_readline) ) {
> /^\s+(.*)/ && do { push (@acc,
> split(/\s+/,$1)); next };
> last;
> }
> $buffer = $_;
> next;
> }
> ....
> elsif( /^VERSION\s+(.+)$/ ) {
> my ($acc,$gi) = split(' ',$1);
> if($acc =~ /^\w+\.(\d+)/) {
> $params{'-version'} = $1;
> $params{'-seq_version'} = $1;
> }
> if($gi && (index($gi,"GI:") == 0)) {
> $params{'-primary_id'} = substr($gi,3);
> }
> }
>
> the first item in @acc is passed to the appropriate
> accession_number
> slot and the rest are passed to secondard accessions slot.
>
> The %params hash gets passed to the appropriate Seq object
> through the factory. The arguments are passed through the hierarchy
>
> SEQ_VERSION is consumed by Seq::RichSeq
> VERSION is consumed by Seq::PrimarySeq
> PRIMARY_ID is consumed by Seq::PrimarySeq
>
> -jason
>
> On Fri, 17 Oct 2003, Marc Logghe wrote:
>
> >
> > > You can set GI number with
> > > $seq->primary_id($ginumber);
> >
> > True. You can set it up in the object but it is not shown
> in the genbank
> > dump. And, also the way around, when a genbank record is parsed
> > containing the line 'VERSION AB050006.1 GI:26453358' it does not get
> > into the Bio::Seq::RichSeq object. As far as I can see.
> >
> > >
> > > If you want to be creating RichSeq instead of Bio::Seq
> objects (in the
> > > event you want to set some fields which are only
> available in RichSeq
> > > objects, initialize your Bio::SeqIO fasta parser like this:
> > >
> > > use Bio::SeqIO;
> > > use Bio::Seq::SeqFactory;
> > > my $seqio = new Bio::SeqIO(-format => 'fasta',
> > > -file => $file,
> > > -seqfactory => new Bio::Seq::SeqFactory
> > > ( -type => 'Bio::Seq::RichSeq'));
> > That is a really cool one. Never used it like this. Thanks
> > ML
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
More information about the Bioperl-l
mailing list