[Bioperl-l] [Structure of remote GenBank files]

Jason Stajich jason at cgt.duhs.duke.edu
Fri Apr 23 03:23:24 EDT 2004


I think this particular behavior was a bug - I fixed it on the
main trunk -- I'm not sure it has been migrated to the branch though.

http://bugzilla.open-bio.org/show_bug.cgi?id=1588
-jason

On Fri, 23 Apr 2004, Sebastien Moretti wrote:

> > Sebastian,
> > I ran your script with BioPerl 1.4, Active State Perl 5.8 on Windows XP
> > works fine for me.  I don't know what's causing your problem.  Maybe
> > telling more about your system might help.  This doesn't have anything
> > to do with you file format problems, but thought I'd mention that since
> > your script takes accession numbers as input you could skip the query,
> > and call a $gb->get_Stream_by_id(\@accession) on a array of accessions
> > or $gb->get_Seq_by_acc($acc) on a scalar.
> >
> > Barry
>
> Hello Barry,
> I use Linux Suse 9.0 and 8.2 and BioPerl 1.4
> I try to get GenBank and RefSeq files with
> '$gb->get_Stream_by_id(\@accession)' and I still have the same problems (with
> NM_178432, BC032122 or NM_000559 as accession number):
> 	PUBMED fields are not on their own lines but paste to JOURNAL fields
> 	COMMENT fields are compact, without blank lines and line breaks
>
> Do you think it comes from linux system ?
> It might be for blank lines but why for PUBMED fields ?
>
> When a MEDLINE field is here (eg: NM_169678 or NM_079645), PUBMED and MEDLINE
> fields are right placed.
> The COMMENT field are still compact, without blank lines and line breaks.
>
> Thanks
>
> > >Hello
> > >I use a BioPerl script to get GenBank and RefSeq files in GenBank flat
> > > file format.
> > >
> > >	#!/usr/bin/perl -w
> > >
> > >	use strict;
> > >	use Bio::DB::GenBank;
> > >	use Bio::DB::Query::GenBank;
> > >	use Bio::SeqIO;
> > >	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is
> > >missing.\n\tTry something like: ./update_estCDK.pl NM_178432\n\n";
> > >
> > >	$acc=$acc."[Accession]";
> > >
> > >	my $query_string = "$acc";
> > >	my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',
> > >	                                         -query=>$query_string);
> > >
> > >	my $gb = new Bio::DB::GenBank;
> > >	my $stream = $gb->get_Stream_by_query($query);
> > >
> > >	my $out=Bio::SeqIO->new(-format=>'genbank');
> > >	my $seq = $stream->next_seq();
> > >
> > >	my $result=$out->write_seq($seq);
> > >	$result =~ s/^1.*$//;
> > >	#print $out->write_seq($seq);
> > >	print $result;
> > >
> > >	exit;
> > >
> > >It works fine but I have two structures problems in my files:
> > >	- the PUBMED fields are pasted with the JOURNAL fields line above:
> > >  JOURNAL   J. Biol. Chem. 278 (42), 40815-40828 (2003) PUBMED   12912980
> > >or
> > >  JOURNAL   J. Cancer Res. Clin. Oncol. 129 (9), 498-502 (2003) PUBMED
> > >            12884029
> > >or
> > >  JOURNAL   Am. J. Physiol. Heart Circ. Physiol. 284 (6), H1917-H1923
> > > (2003) PUBMED   12742823
> > >
> > >	- the COMMENT fields haven't blank lines and \n, so COMMENT fields looks
> > >	   compact:
> > >COMMENT     REVIEWED REFSEQ: This record has been curated by NCBI staff.
> > > The reference sequence was derived from Y00272.1 and BC014563.1. On Oct
> > > 22, 2001 this sequence version replaced gi:4502708. Summary: The protein
> > > encoded by this gene is a member of the Ser/Thr protein kinase family.
> > > This protein is a catalytic subunit of the highly conserved protein
> > > kinase complex known as M-phase promoting factor (MPF), which is
> > > essential for G1/S and G2/M phase transitions of eukaryotic cell cycle.
> > > Mitotic cyclins stably associate with this protein and function as
> > > regulatory subunits. The kinase activity of this protein is controlled by
> > > cyclin accumulation and destruction through the cell cycle. The
> > > phosphorylation and dephosphorylation of this protein also play important
> > > regulatory roles in cell cycle control. Transcript Variant: This variant
> > > (1) encodes the full length isoform. COMPLETENESS: complete on the 3'
> > > end.
> > >
> > >Does it come from my script ?
> > >Do you see the same thing ?
> > >Thanks
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list