[Bioperl-l] parsing GenBank file
shalabh sharma
shalabh.sharma7 at gmail.com
Wed May 5 14:46:19 UTC 2010
Hi Torsten,
Thanks for pointing that out. But this is just a warning,
it will not break the script. i found the the point where script is
breaking.
Its breaking and giving this message:
Can't call method "classification" on an undefined value at parseGB.pl line
9, <GEN0> line 10067733.
So the script is breaking when its coming to this record:
LOCUS S001198291 1521 bp rRNA linear BCT
02-Feb-2009
DEFINITION Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2.
ACCESSION AP010656 REGION: 61786..63306
PROJECT GenomeProject:29025
SOURCE Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2
ORGANISM Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2
Root; Bacteria; "Bacteroidetes"; "Bacteroidia";
"Bacteroidales";
"Porphyromonadaceae"; unclassified_"Porphyromonadaceae".
REFERENCE 1 (bases 1 to 1521)
AUTHORS Toyoda A., Hongoh Y., Toh H., Hattori M., Ohkuma M., Sakaki Y.;
TITLE ;
JOURNAL Submitted (21-MAR-2008) to the EMBL/GenBank/DDBJ databases.
Contact:Atsushi Toyoda National Institute of Genetics,
Comparative
Genomics Laboratory; Yata 1111, Mishima, Shizuoka 411-8540,
Japan
REFERENCE 2
AUTHORS Hongoh Y., Sharma V.K., Prakash T., Noda S., Toh H., Taylor
T.D.,
Kudo T., Sakaki Y., Toyoda A., Hattori M., Ohkuma M.;
It is unable to parse this record, but i don't understand why it is doing
so? The only reason i can think of is the organism's name which is very long
as compared to others.
Thanks
Shalabh
On Wed, May 5, 2010 at 3:48 AM, Torsten Seemann <
torsten.seemann at infotech.monash.edu.au> wrote:
> > i have a huge GenBank file ( downloaded from RDP containing all
> > bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's
> linage (in ORGANISM).
> > I am getting the output like:
> > S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae
> > Holophagales Holophagae "Acidobacteria" Bacteria Root
> > This is the exact output i want, but i am missing lot of records (they
> are
> > there in the genbank file but not in my output).
> > I also got a warning during parsing:
> > --------------------- WARNING ---------------------
> > MSG: Unbalanced quote in:
> > /db_xref="taxon:35783" /germline"
> > /mol_type="genomic DNA"
> > /organism="Enterococcus sp."
> > /strain="LMG12316"No further qualifiers will be added for this feature
> > ---------------------------------------------------
> > So i was just wondering that is this warning message causing that problem
> or
> > i am doing something wrong?
>
> "Unbalanced quote" means there is not an even number (multiple of 2)
> double-quote (") symbols around the tag's value. I can see that the
> first line (below) looks problematic:
>
> YOU HAVE:
>
> /db_xref="taxon:35783" /germline"
>
> SHOULD BE:
>
> /db_xref="taxon:35783"
> /germline
>
> I suspect there is a problem either with RDP's genbank producer, or
> Bioperl is having problem with the "germline" qualifier which is a
> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I
> think in Bioperl this is handled by setting the value to "_no_value"
> ?)
>
> http://www.ncbi.nlm.nih.gov/collab/FT/
>
> Qualifier /germline
> Definition the sequence presented in the entry has not undergone
> somatic
> rearrangement as part of an adaptive immune response; it is
> the
> unrearranged sequence that was inherited from the parental
> germline
> Value format none
> Example /germline
> Comment /germline should not be used to indicate that the source of
> the sequence is a gamete or germ cell;
> /germline and /rearranged cannot be used in the same source
> feature;
> /germline and /rearranged should only be used for molecules
> that
> can undergo somatic rearrangements as part of an
> adaptive immune
> response; these are the T-cell receptor (TCR) and
> immunoglobulin
> loci in the jawed vertebrates, and the unrelated variable
> lymphocyte receptor (VLR) locus in the jawless fish
> (lampreys
> and hagfish);
> /germline and /rearranged should not be used outside of the
> Craniata (taxid=89593)
>
>
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
> University, AUSTRALIA
>
More information about the Bioperl-l
mailing list