[Bioperl-l] parsing GenBank file
Torsten Seemann
torsten.seemann at infotech.monash.edu.au
Wed May 5 07:48:55 UTC 2010
> i have a huge GenBank file ( downloaded from RDP containing all
> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM).
> I am getting the output like:
> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae
> Holophagales Holophagae "Acidobacteria" Bacteria Root
> This is the exact output i want, but i am missing lot of records (they are
> there in the genbank file but not in my output).
> I also got a warning during parsing:
> --------------------- WARNING ---------------------
> MSG: Unbalanced quote in:
> /db_xref="taxon:35783" /germline"
> /mol_type="genomic DNA"
> /organism="Enterococcus sp."
> /strain="LMG12316"No further qualifiers will be added for this feature
> ---------------------------------------------------
> So i was just wondering that is this warning message causing that problem or
> i am doing something wrong?
"Unbalanced quote" means there is not an even number (multiple of 2)
double-quote (") symbols around the tag's value. I can see that the
first line (below) looks problematic:
YOU HAVE:
/db_xref="taxon:35783" /germline"
SHOULD BE:
/db_xref="taxon:35783"
/germline
I suspect there is a problem either with RDP's genbank producer, or
Bioperl is having problem with the "germline" qualifier which is a
'null valued' qualifier like /pseudo - it takes no ="value" string. (I
think in Bioperl this is handled by setting the value to "_no_value"
?)
http://www.ncbi.nlm.nih.gov/collab/FT/
Qualifier /germline
Definition the sequence presented in the entry has not undergone somatic
rearrangement as part of an adaptive immune response; it is the
unrearranged sequence that was inherited from the parental
germline
Value format none
Example /germline
Comment /germline should not be used to indicate that the source of
the sequence is a gamete or germ cell;
/germline and /rearranged cannot be used in the same source
feature;
/germline and /rearranged should only be used for molecules that
can undergo somatic rearrangements as part of an
adaptive immune
response; these are the T-cell receptor (TCR) and immunoglobulin
loci in the jawed vertebrates, and the unrelated variable
lymphocyte receptor (VLR) locus in the jawless fish (lampreys
and hagfish);
/germline and /rearranged should not be used outside of the
Craniata (taxid=89593)
--Torsten Seemann
--Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
University, AUSTRALIA
More information about the Bioperl-l
mailing list