[Bioperl-l] using default string values for undef/empty, was Re: parsing GenBank file
Chris Fields
cjfields at illinois.edu
Wed May 5 12:30:30 UTC 2010
On May 5, 2010, at 2:48 AM, Torsten Seemann wrote:
>> i have a huge GenBank file ( downloaded from RDP containing all
>> bacterial 16s). I just want to parse RDP id (in LOCUS) and organism's linage (in ORGANISM).
>> I am getting the output like:
>> S000107505 uncultured Acidobacteria bacterium Geothrix Holophagaceae
>> Holophagales Holophagae "Acidobacteria" Bacteria Root
>> This is the exact output i want, but i am missing lot of records (they are
>> there in the genbank file but not in my output).
>> I also got a warning during parsing:
>> --------------------- WARNING ---------------------
>> MSG: Unbalanced quote in:
>> /db_xref="taxon:35783" /germline"
>> /mol_type="genomic DNA"
>> /organism="Enterococcus sp."
>> /strain="LMG12316"No further qualifiers will be added for this feature
>> ---------------------------------------------------
>> So i was just wondering that is this warning message causing that problem or
>> i am doing something wrong?
>
> "Unbalanced quote" means there is not an even number (multiple of 2)
> double-quote (") symbols around the tag's value. I can see that the
> first line (below) looks problematic:
>
> YOU HAVE:
>
> /db_xref="taxon:35783" /germline"
>
> SHOULD BE:
>
> /db_xref="taxon:35783"
> /germline
>
> I suspect there is a problem either with RDP's genbank producer, or
> Bioperl is having problem with the "germline" qualifier which is a
> 'null valued' qualifier like /pseudo - it takes no ="value" string. (I
> think in Bioperl this is handled by setting the value to "_no_value"
> ?)
> ...
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
> University, AUSTRALIA
Ugh, didn't notice the '_no_value' bit. Probably my opinion, but I don't like stubs like that as they tend to be brittle and run into issues (like this one, for instance). I would prefer we just leave that as undef and only quote defined values (with the exceptions in %FTQUAL_NO_QUOTE).
Any reason for this behavior (is it related to ORM-related stuff like bioperl-db)? Can we change that to something a bit more realistic?
chris
More information about the Bioperl-l
mailing list