[BioSQL-l] redundant qualifier values & Bio::SeqFeature::SimilarityPair

Hilmar Lapp hlapp at gnf.org
Mon Apr 5 14:01:14 EDT 2004


Without any investigation into when this is actually happening, what I  
can tell you is that this sort of duplication is not supposed on happen  
when you create (insert) a fresh sequence into the database. Try to  
check whether this is what is happening in your situation (I doubt it).

The situation when this would happen naturally is when you update a  
previously looked up sequence after you merged it with new annotation  
and didn't carefully reconcile the new annotation with the existing one  
to avoid any duplication. This reconciliation step is non-trivial, as  
it requires you to make some decisions (is an annotation that is on the  
existing but not on the new sequence record stale [to be removed] or  
non-public [to be kept]), and to find a good measure of when two  
annotations are identical.

I believe the merge function examples that come with bioperl-db will  
attempt to reconcile annotations for bioentries, but probably not for  
their features (BTW updating features themselves suffers from the same  
problem). I'd have to check, but that's my guess.

	-hilmar

On Monday, April 5, 2004, at 10:05  AM, Daniel Lang wrote:

> Hi,
> I am trying to populate a biosql database (postgres) with "home-made"  
> RichSeqs objects. On first sight, I looks like everything is OK, but  
> when I retrieve the sequences and write them e.g. in genbank format,  
> there are some sequences, where the qualifiers are doubled or even  
> tripled:
> ...
> FEATURES             Location/Qualifiers
>      source          1..551
>                      /tissue_type="mixture of chloronemata,  
> caulonemata and
>                      malformed buds"
>                      /tissue_type="mixture of chloronemata,  
> caulonemata and
>                      malformed buds"
>                      /tissue_type="mixture of chloronemata,  
> caulonemata and
>                      malformed buds"
>                      /clone_lib="normalized full length cDNA library,
>                      chloronemata, caulonemata and malformed buds"
>                      /clone_lib="normalized full length cDNA library,
>                      chloronemata, caulonemata and malformed buds"
>                      /clone_lib="normalized full length cDNA library,
>                      chloronemata, caulonemata and malformed buds"
>                      /sub_species="patens"
>                      /sub_species="patens"
>                      /sub_species="patens"
>                      /clone="pphb37e20"
>                      /clone="pphb37e20"
>                      /clone="pphb37e20"
>                      /organism="Physcomitrella patens subsp. patens"
>                      /organism="Physcomitrella patens subsp. patens"
>                      /organism="Physcomitrella patens subsp. patens"
>                      /mol_type="mRNA"
>                      /mol_type="mRNA"
>                      /mol_type="mRNA"
>                      /db_xref="taxon:145481"
>                      /db_xref="taxon:145481"
>                      /db_xref="taxon:145481"
>      ATAIL           1..28
>                      /ACTION="annot"
>                      /ACTION="annot"
>                      /ACTION="annot"
>                      /ORIENT="REVERSE"
>                      /ORIENT="REVERSE"
>                      /ORIENT="REVERSE"
> ...
>
> Of course, the underlying data is non-redundant...
> Correspondingly, these qualifier_values are stored redundantly in the  
> seqfeature_qualifier_value table with a different rank:
>
> seqfeature_id | term_id | rank |                value
> ---------------+---------+------ 
> +---------------------------------------------------------------------- 
> ----------------------------------------------
>              1 |       5 |    1 | pphb37e20
>              1 |       5 |    2 | pphb37e20
>              1 |       5 |    3 | pphb37e20
>              1 |       6 |    1 | normalized full length cDNA library,  
> chloronemata, caulonemata and malformed buds
>              1 |       6 |    2 | normalized full length cDNA library,  
> chloronemata, caulonemata and malformed buds
>              1 |       6 |    3 | normalized full length cDNA library,  
> chloronemata, caulonemata and malformed buds
>
>
> But when I write the constructed object in genbank or embl format in  
> the first place, the qualifiers are correct?!
> Bio::Annotation::Reference s are also affected...
>
> Additionally not all data I intend to insert is integrated...
>
> Here a code snipplet:
>  if($seq->isa("Bio::AnnotatableI")) {
> 		flatten_annotations($seq->annotation);
> 	    }
> 	    $adp= $db->get_object_adaptor($seq);
> 	
> 	    my $pseq = $db->create_persistent($seq)  unless  
> $seq->isa("Bio::DB::PersistentObjectI");
> 	    $pseq->namespace($namespace);
> 	    $pseq->store();
> 	    $adp->commit();
>
> Any ideas what is happening?
>
> Additionally I´d like to know if there is an adapter or a way to store  
> Bio::SeqFeature::SimilarityPair s in the schema yet?
>
> Thanks in advance,
> Daniel
>
> -- 
>
> Daniel Lang
> University of Freiburg, Plant Biotechnology
> Sonnenstr. 5, D-79104 Freiburg
> phone: +49 761 203 6988
> homepage:  http://www.plant-biotech.net/
> e-mail: daniel.lang at biologie.uni-freiburg.de
>
> #################################################
> >REALITY.SYS corrupted: Reboot universe? (Y/N/A)
> #################################################
>
> Join MOSS 2004 in Freiburg, Germany from September 12th - 15th:
> registration and information @ http://www.plant-biotech.net/moss2004
>
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------





More information about the BioSQL-l mailing list