[Bioperl-l] basic problems with bioperl-db/biosql
Hilmar Lapp
hlapp at gnf.org
Mon Oct 18 15:41:16 EDT 2004
On Oct 18, 2004, at 7:11 AM, Mikko Arvas wrote:
>
> However if I have in BioSQL something like this in bioentry and
> correspondingly
> in biosequence:
> +-------------+----------------+----------+-----------+------------+
> | bioentry_id | biodatabase_id | name | accession | identifier |
> +-------------+----------------+----------+-----------+------------+
> | 9 | 9 | YAL001C | unknown | YAL001C |
> | 12 | 9 | XX0115.2 | ma00001 | XX0115.2 |
> +-------------+----------------+----------+-----------+------------+
>
> and I do this:
>
> #!/usr/bin/perl -w
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::Seq::SeqFactory;
> use Bio::DB::BioDB;
> my $db = Bio::DB::BioDB->new( -database => 'biosql', -user => 'root',
> -dbname => 'biosql', -host => 'localhost', -driver => 'mysql');
> my $seq =Bio::Seq->new(-primary_id => "XX0115.2",
> -namespace => "bioperl");
> my $seqfact = Bio::Seq::SeqFactory->new(-type=>"Bio::Seq");
> my $adp = $db->get_object_adaptor($seq);
> my $dbseq=$adp->find_by_unique_key($seq, -obj_factory => $seqfact);
>
> I get the XX0115.2, but if that doesn't exist in the database I get
> YAL001C instead,
> which is a little bit funny.
It is due to the (stupid IMO) rule in bioperl that the default value
for the accession number is 'unknown'. Also, there are multiple unique
keys on bioentry, and the adaptor will search all of them until it
finds a match. So, if using the identifier (primary_id) you set fails
it will try the accession number ('unknown') and version (default 0) -
which will match the YAL001C entry.
That's why it is almost never a good idea to let sequences with
accession number 'unknown' into your database ... (apart from the fact
that you can have only one per namespace anyway unless you increment
versions ...).
> If I use instead:
> my $seq =Bio::Seq->new(-accession => "XX0115.2", -namespace =>
> "bioperl");
> it works as it should.
>
> Is this do to the fact that YAL001C doesn't have an accession, so I
> should make sure that there is always an accession
Right. Accession is required (NOT NULL) in biosql, and it's not a good
idea to leave it at a non-meaningful default.
Primary_id (or Identifier in biosql) is rather meant for 'internal'
identifiers. E.g., NCBI's GI number, or the source primary key if you
imported seqs from somewhere. Almost all 'identifiers' you encounter
will be accessions, not primary_id's in the sense of bioperl.
> or is the -primery_id search somehow problematic and should be
> avoided?
No, not at all, it's perfectly fine ...
Great if you find the software is useful. That's the goal :-)
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list