[Bioperl-l] Not picking up Dbxrefs EMBL records
Hilmar Lapp
hlapp at gnf.org
Tue Aug 9 12:40:12 EDT 2005
This is a RefSeq accession. In GenBank format the db_xrefs you see are
notes for features in the feature table, not top-level db_xrefs (i.e.,
for the entry itself), although semantically of course that's what they
are. Bioperl (i.e., the Bioperl SeqIO parser for genbank format)
doesn't interpret that however, and leaves them where they are, namely
as annotation for the features. The single exception to that is that
the parser actually does look for the taxon ID in the feature table and
sets the $seq->species->ncbi_taxon_id property accordingly.
GenBank format doesn't have top-level db_xrefs at all. You will need
EMBL format for that. As I said before, the PUBMED line is not a
db_xref for the entry either but the db_xref for the reference entry,
so you will need to retrieve the references
($seq->annotation->get_Annotations('reference')) and use its
$ref->pubmed or $ref->medline properties.
BTW this will still hold true if you first load the sequences into
bioperl-db and then retrieve them; there isn't really any magic being
applied that would transform db_xrefs into a common unified picture.
I use a SequenceProcessor (see Bio::Seq::BaseSeqProcessor and the
--pipeline option to load_seqdatabase.pl) to promote db_xref tags found
in the feature table of genbank records to Bio::Annotation::DBLink
annotation on the sequence object. Very easy to implement and you are
in total control of the annotation structure.
-hilmar
On Aug 9, 2005, at 9:21 AM, SG Edwards wrote:
> Hi,
>
> My installation does not pick up ANY dbxrefs for gene records e.g.
> Pubmed,
> MEDLINE(either EMBL or Genbank formats). When I load them into the
> database
> they go in fine but no dbxref_ids are mapped to the bioentry_id in the
> bioentry_dbxref table. Therefore, nothing appears in the dbxref table
> either!
>
> The system works fine for UniProt protein entries into the database. I
> am
> currently installing BioPerl v 1.5 to see if this resolves the problem.
>
> An example: NM_214434 from Genbank which has the dbxrefs:
>
> Pubmed 1503277
> Taxon 9823
> GeneID 404088
>
> Quoting Hilmar Lapp <hlapp at gnf.org>:
>
>> Are you referring to references and their PMID? These you would find
>> in
>> the Reference table, which has a foreign key to dbxref, which would
>> only store the PUBMED or MEDLINE ID (not both at this time). Can you
>> given an example accession that's giving you grief?
>>
>> -hilmar
>>
>> On Aug 8, 2005, at 1:17 AM, SG Edwards wrote:
>>
>>> Hi folks,
>>>
>>> I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db
>>> 1.2) set up
>>> containing protein and gene data. However, when I load gene sequence
>>> records
>>> (EMBL or Genbank) using:
>>>
>>> perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk
>>> -dbuser
>>> s0460205
>>> -dbpass password -format embl /home/s0460205/file_name.txt
>>>
>>> from bioperl-db it does not pick up any dbxrefs i.e. there is no
>>> dbxref_id for
>>> MEDLINE etc.
>>>
>>> Has anyone else come across this rpoblem and is ther a fix?
>>>
>>> Cheers,
>>>
>>> Stephen
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list