[Bioperl-l] DBSOURCE parsing
Chris Fields
cjfields at uiuc.edu
Mon Nov 27 21:47:12 UTC 2006
Jason,
I am working on stockholm and GenPept format parsing, both which have
DBLink objects. I have a couple of questions. First, (not a huge
issue really, more like a curiosity), is it possible to pass a
callback to Annotation objects for the overloaded operators? I'm
just thinking of situations where the data is displayed differently
in other formats (like Stockholm).
Also, would it be feasible to have DBLink objects also contain
annotations (comments, other DBLink objects, etc) for more complex
data? In particular this regards GenPept stuff, like the following
examples:
DBSOURCE swissprot: locus BRCA1_HUMAN, accession P38398;
class: standard.
created: Oct 1, 1994.
sequence updated: Feb 1, 1995.
annotation updated: Nov 14, 2006.
xrefs: U14680.1, AAA73985.1, L78833.1, AAC37594.1,
AY273801.1,
AAP12647.1, A58881, 1JM7A, 1JNXX, 1N5OX, 1OQAA, 1T15A,
1T29A,
1T2UA, 1T2VA, 1T2VB, 1T2VC, 1T2VD, 1T2VE, 1Y98A
xrefs (non-sequence databases): UniGene:Hs.194143,
IntAct:P38398,
TRANSFAC:T04074, Ensembl:ENSG00000012048, KEGG:hsa:672,
HGNC:1100,
MIM:113705, MIM:114480, Reactome:P38398,
ArrayExpress:P38398,
GO:0031436, GO:0008274, GO:0005634, GO:0000151, GO:0050681,
GO:0003677, GO:0019899, GO:0003713, GO:0015631, GO:0008270,
GO:0030521, GO:0007059, GO:0006978, GO:0008630, GO:0042759,
GO:0046600, GO:0016481, GO:0045739, GO:0031398, GO:0045893,
GO:0016567, GO:0042981, GO:0042127, GO:0006357, GO:0006359,
InterPro:IPR011364, InterPro:IPR001357, InterPro:IPR002378,
InterPro:IPR001841, PANTHER:PTHR13763, Pfam:PF00533,
Pfam:PF00097,
PIRSF:PIRSF001734, PRINTS:PR00493, SMART:SM00292,
SMART:SM00184,
PROSITE:PS50172, PROSITE:PS00518, PROSITE:PS50089
...
DBSOURCE pdb: molecule 1T2U, chain 65, release Apr 22, 2004;
deposition: Apr 22, 2004;
class: Antitumor Protein;
source: Mol_id: 1; Organism_scientific: Homo Sapiens;
Organism_common: Human; Gene: Brca1; Expression_system:
Escherichia
Coli; Expression_system_common: Bacteria;
Exp. method: X-Ray Diffraction.
...
DBSOURCE pir: locus I49350;
summary: #length 1812 #molecular-weight 198788 #checksum
8813
;
genetic: #gene Brca1
;
superfamily: transcriptional regulator, BRCA1 type; RING
finger
homology
;
PIR dates: 02-Jul-1996 #sequence_revision 02-Jul-1996
#text_change
09-May-2004
.
...
DBSOURCE prf: locus 2202221A;
state: hepatoma/colonic tumor;
taxonomy: Mammalia.
My thought is, the first line would be the main DBLink object data,
with all subsequent lines as annotation objects (comments, DBLinks,
etc) in an annotation collection contained within the main DBLink
object. I don't think there would be any danger of circular
references if handled correctly.
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list