[Biopython] The problem of using Bio.SwissProt
Peter Cock
p.j.a.cock at googlemail.com
Tue Sep 10 16:26:11 UTC 2019
Hello Dechang,
It is entirely possible that the file format has changed a little
since the last major work on Bio.SwissProt.KeyWList back in 2008:
https://github.com/biopython/biopython/blob/master/Bio/SwissProt/KeyWList.py
I suggest you open an issue on our Github repository with a specific
example (UniProt URL), showing the mismatch in fields. If you want to
work on a pull request to cope with the changes, even better :)
If you are familiar with SwissProt / UniProt and can find a relevant
announcement about these changes, that would also be very helpful.
Thank you,
Peter
On Tue, Sep 10, 2019 at 4:25 PM De-Chang Yang
<yangdc at mail.cbi.pku.edu.cn> wrote:
>
> Dear Biopython team,
> Hi, this is Dechang Yang.
> I want to search some information from swissProt databases by using BioPython. Then i find the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc139.
> But to my surprise, i find the KeyWList module of Bio.SwissProt seems to be out of date......
> When i type: help(KeyWList)
> I get the class infomation:
>
> | --------- --------------------------- ----------------------
> | Line code Content Occurrence in an entry
> | --------- --------------------------- ----------------------
> | ID Identifier (keyword) Once; starts a keyword entry
> | IC Identifier (category) Once; starts a category entry
> | AC Accession (KW-xxxx) Once
> | DE Definition Once or more
> | SY Synonyms Optional; once or more
> | GO Gene ontology (GO) mapping Optional; once or more
> | HI Hierarchy Optional; once or more
> | WW Relevant WWW site Optional; once or more
> | CA Category Once per keyword entry; absent
> | in category entries
>
> You can see the Line Code include some KEYS, but i have to say those KEYS are inconsistent with the lastest swissProt KeyWList file. Which are like the content below:(DR CC RX and most of the lines will be ignored by the KeyWList module)
>
>
> RP TISSUE SPECIFICITY, AND SUBCELLULAR LOCATION.
> RX PubMed=24154973; DOI=10.1002/ijc.28557;
> RA Peltekova V.D., Lemire M., Qazi A.M., Zaidi S.H., Trinh Q.M.,
> RA Bielecki R., Rogers M., Hodgson L., Wang M., D'Souza D.J., Zandi S.,
> RA Chong T., Kwan J.Y., Kozak K., De Borja R., Timms L., Rangrej J.,
> RA Volar M., Chan-Seng-Yue M., Beck T., Ash C., Lee S., Wang J.,
> RA Boutros P.C., Stein L.D., Dick J.E., Gryfe R., McPherson J.D.,
> RA Zanke B.W., Pollett A., Gallinger S., Hudson T.J.;
> RT "Identification of genes expressed by immune cells of the colon that
> RT are regulated by colorectal cancer-associated variants.";
> RL Int. J. Cancer 134:2330-2341(2014).
> CC -!- SUBCELLULAR LOCATION: Membrane {ECO:0000269|PubMed:24154973};
> CC Single-pass membrane protein {ECO:0000269|PubMed:24154973}.
> CC Note=Co-localizes with crystalloid granules of eosinophils and
> CC granular organelles of mast cells, neutrophils, macrophages and
> CC dendritic cells.
> CC -!- TISSUE SPECIFICITY: Expressed in gastrointestinal and immune
> CC tissue, as well as prostate, testis and ovary. Expressed in lamina
> CC propria and eosinophils but not in epithelial cells. Expression is
> CC greater in benign adjacent tissues than in colon tumors.
> CC {ECO:0000269|PubMed:24154973}.
> CC -----------------------------------------------------------------------
> CC Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
> CC Distributed under the Creative Commons Attribution (CC BY 4.0) License
> CC -----------------------------------------------------------------------
> DR EMBL; AK127703; -; NOT_ANNOTATED_CDS; mRNA.
> DR EMBL; AP002448; -; NOT_ANNOTATED_CDS; Genomic_DNA.
> DR RefSeq; NP_001289573.1; NM_001302644.1.
> DR RefSeq; NP_001289574.1; NM_001302645.1.
> DR RefSeq; NP_001289575.1; NM_001302646.1.
> DR RefSeq; NP_001289576.1; NM_001302647.1.
> DR RefSeq; NP_001289577.1; NM_001302648.1.
> DR RefSeq; NP_997312.1; NM_207429.3.
> DR BioMuta; HGNC:33789; -.
> DR DMDM; 74711342; -.
> DR PaxDb; Q6ZS62; -.
> DR PRIDE; Q6ZS62; -.
> DR ProteomicsDB; 68193; -.
> DR GeneID; 399948; -.
> DR KEGG; hsa:399948; -.
> DR CTD; 399948; -.
> DR DisGeNET; 399948; -.
> DR GeneCards; COLCA1; -.
> DR HGNC; HGNC:33789; COLCA1.
> DR MIM; 615693; gene.
> DR neXtProt; NX_Q6ZS62; -.
> DR PharmGKB; PA164716768; -.
> DR eggNOG; ENOG410JDIH; Eukaryota.
> DR eggNOG; ENOG4111630; LUCA.
> DR HOGENOM; HOG000111748; -.
> DR InParanoid; Q6ZS62; -.
> DR OrthoDB; 1566774at2759; -.
> DR PhylomeDB; Q6ZS62; -.
> DR TreeFam; TF354066; -.
> DR ChiTaRS; COLCA1; human.
> DR GenomeRNAi; 399948; -.
> DR PRO; PR:Q6ZS62; -.
> DR Proteomes; UP000005640; Unplaced.
> DR GO; GO:0016021; C:integral component of membrane; IEA:UniProtKB-KW.
> DR GO; GO:0016020; C:membrane; IDA:UniProtKB.
> PE 2: Evidence at transcript level;
> KW Complete proteome; Membrane; Reference proteome; Transmembrane;
> KW Transmembrane helix.
> FT CHAIN 1 124 Colorectal cancer-associated protein 1.
> FT /FTId=PRO_0000340692.
>
>
>
>
> Could you please help me to find if there are any mistakes i have made?
>
> Best Regards,
> Dechang
>
> _______________________________________________
> Biopython mailing list - Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list