[Biopython] Regarding blast record report

Ahmad Khalifa underoath006 at gmail.com
Thu Nov 8 16:57:09 UTC 2018


Hello,

I want to extract certain information from the biopython blast output.

In the header I often get variable amounts of information in the title, for
example:

gi|1335041855|gb|PNW76469.1| hypothetical protein CHLRE_11g467616v5
[Chlamydomonas reinhardtii]

gi|159481404|ref|XP_001698769.1| predicted protein [Chlamydomonas
reinhardtii] >gi|745998015|sp|A8JA42.1|IFT56_CHLRE RecName:
Full=Intraflagellar transport protein 56; AltName: Full=Abnormal dye
filling protein 13; AltName: Full=Tetratricopeptide repeat protein 26
homolog; Short=TPR repeat protein 26 homolog

gi|1335043717|gb|PNW78329.1| hypothetical protein CHLRE_09g401700v5
[Chlamydomonas reinhardtii]


I wonder what exactly is contained in this output, what's gi and gb? How
come sometimes I have a refseq or a uniprot accession code but not always
(the same information is not consistently present, very difficult to mine).
Is it possible to retrieve a uniprot accession code for my hits or a gene
name that I can map to an accession code using uniprots API?

What I really want is to mine the title to get every piece of information
separately (if it exists of course), are there parsers that do that?

Best regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20181108/f446e29e/attachment.html>


More information about the Biopython mailing list