<div dir="ltr"><div dir="ltr">Hello, <div><br></div><div>I want to extract certain information from the biopython blast output. </div><div><br></div><div>In the header I often get variable amounts of information in the title, for example:</div><div><br></div><div>gi|1335041855|gb|PNW76469.1| hypothetical protein CHLRE_11g467616v5 [Chlamydomonas reinhardtii]</div><div><br></div><div>gi|159481404|ref|XP_001698769.1| predicted protein [Chlamydomonas reinhardtii] >gi|745998015|sp|A8JA42.1|IFT56_CHLRE RecName: Full=Intraflagellar transport protein 56; AltName: Full=Abnormal dye filling protein 13; AltName: Full=Tetratricopeptide repeat protein 26 homolog; Short=TPR repeat protein 26 homolog</div><div><br></div><div>gi|1335043717|gb|PNW78329.1| hypothetical protein CHLRE_09g401700v5 [Chlamydomonas reinhardtii]</div><div><br></div><div><br></div><div>I wonder what exactly is contained in this output, what's gi and gb? How come sometimes I have a refseq or a uniprot accession code but not always (the same information is not consistently present, very difficult to mine). Is it possible to retrieve a uniprot accession code for my hits or a gene name that I can map to an accession code using uniprots API? </div><div><br></div><div>What I really want is to mine the title to get every piece of information separately (if it exists of course), are there parsers that do that? </div><div><br></div><div>Best regards. </div></div></div>