[Biopython] Tutorial Question 7.4 alignment.title
Ara Kooser
akooser at unm.edu
Fri Oct 8 15:45:31 UTC 2010
Peter,
Thanks for your reply. I started to fiddle around with parsing the
string last night but haven't made much progress.
At the moment the output looks like this:
****Alignment****
sequence: gi|302529614|ref|ZP_07281956.1| predicted protein
[Streptomyces sp. AA4] >gi|302438509|gb|EFL10325.1| predicted protein
[Streptomyces sp. AA4]
e value: 1.89229e-46
length: 1109
start: 7
end: 414
So what I want from the sequence string is the following:
[Streptomyces sp. AA4]
ZP_07281956.1
printed out as separated lines like the rest of the output.
After that is figured out I want to put all the information in columns
so it can be read into a spreadsheet in OO so that it looks like this:
Name Locus # E_value Length Start End
Regards,
Ara
On Oct 8, 2010, at 3:30 AM, Peter wrote:
> On Fri, Oct 8, 2010 at 4:06 AM, Ara Kooser <akooser at unm.edu> wrote:
>> Hello all,
>>
>> I am a new user to Biopython. I've been working my way through the
>> tutorial. I have a question about how the alignment.title works in
>> the
>> example given in section 7.4 of the tutorial. I wrote the following
>> code:
>>
>> from Bio.Blast import NCBIXML
>>
>> E_VALUE_THRESH = 1e-30
>>
>> result_handle = open("test.xml")
>> blast_records = NCBIXML.parse(result_handle)
>> blast_record = blast_records.next()
>>
>> for alignment in blast_record.alignments:
>> for hsp in alignment.hsps:
>> if hsp.expect < E_VALUE_THRESH:
>> print '****Alignment****'
>> print 'sequence:', alignment.title
>> print 'e value:', hsp.expect
>> print 'length:', alignment.length
>> print 'start:', hsp.query_start
>> print 'end:',hsp.query_end
>>
>> To look at a .xml file that was produced by BLAST. I was wondering
>> if there
>> was a way to break up the string for information produced by the:
>>
>> print 'sequence:', alignment.title
>>
>> Basically I would like the organisms name first, followed by the
>> locus
>> number. I wasn't sure how to split up the print command.
>>
>> I looked at the docs over at http://biopython.org/DIST/docs/api/ to
>> see if
>> there was a tag specifically for the locus number and organism name.
>>
>> Thank you for your time and help.
>>
>> Regards,
>> Ara
>
> Hi Ara,
>
> An example of the output you are getting and what you want
> would help, but I think this isn't possible in general.
>
> As I recall, the locus number and organism name information is
> just part of the original identifier and/or description in the FASTA
> file used to build the BLAST database. The NCBI tend to include
> the species in the description within square brackets - but this is
> just their convention, it is not a nicely tagged part of the BLAST
> output which the parser could spot.
>
> Basically I think you will have to parse the string yourself.
>
> Peter
>
> P.S. Alternatively if you want the organism name and have the
> GI number (or similar) this can be mapped to the organism via
> the NCBI taxonomy database (either online via Entrez or
> by parsing a downloaded copy of the mapping).
More information about the Biopython
mailing list