[Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Jun 30 14:21:41 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2531





------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-06-30 10:21 EST -------
Can I repeat my request that you upload an example file (by creating an
attachment to this bug) of a FASTA and NEXUS file that doesn't work for you.

Here is a small Nexus file I just created by hand, with repeated taxon
CYS1_DICDI (with almost the same sequence), and then below some example code
using Bio.Nexus to parse it.

==================================
#NEXUS
[TITLE: NoName]

begin data;
dimensions ntax=4 nchar=50;
format interleave datatype=protein   gap=- symbols="FSTNKEYVQMCLAWPHDRIG";

matrix
CYS1_DICDI          -----MKVIL LFVLAVFTVF VSS------- --------RG IPPEEQ---- 
ALEU_HORVU          MAHARVLLLA LAVLATAAVA VASSSSFADS NPIRPVTDRA ASTLESAVLG 
CATH_HUMAN          ------MWAT LPLLCAGAWL LGV------- -PVCGAAELS VNSLEK----
CYS1_DICDI          -----MKVIL LFVLAVFTVF VSS------- --------RG IPPEEQ---X
;
end; 
==================================

Then in python,
>>> filename = ...
>>> handle = open(filename)
>>> from Bio.Nexus import Nexus
>>> n = Nexus.Nexus(handle)
>>> print n.matrix.keys()
['CATH_HUMAN', 'CYS1_DICDI', 'CYS1_DICDI.copy', 'ALEU_HORVU']
>>> n.matrix['CYS1_DICDI']
Seq('-----MKVILLFVLAVFTVFVSS---------------RGIPPEEQ----', IUPACProtein())
>>> n.matrix['CYS1_DICDI.copy']
Seq('-----MKVILLFVLAVFTVFVSS---------------RGIPPEEQ---X', IUPACProtein())

Note that Bio.Nexus has automatically renamed the duplicate entry
'CYS1_DICDI.copy' and that their different sequences have been loaded
correctly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list