[Biopython-dev] [BioPython] Next release plans; was: what to use for working with fasta sequences and alignments?

Michiel Jan Laurens de Hoon mdehoon at c2b2.columbia.edu
Tue Jan 16 17:51:23 UTC 2007


Peter wrote:
> Regarding the fix checked in on bug 1970 I still would prefer we call 
> the new XML iterator NCBIXML.Iterator(handle) rather than 
> NCBIXML.parse(handle) but I'll live ;)
> 
I chose "parse" because it is used in the old (Biopython release 1.42) 
Blast XML parser:

Old:
 >>> from Bio.Blast import NCBIXML
 >>> b_parser = NCBIXML.BlastParser()
 >>> b_record = b_parser.parse(blast_out)

New:
 >>> from Bio.Blast import NCBIXML
 >>> b_records = NCBIXML.parse(blast_out)
 >>> b_record = b_records.next() # Repeat to get subsequent Blast records

Whereas I am not dead set on "parse", it agrees with similar functions 
in Python:
1) Function name is a verb, not a noun
2) Function name describes what the function does, not what the function 
returns
3) Function names are short, and start with a lower case letter.

For example, to read a file line-by-line in Python:
 >>> inputfile = open("somefunnyfile")
# "open"; not "Iterator", nor "FileToLineIterator",
# even though "open" returns an iterator:
 >>> for line in inputfile:
...     print line

To read an image file with the Python Imaging Library:
 >>> import Image
 >>> im = Image.open("lena.ppm")
# "open"; not "Image", nor "FileNameToImage".

To read a Python object from a pickled file:
 >>> import pickle
 >>> inputfile = open("somepickledfile")
 >>> myobject = pickle.load(inputfile)
# "load"; not "FileToObject".
 >>> inputfile.close()

To parse an XML file with the sax parser framework in Python:
 >>> from xml.sax.handler import ContentHandler
 >>> from xml import sax
 >>> handler = SomeSubclassOfContentHandler()
 >>> inputfile = open("myxmlfile.xml")
 >>> sax.parse(inputfile, handler)
# "parse", same as in the new Bio.Blast.NCBIXML
 >>> inputfile.close()

So, for Bio.Blast.NCBIXML, good names would be "load", "read", "parse", 
or something similar. "Iterator" would not be consistent; besides, until 
recently I didn't know what an iterator is, so I doubt that new users 
would know.

What we could do is to have two functions in Bio.Blast.NCBIXML, perhaps 
one called "read" and the other "iterate", where the former returns a 
single Blast record (for an XML file containing only one Blast result), 
and the latter an iterator over multiple Blast records.

--Michiel.


-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032



More information about the Biopython-dev mailing list