[Biopython-dev] Getting raw unparsed records with SeqIO?

Brad Chapman chapmanb at 50mail.com
Wed Feb 3 12:55:52 UTC 2010


Hi Peter;

> Another solution to this task (extracting the raw GenBank
> records from a large file) would seem to be to extend the
> Bio.SeqIO.index functionality. The patch I'm about to
> attach to Bug 3000 adds a new "get_raw" method to the
> dictionary like object we return. Unlike the __getitem__
> and get methods which return a SeqRecord this just gives
> the raw string.
[...]
> >>> from Bio import SeqIO
> >>> data = SeqIO.index("cor6_6.gb", "gb")
> >>> data.keys()
> ['L31939.1', 'AJ237582.1', 'X62281.1', 'AF297471.1', 'X55053.1', 'M81224.1']
> >>> print data.get_raw("X62281.1")
> LOCUS       ATKIN2        880 bp    DNA             PLN       23-JUL-1992
> DEFINITION  A.thaliana kin2 gene.
> ACCESSION   X62281
> ...
> //
> 
> What are people's thoughts on this?

Not much to add, but a +1 from me. This sounds like a solid solution
and makes sense for the use case I can think of, which is picking
out records of interest from a large file and re-writing them in a
smaller file.

Brad



More information about the Biopython-dev mailing list