[Biopython] EMBL DNA records with locations referencing other sequences

Adam Sjøgren asjo at koldfront.dk
Fri Oct 4 17:02:41 UTC 2019


Peter writes:

> That's more or less exactly what I had in mind. Do you have this on a
> branch of a public git repository?

I will make one - I thought I'd get some feedback first :-)

> One tweak I was considering is accepting a dictionary-like object where
> the values could be SeqRecord rather than Seq-like objects. The reason
> being that those are easy to get via Bio.SeqIO.index(...) or
> Bio.SeqIO.index_db(...), and should be perfect for when you have already
> downloaded the referenced accessions (e.g. a folder of GenBank files).

If parent_sequence is a Seq object, references must be as well. If
parent_sequence is a SeqRecord, references must be as well [I think].

At least those are the combinations I test, as I found that extract()
returns a Seq if you give it one, and a SeqRecord() if you give it that:

  >>> from Bio import Seq, SeqRecord
  >>> from Bio.SeqFeature import FeatureLocation
  >>> location = FeatureLocation(1, 2)
  >>> location.extract(Seq.Seq("actg"))
  Seq('c')
  >>> location.extract(SeqRecord.SeqRecord(Seq.Seq("actg")))
  SeqRecord(seq=Seq('c'), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])
  >>> 

so I thought it would make sense to just have references follow that
pattern of containing the same type of objects.

Would you prefer for extract() to always take SeqRecord objects in the
references dictionary, and "convert" them to Seq objects when necessary?

(I'm coming from a narrow use-case, so my "overview" of what a better
API would be is limited.)


  Best regards,

    Adam

-- 
 "There is no mail software that decodes it into human-       Adam Sjøgren
  readable one."                                         asjo at koldfront.dk



More information about the Biopython mailing list