[BioPython] I don't understand why SeqRecord.feature is a list
Giovanni Marco Dall'Olio
dalloliogm at gmail.com
Thu Jul 12 15:00:03 UTC 2007
Yes, it's true, it is something similar to the way SeqFeature should work.
But I just still don't get how to represent my genes in biopython :(
You know, I've printed the Bio module UML scheme from here:
http://www.pasteur.fr/recherche/unites/sis/formation/python/images/seq_class.png
and putted it in the wall above the monitor of my computer like a
poster.
So everyday, when I come at work, I see the Bio module UML scheme and
ask myself why SeqRecord.features is a list instead of a dictionary :)
2007/7/5, Peter <biopython at maubp.freeserve.co.uk>:
> Giovanni Marco Dall'Olio wrote:
> > Let's have a look at your example:
> > - we have a list of features like this:
> > list_features = ['GTAAGT', 'TACTAAC', 'TGT']
> >
> > - then we specify the meaning of these features in another dictionary:
> > splicesignal5 = list_features[0]
> > polypirimidinetract = list_features[1]
> > splicesignal3 = list_features[2]
> >
> > python passes the variables by value: this means that if you change
> > one of the values in the list_features list, then you have to update
> > all the variables which refer to it manually.
> >
> >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT']
> >>>> splicesignal5 = list_features[0]
> >>>> print splicesignal5
> > 'GTAAGT'
> >>>> list_features[0] = 'TTTTTTT'
> >>>> print splicesignal5
> > 'GTAAGT' # wrong!
> >>>> splicesignal5 = list_features[0] # have to update all the
> > variables which refer to list_features manually
> >>>> print splicesignal5'
> > 'TTTTTTT'
> >
> > This is why I prefer to save the positions of the features instead of
> > their values:
> >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT']
> >>>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1],
> > 'splicesignal3': [2]}
> >>>> def get_feature(feature_name): return
> > list_features[dict_aliases[feature_name]] # (this code doesn't work)
>
> ...
>
> > Another option could be to use references to memory positions instead
> > of dictionary keys, but I don't know how to implement this in python,
> > and I'm not sure it would be computationally convenient.
>
> Have you considered making "feature objects", where each object can hold
> multiple pieces of information such as a name, alias, type - as well as
> the sequence data itself. You may wish to create your own class here, or
> try and use the existing Biopython SeqFeature object.
>
> You could then use a list to hold your feature objects, or a dictionary
> keyed on the alias perhaps. Or both.
>
> e.g.
>
> class Feature :
> #Very simple class which could be extended
> def __init__(self, seq_string) :
> self.seq = seq_string
>
> def __repr__(self) :
> #Use id(self) is to show the memory location (in hex), just
> #to show difference between two instance with same seq
> return "Feature(%s) instance at %s" \
> % (self.seq, hex(id(self)))
>
>
> list_features = [Feature('GTAAGT'),
> Feature('TACTAAC'),
> Feature('TGT')]
>
> splicesignal5 = list_features[0]
> print splicesignal5
> print list_features[0]
>
> print "EDITING first object in the list:"
> list_features[0].seq = 'TTTTTTT'
>
> print splicesignal5 #changed, now TTTTTTT
> print list_features[0]
>
> print "REPLACING first object in the list:"
> list_features[0] = Feature('GGGGGG')
>
> print splicesignal5 #still points to old object, TTTTTTT
> print list_features[0]
>
> --
>
> I'm not sure if that is closer to what you wanted, or not.
>
> Peter
>
>
--
-----------------------------------------------------------
My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com
More information about the Biopython
mailing list