[Biopython] removing redundant sequence
Brad Chapman
chapmanb at 50mail.com
Thu Apr 22 12:18:10 UTC 2010
Bala;
> > I created a sample fasta
> > file with two redundant sequences. But when i use checksums seguid to spot
> > the redundancies, it spots only the first one.
> What you should do is loop over the records and keep a record
> of the checksums you have saved, and use that to ignore duplicates.
> I would use a python set rather than a python list for speed.
>
> You could do this with a for loop. However, I would probably use an
> iterator based approach with a generator function - I think it is more
> elegant but perhaps not so easy for a beginner:
[... Nice code example from Peter ..]
This is a nice problem example and discussion. Bala, it sounds like
Peter provided some useful example code to solve this. Once you use
this to get together a program that solves your problem, it would be
very helpful if you could write it up as a Cookbook entry:
http://biopython.org/wiki/Category:Cookbook
That would help others in the future who will be tackling similar
issues. Thanks much,
Brad
More information about the Biopython
mailing list