[Biopython] remove list redundancy
ferreirafm at usp.br
ferreirafm at usp.br
Fri Mar 23 21:55:27 UTC 2012
Hi Biopy users,
I have a mult-sequence fasta file which I've read as a list. Is there
a clever way/method to remove redundant sequences?
Thanks in advance,
Fred
### CODE:
def redundancy(fastafile):
f=open(fastafile, 'r')
record = list(SeqIO.parse(f,"fasta"))
new_rec = record
f.close
print len(record)
for i in range(len(record)):
for j in range(len(record)):
if i < j:
if record[i].seq == record[j].seq:
del new_rec[j]
print len(new_rec)
### RESULTS:
$ redundancy.py -run all_emm_fake.fasta
823
/usr/lib64/python2.7/site-packages/Bio/Seq.py:197: FutureWarning: In
future comparing Seq objects will use string comparison (not object
comparison). Incompatible alphabets will trigger a warning (not an
exception). In the interim please use id(seq1)==id(seq2) or
str(seq1)==str(seq2) to make your code explicit and to avoid this
warning.
"and to avoid this warning.", FutureWarning)
823
### EXPECTING:
Worse, the function above is not working. I was expecting 823 before
and 822 after running it.
More information about the Biopython
mailing list