[Bioperl-l] Sequence Validation
Matthew Laird
lairdm at sfu.ca
Wed Jun 11 11:05:00 EDT 2003
Hello, I hope this is the correct place to ask this...
I've been looking through the BioPerl documentation and the mailing list
archives and am wondering if there is anything built to do sequence
validation.
What I mean is this, there are functions as I see to do things such as
read in FASTA files (Bio::SeqIO) but how would one test if the file is
valid? We're attempting to create a web interface where people can submit
sequences for analysis, however people could submit faulty formatted
files. Example:
>
BRKISLIGLATMSMLAFNTSAFALGTASSNSGASGKHWSVVGGAALVQPK
NGKNAAQNTVKFGGDVAPTLSVTYYINDNVGFELWGITKKLSYTAKTDAS
Bio:SeqIO doesn't throw any error on this, what it does do is begin at the
line starting with "NGKN" as the beginning of the sequence. Yes this
sequence violates the FASTA format, but in web interfaces you can't be
sure people will submit a perfectly formatted file.
Can anyone point me in the direction of a module which will validate the
file as it's read for both format and that only allowed sequence letters
are included? Or is this something which needs to be written? Ideally
this should work for multiple formats as well.
If such a module doesn't exist I suppose I'll begin working on one and
submit the results to the collective since this seems like such a useful
tool.
Thanks.
--
Matthew Laird
SysAdmin/Web Developer, Brinkman Laboratory, MBB Dept.
Simon Fraser University
More information about the Bioperl-l
mailing list