[Bioperl-l] Bio:SeqIO fasta

Jason Stajich jason@chg.mc.duke.edu
Fri, 29 Jun 2001 10:06:41 -0400 (EDT)


Stephen - the Bio::SeqIO system does expect you to get the format right.
At least we don't make you decide if it should be protein or dna! =)
One thing you can do is try and write a format_guesser method -
if the line starts with '>' it is probably fasta, if the line starts with
LOCUS it is probably GenBank, if it starts with ID then it is probably
EMBL, otherwise you can default to 'raw' (note only one sequence at a time
would then be allowed).

This might be a useful method that could be part of a utility or in the
examples directory?  

I assume you can figure out how to use IO::String to wrap your sequence
string into a filehandle that is then passed to SeqIO?

Let us know if you need more help.

On Thu, 28 Jun 2001, Stephen Baird wrote:

> Hi,
>   I was using Bio:SeqIO in a CGI script taking in some data that's
> supposed to be in fasta format from a text area. I noticed that if there
> was no definition line (ie. one that starts with a >) it would take the
> first line of sequence and put that as the defline. 
>   I tried "the simplest ever reformatter" to recheck it and also noticed
> that putting raw in as the format makes the IO.pm module fail. 
> 
> I was going to use Bio::SeqIO to help clean up user's sequence data pasted
> in the wrong format. Should I just clean up the input using perl
> or is there something in Bioperl that will test the sequence's
> format....sort of like Don Gilbert's 'readseq'?
> 
> Thanks for all the hard work,
> 
> |--------------------------------------------------------------------|
> | Stephen Baird                        sbaird@mgcheo.med.uottawa.ca  |
> | Molecular Genetics                       tel: 613-738-3925         |
> | Children's Hospital of Eastern Ontario   fax: 613-738-4833         |
> | 401 Smyth Rd.                                                      |
> | Ottawa, Ontario                                                    |
> | Canada                                                             |
> | K1H 8L1                                                            |
> |--------------------------------------------------------------------|
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/