[Bioperl-l] fasta format

Lincoln Stein lstein@cshl.org
Mon, 26 Aug 2002 11:09:05 -0400


Sloppy of me.  The way to parse the header using Bill's rules is:

	($id,$description) = $header =~ /^>(\S*)\s*(.*)$/;

In practice, one will need to do further parsing of the description.

This gives empty strings for both the id and description with a blank header 
line, or one that contains one or more spaces.

Lincoln

On Monday 26 August 2002 11:02 am, Lincoln Stein wrote:
> I apologize for the previous message; I didn't see Bill's response before I
> sent it.
>
> As I understand it, FASTA does makes a distinction between the ID and the
> description (Bill, please confirm). The regular expression to match the two
> is:
>
> 	/^>(\S+)\s+(.*)$/
>
> So, given that Bill has confirmed that empty IDs are valid, if there is a
> space after the ">", then what comes afterward should be interpreted as the
> description, not the ID.
>
> Lincoln
>
> On Friday 23 August 2002 05:38 pm, Wiepert, Mathieu wrote:
> > > I have seen many people use the perfectly acceptable
> > >
> > > 	>    [blanks] description 1
> > >
> > > 	asdf
> > >
> > > 	>    description2
> > >
> > > 	qwerty
> >
> > Thanks for he explanation, that sounds very reasonable, and I think is
> > what should be implemented.  If there is a space, I would not expect the
> > first word of the description to become my id.  For instance, given a
> > header like this
> >
> > >   Hi I am the header description
> >
> > asdf
> >
> > bioperl makes 'Hi' the id.  This is because
> >
> > my ($id,$fulldesc) = $top =~ /^\s*(\S+)\s*(.*)/
> >
> > parses "   Hi I am the header description" that way.  I would not expect
> > that behavior.
> >
> > If anyone can recall why this might be, let me know.  I saw some threads
> > on what to do with a blank sequence, nothing with a blank header, or a
> > header missing an id.  If people like it the way it is, I can put a
> > comment in the code to that effect.  However, I would hate not to touch
> > it just because people can't remember why it is the way it is.
> >
> > I'll mess around and execute the test scripts, see if those break with
> > any of the changes I was testing.
> >
> > -Mat
> >
> > > Bill Pearson
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================