[Bioperl-l] Output a subset of FASTA data from a single large file

Chris Fields cjfields at uiuc.edu
Fri Jun 9 17:59:18 UTC 2006


No; I saw the same thing here.  It's not FASTA in the traditional sense:

http://www.bioperl.org/wiki/FASTA_sequence_format

though he did get it to build a database successfully.  Well, 'success' in
the sense that no errors were thrown.  I've learned the absence of error
messages does not necessarily mean that everything went as planned; it
depends on how much error handling has been added to the module by the
submitting author.  

It's possible that the second annotation line was ignored completely.  I
suppose it's also possible that two sequences are entered into the database,
an empty sequence for the first '>' line and the full sequence for the
second.  It's all dependent on how the parser handles this.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of M Senthil Kumar
> Sent: Friday, June 09, 2006 5:21 PM
> To: simon andrews (BI)
> Cc: bioperl-l at lists.open-bio.org; Michael Oldham
> Subject: Re: [Bioperl-l] Output a subset of FASTA data from a single large
> file
> 
> 
> 
> On Fri, 9 Jun 2006, simon andrews (BI) wrote:
> |
> |
> |> -----Original Message-----
> |> From: bioperl-l-bounces at lists.open-bio.org
> |> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> |> Michael Oldham
> |> Sent: 09 June 2006 03:08
> |> To: bioperl-l at lists.open-bio.org
> |> Subject: [Bioperl-l] Output a subset of FASTA data from a
> |> single large file
> |>
> |> Dear all,
> |>
> |> I am a total Bioperl newbie struggling to accomplish a
> |> conceptually simple task.  I have a single large fasta file
> |> containing about 200,000 probe sequences (from an Affymetrix
> |> microarray), each of which looks like this:
> |>
> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
> |> >Antisense;
> |> TGGCTCCTGCTGAGGTCCCCTTTCC
> |
> |Unfortunately that's not Fasta format (which only has a single header
> |line starting with a '>'.  I'd imagine that most programs which deal
> |with fasta which read that entry would see it as two sequences, the
> |first of which is empty.
> |
> 
> [snipped]
> 
> hi,
> 
> I think the file is in fasta format and probably you might have seen it
> differently because of your mail transport agent.
> 
> Senthil
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list