[Bioperl-l] Output a subset of FASTA data from a single large file

Fri Jun 9 22:21:11 UTC 2006

On Fri, 9 Jun 2006, simon andrews (BI) wrote:
|
|
|> -----Original Message-----
|> From: bioperl-l-bounces at lists.open-bio.org
|> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
|> Michael Oldham
|> Sent: 09 June 2006 03:08
|> To: bioperl-l at lists.open-bio.org
|> Subject: [Bioperl-l] Output a subset of FASTA data from a
|> single large file
|>
|> Dear all,
|>
|> I am a total Bioperl newbie struggling to accomplish a
|> conceptually simple task.  I have a single large fasta file
|> containing about 200,000 probe sequences (from an Affymetrix
|> microarray), each of which looks like this:
|>
|> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
|> >Antisense;
|> TGGCTCCTGCTGAGGTCCCCTTTCC
|
|Unfortunately that's not Fasta format (which only has a single header
|line starting with a '>'.  I'd imagine that most programs which deal
|with fasta which read that entry would see it as two sequences, the
|first of which is empty.
|

[snipped]

hi,

I think the file is in fasta format and probably you might have seen it
differently because of your mail transport agent.

Senthil