[BioPython] Sequence from Fasta
Giovanni Marco Dall'Olio
dalloliogm at gmail.com
Tue Jul 1 09:37:53 UTC 2008
On Tue, Jul 1, 2008 at 10:04 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Mon, Jun 30, 2008 at 10:40 AM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>
>>> But I'm looking for something like this:
>>>
>>> Name Sequence without linebreak
>>>
>>> Example:
>>>
>>> MySequence atgcgcgctcggcgcgctcgfcgcgccccccatggctcgcgcactacagcg
>>> MySequence2 atgcgctctgcgcgctcgatgtagaatatgagatctctatgagatcagcatca
>>
>> Bioperl's SeqIO has support for a 'tab sequence format' which is
>> similar to this[1].
>> Maybe it could be useful in the future to add support for such a
>> format in biopython.
>>
>> [1] http://www.bioperl.org/wiki/Tab_sequence_format
>>
>
> That does look fairly straight forward.
>
> Do you happen to know how BioPerl reacts when the first field has spaces?
> I would suggest treating the first field like the ">" line in a FASTA file and
> taking the first word as the id/name and the whole field as the description.
>
It ignores any field after the first space in the header.
For example:
$ cat >seq1.fasta
>seq1 field2 field3
acatcgatgcatgctagctactgtacgac
$ cat > fasta2tab.pl
my $seqin = Bio::SeqIO->newFh("-file" => "seq1.fasta", "-format" => "fasta");
my $seqout = Bio::SeqIO->newFh("-fh" => \*STDOUT, "-format" => "tab");
while (<$seqin>)
{
print $seqout $_;
}
$ perl fasta2tab.pl
seq1 acatcgatgcatgctagctactgtacgac
Do you need some help to implement this function?
> This format could be handy for some people who use the command line. By
> converting between FASTA and the tab format (which can be done with sed
> or awk), each sequence is on one line, so you use tools like grep to filter your
> records. Then convert back to fasta. There's a nice blog page I found some
> time ago where the author describes his workflow for this.
>
> Peter
>
--
-----------------------------------------------------------
My Blog on Bioinformatics (italian): http://bioinfoblog.it
More information about the Biopython
mailing list