[Bioperl-l] Help Parsing FASTA Sequence File

Fahmida fahmidaa120 at gmail.com
Thu Dec 9 12:50:13 UTC 2010


Hi,

I've several input 'score' files and their corresponding 'data' files like:
score1.txt data1.txt
score2.txt data2.txt
....
....

score1.txt

contig00002 length=671 numreads=17 1207 0.0
contig00003 length=637 numreads=26 1205 0.0
contig00052 length=535 numreads=10 607 e-176
contig00072 length=472 numreads=46 571 e-165
contig00019 length=667 numreads=5 474 e-136

This file has several rows and five columns.column 1-3 are
names/descriptions and column 4 (1207, 1205, etc) and column 5 (0.0,0.0,
e-176, etc). contain the scores. I want to make a list of TOP 2 names based
on column 4 score and whose column 5 score is not '0.0'. For example. for
the above data the output list would be:

contig00052 length=535 numreads=10
contig00072 length=472 numreads=46

Use the above list to extract data from the 'data1.txt':

data1.txt

>contig00001 length=567 numreads=35
GGGCTGACGTGGCCGCTAATACGACTCACTATAGGGAGAGAAAaCCAAGGGAGAAaGAAa
CTACACTACTAATGGAAAaGATCTACATGCTAGAAAAa
>contig00002 length=671 numreads=17
GGGgCTGACGTGgCcGCTAATACGACTCACTATAGGgAGAGTTACTGTGGAGGGAGAGGC
TTGCTCAAaTCCGCGTTCAAGGATTTCCAGATTGGTAAGAACTTCAGATT
>contig00052 length=535 numreads=10
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
CCCAGGTGCCGTTAGCCA
>contig00003 length=637 numreads=26
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
CCCAGGTGCCGTTAGCCAGAGCTG
>contig00072 length=472 numreads=46
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA
GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT
CTCAAGcACTAGGATC
>contig00019 length=504 numreads=5
GGGCTGACGTGGCCGCTAATACGACTCACTATAGGgAGAGATCTCACTAAAAAACTGGGG
ATAACGCCT


Example Output file:

>contig00052 length=535 numreads=10
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
CCCAGGTGCCGTTAGCCA
>contig00072 length=472 numreads=46
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA
GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT
CTCAAGcACTAGGATC

Any reply would be greatly appreciated.

-- 
View this message in context: http://old.nabble.com/Help-Parsing-FASTA-Sequence-File-tp30416193p30416193.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.




More information about the Bioperl-l mailing list