[Bioperl-l] Help Parsing FASTA Sequence File
Fahmida
fahmidaa120 at gmail.com
Thu Dec 9 12:50:13 UTC 2010
Hi,
I've several input 'score' files and their corresponding 'data' files like:
score1.txt data1.txt
score2.txt data2.txt
....
....
score1.txt
contig00002 length=671 numreads=17 1207 0.0
contig00003 length=637 numreads=26 1205 0.0
contig00052 length=535 numreads=10 607 e-176
contig00072 length=472 numreads=46 571 e-165
contig00019 length=667 numreads=5 474 e-136
This file has several rows and five columns.column 1-3 are
names/descriptions and column 4 (1207, 1205, etc) and column 5 (0.0,0.0,
e-176, etc). contain the scores. I want to make a list of TOP 2 names based
on column 4 score and whose column 5 score is not '0.0'. For example. for
the above data the output list would be:
contig00052 length=535 numreads=10
contig00072 length=472 numreads=46
Use the above list to extract data from the 'data1.txt':
data1.txt
>contig00001 length=567 numreads=35
GGGCTGACGTGGCCGCTAATACGACTCACTATAGGGAGAGAAAaCCAAGGGAGAAaGAAa
CTACACTACTAATGGAAAaGATCTACATGCTAGAAAAa
>contig00002 length=671 numreads=17
GGGgCTGACGTGgCcGCTAATACGACTCACTATAGGgAGAGTTACTGTGGAGGGAGAGGC
TTGCTCAAaTCCGCGTTCAAGGATTTCCAGATTGGTAAGAACTTCAGATT
>contig00052 length=535 numreads=10
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
CCCAGGTGCCGTTAGCCA
>contig00003 length=637 numreads=26
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
CCCAGGTGCCGTTAGCCAGAGCTG
>contig00072 length=472 numreads=46
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA
GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT
CTCAAGcACTAGGATC
>contig00019 length=504 numreads=5
GGGCTGACGTGGCCGCTAATACGACTCACTATAGGgAGAGATCTCACTAAAAAACTGGGG
ATAACGCCT
Example Output file:
>contig00052 length=535 numreads=10
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
CCCAGGTGCCGTTAGCCA
>contig00072 length=472 numreads=46
GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA
GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT
CTCAAGcACTAGGATC
Any reply would be greatly appreciated.
--
View this message in context: http://old.nabble.com/Help-Parsing-FASTA-Sequence-File-tp30416193p30416193.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
More information about the Bioperl-l
mailing list