[Bioperl-l] modify sequence name
Jason Stajich
jason.stajich at gmail.com
Sun Mar 11 01:46:46 UTC 2012
Since this is a bioperl list, I would suggest a more bioperl solution that doesn't require you to do the parsing or splitting, just read the sequences in with Bio::SeqIO and manipulate the id which you get/set with seq->display_id methods.
Did you look at the SeqIO HOWTO on the bioperl website?
Jason
On Mar 9, 2012, at 1:50 PM, Yifei Huang wrote:
> Hi Yang,
>
> It is fairly easy to do that in perl. You may write a perl script like this:
>
> Step 1: read file 2 line by line and use the function 'split' to separate
> sequence Ids and taxon names. Then construct a hash table in which keys are
> sequence ids and values are taxon names.
>
> Step 2: read file 1 line by line. For each line with initial '>', use
> regular expression to extract its sequence id and find the corresponding
> taxon name from the hash table. Then reformat the sequence id and print new
> id out (with initial '>'). For each line without initial '>', just print it
> out directly.
>
> If you are not very familiar with perl, I suggest you to learn it by
> yourself. Beginning Perl for Bioinformatics is a good book for biologists.
>
> Best,
>
> Yifei
>
> On Fri, Mar 9, 2012 at 2:25 PM, yang liu <yang.liu0508 at gmail.com> wrote:
>
>> Dear colleagues,
>>
>>
>>
>> When I do Sanger sequencing, I get hundreds of sequences named by DNA
>> Numbers, and for several genes. I need to add taxon name manually for each
>> sequence. I wonder is there a way to change the names automatically?
>>
>>
>> I have two .txt files.
>>
>> file 1, with seqeucens named by DNA Number:
>>> 2863
>> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
>> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC
>>
>>> 2864
>> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
>> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT
>> ........
>>
>>
>> file 2, with DNA Number and taxa names, seperated by tabs
>> 2863 Gelidium
>> 2864 Poa
>> ........
>>
>> I hope the final file to be like this,
>>> Gelidium-2863
>> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
>> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC
>>
>>> Poa-2864
>> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
>> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT
>> Any ideas? Anything help would be appreciated.
>>
>> Yang.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> Yifei Huang
> Department of Biology
> McMaster University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
More information about the Bioperl-l
mailing list