[Bioperl-l] modify sequence name

Sun Mar 11 01:46:46 UTC 2012

Since this is a bioperl list, I would suggest a more bioperl solution that doesn't require you to do the parsing or splitting, just read the sequences in with Bio::SeqIO and manipulate the id which you get/set with seq->display_id methods.

Did you look at the SeqIO HOWTO on the bioperl website?

Jason 
On Mar 9, 2012, at 1:50 PM, Yifei Huang wrote:

> Hi Yang,
> 
> It is fairly easy to do that in perl. You may write a perl script like this:
> 
> Step 1: read file 2 line by line and use the function 'split' to separate
> sequence Ids and taxon names. Then construct a hash table in which keys are
> sequence ids and values are taxon names.
> 
> Step 2: read file 1 line by line. For each line with initial '>', use
> regular expression to extract its sequence id and find the corresponding
> taxon name from the hash table. Then reformat the sequence id and print new
> id out (with initial '>'). For each line without initial '>', just print it
> out directly.
> 
> If you are not very familiar with perl, I suggest you to learn it by
> yourself. Beginning Perl for Bioinformatics is a good book for biologists.
> 
> Best,
> 
> Yifei
> 
> On Fri, Mar 9, 2012 at 2:25 PM, yang liu <yang.liu0508 at gmail.com> wrote:
> 
>> Dear colleagues,
>> 
>> 
>> 
>> When I do Sanger sequencing, I get hundreds of sequences named by DNA
>> Numbers, and for several genes. I need to add taxon name manually for each
>> sequence. I wonder is there a way to change the names automatically?
>> 
>> 
>> I have two .txt files.
>> 
>> file 1, with seqeucens named by DNA Number:
>>> 2863
>> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
>> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC
>> 
>>> 2864
>> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
>> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT
>> ........
>> 
>> 
>> file 2, with DNA Number and taxa names, seperated by tabs
>> 2863 Gelidium
>> 2864 Poa
>> ........
>> 
>> I hope the final file to be like this,
>>> Gelidium-2863
>> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
>> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC
>> 
>>> Poa-2864
>> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
>> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT
>> Any ideas? Anything help would be appreciated.
>> 
>> Yang.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> Yifei Huang
> Department of Biology
> McMaster University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org