[Bioperl-l] modify sequence name
Yifei Huang
huangyifeicmb at gmail.com
Fri Mar 9 21:50:09 UTC 2012
Hi Yang,
It is fairly easy to do that in perl. You may write a perl script like this:
Step 1: read file 2 line by line and use the function 'split' to separate
sequence Ids and taxon names. Then construct a hash table in which keys are
sequence ids and values are taxon names.
Step 2: read file 1 line by line. For each line with initial '>', use
regular expression to extract its sequence id and find the corresponding
taxon name from the hash table. Then reformat the sequence id and print new
id out (with initial '>'). For each line without initial '>', just print it
out directly.
If you are not very familiar with perl, I suggest you to learn it by
yourself. Beginning Perl for Bioinformatics is a good book for biologists.
Best,
Yifei
On Fri, Mar 9, 2012 at 2:25 PM, yang liu <yang.liu0508 at gmail.com> wrote:
> Dear colleagues,
>
>
>
> When I do Sanger sequencing, I get hundreds of sequences named by DNA
> Numbers, and for several genes. I need to add taxon name manually for each
> sequence. I wonder is there a way to change the names automatically?
>
>
> I have two .txt files.
>
> file 1, with seqeucens named by DNA Number:
> >2863
> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC
>
> >2864
> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT
> ........
>
>
> file 2, with DNA Number and taxa names, seperated by tabs
> 2863 Gelidium
> 2864 Poa
> ........
>
> I hope the final file to be like this,
> >Gelidium-2863
> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC
>
> >Poa-2864
> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT
> Any ideas? Anything help would be appreciated.
>
> Yang.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
Yifei Huang
Department of Biology
McMaster University
More information about the Bioperl-l
mailing list