[Bioperl-l] Genbank parsers
Heikki Lehvaslaiho
heikki@ebi.ac.uk
Wed, 27 Mar 2002 08:31:12 +0000
Hong Qin wrote:
>
> Hi all,
>
> You can tell my laziness from this question. Could someone suggest a good
> parser to take CDS sequences from genbank formatted files. (The *.gbff file
> from NCBI FTP site). If the output file is FASTA, it would be great.
I think this should be in FAQ. Elia's answer gives the right pointers how to
do it.
However, the problem is a bit more complex than that. Quite often
the CDS feature contains a join statement:
FT CDS join(U21925.1:818..987,U21926.1:258..420,
FT U21927.1:428..520,U21928.1:196..336,U21929.1:279..415,
FT U21930.1:895..1014,516..708)
and unless you are able to go and fetch the needed entry from a ramdom
access data store, you can not do it.
This would be nice task for someone wanting to start programming in
bioperl... Bio::Tools::CDSExtractor which would use Bio::SeqIO and
Bio::DB::BioFetch (Using the Registry would be even better). A more generic
module would be Bio::Tools::SeqFeatureExtractor.
The EMBOSS program which is able to do this (given the sequence database is
populated) is coderet
http://www.uk.embnet.org/Software/EMBOSS/Apps/coderet.html
-heikki
> Thanks a lot,
>
> Hong
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________