[Bioperl-l] problem with bp_genbank2gff.pl

sophiezhouli at gmail.com sophiezhouli at gmail.com
Thu Nov 28 07:15:33 UTC 2013


Dear all,
I am using* BioPerl-1.6.1*, and the operating system is *Mac OS X version 
10.8.5.*
I am trying to *convert a local GenBank file to GFF file using 
bp_genbank2gff.pl,* using the following command,
$ bp_genbank2gff.pl M21017.gb --stdout > M21017.gff3
And I got the following message, I am not sure if this is an error:
Replacement list is longer than search list at 
/Users/zhouli/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Bio/Range.pm 
line 251.
UNIVERSAL->import is deprecated and will be removed in a future perl at 
/Users/zhouli/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Bio/Tree/TreeFunctionsI.pm 
line 94.
# working on region:M21017, Drosophila melanogaster, 09-MAY-1994, 
D.melanogaster 18S, 5.8S 2S and 28S rRNA genes, complete, and 18S rRNA 
gene, 5' end, clone pDm238.

***************************************************************************
And the output file M21017.gff3 is attached.

$head M21017.gff3
##gff-version 3
M21017    Genbank    region    1    12026    .    .    .    
ID=M21017;Note=D.melanogaster%2018S%2C%205.8S%202S%20and%2028S%20rRNA%20genes%2C%20complete%2C%20and%2018S%20rRNA%20gene%2C%205%27%20end%2C%20clone%20pDm238.;Alias=M29800
M21017    Genbank    region    1    12026    .    +    .    
ID=Drosophila%20melanogaster;db_xref=taxon%3A7227;mol_type=genomic%20DNA
M21017    Genbank    gene    1    12026    .    +    .    ID=18S%20rRNA
M21017    Genbank    RNA    1    7232    .    +    .    
ID=18S%20rRNA;note=rRNA%20primary%20transcript
M21017    Genbank    rRNA    1    1995    .    +    .    
ID=18S%20rRNA;product=18S%20ribosomal%20RNA
M21017    Genbank    gene    2722    2844    .    +    .    ID=5.8S%20rRNA
M21017    Genbank    rRNA    2722    2844    .    +    .    
ID=5.8S%20rRNA;product=5.8S%20ribosomal%20RNA
M21017    Genbank    gene    2873    2902    .    +    .    ID=2S%20rRNA
M21017    Genbank    rRNA    2873    2902    .    +    .    
ID=2S%20rRNA;product=2S%20ribosomal%20RNA


When I test another genbank file 
$ bp_genbank2gff.pl WSSV-AF369029-GenBank.gb --stdout > 
WSSV-AF369029-GenBank.gff3 
I also got the error message:
Replacement list is longer than search list at 
/Users/zhouli/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Bio/Range.pm 
line 251.
UNIVERSAL->import is deprecated and will be removed in a future perl at 
/Users/zhouli/perl5/perlbrew/perls/perl-5.18.1/lib/site_perl/5.18.1/Bio/Tree/TreeFunctionsI.pm 
line 94.
$ head WSSV-AF369029-GenBank.gff3
##gff-version 3
AF369029    Genbank    region    1    292967    .    .    .    
ID=AF369029;Alias=AY864671;Note=White%20spot%20syndrome%20virus%2C%20complete%20genome.
AF369029    Genbank    region    1    292967    .    +    .    
ID=White%20spot%20syndrome%20virus;mol_type=genomic%20DNA;isolate=WSSV-TH;country=Thailand;db_xref=taxon%3A342409
AF369029    Genbank    gene    1    615    .    +    .    
ID=VP28;experiment=experimental%20evidence%2C%20no%20additional%20details%20recorded;note=envelope%20protein
AF369029    Genbank    CDS    1    615    .    +    .    
Parent=VP28.t00;translation=MDLSFTLSVVSAILAITAVIAVFIVIFRYHNTVTKTIETHTDNIETNMDENLRIPVTAEVGSGYFKMTDVSFDSDTLGKIKIRNGKSDAQMKEEDADLVITPVEGRALEVTVGQNLTFEGTFKVWNNTSRKINITGMQMVPKINPSKAFVGSSNTSSFTPVSIDEDEVGTFVCGTTFGAPIAATAGGNLFDMYVHVTYSGTETE;db_xref=GI%3A15021393;protein_id=AAK77670.1;product=ORF1%2C%20VP28%2C%20gene%20family%201;note=envelope%20protein;codon_start=1
AF369029    Genbank    CDS    710    2902    .    -    .    
Parent=AAK77671.1.t00;translation=MEGGDQRTKLTPATVMGLYQSKTPGEGEGGEGGGQFKIPSAIAVKSCCSKNATRRSPPSDSPYSLRPMKRLKKNNGEVGGKAPPPVTLRLREDYESTPYNFNRNKKKRPITIDENQFATLNPTYATDIIKKQQLPSVSAASVLRKHRANADTQYRKRFSHPNCAKFSTVNLKARDYTPLSVLRSHVKGPKHLKSSCDTVTETNVVKRNFSSIDKWVKLEKPPCYFAVAEADTNIAAGLESPFHLIRQAAKLGLISDVQDVSSNYETIKQSCIDAKEKASKFLWSNNRTKQPPSSWWPVGFGSKNLSVLDTSPLLNWNRLCKNNGKGWIKTMSIDHMAKNVFKLSPGACESILEKKTTLLGEVTAQCKKWESYRRNIPVPAHVQPEYASQVVMIGPSELYLEVKVGVYYMLETGKVIKFMTDKEMYCEFVFETVFSHALEGRMKGAVGVRKMCVEGFCVEMDFAGISVIDVLNGDLKCKMDENVVQQPNPSTTSSKPAAELMQDHGSLCRMRDTLYGVRMLQATGRLPEGLQSKCKKPITDSISAIAIVGKMRERMLNQLPFVLVEIVNIVTRLSQQGLVNPDIKSDNIVIDGITGQPKMIDFGLIVPCKKYYNFKCWGTDERFFSNHPHTAPEFINSELCSETAMTFGLAYLLIDMLSILIKRTADLSANSIYTNIPFLSIVSKMYDQEKTNRPRAYEIAPVIGACFPFKDNIAKLFQSPKHSLYSKKVK;db_xref=GI%3A15021394;codon_start=1;product=ORF2%2C%20putative%20serine%2Fthreonine%20protein%20kinase%20%28PK1%29%2C%20gene%20family%202
AF369029    Genbank    CDS    3118    4989    .    -    .    
Parent=AAK77672.1.t00;codon_start=1;product=ORF3;db_xref=GI%3A15021395;translation=MAWTVMALKDAFTERLVVNKVGSGTDMAPVVEDDRQKSLFQKVENLYRVLVVEQKNSAITLSGNKNTNKRQCRQVEEDKVIFEGEDRTVSNLPQAVKETIAANAESILDYWYKNVIPLLDTKKERSGKSDTFLRTAVICLVRCCVSYKDMKTCSLIYEFEHKILNKSTLDPLLKDILDNKQELLHMDSKYGSKTTSPELAKETIEALYTTVYNHWTNAFKLYQASLTHKPVTGKKYASVIHFIRTWRKIVKAYVSKHNNVERDLSLKNIMKNESADNANVLTIEKMYKKIGNSVKNTNNNSAHQMSDSEDDDDDDDDDCEGMDVCDEASEREKKHQESLYPINTPVTTITGDYIFKVLLELVLSPHIHPEWKIPMCDFVNRNIPKLMKAMETDISNAVIEVRASKVNPVQILPIAANFWDFCKSGKPPSDVKFCMMFNEPSSNETLSSGAGVFGRFIGGPFSHKSKELDIISNCLRSLLLNKEADNLSTRIWREGGSVVCFNYCPITARGAVLGYGEQLSERSIKALWAKKIQDAVTESVKRQRNAADKNSRNCDLLGDEGVVSMKTVTFGCANMLKTQNGMGKFNVVVSFEDSIQANKEGAARQYMSQQVFTHSFPALDQGK

The output file is so tedious, the translation is all showing up. But to 
me, it is not needed.
1. Is there any way to make the output file more succinct without having 
the translation included?
2. Also, is there any way to split the output file to two files, one is the 
GFF3 file and the other one is DNA fasta sequence file?
3. When I import the WSSV-AF369029-GenBank.gff3 file to IGV, it displays 
the protein ID if there is no gene name for the sequence, e.g. those with 
feature "CDS" display the protein ID, and those with feature "gene" display 
gene ID, is this the way it works? I want to display the ORF ID, what 
should I do?
 


Your help is greatly appreciated.
Thank you very much!
Regards,
Zhou Li
-------------- next part --------------
A non-text attachment was scrubbed...
Name: M21017.gff3
Type: application/octet-stream
Size: 13528 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20131127/08c30d80/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: WSSV-AF369029-GenBank.gff3
Type: application/octet-stream
Size: 413987 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20131127/08c30d80/attachment-0005.obj>


More information about the Bioperl-l mailing list