[Bioperl-l] question about GenPept.pm

Dimitar Kenanov dimitark at bii.a-star.edu.sg
Thu Nov 25 02:20:28 UTC 2010


Hi guys,
i want to get some genomes and proteomes from NCBI in fasta format. I 
found i have to use 'download_query_genbank.pl' for that. It works but 
not as i would like. It uses the modules GenPept and GenBank. They 
retrieve the data in fasta but in different format than i want.

Example:
a) i want the fasta to be like the following:
 >gi|5834889|ref|NP_006959.1|COX3_10021 cytochrome c oxidase subunit III 
[Caenorhabditis elegans]
here sequense...

b) but it comes like this:
 >COX3_10021 cytochrome c oxidase subunit III [Caenorhabditis elegans]
here sequense...

But i need the gi and NP as well. So i dug up a bit and after playing 
with 'download_query_genbank.pl' i managed to make GenBank to give the 
fasta seqs in the format i want.
I made the following changes:
1. added $retformat option for Getopt
2.modified this section:
if( $options{'-db'} eq 'protein' ) {
     ### DIMITAR ###
     if( $retformat eq 'fasta'){
     $dbh = Bio::DB::GenPept->new(-verbose => $debug,
                             -format => 'Fasta');
     ### END DIMITAR ###
     }else{
     $dbh = Bio::DB::GenPept->new(-verbose => $debug);
     }


} else {
     ### DIMITAR ###
     if( $retformat eq 'fasta'){
     $dbh = Bio::DB::GenBank->new(-verbose => $debug,
                             -format => 'Fasta');
     ### END DIMITAR ###
     }else{
     $dbh = Bio::DB::GenBank->new(-verbose => $debug);
     }
}

But i go problem with GenPept. I still cant get the seqs in full fasta 
format as i explained above. Its interesting cos both modules GenPept 
and GenBank are almost identical except that GenBank uses the new method 
of NCBIHelper while GenPept has its own which still uses the 
NCBIHelper's as well.

With my modification i pass the format i want but then somehow it 
reverts to the default set in GenPept which is 'gp' while i need it to 
be 'fasta'.

If i change the defaultformat in GenPept to fasta it works but thats 
just doing the job without adding the needed flexibility.

Any help would be appreciated. I will try to find solution as well.
Cheers

PS: i attache the modified 'download_query_genbank.pl'
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: download_query_genbank.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20101125/9645c803/attachment-0004.pl>


More information about the Bioperl-l mailing list