[Bioperl-l] genbank

Jason Stajich jason at bioperl.org
Tue Nov 30 17:06:08 UTC 2010


great - the whole point of the scripts are as examples really, not that 
you need to send patches back to show everything that you modified, but 
that you modify to use modules and code to do whatever special thing you 
want. The hope is that the modules are flexible enough that you can 
write the script to accomplish your goal.

BTW - the one thing you can't recover from the GBK version of the file 
is the source of the accession number -- you have hardcoded in 'ref' but 
it can be 'gb', 'emb', 'sp' etc this field isn't part of the genbank 
record unfortunately -- one can come up with a pattern based on 
knowledge of accession number formats but I don't know that anyone has 
really been that worried about that sort of thing to try and write 
something for it.

>>
> Hi again,
> i managed to solve my problem. It may be dirty but it works the way i 
> want :)
> I reworked the 'download_query_genbank.pl' (attached). Now i can get 
> the seqs in full fasta for proteomes and genomes and the genpept 
> report files for the proteomes.
> For DB handle i only use GenPept now cos it gives me stream which i 
> can track with term::progressbar.
>
> For the output i use 2 cases:
> -----------------
> while( my $seq = $stream->next_seq ) {
>     #DIMITAR
>     my($gi,$locus,$refnum,$desc,$seqstr);
>     if($retformat eq 'fasta'){ <-------------------------| for the 
> fasta as i want it
>         check_progress($prgs,$seqnum,$count);
>         $locus=$seq->display_id;
>         $refnum=$seq->accession_number;
>         $gi=$seq->primary_id;
>         $desc=$seq->desc;
>         $desc=~s/\.$//;
>         $seqstr=$seq->seq;
>         print $fhout ">gi\|$gi\|ref\|$refnum\|$locus $desc\n$seqstr\n";
>     }else{
>         check_progress($prgs,$seqnum,$count);
>         $out->write_seq($seq); <--------------------------| for the 
> genbank reports
>     }
>     $seqnum++;
>     #DIMITAR
>
> #    $out->write_seq($seq);#original
>
> }
> ------------------
>
> Thank you for your help and time.
>
> Cheers
> Dimitar
>

-- 
Jason Stajich
jason at bioperl.org




More information about the Bioperl-l mailing list