[Bioperl-l] Genbank seq CODE

Melissa L. Kimball mkimball@med.unc.edu
Mon, 10 Jun 2002 14:22:40 -0400


I THINK I AM DOING IT RIGHT????? MAYBE SINCE I OPENED AN "FTP" FILE HANDLE,
DO I HAVE TO SPECIFY "STDOUT" BEFORE I USE  write->seq() ?

#!/usr/bin/perl -v

use Bio::SeqIO; 
use Bio::DB::GenBank;
use Bio::Seq;
use Bio::DB::NCBIHelper;
use Bio::Annotation::Collection;
use diagnostics;

my $ftp = "/usr/bin/ftp";
my $tmp = "genbankflatfile.txt";
my $remotefile = "gbcu.flat.gz";
my $localfile = "gbcu.flat.gz";
my $host = "ftp.ncbi.nih.gov";
my $dir = "/genbank/daily";

open(FTP,"| $ftp -n -v $host > $tmp");

print FTP "user anonymous mkimball\@med.unc.edu\n";
print FTP "cd $dir\n";
print FTP "binary\n";
print FTP "get $remotefile $localfile\n";
print FTP "quit\n";

#close(FTP);

#`gzip -d gbcu.flat.gz`

$genbankfile = Bio::SeqIO->new('-file' => "gbcu.flat",'-format' =>
'genbank');
$fastafile = Bio::SeqIO->new('-file' => "gbcu.fsa", '-format' => 'Fasta');

while (my $sequence = $genbankfile->next_seq())
{
        my $thespecies = $sequence->species();   //YOUR WAY IS MUCH BETTER!!
        my $specsci = $thespecies->species();
 
        chop($specsci);

       if ($specsci =~ /^gonorrhoea\b/i) {

                print "$specsci\n\n";

                $fastafile->write_seq($sequence);
        }
}


IN THE CONDITION, I CHECK FOR ALL THOSE ENTRIES THAT ARE "gonorrhoea."  WHEN
I ACTUALLY LOOK AT A *.seq FILE IT IS SPELLED "gonorrhoeae."  ALL OTHER
SCIENTIFIC LITERATURE SPELLS IT THIS WAY.  STRANGE.

HERE IS A CHUNK OF ANNOTATION.  I WILL DEFINITELY NEED THE DEFINITION LINE,
SOURCE LINE, AND ORGANISM LINE.  POSSIBLY KEYWORDS, TITLE, AND FEATURES.
THE QUERY WOULD BE ON THE STRING "gonorrhoeae":


LOCUS       AB032563                1407 bp    DNA     linear   BCT
23-SEP-2000
DEFINITION  Neisseria gonorrhoeae gene for efflux transporter membrane
protein
            AgrA, complete cds.
ACCESSION   AB032563
VERSION     AB032563.1  GI:10280997
KEYWORDS    AgrA.
SOURCE      Neisseria gonorrhoeae (strain:ATCC19424) DNA.
  ORGANISM  Neisseria gonorrhoeae
            Bacteria; Proteobacteria; beta subdivision; Neisseriaceae;
            Neisseria.
REFERENCE   1  (bases 1 to 1407)
  AUTHORS   Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and Nishino,T.
  TITLE     agrA gene involving to aminoglycoside resistance in Neisseria
            gonorrhoeae
  JOURNAL   Published Only in DataBase (2000) In press
REFERENCE   2  (bases 1 to 1407)
  AUTHORS   Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and Nishino,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (20-SEP-1999) Takeshi Murata, Kyoto Phamaceutical
            University, Microbiology; Misasagi Yamashina, Kyoto, Kyoto
            607-8414, Japan (E-mail:murata@mb.kyoto-phu.ac.jp,
            Tel:81-75-595-4642)
FEATURES             Location/Qualifiers
     source          1..1407
                     /organism="Neisseria gonorrhoeae"
                     /strain="ATCC19424"
                     /db_xref="taxon:485"
     gene            1..1407
                     /gene="agrA"



THANK YOU! THANK YOU! THANK YOU!