[Bioperl-l] Genbank seq CODE
Jason Stajich
jason@cgt.mc.duke.edu
Mon, 10 Jun 2002 14:27:38 -0400 (EDT)
[Pls try not use all caps in messages, it feels as if one is being shouted
at.]
On Mon, 10 Jun 2002, Melissa L. Kimball wrote:
> I THINK I AM DOING IT RIGHT????? MAYBE SINCE I OPENED AN "FTP" FILE HANDLE,
> DO I HAVE TO SPECIFY "STDOUT" BEFORE I USE write->seq() ?
>
> #!/usr/bin/perl -v
>
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> use Bio::Seq;
> use Bio::DB::NCBIHelper;
> use Bio::Annotation::Collection;
> use diagnostics;
>
> my $ftp = "/usr/bin/ftp";
> my $tmp = "genbankflatfile.txt";
> my $remotefile = "gbcu.flat.gz";
> my $localfile = "gbcu.flat.gz";
> my $host = "ftp.ncbi.nih.gov";
> my $dir = "/genbank/daily";
>
> open(FTP,"| $ftp -n -v $host > $tmp");
>
> print FTP "user anonymous mkimball\@med.unc.edu\n";
> print FTP "cd $dir\n";
> print FTP "binary\n";
> print FTP "get $remotefile $localfile\n";
> print FTP "quit\n";
>
> #close(FTP);
>
> #`gzip -d gbcu.flat.gz`
>
> $genbankfile = Bio::SeqIO->new('-file' => "gbcu.flat",'-format' =>
> 'genbank');
> $fastafile = Bio::SeqIO->new('-file' => "gbcu.fsa", '-format' => 'Fasta');
^^^^^^
">gbcu.fsa"
One correction, this is why you aren't able to write - you haven't told
the program you want to open a writeable filehandle - you need the
">filename.fsa"
>
> while (my $sequence = $genbankfile->next_seq())
> {
> my $thespecies = $sequence->species(); //YOUR WAY IS MUCH BETTER!!
> my $specsci = $thespecies->species();
>
> chop($specsci);
>
> if ($specsci =~ /^gonorrhoea\b/i) {
>
> print "$specsci\n\n";
>
> $fastafile->write_seq($sequence);
> }
> }
>
>
> IN THE CONDITION, I CHECK FOR ALL THOSE ENTRIES THAT ARE "gonorrhoea." WHEN
> I ACTUALLY LOOK AT A *.seq FILE IT IS SPELLED "gonorrhoeae." ALL OTHER
> SCIENTIFIC LITERATURE SPELLS IT THIS WAY. STRANGE.
>
> HERE IS A CHUNK OF ANNOTATION. I WILL DEFINITELY NEED THE DEFINITION LINE,
> SOURCE LINE, AND ORGANISM LINE. POSSIBLY KEYWORDS, TITLE, AND FEATURES.
> THE QUERY WOULD BE ON THE STRING "gonorrhoeae":
>
>
> LOCUS AB032563 1407 bp DNA linear BCT
> 23-SEP-2000
> DEFINITION Neisseria gonorrhoeae gene for efflux transporter membrane
> protein
> AgrA, complete cds.
> ACCESSION AB032563
> VERSION AB032563.1 GI:10280997
> KEYWORDS AgrA.
> SOURCE Neisseria gonorrhoeae (strain:ATCC19424) DNA.
> ORGANISM Neisseria gonorrhoeae
> Bacteria; Proteobacteria; beta subdivision; Neisseriaceae;
> Neisseria.
> REFERENCE 1 (bases 1 to 1407)
> AUTHORS Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and Nishino,T.
> TITLE agrA gene involving to aminoglycoside resistance in Neisseria
> gonorrhoeae
> JOURNAL Published Only in DataBase (2000) In press
> REFERENCE 2 (bases 1 to 1407)
> AUTHORS Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and Nishino,T.
> TITLE Direct Submission
> JOURNAL Submitted (20-SEP-1999) Takeshi Murata, Kyoto Phamaceutical
> University, Microbiology; Misasagi Yamashina, Kyoto, Kyoto
> 607-8414, Japan (E-mail:murata@mb.kyoto-phu.ac.jp,
> Tel:81-75-595-4642)
> FEATURES Location/Qualifiers
> source 1..1407
> /organism="Neisseria gonorrhoeae"
> /strain="ATCC19424"
> /db_xref="taxon:485"
> gene 1..1407
> /gene="agrA"
>
>
>
> THANK YOU! THANK YOU! THANK YOU!
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu