[Bioperl-l] Genbank seq CODE
Melissa L. Kimball
mkimball@med.unc.edu
Mon, 10 Jun 2002 14:22:40 -0400
I THINK I AM DOING IT RIGHT????? MAYBE SINCE I OPENED AN "FTP" FILE HANDLE,
DO I HAVE TO SPECIFY "STDOUT" BEFORE I USE write->seq() ?
#!/usr/bin/perl -v
use Bio::SeqIO;
use Bio::DB::GenBank;
use Bio::Seq;
use Bio::DB::NCBIHelper;
use Bio::Annotation::Collection;
use diagnostics;
my $ftp = "/usr/bin/ftp";
my $tmp = "genbankflatfile.txt";
my $remotefile = "gbcu.flat.gz";
my $localfile = "gbcu.flat.gz";
my $host = "ftp.ncbi.nih.gov";
my $dir = "/genbank/daily";
open(FTP,"| $ftp -n -v $host > $tmp");
print FTP "user anonymous mkimball\@med.unc.edu\n";
print FTP "cd $dir\n";
print FTP "binary\n";
print FTP "get $remotefile $localfile\n";
print FTP "quit\n";
#close(FTP);
#`gzip -d gbcu.flat.gz`
$genbankfile = Bio::SeqIO->new('-file' => "gbcu.flat",'-format' =>
'genbank');
$fastafile = Bio::SeqIO->new('-file' => "gbcu.fsa", '-format' => 'Fasta');
while (my $sequence = $genbankfile->next_seq())
{
my $thespecies = $sequence->species(); //YOUR WAY IS MUCH BETTER!!
my $specsci = $thespecies->species();
chop($specsci);
if ($specsci =~ /^gonorrhoea\b/i) {
print "$specsci\n\n";
$fastafile->write_seq($sequence);
}
}
IN THE CONDITION, I CHECK FOR ALL THOSE ENTRIES THAT ARE "gonorrhoea." WHEN
I ACTUALLY LOOK AT A *.seq FILE IT IS SPELLED "gonorrhoeae." ALL OTHER
SCIENTIFIC LITERATURE SPELLS IT THIS WAY. STRANGE.
HERE IS A CHUNK OF ANNOTATION. I WILL DEFINITELY NEED THE DEFINITION LINE,
SOURCE LINE, AND ORGANISM LINE. POSSIBLY KEYWORDS, TITLE, AND FEATURES.
THE QUERY WOULD BE ON THE STRING "gonorrhoeae":
LOCUS AB032563 1407 bp DNA linear BCT
23-SEP-2000
DEFINITION Neisseria gonorrhoeae gene for efflux transporter membrane
protein
AgrA, complete cds.
ACCESSION AB032563
VERSION AB032563.1 GI:10280997
KEYWORDS AgrA.
SOURCE Neisseria gonorrhoeae (strain:ATCC19424) DNA.
ORGANISM Neisseria gonorrhoeae
Bacteria; Proteobacteria; beta subdivision; Neisseriaceae;
Neisseria.
REFERENCE 1 (bases 1 to 1407)
AUTHORS Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and Nishino,T.
TITLE agrA gene involving to aminoglycoside resistance in Neisseria
gonorrhoeae
JOURNAL Published Only in DataBase (2000) In press
REFERENCE 2 (bases 1 to 1407)
AUTHORS Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and Nishino,T.
TITLE Direct Submission
JOURNAL Submitted (20-SEP-1999) Takeshi Murata, Kyoto Phamaceutical
University, Microbiology; Misasagi Yamashina, Kyoto, Kyoto
607-8414, Japan (E-mail:murata@mb.kyoto-phu.ac.jp,
Tel:81-75-595-4642)
FEATURES Location/Qualifiers
source 1..1407
/organism="Neisseria gonorrhoeae"
/strain="ATCC19424"
/db_xref="taxon:485"
gene 1..1407
/gene="agrA"
THANK YOU! THANK YOU! THANK YOU!