[Bioperl-l] Genbank seq CODE
Peter Kos
kos@rite.or.jp" <kos@rite.or.jp
Tue, 11 Jun 2002 21:46:04 +0900
Hi Melissa,
Strange message.
I agree with Jason in the capital letters' matter.
So what is the question? It is tough to troubleshoot if there is no
trouble.
Does this work or not? Possibly not but why don't you just try and
read the error messages?
STDOUT may not have anything to do with write->seq() if you want to
write in the file gbcu.fsa
However, you may want to insert a > sign in front of the output file
name like this
$fastafile = Bio::SeqIO->new('-file' => ">gbcu.fsa", '-format' =>
'Fasta');
You may wrap the gzip -d ... in system ();
If you have doubts about the spelling of gonorrhoeae why do you
insist to be strict about the ending. It is not likely to have
something called gonorrhoeaplix or similar, so you may as well just
search with gonorrhoea and, moreover, you may not need to chop off
the last character of gonorrhoeae and then you may search with that
word. Or use a regex. There can be even misspellings sometimes.
I hope it helps (if you needed help at all).
Peter
On Tuesday, June 11, 2002 3:23 AM, Melissa L. Kimball
[SMTP:mkimball@med.unc.edu] wrote:
> I THINK I AM DOING IT RIGHT????? MAYBE SINCE I OPENED AN "FTP" FILE
> HANDLE,
> DO I HAVE TO SPECIFY "STDOUT" BEFORE I USE write->seq() ?
>
> #!/usr/bin/perl -v
>
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> use Bio::Seq;
> use Bio::DB::NCBIHelper;
> use Bio::Annotation::Collection;
> use diagnostics;
>
> my $ftp = "/usr/bin/ftp";
> my $tmp = "genbankflatfile.txt";
> my $remotefile = "gbcu.flat.gz";
> my $localfile = "gbcu.flat.gz";
> my $host = "ftp.ncbi.nih.gov";
> my $dir = "/genbank/daily";
>
> open(FTP,"| $ftp -n -v $host > $tmp");
>
> print FTP "user anonymous mkimball\@med.unc.edu\n";
> print FTP "cd $dir\n";
> print FTP "binary\n";
> print FTP "get $remotefile $localfile\n";
> print FTP "quit\n";
>
> #close(FTP);
>
> #`gzip -d gbcu.flat.gz`
>
> $genbankfile = Bio::SeqIO->new('-file' => "gbcu.flat",'-format' =>
> 'genbank');
> $fastafile = Bio::SeqIO->new('-file' => "gbcu.fsa", '-format' =>
> 'Fasta');
>
> while (my $sequence = $genbankfile->next_seq())
> {
> my $thespecies = $sequence->species(); //YOUR WAY IS MUCH
> BETTER!!
> my $specsci = $thespecies->species();
>
> chop($specsci);
>
> if ($specsci =~ /^gonorrhoea\b/i) {
>
> print "$specsci\n\n";
>
> $fastafile->write_seq($sequence);
> }
> }
>
>
> IN THE CONDITION, I CHECK FOR ALL THOSE ENTRIES THAT ARE
> "gonorrhoea." WHEN
> I ACTUALLY LOOK AT A *.seq FILE IT IS SPELLED "gonorrhoeae." ALL
> OTHER
> SCIENTIFIC LITERATURE SPELLS IT THIS WAY. STRANGE.
>
> HERE IS A CHUNK OF ANNOTATION. I WILL DEFINITELY NEED THE
> DEFINITION LINE,
> SOURCE LINE, AND ORGANISM LINE. POSSIBLY KEYWORDS, TITLE, AND
> FEATURES.
> THE QUERY WOULD BE ON THE STRING "gonorrhoeae":
>
>
> LOCUS AB032563 1407 bp DNA linear BCT
> 23-SEP-2000
> DEFINITION Neisseria gonorrhoeae gene for efflux transporter
> membrane
> protein
> AgrA, complete cds.
> ACCESSION AB032563
> VERSION AB032563.1 GI:10280997
> KEYWORDS AgrA.
> SOURCE Neisseria gonorrhoeae (strain:ATCC19424) DNA.
> ORGANISM Neisseria gonorrhoeae
> Bacteria; Proteobacteria; beta subdivision;
> Neisseriaceae;
> Neisseria.
> REFERENCE 1 (bases 1 to 1407)
> AUTHORS Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and
Nishino,T.
>
> TITLE agrA gene involving to aminoglycoside resistance in
> Neisseria
> gonorrhoeae
> JOURNAL Published Only in DataBase (2000) In press
> REFERENCE 2 (bases 1 to 1407)
> AUTHORS Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and
Nishino,T.
>
> TITLE Direct Submission
> JOURNAL Submitted (20-SEP-1999) Takeshi Murata, Kyoto
> Phamaceutical
> University, Microbiology; Misasagi Yamashina, Kyoto,
> Kyoto
> 607-8414, Japan (E-mail:murata@mb.kyoto-phu.ac.jp,
> Tel:81-75-595-4642)
> FEATURES Location/Qualifiers
> source 1..1407
> /organism="Neisseria gonorrhoeae"
> /strain="ATCC19424"
> /db_xref="taxon:485"
> gene 1..1407
> /gene="agrA"
>
>
>
> THANK YOU! THANK YOU! THANK YOU!
>