[Bioperl-l] problem with swissprot parsin

Siddhartha Basu basu at pharm.sunysb.edu
Fri Oct 15 11:34:13 EDT 2004


Hi brian,
Ok i have installed bioperl-live. However, now it is coming out with 
another set warnings.
I have used the same code as i did earlier.

#!/usr/bin/perl -w
#
use strict;
use Bio::DB::Flat;
#
die "no files\n" unless @ARGV;
my $LOCATION = ".";
#
my $db = Bio::DB::Flat->new( -directory => $LOCATION,
                     -dbname => "swissall",
                     -format => "swiss",
                     -index => "bdb",
                     -write_flag => 1,
                   ) or die "can't create BioFlat indexes\n";

$db->build_index(@ARGV);
#print "Done indexing\n";

my $seq = $db->get_Seq_by_acc("Q09543");
print $seq->seq,"\n";
exit;

And i have attached the test file.


-siddhartha


Brian Osborne wrote:
> Siddhartha,
> 
> bioperl-live, the latest. Instructions on how to download this are at
> http://cvs.open-bio.org/.
> 
> Brian O.
> 
> -----Original Message-----
> From: Siddhartha Basu [mailto:basu at pharm.sunysb.edu]
> Sent: Friday, October 15, 2004 9:55 AM
> To: Brian Osborne
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] problem with swissprot parsin
> 
> Hi Brian,
> I retested it again. The good part is that i can fetch the seq obj and
> the sequence now. The warnings are still there. I will try to index the
> entire swissprot data file and see what happens.
> 
> Now,as mentioned by one of the earlier mail in the list that it has to
> with the absence of RA records in recent swissprot files and the
> swiss.pm modules spewing out warnings when the variable $au  is not
> getting the data. Since you are even getting the warnings what version
> of bioperl you are running.
> 
> -siddhartha
> 
> 
> 
> 
> 
> Brian Osborne wrote:
> 
>>Siddhartha,
>>
>>Changing @files to @ARGV makes your script index without warnings on my
>>machine, using your Swissprot file or mine. It also retrieves a sequence.
>>Below...
>>
>>Brian O.
>>
>>
>>#!/usr/bin/perl -w
>>
>>use strict;
>>use Bio::DB::Flat;
>>
>>die "no files\n" unless @ARGV;
>>my $LOCATION = ".";
>>
>>my $db = Bio::DB::Flat->new( -directory => $LOCATION,
>>                                                                        -d
> 
> bname => "swissall",
> 
>>                                                                        -f
> 
> ormat => "swiss",
> 
>>                                                                        -i
> 
> ndex => "bdb",
> 
>>                                                                        -w
> 
> rite_flag => 1,
> 
>>                                                                      ) or
> 
> die "can't create BioFlat indexes\n";
> 
>>$db->build_index(@ARGV);
>>print "Done indexing\n";
>>
>>my $seq = $db->get_Seq_by_acc("P41932");
>>print $seq->seq;
>>
>>
>>
>>-----Original Message-----
>>From: bioperl-l-bounces at portal.open-bio.org
>>[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Siddhartha Basu
>>Sent: Thursday, October 14, 2004 4:54 PM
>>To: Brian Osborne
>>Cc: bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] problem with swissprot parsin
>>
>>Hi Brian,
>>Changed it, problem persists.
>>
>>-siddhartha
>>
>>Brian Osborne wrote:
>>
>>
>>>Siddhartha,
>>>
>>>Change @files to @ARGV in the build_index line. Does that fix it?
>>>
>>>Brian O.
>>>
>>>-----Original Message-----
>>>From: Siddhartha Basu [mailto:basu at pharm.sunysb.edu]
>>>Sent: Thursday, October 14, 2004 4:15 PM
>>>To: Brian Osborne
>>>Cc: bioperl-l at bioperl.org
>>>Subject: Re: [Bioperl-l] problem with swissprot parsin
>>>
>>>Hi Brian,
>>>Here is the code that started to give the following error. I presume i
>>>am using Bio::DB::Flat::BDB though i haven't called it directly. I am
>>>trying to index swissprot/trembl files here.
>>>
>>>#!/usr/bin/perl -w
>>>use strict;
>>>use Bio::DB::Flat;
>>>
>>>die "no files\n" unless @ARGV;
>>>my $LOCATION = "/home/basu/odbaindex";
>>>
>>>my $db = Bio::DB::Flat->new( -directory => $LOCATION,
>>>                                -dbname => "swissall",
>>>                                -format => "swiss",
>>>                                -index => "bdb",
>>>                                -write_flag => 1,
>>>                             ) or die "can't create BioFlat indexes\n";
>>>$db->build_index(@files);
>>>print "Done indexing\n";
>>>
>>>exit;
>>>
>>>
>>>I get the following warinings.
>>> ======================================================================
>>>Use of uninitialized value in substitution (s///) at
>>>/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>> 18676877.
>>> Use of uninitialized value in substitution (s///) at
>>> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>> 18676916.
>>> Use of uninitialized value in substitution (s///) at
>>> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>> 18676956.
>>> Use of uninitialized value in substitution (s///) at
>>> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>> 18677002.
>>> Use of uninitialized value in substitution (s///) at
>>> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>>=========================================================================
>>>
>>>I have done a small test with Bio::SeqIO module using a small test
>>>file(swiss.test). Here is the code.
>>>
>>>#!/usr/bin/perl -w
>>>#
>>>use strict;
>>>use Bio::SeqIO;
>>>
>>>my $seq = Bio::SeqIO->new(-file => $ARGV[0], -format => "swiss");
>>>
>>>while (my $in = $seq->next_seq) {
>>>   print $in->id,"\n";
>>>}
>>>
>>>exit;
>>>
>>>
>>>It gives the same error
>>>Use of uninitialized value in substitution (s///) at
>>>/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line
>>
>>28.
>>
>>
>>>1433_CAEEL
>>>Use of uninitialized value in substitution (s///) at
>>>/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line
>>
>>87.
>>
>>
>>>A4_CAEEL
>>>Use of uninitialized value in substitution (s///) at
>>>/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line
>>
>>171.
>>
>>
>>>AATC_CAEEL
>>>
>>>I have also attached the test file.
>>>
>>>Hope this will give some clue for the problem.
>>>Thanks for the response.
>>>
>>>siddhartha
>>>
-------------- next part --------------
ID   AATC_CAEEL     STANDARD;      PRT;   408 AA.
AC   Q22067;
DT   01-NOV-1997 (Rel. 35, Created)
DT   01-NOV-1997 (Rel. 35, Last sequence update)
DT   01-OCT-2004 (Rel. 45, Last annotation update)
DE   Probable aspartate aminotransferase, cytoplasmic (EC 2.6.1.1)
DE   (Transaminase A) (Glutamate oxaloacetate transaminase-1).
GN   ORFNames=T01C8.5;
OS   Caenorhabditis elegans.
OC   Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
OC   Rhabditidae; Peloderinae; Caenorhabditis.
OX   NCBI_TaxID=6239;
RN   [1]
RP   SEQUENCE FROM N.A.
RC   STRAIN=Bristol N2;
RX   MEDLINE=99069613; PubMed=9851916;
RG   THE C. ELEGANS SEQUENCING CONSORTIUM;
RT   "Genome sequence of the nematode C. elegans: a platform for
RT   investigating biology.";
RL   Science 282:2012-2018(1998).
CC   -!- CATALYTIC ACTIVITY: L-aspartate + 2-oxoglutarate = oxaloacetate +
CC       L-glutamate.
CC   -!- COFACTOR: Pyridoxal phosphate (By similarity).
CC   -!- SUBUNIT: Homodimer (By similarity).
CC   -!- SUBCELLULAR LOCATION: Cytoplasmic (Potential).
CC   -!- MISCELLANEOUS: In eukaryotes there are cytoplasmic, mitochondrial
CC       and chloroplastic isozymes.
CC   -!- SIMILARITY: Belongs to the class-I pyridoxal-phosphate-dependent
CC       aminotransferase family.
CC   --------------------------------------------------------------------------
CC   This SWISS-PROT entry is copyright. It is produced through a collaboration
CC   between  the Swiss Institute of Bioinformatics  and the  EMBL outstation -
CC   the European Bioinformatics Institute.  There are no  restrictions on  its
CC   use  by  non-profit  institutions as long  as its content  is  in  no  way
CC   modified and this statement is not removed.  Usage  by  and for commercial
CC   entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC   or send an email to license at isb-sib.ch).
CC   --------------------------------------------------------------------------
DR   EMBL; U58726; AAB00578.1; -.
DR   PIR; T29857; T29857.
DR   HSSP; P00503; 1AJS.
DR   WormPep; T01C8.5; CE07462.
DR   InterPro; IPR004839; Aminotrans_I/II.
DR   InterPro; IPR000796; Asp_trans.
DR   InterPro; IPR004838; NHtransf_1_BS.
DR   Pfam; PF00155; Aminotran_1_2; 1.
DR   PRINTS; PR00799; TRANSAMINASE.
DR   PROSITE; PS00105; AA_TRANSFER_CLASS_1; 1.
KW   Aminotransferase; Pyridoxal phosphate; Transferase.
FT   BINDING     251    251       Pyridoxal phosphate (By similarity).
SQ   SEQUENCE   408 AA;  45493 MW;  A4DDCBCB8C0EFD83 CRC64;
     MSFFDGIPVA PPIEVFHKNK MYLDETAPVK VNLTIGAYRT EEGQPWVLPV VHETEVEIAN
     DTSLNHEYLP VLGHEGFRKA ATELVLGAES PAIKEERSFG VQCLSGTGAL RAGAEFLASV
     CNMKTVYVSN PTWGNHKLVF KKAGFTTVAD YTFWDYDNKR VHIEKFLSDL ESAPEKSVII
     LHGCAHNPTG MDPTQEQWKL VAEVIKRKNL FTFFDIAYQG FASGDPAADA WAIRYFVDQG
     MEMVVSQSFA KNFGLYNERV GNLTVVVNNP AVIAGFQSQM SLVIRANWSN PPAHGARIVH
     KVLTTPARRE QWNQSIQAMS SRIKQMRAAL LRHLMDLGTP GTWDHIIQQI GMFSYTGLTS
     AQVDHLIANH KVFLLRDGRI NICGLNTKNV EYVAKAIDET VRAVKSNI
//
ID   2AAA_CAEEL     STANDARD;      PRT;   590 AA.
AC   Q09543;
DT   01-NOV-1997 (Rel. 35, Created)
DT   10-OCT-2003 (Rel. 42, Last sequence update)
DT   01-OCT-2004 (Rel. 45, Last annotation update)
DE   Probable protein phosphatase PP2A regulatory subunit (Protein
DE   phosphatase PP2A regulatory subunit A).
GN   ORFNames=F48E8.5;
OS   Caenorhabditis elegans.
OC   Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
OC   Rhabditidae; Peloderinae; Caenorhabditis.
OX   NCBI_TaxID=6239;
RN   [1]
RP   SEQUENCE FROM N.A.
RC   STRAIN=Bristol N2;
RX   MEDLINE=99069613; PubMed=9851916;
RG   THE C. ELEGANS SEQUENCING CONSORTIUM;
RT   "Genome sequence of the nematode C. elegans: a platform for
RT   investigating biology.";
RL   Science 282:2012-2018(1998).
RN   [2]
RP   REVISIONS.
RA   Waterston R.;
RL   Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases.
CC   -!- FUNCTION: The PR65 subunit of protein phosphatase 2A serves as a
CC       scaffolding molecule to coordinate the assembly of the catalytic
CC       subunit and a variable regulatory B subunit (By similarity).
CC   -!- SUBUNIT: PP2A exists in several trimeric forms, all of which
CC       consist of a core composed of a catalytic subunit associated with
CC       a 65 kDa regulatory subunit (PR65) (subunit A). The core complex
CC       associates with a third, variable subunit (subunit B), which
CC       confers distinct properties to the holoenzyme (By similarity).
CC   -!- DOMAIN: Each HEAT repeat appears to consist of two alpha helices
CC       joined by a hydrophilic region, the intrarepeat loop. The repeat
CC       units may be arranged laterally to form a rod-like structure.
CC   -!- SIMILARITY: Belongs to the phosphatase 2A regulatory subunit A
CC       family.
CC   -!- SIMILARITY: Contains 14 HEAT repeats.
CC   --------------------------------------------------------------------------
CC   This SWISS-PROT entry is copyright. It is produced through a collaboration
CC   between  the Swiss Institute of Bioinformatics  and the  EMBL outstation -
CC   the European Bioinformatics Institute.  There are no  restrictions on  its
CC   use  by  non-profit  institutions as long  as its content  is  in  no  way
CC   modified and this statement is not removed.  Usage  by  and for commercial
CC   entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC   or send an email to license at isb-sib.ch).
CC   --------------------------------------------------------------------------
DR   EMBL; U23514; AAC46541.2; -.
DR   PIR; T16411; T16411.
DR   HSSP; P30153; 1B3U.
DR   WormPep; F48E8.5; CE30997.
DR   InterPro; IPR008938; ARM.
DR   InterPro; IPR000357; HEAT.
DR   Pfam; PF02985; HEAT; 12.
DR   PROSITE; PS50077; HEAT_REPEAT; 8.
KW   Hypothetical protein; Protein phosphatase; Repeat.
FT   REPEAT       37     73       HEAT 1.
FT   REPEAT       74    111       HEAT 2.
FT   REPEAT      113    150       HEAT 3.
FT   REPEAT      151    188       HEAT 4.
FT   REPEAT      189    227       HEAT 5.
FT   REPEAT      228    266       HEAT 6.
FT   REPEAT      267    305       HEAT 7.
FT   REPEAT      306    344       HEAT 8.
FT   REPEAT      349    387       HEAT 9.
FT   REPEAT      388    426       HEAT 10.
FT   REPEAT      427    465       HEAT 11.
FT   REPEAT      466    504       HEAT 12.
FT   REPEAT      505    543       HEAT 13.
FT   REPEAT      544    582       HEAT 14.
FT   DOMAIN      186    189       Poly-Ala.
SQ   SEQUENCE   590 AA;  66148 MW;  E9B6F7DFFEB973E2 CRC64;
     MSVVEEATDD ALYPIAVLID ELRNEDVTLR LNSIRKLSTI ALALGVERTR NELIQFLTDT
     IYDEDEVLLV LAEQLGNFTP LVGGPDHVHC LLLPLENLAT VEETVVRDKA VESLRKIADK
     HSSASLEEHF VPMLRRLATG DWFTSRTSAC GLFSVVYPRV SPAIKSELKS MFRTLCRDDT
     PMVRRAAAAK LGEFAKVFEK TAVIEGLHSS LTDLHVDEQD SVRLLTVESA IAFGTLLDKA
     NKKKLIEPIL IELFDDKSWR VRYMVAEKLI EIQNVLGEDM DTTHLVNMYT NLLKDPEGEV
     RCAATQRLQE FALNLPEDKR QNIICNSLLN VAKELVTDGN QLVKSELAGV IMGLAPLIGK
     EQTVSELLPI YMQLLNDQTP EVRLNIISSL DKVNEVIGAA QLSTSLLPAI VGLAEDGKWR
     VRLAIVQFMP LLASQLGQEF FDEKLLPLCL NWLTDHVFSI REASTLIMKE LTQKFGGQWA
     STNIVPKMQK LQKDTNYLQR MTCLFCLNTL SEAMTQEQIL KEIMPIVKDL VEDDVPNVRF
     NAAKSLKRIG KNLTPSTLTS EVKPLLEKLG KDSDFDVRYF SEEAKNSLGL
//
ID   Q6ITX2      PRELIMINARY;      PRT;   207 AA.
AC   Q6ITX2;
DT   05-JUL-2004 (TrEMBLrel. 27, Created)
DT   05-JUL-2004 (TrEMBLrel. 27, Last sequence update)
DT   05-JUL-2004 (TrEMBLrel. 27, Last annotation update)
DE   Methyl-coenzyme M reductase alpha subunit (Fragment).
GN   Name=mcrA;
OS   uncultured Methanomicrobiales archaeon.
OC   Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales;
OC   environmental samples.
OX   NCBI_TaxID=183760;
RN   [1]
RP   SEQUENCE FROM N.A.
RA   Banning N., Brock F., Parkes R.J., Fry J.C., Weightman A.J.;
RL   Submitted (MAY-2004) to the EMBL/GenBank/DDBJ databases.
DR   EMBL; AY625595; AAT45717.1; -.
DR   InterPro; IPR008924; MCR_alpha_beta_C.
DR   InterPro; IPR009047; MCR_alpha_C.
DR   InterPro; IPR009024; MCR_fer_like.
DR   Pfam; PF02249; MCR_alpha; 1.
FT   NON_TER       1      1
FT   NON_TER     207    207
SQ   SEQUENCE   207 AA;  22849 MW;  5309DD59248C9038 CRC64;
     WHSLAKHAGV IQMGDILPAR RARGPNEPGG IKFGHFADMV QTDRKYPNDP ARASLEVVGA
     GTMLFDQIWL GSYMSGGVGF TQYATAAYTD NILDDYTYYG MDYIKSKYKV NWQSPSEKDK
     VKATQDVVND IATEVNLYGM EQYEQYPTAL EDHFGGSQRA SVLAAASGLS VSIATGNSNA
     GLNGWYLSML MHKEGWSRLG FFGYDLQ
//
ID   Q6L781      PRELIMINARY;      PRT;   216 AA.
AC   Q6L781;
DT   05-JUL-2004 (TrEMBLrel. 27, Created)
DT   05-JUL-2004 (TrEMBLrel. 27, Last sequence update)
DT   05-JUL-2004 (TrEMBLrel. 27, Last annotation update)
DE   Methyl-coenzyme M reductase alpha subunit (Fragment).
GN   Name=mcrA;
OS   uncultured Methanomicrobiales archaeon.
OC   Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales;
OC   environmental samples.
OX   NCBI_TaxID=183760;
RN   [1]
RP   SEQUENCE FROM N.A.
RX   PubMed=15240282;
RA   Shigematsu T., Tang Y., Kobayashi T., Kawaguchi H., Morimura S.,
RA   Kida K.;
RT   "Effect of Dilution Rate on Metabolic Pathway Shift between
RT   Aceticlastic and Nonaceticlastic Methanogenesis in Chemostat
RT   Cultivation.";
RL   Appl. Environ. Microbiol. 70:4048-4052(2004).
DR   EMBL; AB158527; BAD21106.1; -.
DR   InterPro; IPR008924; MCR_alpha_beta_C.
DR   InterPro; IPR009047; MCR_alpha_C.
DR   InterPro; IPR009024; MCR_fer_like.
DR   Pfam; PF02249; MCR_alpha; 1.
FT   NON_TER       1      1
FT   NON_TER     216    216
SQ   SEQUENCE   216 AA;  23541 MW;  5265242473ED9CFE CRC64;
     FIGAYRMCAG EAAVADLAFA AKHAGVVQMA THLPARRARG PNEPGGLAFG LFSDIIQGNR
     KYPHDPAKAS FEVVGAGTML YDQIWLGSYM SGGVGFTQYA TAAYTDNILD EFTYYGMDYI
     KDKYKVDWKN PSPKDRVKPT QEIVNDIITE VALNAMEQYE QYPTMMEDHF GGSQRAGVIA
     AACGLSTSIA TGNSNAGLNG WYLSMLLHKE GWSRLG
//
ID   Q6SEI7      PRELIMINARY;      PRT;   162 AA.
AC   Q6SEI7;
DT   05-JUL-2004 (TrEMBLrel. 27, Created)
DT   05-JUL-2004 (TrEMBLrel. 27, Last sequence update)
DT   05-JUL-2004 (TrEMBLrel. 27, Last annotation update)
DE   Methyl-coenzyme M reductase alpha subunit (Fragment).
GN   Name=mcrA;
OS   uncultured euryarchaeote.
OC   Archaea; Euryarchaeota; environmental samples.
OX   NCBI_TaxID=114243;
RN   [1]
RP   SEQUENCE FROM N.A.
RA   Castro H., Reddy K.R., Ogram A.;
RL   Submitted (NOV-2003) to the EMBL/GenBank/DDBJ databases.
DR   EMBL; AY459319; AAR24561.1; -.
DR   InterPro; IPR008924; MCR_alpha_beta_C.
DR   InterPro; IPR009047; MCR_alpha_C.
DR   Pfam; PF02249; MCR_alpha; 1.
FT   NON_TER       1      1
FT   NON_TER     162    162
SQ   SEQUENCE   162 AA;  18115 MW;  3FF9CD3C09CBC441 CRC64;
     WCRFTQYATA AYTDNILDEY TYYGMDYIKD KYKVDWKNPN DKDKVKPTQD IANDMATEVA
     LNGMEQYEQF PTLMEDHFGG SQRAGVLAAA CGLTASIATG NSNAGLNAWY LCMLLHKEGW
     SRLGFFGYDL QDQCGSANSL AIRPDEGAIG ELRGPNYPNY AM
//



More information about the Bioperl-l mailing list