[Bioperl-l] problem with swissprot parsin
Siddhartha Basu
basu at pharm.sunysb.edu
Fri Oct 15 11:34:13 EDT 2004
Hi brian,
Ok i have installed bioperl-live. However, now it is coming out with
another set warnings.
I have used the same code as i did earlier.
#!/usr/bin/perl -w
#
use strict;
use Bio::DB::Flat;
#
die "no files\n" unless @ARGV;
my $LOCATION = ".";
#
my $db = Bio::DB::Flat->new( -directory => $LOCATION,
-dbname => "swissall",
-format => "swiss",
-index => "bdb",
-write_flag => 1,
) or die "can't create BioFlat indexes\n";
$db->build_index(@ARGV);
#print "Done indexing\n";
my $seq = $db->get_Seq_by_acc("Q09543");
print $seq->seq,"\n";
exit;
And i have attached the test file.
-siddhartha
Brian Osborne wrote:
> Siddhartha,
>
> bioperl-live, the latest. Instructions on how to download this are at
> http://cvs.open-bio.org/.
>
> Brian O.
>
> -----Original Message-----
> From: Siddhartha Basu [mailto:basu at pharm.sunysb.edu]
> Sent: Friday, October 15, 2004 9:55 AM
> To: Brian Osborne
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] problem with swissprot parsin
>
> Hi Brian,
> I retested it again. The good part is that i can fetch the seq obj and
> the sequence now. The warnings are still there. I will try to index the
> entire swissprot data file and see what happens.
>
> Now,as mentioned by one of the earlier mail in the list that it has to
> with the absence of RA records in recent swissprot files and the
> swiss.pm modules spewing out warnings when the variable $au is not
> getting the data. Since you are even getting the warnings what version
> of bioperl you are running.
>
> -siddhartha
>
>
>
>
>
> Brian Osborne wrote:
>
>>Siddhartha,
>>
>>Changing @files to @ARGV makes your script index without warnings on my
>>machine, using your Swissprot file or mine. It also retrieves a sequence.
>>Below...
>>
>>Brian O.
>>
>>
>>#!/usr/bin/perl -w
>>
>>use strict;
>>use Bio::DB::Flat;
>>
>>die "no files\n" unless @ARGV;
>>my $LOCATION = ".";
>>
>>my $db = Bio::DB::Flat->new( -directory => $LOCATION,
>> -d
>
> bname => "swissall",
>
>> -f
>
> ormat => "swiss",
>
>> -i
>
> ndex => "bdb",
>
>> -w
>
> rite_flag => 1,
>
>> ) or
>
> die "can't create BioFlat indexes\n";
>
>>$db->build_index(@ARGV);
>>print "Done indexing\n";
>>
>>my $seq = $db->get_Seq_by_acc("P41932");
>>print $seq->seq;
>>
>>
>>
>>-----Original Message-----
>>From: bioperl-l-bounces at portal.open-bio.org
>>[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Siddhartha Basu
>>Sent: Thursday, October 14, 2004 4:54 PM
>>To: Brian Osborne
>>Cc: bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] problem with swissprot parsin
>>
>>Hi Brian,
>>Changed it, problem persists.
>>
>>-siddhartha
>>
>>Brian Osborne wrote:
>>
>>
>>>Siddhartha,
>>>
>>>Change @files to @ARGV in the build_index line. Does that fix it?
>>>
>>>Brian O.
>>>
>>>-----Original Message-----
>>>From: Siddhartha Basu [mailto:basu at pharm.sunysb.edu]
>>>Sent: Thursday, October 14, 2004 4:15 PM
>>>To: Brian Osborne
>>>Cc: bioperl-l at bioperl.org
>>>Subject: Re: [Bioperl-l] problem with swissprot parsin
>>>
>>>Hi Brian,
>>>Here is the code that started to give the following error. I presume i
>>>am using Bio::DB::Flat::BDB though i haven't called it directly. I am
>>>trying to index swissprot/trembl files here.
>>>
>>>#!/usr/bin/perl -w
>>>use strict;
>>>use Bio::DB::Flat;
>>>
>>>die "no files\n" unless @ARGV;
>>>my $LOCATION = "/home/basu/odbaindex";
>>>
>>>my $db = Bio::DB::Flat->new( -directory => $LOCATION,
>>> -dbname => "swissall",
>>> -format => "swiss",
>>> -index => "bdb",
>>> -write_flag => 1,
>>> ) or die "can't create BioFlat indexes\n";
>>>$db->build_index(@files);
>>>print "Done indexing\n";
>>>
>>>exit;
>>>
>>>
>>>I get the following warinings.
>>> ======================================================================
>>>Use of uninitialized value in substitution (s///) at
>>>/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>> 18676877.
>>> Use of uninitialized value in substitution (s///) at
>>> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>> 18676916.
>>> Use of uninitialized value in substitution (s///) at
>>> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>> 18676956.
>>> Use of uninitialized value in substitution (s///) at
>>> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>> 18677002.
>>> Use of uninitialized value in substitution (s///) at
>>> /usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN1> line
>>>=========================================================================
>>>
>>>I have done a small test with Bio::SeqIO module using a small test
>>>file(swiss.test). Here is the code.
>>>
>>>#!/usr/bin/perl -w
>>>#
>>>use strict;
>>>use Bio::SeqIO;
>>>
>>>my $seq = Bio::SeqIO->new(-file => $ARGV[0], -format => "swiss");
>>>
>>>while (my $in = $seq->next_seq) {
>>> print $in->id,"\n";
>>>}
>>>
>>>exit;
>>>
>>>
>>>It gives the same error
>>>Use of uninitialized value in substitution (s///) at
>>>/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line
>>
>>28.
>>
>>
>>>1433_CAEEL
>>>Use of uninitialized value in substitution (s///) at
>>>/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line
>>
>>87.
>>
>>
>>>A4_CAEEL
>>>Use of uninitialized value in substitution (s///) at
>>>/usr/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line 855, <GEN0> line
>>
>>171.
>>
>>
>>>AATC_CAEEL
>>>
>>>I have also attached the test file.
>>>
>>>Hope this will give some clue for the problem.
>>>Thanks for the response.
>>>
>>>siddhartha
>>>
-------------- next part --------------
ID AATC_CAEEL STANDARD; PRT; 408 AA.
AC Q22067;
DT 01-NOV-1997 (Rel. 35, Created)
DT 01-NOV-1997 (Rel. 35, Last sequence update)
DT 01-OCT-2004 (Rel. 45, Last annotation update)
DE Probable aspartate aminotransferase, cytoplasmic (EC 2.6.1.1)
DE (Transaminase A) (Glutamate oxaloacetate transaminase-1).
GN ORFNames=T01C8.5;
OS Caenorhabditis elegans.
OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
OC Rhabditidae; Peloderinae; Caenorhabditis.
OX NCBI_TaxID=6239;
RN [1]
RP SEQUENCE FROM N.A.
RC STRAIN=Bristol N2;
RX MEDLINE=99069613; PubMed=9851916;
RG THE C. ELEGANS SEQUENCING CONSORTIUM;
RT "Genome sequence of the nematode C. elegans: a platform for
RT investigating biology.";
RL Science 282:2012-2018(1998).
CC -!- CATALYTIC ACTIVITY: L-aspartate + 2-oxoglutarate = oxaloacetate +
CC L-glutamate.
CC -!- COFACTOR: Pyridoxal phosphate (By similarity).
CC -!- SUBUNIT: Homodimer (By similarity).
CC -!- SUBCELLULAR LOCATION: Cytoplasmic (Potential).
CC -!- MISCELLANEOUS: In eukaryotes there are cytoplasmic, mitochondrial
CC and chloroplastic isozymes.
CC -!- SIMILARITY: Belongs to the class-I pyridoxal-phosphate-dependent
CC aminotransferase family.
CC --------------------------------------------------------------------------
CC This SWISS-PROT entry is copyright. It is produced through a collaboration
CC between the Swiss Institute of Bioinformatics and the EMBL outstation -
CC the European Bioinformatics Institute. There are no restrictions on its
CC use by non-profit institutions as long as its content is in no way
CC modified and this statement is not removed. Usage by and for commercial
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC or send an email to license at isb-sib.ch).
CC --------------------------------------------------------------------------
DR EMBL; U58726; AAB00578.1; -.
DR PIR; T29857; T29857.
DR HSSP; P00503; 1AJS.
DR WormPep; T01C8.5; CE07462.
DR InterPro; IPR004839; Aminotrans_I/II.
DR InterPro; IPR000796; Asp_trans.
DR InterPro; IPR004838; NHtransf_1_BS.
DR Pfam; PF00155; Aminotran_1_2; 1.
DR PRINTS; PR00799; TRANSAMINASE.
DR PROSITE; PS00105; AA_TRANSFER_CLASS_1; 1.
KW Aminotransferase; Pyridoxal phosphate; Transferase.
FT BINDING 251 251 Pyridoxal phosphate (By similarity).
SQ SEQUENCE 408 AA; 45493 MW; A4DDCBCB8C0EFD83 CRC64;
MSFFDGIPVA PPIEVFHKNK MYLDETAPVK VNLTIGAYRT EEGQPWVLPV VHETEVEIAN
DTSLNHEYLP VLGHEGFRKA ATELVLGAES PAIKEERSFG VQCLSGTGAL RAGAEFLASV
CNMKTVYVSN PTWGNHKLVF KKAGFTTVAD YTFWDYDNKR VHIEKFLSDL ESAPEKSVII
LHGCAHNPTG MDPTQEQWKL VAEVIKRKNL FTFFDIAYQG FASGDPAADA WAIRYFVDQG
MEMVVSQSFA KNFGLYNERV GNLTVVVNNP AVIAGFQSQM SLVIRANWSN PPAHGARIVH
KVLTTPARRE QWNQSIQAMS SRIKQMRAAL LRHLMDLGTP GTWDHIIQQI GMFSYTGLTS
AQVDHLIANH KVFLLRDGRI NICGLNTKNV EYVAKAIDET VRAVKSNI
//
ID 2AAA_CAEEL STANDARD; PRT; 590 AA.
AC Q09543;
DT 01-NOV-1997 (Rel. 35, Created)
DT 10-OCT-2003 (Rel. 42, Last sequence update)
DT 01-OCT-2004 (Rel. 45, Last annotation update)
DE Probable protein phosphatase PP2A regulatory subunit (Protein
DE phosphatase PP2A regulatory subunit A).
GN ORFNames=F48E8.5;
OS Caenorhabditis elegans.
OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea;
OC Rhabditidae; Peloderinae; Caenorhabditis.
OX NCBI_TaxID=6239;
RN [1]
RP SEQUENCE FROM N.A.
RC STRAIN=Bristol N2;
RX MEDLINE=99069613; PubMed=9851916;
RG THE C. ELEGANS SEQUENCING CONSORTIUM;
RT "Genome sequence of the nematode C. elegans: a platform for
RT investigating biology.";
RL Science 282:2012-2018(1998).
RN [2]
RP REVISIONS.
RA Waterston R.;
RL Submitted (JUN-2002) to the EMBL/GenBank/DDBJ databases.
CC -!- FUNCTION: The PR65 subunit of protein phosphatase 2A serves as a
CC scaffolding molecule to coordinate the assembly of the catalytic
CC subunit and a variable regulatory B subunit (By similarity).
CC -!- SUBUNIT: PP2A exists in several trimeric forms, all of which
CC consist of a core composed of a catalytic subunit associated with
CC a 65 kDa regulatory subunit (PR65) (subunit A). The core complex
CC associates with a third, variable subunit (subunit B), which
CC confers distinct properties to the holoenzyme (By similarity).
CC -!- DOMAIN: Each HEAT repeat appears to consist of two alpha helices
CC joined by a hydrophilic region, the intrarepeat loop. The repeat
CC units may be arranged laterally to form a rod-like structure.
CC -!- SIMILARITY: Belongs to the phosphatase 2A regulatory subunit A
CC family.
CC -!- SIMILARITY: Contains 14 HEAT repeats.
CC --------------------------------------------------------------------------
CC This SWISS-PROT entry is copyright. It is produced through a collaboration
CC between the Swiss Institute of Bioinformatics and the EMBL outstation -
CC the European Bioinformatics Institute. There are no restrictions on its
CC use by non-profit institutions as long as its content is in no way
CC modified and this statement is not removed. Usage by and for commercial
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC or send an email to license at isb-sib.ch).
CC --------------------------------------------------------------------------
DR EMBL; U23514; AAC46541.2; -.
DR PIR; T16411; T16411.
DR HSSP; P30153; 1B3U.
DR WormPep; F48E8.5; CE30997.
DR InterPro; IPR008938; ARM.
DR InterPro; IPR000357; HEAT.
DR Pfam; PF02985; HEAT; 12.
DR PROSITE; PS50077; HEAT_REPEAT; 8.
KW Hypothetical protein; Protein phosphatase; Repeat.
FT REPEAT 37 73 HEAT 1.
FT REPEAT 74 111 HEAT 2.
FT REPEAT 113 150 HEAT 3.
FT REPEAT 151 188 HEAT 4.
FT REPEAT 189 227 HEAT 5.
FT REPEAT 228 266 HEAT 6.
FT REPEAT 267 305 HEAT 7.
FT REPEAT 306 344 HEAT 8.
FT REPEAT 349 387 HEAT 9.
FT REPEAT 388 426 HEAT 10.
FT REPEAT 427 465 HEAT 11.
FT REPEAT 466 504 HEAT 12.
FT REPEAT 505 543 HEAT 13.
FT REPEAT 544 582 HEAT 14.
FT DOMAIN 186 189 Poly-Ala.
SQ SEQUENCE 590 AA; 66148 MW; E9B6F7DFFEB973E2 CRC64;
MSVVEEATDD ALYPIAVLID ELRNEDVTLR LNSIRKLSTI ALALGVERTR NELIQFLTDT
IYDEDEVLLV LAEQLGNFTP LVGGPDHVHC LLLPLENLAT VEETVVRDKA VESLRKIADK
HSSASLEEHF VPMLRRLATG DWFTSRTSAC GLFSVVYPRV SPAIKSELKS MFRTLCRDDT
PMVRRAAAAK LGEFAKVFEK TAVIEGLHSS LTDLHVDEQD SVRLLTVESA IAFGTLLDKA
NKKKLIEPIL IELFDDKSWR VRYMVAEKLI EIQNVLGEDM DTTHLVNMYT NLLKDPEGEV
RCAATQRLQE FALNLPEDKR QNIICNSLLN VAKELVTDGN QLVKSELAGV IMGLAPLIGK
EQTVSELLPI YMQLLNDQTP EVRLNIISSL DKVNEVIGAA QLSTSLLPAI VGLAEDGKWR
VRLAIVQFMP LLASQLGQEF FDEKLLPLCL NWLTDHVFSI REASTLIMKE LTQKFGGQWA
STNIVPKMQK LQKDTNYLQR MTCLFCLNTL SEAMTQEQIL KEIMPIVKDL VEDDVPNVRF
NAAKSLKRIG KNLTPSTLTS EVKPLLEKLG KDSDFDVRYF SEEAKNSLGL
//
ID Q6ITX2 PRELIMINARY; PRT; 207 AA.
AC Q6ITX2;
DT 05-JUL-2004 (TrEMBLrel. 27, Created)
DT 05-JUL-2004 (TrEMBLrel. 27, Last sequence update)
DT 05-JUL-2004 (TrEMBLrel. 27, Last annotation update)
DE Methyl-coenzyme M reductase alpha subunit (Fragment).
GN Name=mcrA;
OS uncultured Methanomicrobiales archaeon.
OC Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales;
OC environmental samples.
OX NCBI_TaxID=183760;
RN [1]
RP SEQUENCE FROM N.A.
RA Banning N., Brock F., Parkes R.J., Fry J.C., Weightman A.J.;
RL Submitted (MAY-2004) to the EMBL/GenBank/DDBJ databases.
DR EMBL; AY625595; AAT45717.1; -.
DR InterPro; IPR008924; MCR_alpha_beta_C.
DR InterPro; IPR009047; MCR_alpha_C.
DR InterPro; IPR009024; MCR_fer_like.
DR Pfam; PF02249; MCR_alpha; 1.
FT NON_TER 1 1
FT NON_TER 207 207
SQ SEQUENCE 207 AA; 22849 MW; 5309DD59248C9038 CRC64;
WHSLAKHAGV IQMGDILPAR RARGPNEPGG IKFGHFADMV QTDRKYPNDP ARASLEVVGA
GTMLFDQIWL GSYMSGGVGF TQYATAAYTD NILDDYTYYG MDYIKSKYKV NWQSPSEKDK
VKATQDVVND IATEVNLYGM EQYEQYPTAL EDHFGGSQRA SVLAAASGLS VSIATGNSNA
GLNGWYLSML MHKEGWSRLG FFGYDLQ
//
ID Q6L781 PRELIMINARY; PRT; 216 AA.
AC Q6L781;
DT 05-JUL-2004 (TrEMBLrel. 27, Created)
DT 05-JUL-2004 (TrEMBLrel. 27, Last sequence update)
DT 05-JUL-2004 (TrEMBLrel. 27, Last annotation update)
DE Methyl-coenzyme M reductase alpha subunit (Fragment).
GN Name=mcrA;
OS uncultured Methanomicrobiales archaeon.
OC Archaea; Euryarchaeota; Methanomicrobia; Methanomicrobiales;
OC environmental samples.
OX NCBI_TaxID=183760;
RN [1]
RP SEQUENCE FROM N.A.
RX PubMed=15240282;
RA Shigematsu T., Tang Y., Kobayashi T., Kawaguchi H., Morimura S.,
RA Kida K.;
RT "Effect of Dilution Rate on Metabolic Pathway Shift between
RT Aceticlastic and Nonaceticlastic Methanogenesis in Chemostat
RT Cultivation.";
RL Appl. Environ. Microbiol. 70:4048-4052(2004).
DR EMBL; AB158527; BAD21106.1; -.
DR InterPro; IPR008924; MCR_alpha_beta_C.
DR InterPro; IPR009047; MCR_alpha_C.
DR InterPro; IPR009024; MCR_fer_like.
DR Pfam; PF02249; MCR_alpha; 1.
FT NON_TER 1 1
FT NON_TER 216 216
SQ SEQUENCE 216 AA; 23541 MW; 5265242473ED9CFE CRC64;
FIGAYRMCAG EAAVADLAFA AKHAGVVQMA THLPARRARG PNEPGGLAFG LFSDIIQGNR
KYPHDPAKAS FEVVGAGTML YDQIWLGSYM SGGVGFTQYA TAAYTDNILD EFTYYGMDYI
KDKYKVDWKN PSPKDRVKPT QEIVNDIITE VALNAMEQYE QYPTMMEDHF GGSQRAGVIA
AACGLSTSIA TGNSNAGLNG WYLSMLLHKE GWSRLG
//
ID Q6SEI7 PRELIMINARY; PRT; 162 AA.
AC Q6SEI7;
DT 05-JUL-2004 (TrEMBLrel. 27, Created)
DT 05-JUL-2004 (TrEMBLrel. 27, Last sequence update)
DT 05-JUL-2004 (TrEMBLrel. 27, Last annotation update)
DE Methyl-coenzyme M reductase alpha subunit (Fragment).
GN Name=mcrA;
OS uncultured euryarchaeote.
OC Archaea; Euryarchaeota; environmental samples.
OX NCBI_TaxID=114243;
RN [1]
RP SEQUENCE FROM N.A.
RA Castro H., Reddy K.R., Ogram A.;
RL Submitted (NOV-2003) to the EMBL/GenBank/DDBJ databases.
DR EMBL; AY459319; AAR24561.1; -.
DR InterPro; IPR008924; MCR_alpha_beta_C.
DR InterPro; IPR009047; MCR_alpha_C.
DR Pfam; PF02249; MCR_alpha; 1.
FT NON_TER 1 1
FT NON_TER 162 162
SQ SEQUENCE 162 AA; 18115 MW; 3FF9CD3C09CBC441 CRC64;
WCRFTQYATA AYTDNILDEY TYYGMDYIKD KYKVDWKNPN DKDKVKPTQD IANDMATEVA
LNGMEQYEQF PTLMEDHFGG SQRAGVLAAA CGLTASIATG NSNAGLNAWY LCMLLHKEGW
SRLGFFGYDL QDQCGSANSL AIRPDEGAIG ELRGPNYPNY AM
//
More information about the Bioperl-l
mailing list