[Bioperl-l] loading sprot.dat
Brian Osborne
brian_osborne at cognia.com
Thu Jan 8 16:12:16 EST 2004
Peter,
I think that the data you're after is probably in the Annotation objects. I
know that for Swissprot there can a good number of these objects created.
Take a look at the code a bit further down the HOWTO page...
Brian O.
-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of peter robinson
Sent: Friday, January 09, 2004 2:53 PM
To: Brian Osborne
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] loading sprot.dat
On Thursday 08 January 2004 16:55, Brian Osborne wrote:
(...)
Thank you. However, I have to admit I am not quite sure how to go about
this.
The following snippet (copied from your HOWTO) does not really capture the
annotations of the Swiss-Prot entry (see below, it is the first entry in
sprot42.dat)
I would have naively assumed that there would be getter functions for each
of
the Swiss-Prot ID types, (e.g., in Bio::SeqIO::swiss). However, it has not
been obvious to me where to look for these if they exist.
Ideas?
Thanks!!
Peter
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
my $seqio = Bio::SeqIO->new(-file=>"sprot42.dat",
-format=>"swiss");
my $seqOb = $seqio->next_seq();
foreach my $featOb ($seqOb->get_SeqFeatures()) {
print "primary tag: ", $featOb->primary_tag(), "\n";
foreach my $tag ($featOb->get_all_tags()) {
print "tag: $tag \n";
foreach my $value ($featOb->get_tag_values($tag)) {
print "values $value\n";
}
}
}
peter at pluto:~/Swiss> perl sptest.pl
primary tag: DOMAIN
tag: description
values HYDROPHOBIC
primary tag: DOMAIN
tag: description
values HYDROPHOBIC
peter at pluto:~/Swiss> head sprot42.dat --lines=40
ID 104K_THEPA STANDARD; PRT; 924 AA.
AC P15711;
DT 01-APR-1990 (Rel. 14, Created)
DT 01-APR-1990 (Rel. 14, Last sequence update)
DT 01-AUG-1992 (Rel. 23, Last annotation update)
DE 104 kDa microneme-rhoptry antigen.
OS Theileria parva.
OC Eukaryota; Alveolata; Apicomplexa; Piroplasmida; Theileriidae;
OC Theileria.
OX NCBI_TaxID=5875;
RN [1]
RP SEQUENCE FROM N.A.
RC STRAIN=Muguga;
RX MEDLINE=90158697; PubMed=1689460;
RA Iams K.P., Young J.R., Nene V., Desai J., Webster P.,
RA Ole-Moiyoi O.K., Musoke A.J.;
RT "Characterisation of the gene encoding a 104-kilodalton microneme-
RT rhoptry protein of Theileria parva.";
RL Mol. Biochem. Parasitol. 39:47-60(1990).
CC -!- SUBCELLULAR LOCATION: IN MICRONEME/RHOPTRY COMPLEXES.
CC -!- DEVELOPMENTAL STAGE: SPOROZOITE ANTIGEN.
CC
--------------------------------------------------------------------------
CC This SWISS-PROT entry is copyright. It is produced through a
collaboration
CC between the Swiss Institute of Bioinformatics and the EMBL
outstation
-
CC the European Bioinformatics Institute. There are no restrictions on
its
CC use by non-profit institutions as long as its content is in no
way
CC modified and this statement is not removed. Usage by and for
commercial
CC entities requires a license agreement (See
http://www.isb-sib.ch/announce/
CC or send an email to license at isb-sib.ch).
CC
--------------------------------------------------------------------------
DR EMBL; M29954; AAA18217.1; -.
DR PIR; A44945; A44945.
KW Antigen; Sporozoite; Repeat.
FT DOMAIN 1 19 HYDROPHOBIC.
FT DOMAIN 905 924 HYDROPHOBIC.
SQ SEQUENCE 924 AA; 103625 MW; 289B4B554A61870E CRC64;
MKFLILLFNI LCLFPVLAAD NHGVGPQGAS GVDPITFDIN SNQTGPAFLT AVEMAGVKYL
QVQHGSNVNI HRLVEGNVVI WENASTPLYT GAIVTNNDGP YMAYVEVLGD PNLQFFIKSG
DAWVTLSEHE YLAKLQEIRQ AVHIESVFSL NMAFQLENNK YEVETHAKNG ANMVTFIPRN
GHICKMVYHK NVRIYKATGN DTVTSVVGFF RGLRLLLINV FSIDDNGMMS NRYFQHVDDK
peter at pluto:~/Swiss> exit
> Peter,
>
> Some of the data in a SwissProt file is parsed into Features, some of it
> into Annotations (Swissprot entries are RichSeq objects, like Genbank and
> EMBL entries). So probably you'll want to look at the Feature-Annotation
> HOWTO. Unfortunately there are no Swissprot examples there but the logic
> will be the same, open the file with SeqIO and print out specific entries,
> or portions of entries, with the desired characteristics. Perhaps the
SeqIO
> HOWTO would be useful as well...
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Robinson, Peter
> Sent: Thursday, January 08, 2004 10:18 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] loading sprot.dat
>
> Hi all,
>
> I would like to load the contents of the Swiss-Prot file sprot.dat into a
> mysql database. Since I have a relatively specific question, I would like
> to use a databse of my own design and just extract portions of the
relevant
> entries from sprot.dat rather than the whole thing.
> I would appreciate it if someone could point me to an example script for
> this kind of thing. I did not see anything like this in my perusals of the
> archives or google, please forgive me if this question has been asked
> before!
>
> Thanks
>
> Peter
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list