[Bioperl-l] loading sprot.dat

Brian Osborne brian_osborne at cognia.com
Thu Jan 8 16:12:16 EST 2004


Peter,

I think that the data you're after is probably in the Annotation objects. I
know that for Swissprot there can a good number of these objects created.

Take a look at the code a bit further down the HOWTO page...

Brian O.


-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of peter robinson
Sent: Friday, January 09, 2004 2:53 PM
To: Brian Osborne
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] loading sprot.dat

On Thursday 08 January 2004 16:55, Brian Osborne wrote:
(...)


Thank you. However, I have to admit I am not quite sure how to go about
this.
The following snippet (copied from your HOWTO) does not really capture the
annotations of the Swiss-Prot entry (see below, it is the first entry in
sprot42.dat)

I would have naively assumed that there would be getter functions for each
of
the Swiss-Prot ID types, (e.g., in Bio::SeqIO::swiss). However, it has not
been obvious to me where to look for these if they exist.

Ideas?

Thanks!!

Peter

#!/usr/bin/perl
use strict;

use Bio::SeqIO;


my $seqio = Bio::SeqIO->new(-file=>"sprot42.dat",
                           -format=>"swiss");

my $seqOb = $seqio->next_seq();

foreach my $featOb ($seqOb->get_SeqFeatures()) {
  print "primary tag: ", $featOb->primary_tag(), "\n";
  foreach my $tag ($featOb->get_all_tags()) {
    print "tag: $tag \n";
    foreach my $value ($featOb->get_tag_values($tag)) {
      print "values $value\n";
    }
  }
}




peter at pluto:~/Swiss> perl sptest.pl
primary tag: DOMAIN
tag: description
values HYDROPHOBIC
primary tag: DOMAIN
tag: description
values HYDROPHOBIC


peter at pluto:~/Swiss> head sprot42.dat --lines=40
ID   104K_THEPA     STANDARD;      PRT;   924 AA.
AC   P15711;
DT   01-APR-1990 (Rel. 14, Created)
DT   01-APR-1990 (Rel. 14, Last sequence update)
DT   01-AUG-1992 (Rel. 23, Last annotation update)
DE   104 kDa microneme-rhoptry antigen.
OS   Theileria parva.
OC   Eukaryota; Alveolata; Apicomplexa; Piroplasmida; Theileriidae;
OC   Theileria.
OX   NCBI_TaxID=5875;
RN   [1]
RP   SEQUENCE FROM N.A.
RC   STRAIN=Muguga;
RX   MEDLINE=90158697; PubMed=1689460;
RA   Iams K.P., Young J.R., Nene V., Desai J., Webster P.,
RA   Ole-Moiyoi O.K., Musoke A.J.;
RT   "Characterisation of the gene encoding a 104-kilodalton microneme-
RT   rhoptry protein of Theileria parva.";
RL   Mol. Biochem. Parasitol. 39:47-60(1990).
CC   -!- SUBCELLULAR LOCATION: IN MICRONEME/RHOPTRY COMPLEXES.
CC   -!- DEVELOPMENTAL STAGE: SPOROZOITE ANTIGEN.
CC
--------------------------------------------------------------------------
CC   This SWISS-PROT entry is copyright. It is produced through a
collaboration
CC   between  the Swiss Institute of Bioinformatics  and the  EMBL
outstation
-
CC   the European Bioinformatics Institute.  There are no  restrictions on
its
CC   use  by  non-profit  institutions as long  as its content  is  in  no
way
CC   modified and this statement is not removed.  Usage  by  and for
commercial
CC   entities requires a license agreement (See
http://www.isb-sib.ch/announce/
CC   or send an email to license at isb-sib.ch).
CC
--------------------------------------------------------------------------
DR   EMBL; M29954; AAA18217.1; -.
DR   PIR; A44945; A44945.
KW   Antigen; Sporozoite; Repeat.
FT   DOMAIN        1     19       HYDROPHOBIC.
FT   DOMAIN      905    924       HYDROPHOBIC.
SQ   SEQUENCE   924 AA;  103625 MW;  289B4B554A61870E CRC64;
     MKFLILLFNI LCLFPVLAAD NHGVGPQGAS GVDPITFDIN SNQTGPAFLT AVEMAGVKYL
     QVQHGSNVNI HRLVEGNVVI WENASTPLYT GAIVTNNDGP YMAYVEVLGD PNLQFFIKSG
     DAWVTLSEHE YLAKLQEIRQ AVHIESVFSL NMAFQLENNK YEVETHAKNG ANMVTFIPRN
     GHICKMVYHK NVRIYKATGN DTVTSVVGFF RGLRLLLINV FSIDDNGMMS NRYFQHVDDK
peter at pluto:~/Swiss> exit







> Peter,
>
> Some of the data in a SwissProt file is parsed into Features, some of it
> into Annotations (Swissprot entries are RichSeq objects, like Genbank and
> EMBL entries). So probably you'll want to look at the Feature-Annotation
> HOWTO. Unfortunately there are no Swissprot examples there but the logic
> will be the same, open the file with SeqIO and print out specific entries,
> or portions of entries, with the desired characteristics. Perhaps the
SeqIO
> HOWTO would be useful as well...
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Robinson, Peter
> Sent: Thursday, January 08, 2004 10:18 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] loading sprot.dat
>
> Hi all,
>
> I would like to load the contents of the Swiss-Prot file sprot.dat into a
> mysql database. Since I have a relatively specific question, I would like
> to use a databse of my own design and just extract portions of the
relevant
> entries from sprot.dat rather than the whole thing.
> I would appreciate it if someone could point me to an example script for
> this kind of thing. I did not see anything like this in my perusals of the
> archives or google, please forgive me if this question has been asked
> before!
>
> Thanks
>
> Peter
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list