[Bioperl-l] pir.pm => bug
Hilmar Lapp
hlapp at gmx.net
Mon Jun 28 13:54:31 EDT 2004
Note also that PIR is obsolete. PIR merged with Swissprot to form
Uniprot. The Uniprot format is basically the swissprot format.
-hilmar
On Monday, June 28, 2004, at 10:46 AM, Heikki Lehvaslaiho wrote:
> Laure,
>
> Thanks for the fix.
>
> pir.pm has not been updated for a long time. Not many people work with
> the
> format.
>
> Before I apply your changes into the file, I'll summarise here the
> major
> changes so that others can comment:
>
> - uses Bio::Species and Bio::Annotation::Collection
> - uses Bio::Seq::RichSeq rather than Bio::Seq
> - parses TITLE, ORGANISM, DATE, ACCESSIONS lines
> - comments out method write_seq()
>
> I do not know if write_seq() is needed in the module neither if its
> removal
> is intentional?
>
> -Heikki
>
>
>
> On Friday 25 Jun 2004 06:05, Laure.Durufle at serono.com wrote:
>> Hi,
>>
>>
>> I moved the package pir.pm / we give the file and with pir.pm we can
>> parse
>> this file pir*.dat :
>>
>> like this format :
>>
>>
>> P R O T E I N S E Q U E N C E D A T A B A S E
>> of PIR-International
>>
>> Section 1. Fully Classified Entries
>> Release 79.01, April 04, 2004
>> 20685 sequences, 8103841 residues
>>
>> Protein Information Resource (PIR)*
>> National Biomedical Research Foundation
>> 3900 Reservoir Road, N.W.,
>> Washington, DC 20007, USA
>>
>> Japan International Protein Munich Information Center for
>> Information Database (JIPID) Protein Sequences (MIPS)
>> Amakubo 1-16-1 GSF-Forschungszentrum f. Umwelt und
>> Gesundheit
>> Tsukuba 305-0005, Japan am Max-Planck-Instut f.
>> Biochemie
>> Am Klopferspitz 18, D-82152
>> Martinsried,
>> FRG
>>
>> This database may be redistributed without prior consent, provided
>> that
>> this notice be given to each user and that the words "Derived from"
>> shall
>> precede this notice if the database has been altered by the
>> redistributor.
>>
>> Copyright 2000, PIR-International.
>>
>> *PIR is a registered mark of NBRF.
>> \\\
>> ENTRY A27187 #type complete
>> TITLE ubiquinol-cytochrome-c reductase (EC 1.10.2.2)
>> cytochrome
>> c1
>> precursor - Neurospora crassa
>> ALTERNATE_NAMES bc1 complex cytochrome c1; complex III cytochrome c1;
>> cytochrome c1 heme protein
>> ORGANISM #formal_name Neurospora crassa
>> DATE 05-Oct-1988 #sequence_revision 15-Oct-1994
>> #text_change
>> 03-Jun-2002
>> ACCESSIONS A27187
>> REFERENCE A27187
>> #authors Roemisch, J.; Tropschug, M.; Sebald, W.; Weiss, H.
>> #journal Eur. J. Biochem. (1987) 164:111-115
>> #title The primary structure of cytochrome c-1 from
>> Neurospora
>> crassa.
>> #cross-references MUID:87161871; PMID:3030747
>> #accession A27187
>> ##molecule_type mRNA
>> ##residues 1-332 ##label ROE
>> ##cross-references GB:X05235; NID:g3005; PIDN:CAA28860.1;
>> PID:g3006
>> ##note the authors translated the codon AGT for residue 316 as
>> Arg
>> CLASSIFICATION #superfamily cytochrome c1 heme protein; cytochrome
>> c1 heme
>> protein homology
>> KEYWORDS chromoprotein; electron transfer; heme; iron;
>> metalloprotein; mitochondrion; oxidative
>> phosphorylation;
>> oxidoreductase; respiratory chain; transmembrane
>> protein
>> FEATURE
>> 1-70 #domain transit peptide (mitochondrion) #status
>> predicted #label TNP\
>> 71-332 #product cytochrome c1 #status predicted
>> #label MAT\
>> 79-305 #domain cytochrome c1 heme protein homology
>> #label
>> C1H\
>> 278-296 #domain transmembrane #status predicted #label
>> TMM\
>> 110,113 #binding_site heme (Cys) (covalent) #status
>> predicted\
>> 114,234 #binding_site heme iron (His, Met) (axial
>> ligands)
>> #status predicted
>> SUMMARY #length 332 #molecular-weight 36456 #checksum 1753
>> SEQUENCE
>> 5 10 15 20 25 30
>> 1 M L A R T C L R S T R T F A S A K N G A F K F A K R S A S T
>> 31 Q S S G A A A E S P L R L N I A A A A A T A V A A G S I A W
>> 61 Y Y H L Y G F A S A M T P A E E G L H A T K Y P W V H E Q W
>> 91 L K T F D H Q A L R R G F Q V Y R E V C A S C H S L S R V P
>> 121 Y R A L V G T I L T V D E A K A L A E E N E Y D T E P N D Q
>> 151 G E I E K R P G K L S D Y L P D P Y K N D E A A R F A N N G
>> 181 A L P P D L S L I V K A R H G G C D Y I F S L L T G Y P D E
>> 211 P P A G A S V G A G L N F N P Y F P G T G I A M A R V L Y D
>> 241 G L V D Y E D G T P A S T S Q M A K D V V E F L N W A A E P
>> 271 E M D D R K R M G M K V L V V T S V L F A L S V Y V K R Y K
>> 301 W A W L K S R K I V Y D P P K S P P P A T N L A L P Q Q R A
>> 331 K S
>> ///
>>
>>
>> the package is that :
>> # $Id: pir.pm,v 1.4 2004/06/25 09:51:14 ldurufle Exp $
>> #
>> # BioPerl module for Bio::SeqIO::PIR
>> #
>> # Cared for by Aaron Mackey <amackey at virginia.edu>
>> #
>> # Copyright Aaron Mackey
>> #
>> # You may distribute this module under the same terms as perl itself
>> #
>> # _history
>> # October 18, 1999 Largely rewritten by Lincoln Stein
>>
>> # POD documentation - main docs before the code
>>
>> =head1 NAME
>>
>> Bio::SeqIO::pir - PIR sequence input/output stream
>>
>> =head1 SYNOPSIS
>>
>> Do not use this module directly. Use it via the Bio::SeqIO class.
>>
>> =head1 DESCRIPTION
>>
>> This object can transform Bio::Seq objects to and from pir flat
>> file databases.
>>
>> Note: This does not completely preserve the PIR format - quality
>> information about sequence is currently discarded since bioperl
>> does not have a mechanism for handling these encodings in sequence
>> data.
>>
>> =head1 FEEDBACK
>>
>> =head2 Mailing Lists
>>
>> User feedback is an integral part of the evolution of this and other
>> Bioperl modules. Send your comments and suggestions preferably to one
>> of the Bioperl mailing lists. Your participation is much appreciated.
>>
>> bioperl-l at bioperl.org - General discussion
>> http://www.bioperl.org/MailList.shtml - About the mailing lists
>>
>> =head2 Reporting Bugs
>>
>> Report bugs to the Bioperl bug tracking system to help us keep track
>> the bugs and their resolution.
>> Bug reports can be submitted via email or the web:
>>
>> bioperl-bugs at bio.perl.org
>> http://bugzilla.bioperl.org/
>>
>> =head1 AUTHORS
>>
>> Aaron Mackey E<lt>amackey at virginia.eduE<gt>
>> Lincoln Stein E<lt>lstein at cshl.orgE<gt>
>> Jason Stajich E<lt>jason at bioperl.orgE<gt>
>>
>> =head1 APPENDIX
>>
>> The rest of the documentation details each of the object
>> methods. Internal methods are usually preceded with a _
>>
>> =cut
>>
>> # Let the code begin...
>>
>> package Bio::SeqIO::pir;
>> use vars qw(@ISA);
>> use strict;
>>
>> use Bio::SeqIO;
>> use Bio::Seq::SeqFactory;
>> use Bio::Species;
>> use Bio::Annotation::Collection;
>>
>> @ISA = qw(Bio::SeqIO);
>>
>> sub _initialize {
>> my($self, at args) = @_;
>> $self->SUPER::_initialize(@args);
>> if( ! defined $self->sequence_factory ) {
>> $self->sequence_factory(new Bio::Seq::SeqFactory
>> (-verbose => $self->verbose(),
>> -type => 'Bio::Seq::RichSeq'));
>> }
>> }
>>
>> =head2 next_seq
>>
>> Title : next_seq
>> Usage : $seq = $stream->next_seq()
>> Function: returns the next sequence in the stream
>> Returns : Bio::Seq object
>> Args : NONE
>>
>> =cut
>>
>> sub next_seq {
>> my ($self) = @_;
>> #local($/)= "\n";
>> my $line;
>> my ($desc,$seq,$id,$org,$date,$acc_string, at sec,$acc);
>> my ($annotation, %params, @features) = ( new
>> Bio::Annotation::Collection);
>>
>> while(defined($line = $self->_readline())) {
>> last if index($line,'ENTRY ') == 0;
>> }
>> return undef if( !defined $line ); # end of file
>>
>> $line =~ /^ENTRY\s+(\S+)\s+/ ||
>> $self->throw("Pir stream with bad ENTRY line. Not Pir in my
>> book.");
>> $id = $1;
>> $params{'-display_id'} = $id;
>>
>> until(defined ($line) && ($line =~ /^SEQUENCE/) ) {
>>
>> # Description line(s)
>> if ($line=~/^TITLE\s+(.*)/) {
>> $desc = $1;
>> }
>> # organism line(s)
>> if ($line=~/^ORGANISM\s+\#formal_name\s+(.*)/) {
>> $org = $1;
>> my @class =($org);
>> my $make = Bio::Species->new();
>> $make->classification(\@class,"FORCE"); # no name validation
>> please
>> $params{'-species'}= $make;
>> }
>> # date line
>> if($line=~/^DATE\s+(\d\d-\w\w\w-\d\d\d\d).*/) {
>> $date = $1;
>> $date =~ s/\;//;
>> $date =~ s/\s+$//;
>> push @{$params{'-dates'}}, $date;
>> }
>> #accession
>> if($line=~/^ACCESSIONS\s+(.*)/) {
>> $seq = "";
>> $acc_string =$1;
>> $acc_string =~ s/\;\s*/ /g;
>> ($acc, at sec) = split " ",$acc_string;
>> }
>>
>> $line = $self->_readline();
>>
>> }
>> my ($seqc,$seqn) = ("","");
>> my $nb=0;
>> while( defined ($line = $self->_readline) ) {
>> if ($line=~/^\/\/\//) {last};
>> if ($line=~/^\s+\d+\s+\d+/) {next};
>> if ($line=~/^\s+\d+(.*)/) {
>> $line=$1;
>> }
>> $seq = uc($line);
>> $seqc .= $seq;
>> }
>>
>> # P - indicates complete protein
>> # F - indicates protein fragment
>> # not sure how to stuff these into a Bio object
>> # suitable for writing out.
>> $seqc =~ s/\*//g;
>> $seqc =~ s/[\(\)\.\/\=\,]//g;
>> $seqc =~ s/\s+//g; # get rid of whitespace
>> $params{'-seq_version'} = '';
>>
>> my ($alphabet) = ('protein');
>> # TODO - not processing SFS data
>> my $entry = $self->sequence_factory->create
>> (-verbose => $self->verbose,
>> %params,
>> -seq => $seqc,
>> -primary_id => $id,
>> -id => $id,
>> -desc => $desc,
>> -alphabet => $alphabet,
>> -accession_number => $acc,
>> -secondardy_accessions => \@sec,
>> );
>>
>> return $entry;
>> }
>>
>>
>> =head2 write_seq
>>
>> Title : write_seq
>> Usage : $stream->write_seq(@seq)
>> Function: writes the $seq object into the stream
>> Returns : 1 for success and 0 for error
>> Args : Array of Bio::PrimarySeqI objects
>>
>>
>> =cut
>>
>> #sub write_seq {
>> # my ($self, @seq) = @_;
>> # for my $seq (@seq) {
>> # $self->throw("Did not provide a valid Bio::PrimarySeqI object")
>> # unless defined $seq && ref($seq) &&
>> $seq->isa('Bio::PrimarySeqI');
>> # my $str = $seq->seq();
>> # return unless $self->_print(">".$seq->id(),
>> # "\n", $seq->desc(), "\n",
>> # $str, "*\n");
>> # }
>>
>> # $self->flush if $self->_flush_on_write && defined $self->_fh;
>> # return 1;
>> #}
>>
>> 1;
>>
>>
>>
>> Laure Durufle
>>
>>
>>
>>
>>
>> **********************************************************************
>> *****
>> ***************** S - This message contains confidential information
>> and is
>> intended only for the individual named. If you are not the named
>> addressee,
>> you should not disseminate, distribute or copy this e-mail. Please
>> notify
>> the sender immediately by e-mail if you have received this e-mail by
>> mistake and delete this e-mail from your system.
>> e-mail transmission cannot be guaranteed to be secure or error-free as
>> information could be intercepted, corrupted, lost, destroyed, arrive
>> late
>> or incomplete, or contain malware. The presence of this disclaimer is
>> not a
>> proof that it was originated at Serono International S.A. or one of
>> its
>> affiliates. Serono International S.A and its affiliates therefore do
>> not
>> accept liability for any errors or omissions in the content of this
>> message, which arise as a result of e-mail transmission. If
>> verification is
>> required, please request a hard-copy version. Serono International SA,
>> 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com.
>> **********************************************************************
>> *****
>> ******************
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ______ _/ _/_____________________________________________________
> _/ _/ http://www.ebi.ac.uk/mutations/
> _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
> _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
> _/ _/ _/ Wellcome Trust Genome Campus, Hinxton
> _/ _/ _/ Cambridge, CB10 1SD, United Kingdom
> _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list