[Bioperl-l] pir.pm => bug

Mon Jun 28 13:54:31 EDT 2004

Note also that PIR is obsolete. PIR merged with Swissprot to form  
Uniprot. The Uniprot format is basically the swissprot format.

	-hilmar

On Monday, June 28, 2004, at 10:46  AM, Heikki Lehvaslaiho wrote:

> Laure,
>
> Thanks for the fix.
>
> pir.pm has not been updated for a long time. Not many people work with  
> the
> format.
>
> Before I apply your changes into the file, I'll summarise here the  
> major
> changes  so that others can comment:
>
> - uses Bio::Species and Bio::Annotation::Collection
> - uses Bio::Seq::RichSeq rather than Bio::Seq
> - parses TITLE, ORGANISM, DATE, ACCESSIONS lines
> - comments out method write_seq()
>
> I do  not know if write_seq() is needed in the module neither if its  
> removal
> is intentional?
>
> 	-Heikki
>
>
>
> On Friday 25 Jun 2004 06:05, Laure.Durufle at serono.com wrote:
>> Hi,
>>
>>
>> I moved the package pir.pm / we give the file and with pir.pm we can  
>> parse
>> this file  pir*.dat :
>>
>> like this format :
>>
>>
>>                 P R O T E I N  S E Q U E N C E  D A T A B A S E
>>                              of PIR-International
>>
>>                       Section 1. Fully Classified Entries
>>                          Release 79.01, April 04, 2004
>>                        20685 sequences, 8103841 residues
>>
>>                        Protein Information Resource (PIR)*
>>                     National Biomedical Research Foundation
>>                           3900 Reservoir Road, N.W.,
>>                           Washington, DC  20007, USA
>>
>>    Japan International Protein           Munich Information Center for
>>    Information Database (JIPID)             Protein Sequences (MIPS)
>>          Amakubo 1-16-1          GSF-Forschungszentrum f. Umwelt und
>> Gesundheit
>>     Tsukuba 305-0005, Japan            am Max-Planck-Instut f.  
>> Biochemie
>>                                   Am Klopferspitz 18, D-82152  
>> Martinsried,
>> FRG
>>
>>    This database may be redistributed without prior consent, provided  
>> that
>>    this notice be given to each user and that the words "Derived from"
>> shall
>>    precede this notice if the database has been altered by the
>> redistributor.
>>
>>                        Copyright 2000, PIR-International.
>>
>>                        *PIR is a registered mark of NBRF.
>> \\\
>> ENTRY           A27187  #type complete
>> TITLE           ubiquinol-cytochrome-c reductase (EC 1.10.2.2)  
>> cytochrome
>> c1
>>                 precursor - Neurospora crassa
>> ALTERNATE_NAMES bc1 complex cytochrome c1; complex III cytochrome c1;
>>                 cytochrome c1 heme protein
>> ORGANISM        #formal_name Neurospora crassa
>> DATE            05-Oct-1988 #sequence_revision 15-Oct-1994  
>> #text_change
>>                 03-Jun-2002
>> ACCESSIONS      A27187
>> REFERENCE       A27187
>>    #authors     Roemisch, J.; Tropschug, M.; Sebald, W.; Weiss, H.
>>    #journal     Eur. J. Biochem. (1987) 164:111-115
>>    #title       The primary structure of cytochrome c-1 from  
>> Neurospora
>>                 crassa.
>>    #cross-references MUID:87161871; PMID:3030747
>>    #accession   A27187
>>       ##molecule_type mRNA
>>       ##residues 1-332 ##label ROE
>>       ##cross-references GB:X05235; NID:g3005; PIDN:CAA28860.1;  
>> PID:g3006
>>       ##note the authors translated the codon AGT for residue 316 as  
>> Arg
>> CLASSIFICATION  #superfamily cytochrome c1 heme protein; cytochrome  
>> c1 heme
>>                 protein homology
>> KEYWORDS        chromoprotein; electron transfer; heme; iron;
>>                 metalloprotein; mitochondrion; oxidative  
>> phosphorylation;
>>                 oxidoreductase; respiratory chain; transmembrane  
>> protein
>> FEATURE
>>    1-70                #domain transit peptide (mitochondrion) #status
>>                        predicted #label TNP\
>>    71-332              #product cytochrome c1 #status predicted  
>> #label MAT\
>>    79-305              #domain cytochrome c1 heme protein homology  
>> #label
>>                        C1H\
>>    278-296             #domain transmembrane #status predicted #label  
>> TMM\
>>    110,113             #binding_site heme (Cys) (covalent) #status
>>                        predicted\
>>    114,234             #binding_site heme iron (His, Met) (axial  
>> ligands)
>>                        #status predicted
>> SUMMARY         #length 332  #molecular-weight 36456  #checksum 1753
>> SEQUENCE
>>                  5        10        15        20        25        30
>>        1 M L A R T C L R S T R T F A S A K N G A F K F A K R S A S T
>>       31 Q S S G A A A E S P L R L N I A A A A A T A V A A G S I A W
>>       61 Y Y H L Y G F A S A M T P A E E G L H A T K Y P W V H E Q W
>>       91 L K T F D H Q A L R R G F Q V Y R E V C A S C H S L S R V P
>>      121 Y R A L V G T I L T V D E A K A L A E E N E Y D T E P N D Q
>>      151 G E I E K R P G K L S D Y L P D P Y K N D E A A R F A N N G
>>      181 A L P P D L S L I V K A R H G G C D Y I F S L L T G Y P D E
>>      211 P P A G A S V G A G L N F N P Y F P G T G I A M A R V L Y D
>>      241 G L V D Y E D G T P A S T S Q M A K D V V E F L N W A A E P
>>      271 E M D D R K R M G M K V L V V T S V L F A L S V Y V K R Y K
>>      301 W A W L K S R K I V Y D P P K S P P P A T N L A L P Q Q R A
>>      331 K S
>> ///
>>
>>
>> the package is that :
>> # $Id: pir.pm,v 1.4 2004/06/25 09:51:14 ldurufle Exp $
>> #
>> # BioPerl module for Bio::SeqIO::PIR
>> #
>> # Cared for by Aaron Mackey <amackey at virginia.edu>
>> #
>> # Copyright Aaron Mackey
>> #
>> # You may distribute this module under the same terms as perl itself
>> #
>> # _history
>> # October 18, 1999  Largely rewritten by Lincoln Stein
>>
>> # POD documentation - main docs before the code
>>
>> =head1 NAME
>>
>> Bio::SeqIO::pir - PIR sequence input/output stream
>>
>> =head1 SYNOPSIS
>>
>> Do not use this module directly.  Use it via the Bio::SeqIO class.
>>
>> =head1 DESCRIPTION
>>
>> This object can transform Bio::Seq objects to and from pir flat
>> file databases.
>>
>> Note: This does not completely preserve the PIR format - quality
>> information about sequence is currently discarded since bioperl
>> does not have a mechanism for handling these encodings in sequence
>> data.
>>
>> =head1 FEEDBACK
>>
>> =head2 Mailing Lists
>>
>> User feedback is an integral part of the evolution of this and other
>> Bioperl modules. Send your comments and suggestions preferably to one
>> of the Bioperl mailing lists.  Your participation is much appreciated.
>>
>>   bioperl-l at bioperl.org                 - General discussion
>>   http://www.bioperl.org/MailList.shtml - About the mailing lists
>>
>> =head2 Reporting Bugs
>>
>> Report bugs to the Bioperl bug tracking system to help us keep track
>>  the bugs and their resolution.
>>  Bug reports can be submitted via email or the web:
>>
>>   bioperl-bugs at bio.perl.org
>>   http://bugzilla.bioperl.org/
>>
>> =head1 AUTHORS
>>
>> Aaron Mackey E<lt>amackey at virginia.eduE<gt>
>> Lincoln Stein E<lt>lstein at cshl.orgE<gt>
>> Jason Stajich E<lt>jason at bioperl.orgE<gt>
>>
>> =head1 APPENDIX
>>
>> The rest of the documentation details each of the object
>> methods. Internal methods are usually preceded with a _
>>
>> =cut
>>
>> # Let the code begin...
>>
>> package Bio::SeqIO::pir;
>> use vars qw(@ISA);
>> use strict;
>>
>> use Bio::SeqIO;
>> use Bio::Seq::SeqFactory;
>> use Bio::Species;
>> use Bio::Annotation::Collection;
>>
>> @ISA = qw(Bio::SeqIO);
>>
>> sub _initialize {
>>   my($self, at args) = @_;
>>   $self->SUPER::_initialize(@args);
>>   if( ! defined $self->sequence_factory ) {
>>       $self->sequence_factory(new Bio::Seq::SeqFactory
>>                         (-verbose => $self->verbose(),
>>                          -type => 'Bio::Seq::RichSeq'));
>>   }
>> }
>>
>> =head2 next_seq
>>
>>  Title   : next_seq
>>  Usage   : $seq = $stream->next_seq()
>>  Function: returns the next sequence in the stream
>>  Returns : Bio::Seq object
>>  Args    : NONE
>>
>> =cut
>>
>> sub next_seq {
>>     my ($self) = @_;
>>     #local($/)= "\n";
>>     my $line;
>>     my ($desc,$seq,$id,$org,$date,$acc_string, at sec,$acc);
>>     my ($annotation, %params, @features) = ( new
>> Bio::Annotation::Collection);
>>
>>     while(defined($line = $self->_readline())) {
>>       last if index($line,'ENTRY       ') == 0;
>>     }
>>     return undef if( !defined $line ); # end of file
>>
>>     $line =~ /^ENTRY\s+(\S+)\s+/ ||
>>         $self->throw("Pir stream with bad ENTRY line. Not Pir in my
>> book.");
>>     $id = $1;
>>     $params{'-display_id'} = $id;
>>
>>     until(defined ($line) && ($line =~ /^SEQUENCE/) ) {
>>
>>     # Description line(s)
>>       if ($line=~/^TITLE\s+(.*)/) {
>>       $desc = $1;
>>       }
>>       # organism line(s)
>>       if ($line=~/^ORGANISM\s+\#formal_name\s+(.*)/) {
>>       $org = $1;
>>       my @class =($org);
>>       my $make = Bio::Species->new();
>>       $make->classification(\@class,"FORCE"); # no name validation  
>> please
>>       $params{'-species'}= $make;
>>       }
>>       # date line
>>       if($line=~/^DATE\s+(\d\d-\w\w\w-\d\d\d\d).*/) {
>>       $date = $1;
>>       $date =~ s/\;//;
>>       $date =~ s/\s+$//;
>>       push @{$params{'-dates'}}, $date;
>>       }
>>       #accession
>>       if($line=~/^ACCESSIONS\s+(.*)/) {
>>       $seq = "";
>>       $acc_string =$1;
>>       $acc_string =~ s/\;\s*/ /g;
>>       ($acc, at sec) = split " ",$acc_string;
>>       }
>>
>>       $line = $self->_readline();
>>
>>     }
>>     my ($seqc,$seqn) = ("","");
>>     my $nb=0;
>>     while( defined ($line = $self->_readline) ) {
>>       if ($line=~/^\/\/\//) {last};
>>       if ($line=~/^\s+\d+\s+\d+/) {next};
>>       if ($line=~/^\s+\d+(.*)/) {
>>       $line=$1;
>>       }
>>       $seq   = uc($line);
>>       $seqc .= $seq;
>>     }
>>
>>     # P - indicates complete protein
>>     # F - indicates protein fragment
>>     # not sure how to stuff these into a Bio object
>>     # suitable for writing out.
>>     $seqc =~ s/\*//g;
>>     $seqc =~ s/[\(\)\.\/\=\,]//g;
>>     $seqc =~ s/\s+//g;        # get rid of whitespace
>>     $params{'-seq_version'} = '';
>>
>>     my ($alphabet) = ('protein');
>>     # TODO - not processing SFS data
>>     my $entry = $self->sequence_factory->create
>>       (-verbose  => $self->verbose,
>>        %params,
>>        -seq        => $seqc,
>>        -primary_id => $id,
>>        -id         => $id,
>>        -desc       => $desc,
>>        -alphabet    => $alphabet,
>>        -accession_number => $acc,
>>        -secondardy_accessions => \@sec,
>>        );
>>
>>    return $entry;
>> }
>>
>>
>> =head2 write_seq
>>
>>  Title   : write_seq
>>  Usage   : $stream->write_seq(@seq)
>>  Function: writes the $seq object into the stream
>>  Returns : 1 for success and 0 for error
>>  Args    : Array of Bio::PrimarySeqI objects
>>
>>
>> =cut
>>
>> #sub write_seq {
>> #    my ($self, @seq) = @_;
>> #    for my $seq (@seq) {
>> #     $self->throw("Did not provide a valid Bio::PrimarySeqI object")
>> #         unless defined $seq && ref($seq) &&
>> $seq->isa('Bio::PrimarySeqI');
>> #     my $str = $seq->seq();
>> #     return unless $self->_print(">".$seq->id(),
>> #                           "\n", $seq->desc(), "\n",
>> #                           $str, "*\n");
>> #    }
>>
>> #    $self->flush if $self->_flush_on_write && defined $self->_fh;
>> #    return 1;
>> #}
>>
>> 1;
>>
>>
>>
>> Laure Durufle
>>
>>
>>
>>
>>
>> ********************************************************************** 
>> *****
>> ***************** S - This message contains confidential information  
>> and is
>> intended only for the individual named. If you are not the named  
>> addressee,
>> you should not disseminate, distribute or copy this e-mail. Please  
>> notify
>> the sender immediately by e-mail if you have received this e-mail by
>> mistake and delete this e-mail from your system.
>> e-mail transmission cannot be guaranteed to be secure or error-free as
>> information could be intercepted, corrupted, lost, destroyed, arrive  
>> late
>> or incomplete, or contain malware. The presence of this disclaimer is  
>> not a
>> proof that it was originated at Serono International S.A. or one of  
>> its
>> affiliates. Serono International S.A and its affiliates therefore do  
>> not
>> accept liability for any errors or omissions in the content of this
>> message, which arise as a result of e-mail transmission. If  
>> verification is
>> required, please request a hard-copy version. Serono International SA,
>> 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com.
>> ********************************************************************** 
>> *****
>> ******************
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/                      http://www.ebi.ac.uk/mutations/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
>     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
>    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
>   _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
>      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------