[Bioperl-l] pir.pm => bug

Mon Jun 28 12:46:42 EDT 2004

Laure,

Thanks for the fix.

pir.pm has not been updated for a long time. Not many people work with the 
format. 

Before I apply your changes into the file, I'll summarise here the major 
changes  so that others can comment:

- uses Bio::Species and Bio::Annotation::Collection
- uses Bio::Seq::RichSeq rather than Bio::Seq
- parses TITLE, ORGANISM, DATE, ACCESSIONS lines
- comments out method write_seq()

I do  not know if write_seq() is needed in the module neither if its removal 
is intentional?

	-Heikki   

On Friday 25 Jun 2004 06:05, Laure.Durufle at serono.com wrote:
> Hi,
>
>
> I moved the package pir.pm / we give the file and with pir.pm we can parse
> this file  pir*.dat :
>
> like this format :
>
>
>                 P R O T E I N  S E Q U E N C E  D A T A B A S E
>                              of PIR-International
>
>                       Section 1. Fully Classified Entries
>                          Release 79.01, April 04, 2004
>                        20685 sequences, 8103841 residues
>
>                        Protein Information Resource (PIR)*
>                     National Biomedical Research Foundation
>                           3900 Reservoir Road, N.W.,
>                           Washington, DC  20007, USA
>
>    Japan International Protein           Munich Information Center for
>    Information Database (JIPID)             Protein Sequences (MIPS)
>          Amakubo 1-16-1          GSF-Forschungszentrum f. Umwelt und
> Gesundheit
>     Tsukuba 305-0005, Japan            am Max-Planck-Instut f. Biochemie
>                                   Am Klopferspitz 18, D-82152 Martinsried,
> FRG
>
>    This database may be redistributed without prior consent, provided that
>    this notice be given to each user and that the words "Derived from"
> shall
>    precede this notice if the database has been altered by the
> redistributor.
>
>                        Copyright 2000, PIR-International.
>
>                        *PIR is a registered mark of NBRF.
> \\\
> ENTRY           A27187  #type complete
> TITLE           ubiquinol-cytochrome-c reductase (EC 1.10.2.2) cytochrome
> c1
>                 precursor - Neurospora crassa
> ALTERNATE_NAMES bc1 complex cytochrome c1; complex III cytochrome c1;
>                 cytochrome c1 heme protein
> ORGANISM        #formal_name Neurospora crassa
> DATE            05-Oct-1988 #sequence_revision 15-Oct-1994 #text_change
>                 03-Jun-2002
> ACCESSIONS      A27187
> REFERENCE       A27187
>    #authors     Roemisch, J.; Tropschug, M.; Sebald, W.; Weiss, H.
>    #journal     Eur. J. Biochem. (1987) 164:111-115
>    #title       The primary structure of cytochrome c-1 from Neurospora
>                 crassa.
>    #cross-references MUID:87161871; PMID:3030747
>    #accession   A27187
>       ##molecule_type mRNA
>       ##residues 1-332 ##label ROE
>       ##cross-references GB:X05235; NID:g3005; PIDN:CAA28860.1; PID:g3006
>       ##note the authors translated the codon AGT for residue 316 as Arg
> CLASSIFICATION  #superfamily cytochrome c1 heme protein; cytochrome c1 heme
>                 protein homology
> KEYWORDS        chromoprotein; electron transfer; heme; iron;
>                 metalloprotein; mitochondrion; oxidative phosphorylation;
>                 oxidoreductase; respiratory chain; transmembrane protein
> FEATURE
>    1-70                #domain transit peptide (mitochondrion) #status
>                        predicted #label TNP\
>    71-332              #product cytochrome c1 #status predicted #label MAT\
>    79-305              #domain cytochrome c1 heme protein homology #label
>                        C1H\
>    278-296             #domain transmembrane #status predicted #label TMM\
>    110,113             #binding_site heme (Cys) (covalent) #status
>                        predicted\
>    114,234             #binding_site heme iron (His, Met) (axial ligands)
>                        #status predicted
> SUMMARY         #length 332  #molecular-weight 36456  #checksum 1753
> SEQUENCE
>                  5        10        15        20        25        30
>        1 M L A R T C L R S T R T F A S A K N G A F K F A K R S A S T
>       31 Q S S G A A A E S P L R L N I A A A A A T A V A A G S I A W
>       61 Y Y H L Y G F A S A M T P A E E G L H A T K Y P W V H E Q W
>       91 L K T F D H Q A L R R G F Q V Y R E V C A S C H S L S R V P
>      121 Y R A L V G T I L T V D E A K A L A E E N E Y D T E P N D Q
>      151 G E I E K R P G K L S D Y L P D P Y K N D E A A R F A N N G
>      181 A L P P D L S L I V K A R H G G C D Y I F S L L T G Y P D E
>      211 P P A G A S V G A G L N F N P Y F P G T G I A M A R V L Y D
>      241 G L V D Y E D G T P A S T S Q M A K D V V E F L N W A A E P
>      271 E M D D R K R M G M K V L V V T S V L F A L S V Y V K R Y K
>      301 W A W L K S R K I V Y D P P K S P P P A T N L A L P Q Q R A
>      331 K S
> ///
>
>
> the package is that :
> # $Id: pir.pm,v 1.4 2004/06/25 09:51:14 ldurufle Exp $
> #
> # BioPerl module for Bio::SeqIO::PIR
> #
> # Cared for by Aaron Mackey <amackey at virginia.edu>
> #
> # Copyright Aaron Mackey
> #
> # You may distribute this module under the same terms as perl itself
> #
> # _history
> # October 18, 1999  Largely rewritten by Lincoln Stein
>
> # POD documentation - main docs before the code
>
> =head1 NAME
>
> Bio::SeqIO::pir - PIR sequence input/output stream
>
> =head1 SYNOPSIS
>
> Do not use this module directly.  Use it via the Bio::SeqIO class.
>
> =head1 DESCRIPTION
>
> This object can transform Bio::Seq objects to and from pir flat
> file databases.
>
> Note: This does not completely preserve the PIR format - quality
> information about sequence is currently discarded since bioperl
> does not have a mechanism for handling these encodings in sequence
> data.
>
> =head1 FEEDBACK
>
> =head2 Mailing Lists
>
> User feedback is an integral part of the evolution of this and other
> Bioperl modules. Send your comments and suggestions preferably to one
> of the Bioperl mailing lists.  Your participation is much appreciated.
>
>   bioperl-l at bioperl.org                 - General discussion
>   http://www.bioperl.org/MailList.shtml - About the mailing lists
>
> =head2 Reporting Bugs
>
> Report bugs to the Bioperl bug tracking system to help us keep track
>  the bugs and their resolution.
>  Bug reports can be submitted via email or the web:
>
>   bioperl-bugs at bio.perl.org
>   http://bugzilla.bioperl.org/
>
> =head1 AUTHORS
>
> Aaron Mackey E<lt>amackey at virginia.eduE<gt>
> Lincoln Stein E<lt>lstein at cshl.orgE<gt>
> Jason Stajich E<lt>jason at bioperl.orgE<gt>
>
> =head1 APPENDIX
>
> The rest of the documentation details each of the object
> methods. Internal methods are usually preceded with a _
>
> =cut
>
> # Let the code begin...
>
> package Bio::SeqIO::pir;
> use vars qw(@ISA);
> use strict;
>
> use Bio::SeqIO;
> use Bio::Seq::SeqFactory;
> use Bio::Species;
> use Bio::Annotation::Collection;
>
> @ISA = qw(Bio::SeqIO);
>
> sub _initialize {
>   my($self, at args) = @_;
>   $self->SUPER::_initialize(@args);
>   if( ! defined $self->sequence_factory ) {
>       $self->sequence_factory(new Bio::Seq::SeqFactory
>                         (-verbose => $self->verbose(),
>                          -type => 'Bio::Seq::RichSeq'));
>   }
> }
>
> =head2 next_seq
>
>  Title   : next_seq
>  Usage   : $seq = $stream->next_seq()
>  Function: returns the next sequence in the stream
>  Returns : Bio::Seq object
>  Args    : NONE
>
> =cut
>
> sub next_seq {
>     my ($self) = @_;
>     #local($/)= "\n";
>     my $line;
>     my ($desc,$seq,$id,$org,$date,$acc_string, at sec,$acc);
>     my ($annotation, %params, @features) = ( new
> Bio::Annotation::Collection);
>
>     while(defined($line = $self->_readline())) {
>       last if index($line,'ENTRY       ') == 0;
>     }
>     return undef if( !defined $line ); # end of file
>
>     $line =~ /^ENTRY\s+(\S+)\s+/ ||
>         $self->throw("Pir stream with bad ENTRY line. Not Pir in my
> book.");
>     $id = $1;
>     $params{'-display_id'} = $id;
>
>     until(defined ($line) && ($line =~ /^SEQUENCE/) ) {
>
>     # Description line(s)
>       if ($line=~/^TITLE\s+(.*)/) {
>       $desc = $1;
>       }
>       # organism line(s)
>       if ($line=~/^ORGANISM\s+\#formal_name\s+(.*)/) {
>       $org = $1;
>       my @class =($org);
>       my $make = Bio::Species->new();
>       $make->classification(\@class,"FORCE"); # no name validation please
>       $params{'-species'}= $make;
>       }
>       # date line
>       if($line=~/^DATE\s+(\d\d-\w\w\w-\d\d\d\d).*/) {
>       $date = $1;
>       $date =~ s/\;//;
>       $date =~ s/\s+$//;
>       push @{$params{'-dates'}}, $date;
>       }
>       #accession
>       if($line=~/^ACCESSIONS\s+(.*)/) {
>       $seq = "";
>       $acc_string =$1;
>       $acc_string =~ s/\;\s*/ /g;
>       ($acc, at sec) = split " ",$acc_string;
>       }
>
>       $line = $self->_readline();
>
>     }
>     my ($seqc,$seqn) = ("","");
>     my $nb=0;
>     while( defined ($line = $self->_readline) ) {
>       if ($line=~/^\/\/\//) {last};
>       if ($line=~/^\s+\d+\s+\d+/) {next};
>       if ($line=~/^\s+\d+(.*)/) {
>       $line=$1;
>       }
>       $seq   = uc($line);
>       $seqc .= $seq;
>     }
>
>     # P - indicates complete protein
>     # F - indicates protein fragment
>     # not sure how to stuff these into a Bio object
>     # suitable for writing out.
>     $seqc =~ s/\*//g;
>     $seqc =~ s/[\(\)\.\/\=\,]//g;
>     $seqc =~ s/\s+//g;        # get rid of whitespace
>     $params{'-seq_version'} = '';
>
>     my ($alphabet) = ('protein');
>     # TODO - not processing SFS data
>     my $entry = $self->sequence_factory->create
>       (-verbose  => $self->verbose,
>        %params,
>        -seq        => $seqc,
>        -primary_id => $id,
>        -id         => $id,
>        -desc       => $desc,
>        -alphabet    => $alphabet,
>        -accession_number => $acc,
>        -secondardy_accessions => \@sec,
>        );
>
>    return $entry;
> }
>
>
> =head2 write_seq
>
>  Title   : write_seq
>  Usage   : $stream->write_seq(@seq)
>  Function: writes the $seq object into the stream
>  Returns : 1 for success and 0 for error
>  Args    : Array of Bio::PrimarySeqI objects
>
>
> =cut
>
> #sub write_seq {
> #    my ($self, @seq) = @_;
> #    for my $seq (@seq) {
> #     $self->throw("Did not provide a valid Bio::PrimarySeqI object")
> #         unless defined $seq && ref($seq) &&
> $seq->isa('Bio::PrimarySeqI');
> #     my $str = $seq->seq();
> #     return unless $self->_print(">".$seq->id(),
> #                           "\n", $seq->desc(), "\n",
> #                           $str, "*\n");
> #    }
>
> #    $self->flush if $self->_flush_on_write && defined $self->_fh;
> #    return 1;
> #}
>
> 1;
>
>
>
> Laure Durufle
>
>
>
>
>
> ***************************************************************************
>***************** S - This message contains confidential information and is
> intended only for the individual named. If you are not the named addressee,
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately by e-mail if you have received this e-mail by
> mistake and delete this e-mail from your system.
> e-mail transmission cannot be guaranteed to be secure or error-free as
> information could be intercepted, corrupted, lost, destroyed, arrive late
> or incomplete, or contain malware. The presence of this disclaimer is not a
> proof that it was originated at Serono International S.A. or one of its
> affiliates. Serono International S.A and its affiliates therefore do not
> accept liability for any errors or omissions in the content of this
> message, which arise as a result of e-mail transmission. If verification is
> required, please request a hard-copy version. Serono International SA,
> 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com.
> ***************************************************************************
>******************
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________