[Bioperl-l] pir.pm => bug
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Mon Jun 28 12:46:42 EDT 2004
Laure,
Thanks for the fix.
pir.pm has not been updated for a long time. Not many people work with the
format.
Before I apply your changes into the file, I'll summarise here the major
changes so that others can comment:
- uses Bio::Species and Bio::Annotation::Collection
- uses Bio::Seq::RichSeq rather than Bio::Seq
- parses TITLE, ORGANISM, DATE, ACCESSIONS lines
- comments out method write_seq()
I do not know if write_seq() is needed in the module neither if its removal
is intentional?
-Heikki
On Friday 25 Jun 2004 06:05, Laure.Durufle at serono.com wrote:
> Hi,
>
>
> I moved the package pir.pm / we give the file and with pir.pm we can parse
> this file pir*.dat :
>
> like this format :
>
>
> P R O T E I N S E Q U E N C E D A T A B A S E
> of PIR-International
>
> Section 1. Fully Classified Entries
> Release 79.01, April 04, 2004
> 20685 sequences, 8103841 residues
>
> Protein Information Resource (PIR)*
> National Biomedical Research Foundation
> 3900 Reservoir Road, N.W.,
> Washington, DC 20007, USA
>
> Japan International Protein Munich Information Center for
> Information Database (JIPID) Protein Sequences (MIPS)
> Amakubo 1-16-1 GSF-Forschungszentrum f. Umwelt und
> Gesundheit
> Tsukuba 305-0005, Japan am Max-Planck-Instut f. Biochemie
> Am Klopferspitz 18, D-82152 Martinsried,
> FRG
>
> This database may be redistributed without prior consent, provided that
> this notice be given to each user and that the words "Derived from"
> shall
> precede this notice if the database has been altered by the
> redistributor.
>
> Copyright 2000, PIR-International.
>
> *PIR is a registered mark of NBRF.
> \\\
> ENTRY A27187 #type complete
> TITLE ubiquinol-cytochrome-c reductase (EC 1.10.2.2) cytochrome
> c1
> precursor - Neurospora crassa
> ALTERNATE_NAMES bc1 complex cytochrome c1; complex III cytochrome c1;
> cytochrome c1 heme protein
> ORGANISM #formal_name Neurospora crassa
> DATE 05-Oct-1988 #sequence_revision 15-Oct-1994 #text_change
> 03-Jun-2002
> ACCESSIONS A27187
> REFERENCE A27187
> #authors Roemisch, J.; Tropschug, M.; Sebald, W.; Weiss, H.
> #journal Eur. J. Biochem. (1987) 164:111-115
> #title The primary structure of cytochrome c-1 from Neurospora
> crassa.
> #cross-references MUID:87161871; PMID:3030747
> #accession A27187
> ##molecule_type mRNA
> ##residues 1-332 ##label ROE
> ##cross-references GB:X05235; NID:g3005; PIDN:CAA28860.1; PID:g3006
> ##note the authors translated the codon AGT for residue 316 as Arg
> CLASSIFICATION #superfamily cytochrome c1 heme protein; cytochrome c1 heme
> protein homology
> KEYWORDS chromoprotein; electron transfer; heme; iron;
> metalloprotein; mitochondrion; oxidative phosphorylation;
> oxidoreductase; respiratory chain; transmembrane protein
> FEATURE
> 1-70 #domain transit peptide (mitochondrion) #status
> predicted #label TNP\
> 71-332 #product cytochrome c1 #status predicted #label MAT\
> 79-305 #domain cytochrome c1 heme protein homology #label
> C1H\
> 278-296 #domain transmembrane #status predicted #label TMM\
> 110,113 #binding_site heme (Cys) (covalent) #status
> predicted\
> 114,234 #binding_site heme iron (His, Met) (axial ligands)
> #status predicted
> SUMMARY #length 332 #molecular-weight 36456 #checksum 1753
> SEQUENCE
> 5 10 15 20 25 30
> 1 M L A R T C L R S T R T F A S A K N G A F K F A K R S A S T
> 31 Q S S G A A A E S P L R L N I A A A A A T A V A A G S I A W
> 61 Y Y H L Y G F A S A M T P A E E G L H A T K Y P W V H E Q W
> 91 L K T F D H Q A L R R G F Q V Y R E V C A S C H S L S R V P
> 121 Y R A L V G T I L T V D E A K A L A E E N E Y D T E P N D Q
> 151 G E I E K R P G K L S D Y L P D P Y K N D E A A R F A N N G
> 181 A L P P D L S L I V K A R H G G C D Y I F S L L T G Y P D E
> 211 P P A G A S V G A G L N F N P Y F P G T G I A M A R V L Y D
> 241 G L V D Y E D G T P A S T S Q M A K D V V E F L N W A A E P
> 271 E M D D R K R M G M K V L V V T S V L F A L S V Y V K R Y K
> 301 W A W L K S R K I V Y D P P K S P P P A T N L A L P Q Q R A
> 331 K S
> ///
>
>
> the package is that :
> # $Id: pir.pm,v 1.4 2004/06/25 09:51:14 ldurufle Exp $
> #
> # BioPerl module for Bio::SeqIO::PIR
> #
> # Cared for by Aaron Mackey <amackey at virginia.edu>
> #
> # Copyright Aaron Mackey
> #
> # You may distribute this module under the same terms as perl itself
> #
> # _history
> # October 18, 1999 Largely rewritten by Lincoln Stein
>
> # POD documentation - main docs before the code
>
> =head1 NAME
>
> Bio::SeqIO::pir - PIR sequence input/output stream
>
> =head1 SYNOPSIS
>
> Do not use this module directly. Use it via the Bio::SeqIO class.
>
> =head1 DESCRIPTION
>
> This object can transform Bio::Seq objects to and from pir flat
> file databases.
>
> Note: This does not completely preserve the PIR format - quality
> information about sequence is currently discarded since bioperl
> does not have a mechanism for handling these encodings in sequence
> data.
>
> =head1 FEEDBACK
>
> =head2 Mailing Lists
>
> User feedback is an integral part of the evolution of this and other
> Bioperl modules. Send your comments and suggestions preferably to one
> of the Bioperl mailing lists. Your participation is much appreciated.
>
> bioperl-l at bioperl.org - General discussion
> http://www.bioperl.org/MailList.shtml - About the mailing lists
>
> =head2 Reporting Bugs
>
> Report bugs to the Bioperl bug tracking system to help us keep track
> the bugs and their resolution.
> Bug reports can be submitted via email or the web:
>
> bioperl-bugs at bio.perl.org
> http://bugzilla.bioperl.org/
>
> =head1 AUTHORS
>
> Aaron Mackey E<lt>amackey at virginia.eduE<gt>
> Lincoln Stein E<lt>lstein at cshl.orgE<gt>
> Jason Stajich E<lt>jason at bioperl.orgE<gt>
>
> =head1 APPENDIX
>
> The rest of the documentation details each of the object
> methods. Internal methods are usually preceded with a _
>
> =cut
>
> # Let the code begin...
>
> package Bio::SeqIO::pir;
> use vars qw(@ISA);
> use strict;
>
> use Bio::SeqIO;
> use Bio::Seq::SeqFactory;
> use Bio::Species;
> use Bio::Annotation::Collection;
>
> @ISA = qw(Bio::SeqIO);
>
> sub _initialize {
> my($self, at args) = @_;
> $self->SUPER::_initialize(@args);
> if( ! defined $self->sequence_factory ) {
> $self->sequence_factory(new Bio::Seq::SeqFactory
> (-verbose => $self->verbose(),
> -type => 'Bio::Seq::RichSeq'));
> }
> }
>
> =head2 next_seq
>
> Title : next_seq
> Usage : $seq = $stream->next_seq()
> Function: returns the next sequence in the stream
> Returns : Bio::Seq object
> Args : NONE
>
> =cut
>
> sub next_seq {
> my ($self) = @_;
> #local($/)= "\n";
> my $line;
> my ($desc,$seq,$id,$org,$date,$acc_string, at sec,$acc);
> my ($annotation, %params, @features) = ( new
> Bio::Annotation::Collection);
>
> while(defined($line = $self->_readline())) {
> last if index($line,'ENTRY ') == 0;
> }
> return undef if( !defined $line ); # end of file
>
> $line =~ /^ENTRY\s+(\S+)\s+/ ||
> $self->throw("Pir stream with bad ENTRY line. Not Pir in my
> book.");
> $id = $1;
> $params{'-display_id'} = $id;
>
> until(defined ($line) && ($line =~ /^SEQUENCE/) ) {
>
> # Description line(s)
> if ($line=~/^TITLE\s+(.*)/) {
> $desc = $1;
> }
> # organism line(s)
> if ($line=~/^ORGANISM\s+\#formal_name\s+(.*)/) {
> $org = $1;
> my @class =($org);
> my $make = Bio::Species->new();
> $make->classification(\@class,"FORCE"); # no name validation please
> $params{'-species'}= $make;
> }
> # date line
> if($line=~/^DATE\s+(\d\d-\w\w\w-\d\d\d\d).*/) {
> $date = $1;
> $date =~ s/\;//;
> $date =~ s/\s+$//;
> push @{$params{'-dates'}}, $date;
> }
> #accession
> if($line=~/^ACCESSIONS\s+(.*)/) {
> $seq = "";
> $acc_string =$1;
> $acc_string =~ s/\;\s*/ /g;
> ($acc, at sec) = split " ",$acc_string;
> }
>
> $line = $self->_readline();
>
> }
> my ($seqc,$seqn) = ("","");
> my $nb=0;
> while( defined ($line = $self->_readline) ) {
> if ($line=~/^\/\/\//) {last};
> if ($line=~/^\s+\d+\s+\d+/) {next};
> if ($line=~/^\s+\d+(.*)/) {
> $line=$1;
> }
> $seq = uc($line);
> $seqc .= $seq;
> }
>
> # P - indicates complete protein
> # F - indicates protein fragment
> # not sure how to stuff these into a Bio object
> # suitable for writing out.
> $seqc =~ s/\*//g;
> $seqc =~ s/[\(\)\.\/\=\,]//g;
> $seqc =~ s/\s+//g; # get rid of whitespace
> $params{'-seq_version'} = '';
>
> my ($alphabet) = ('protein');
> # TODO - not processing SFS data
> my $entry = $self->sequence_factory->create
> (-verbose => $self->verbose,
> %params,
> -seq => $seqc,
> -primary_id => $id,
> -id => $id,
> -desc => $desc,
> -alphabet => $alphabet,
> -accession_number => $acc,
> -secondardy_accessions => \@sec,
> );
>
> return $entry;
> }
>
>
> =head2 write_seq
>
> Title : write_seq
> Usage : $stream->write_seq(@seq)
> Function: writes the $seq object into the stream
> Returns : 1 for success and 0 for error
> Args : Array of Bio::PrimarySeqI objects
>
>
> =cut
>
> #sub write_seq {
> # my ($self, @seq) = @_;
> # for my $seq (@seq) {
> # $self->throw("Did not provide a valid Bio::PrimarySeqI object")
> # unless defined $seq && ref($seq) &&
> $seq->isa('Bio::PrimarySeqI');
> # my $str = $seq->seq();
> # return unless $self->_print(">".$seq->id(),
> # "\n", $seq->desc(), "\n",
> # $str, "*\n");
> # }
>
> # $self->flush if $self->_flush_on_write && defined $self->_fh;
> # return 1;
> #}
>
> 1;
>
>
>
> Laure Durufle
>
>
>
>
>
> ***************************************************************************
>***************** S - This message contains confidential information and is
> intended only for the individual named. If you are not the named addressee,
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately by e-mail if you have received this e-mail by
> mistake and delete this e-mail from your system.
> e-mail transmission cannot be guaranteed to be secure or error-free as
> information could be intercepted, corrupted, lost, destroyed, arrive late
> or incomplete, or contain malware. The presence of this disclaimer is not a
> proof that it was originated at Serono International S.A. or one of its
> affiliates. Serono International S.A and its affiliates therefore do not
> accept liability for any errors or omissions in the content of this
> message, which arise as a result of e-mail transmission. If verification is
> required, please request a hard-copy version. Serono International SA,
> 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com.
> ***************************************************************************
>******************
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambridge, CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list