[Bioperl-l] multiple species in embl

Heikki Lehvaslaiho heikki at ebi.ac.uk
Tue Jul 13 09:34:12 EDT 2004


Laurie,

By two species, do you mean hybrid animals?  That is the only case where there 
should be more than one species in EMBL enties:

http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3.4.7

Even in that case the OC line is there only for the first species. 

I am not guite sure what bioperl should return in that case. Returning two 
species objects sounds a bt excessive when the second one is not fully 
populated ...

It is a long known problem that SWISS-PROT format allows multiple species per 
entry. Bioperl has been taking in only one; the first, I think.

Could send us some EMBL accession numbers with two species, please, so that we 
could have a look.

	-Heikki

P;S. These kind of long bug reports and file attachments go best into bioperl 
bugzilla: http://bugzilla.open-bio.org/. They are easier to manage there. 

Thanks,

	-H

On Monday 12 Jul 2004 18:00, Laure.Durufle at serono.com wrote:
> Hi,
>
> I noticed something : in the  the package embl.pm, the method species
> returns only the last organism : but in embl, one entry can belong to 2
> organisms.
> I write a method get_species to obtain all organisms in RichSeq.pm and in
> embl.pm, we add  push @{$params{'-species'}},$species ;   instead
> $params{'-species'} = $species ;
>
> # $Id: RichSeq.pm,v 1.9 2002/11/11 18:16:31 lapp Exp $
> #
> # BioPerl module for Bio::Seq::RichSeq
> #
> # Cared for by Ewan Birney <birney at ebi.ac.uk>
> #
> # Copyright Ewan Birney
> #
> # You may distribute this module under the same terms as perl itself
>
> # POD documentation - main docs before the code
>
> =head1 NAME
>
> Bio::Seq::RichSeq - Module implementing a sequence created from a rich
> sequence database entry
>
> =head1 SYNOPSIS
>
> See Bio::Seq::RichSeqI and documentation of methods.
>
> =head1 DESCRIPTION
>
> This module implements Bio::Seq::RichSeqI, an interface for sequences
> created from or created for entries from/of rich sequence databanks,
> like EMBL, GenBank, and SwissProt. Methods added to the Bio::SeqI
> interface therefore focus on databank-specific information. Note that
> not every rich databank format may use all of the properties provided.
>
> =head1 Implemented Interfaces
>
> This class implementes the following interfaces.
>
> =over 4
>
> =item Bio::Seq::RichSeqI
>
> Note that this includes implementing Bio::PrimarySeqI and Bio::SeqI.
>
> =item Bio::IdentifiableI
>
> =item Bio::DescribableI
>
> =item Bio::AnnotatableI
>
> =back
>
> =head1 FEEDBACK
>
> =head2 Mailing Lists
>
> User feedback is an integral part of the evolution of this
> and other Bioperl modules. Send your comments and suggestions preferably
>  to one of the Bioperl mailing lists.
> Your participation is much appreciated.
>
>   bioperl-l at bioperl.org                 - General discussion
>   http://bio.perl.org/MailList.html             - About the mailing lists
>
> =head2 Reporting Bugs
>
> Report bugs to the Bioperl bug tracking system to help us keep track
>  the bugs and their resolution.
>  Bug reports can be submitted via email or the web:
>
>   bioperl-bugs at bio.perl.org
>   http://bugzilla.bioperl.org/
>
> =head1 AUTHOR - Ewan Birney
>
> Email birney at ebi.ac.uk
>
> Describe contact details here
>
> =head1 APPENDIX
>
> The rest of the documentation details each of the object methods. Internal
> methods are usually preceded with a _
>
> =cut
>
>
> # Let the code begin...
>
>
> package Bio::Seq::RichSeq;
> use vars qw($AUTOLOAD @ISA);
> use strict;
>
> # Object preamble - inherits from Bio::Root::Object
>
> use Bio::Seq;
> use Bio::Seq::RichSeqI;
> use Data::Denter;
>
> @ISA = qw(Bio::Seq Bio::Seq::RichSeqI);
>
>
> =head2 new
>
>  Title   : new
>  Usage   : $seq    = Bio::Seq::RichSeq->new( -seq => 'ATGGGGGTGGTGGTACCCT',
>                                              -id  => 'human_id',
>                                      -accession_number => 'AL000012',
>                                     );
>
>  Function: Returns a new seq object from
>            basic constructors, being a string for the sequence
>            and strings for id and accession_number
>  Returns : a new Bio::Seq::RichSeq object
>
> =cut
>
> sub new {
>     # standard new call..
>     my($caller, at args) = @_;
>     my $self = $caller->SUPER::new(@args);
>
>     $self->{'_dates'} = [];
>     $self->{'_secondary_accession'} = [];
>     $self->{'_species'} = [];
>
>     my ($dates, $xtra, $sv,
>       $keywords, $pid, $mol,
>       $division,$species ) = $self->_rearrange([qw(DATES
>                                  SECONDARY_ACCESSIONS
>                                  SEQ_VERSION
>                                  KEYWORDS
>                                  PID
>                                  MOLECULE
>                                  DIVISION
>                                  SPECIES
>                                  )],
>                            @args);
>     defined $division && $self->division($division);
>     defined $mol && $self->molecule($mol);
>     defined $keywords && $self->keywords($keywords);
>     defined $sv && $self->seq_version($sv);
>     defined $pid && $self->pid($pid);
>     #defined $pid && $self->species($pid);
>
>     if( defined $dates ) {
>       if( ref($dates) =~ /array/i ) {
>           foreach ( @$dates) {
>             $self->add_date($_);
>           }
>       } else {
>           $self->add_date($dates);
>       }
>     }
>
>     if( defined $species ) {
>       if( ref($species) =~ /array/i ) {
>           foreach ( @$species) {
>             $self->add_species($_);
>           }
>       } else {
>           $self->add_species($species);
>       }
>     }
>
>
>     if( defined $xtra ) {
>       if( ref($xtra) =~ /array/i ) {
>           foreach ( @$xtra) {
>             $self->add_secondary_accession($_);
>           }
>       } else {
>           $self->add_secondary_accession($xtra);
>       }
>     }
>
>     return $self;
> }
>
>
> =head2 division
>
>  Title   : division
>  Usage   : $obj->division($newval)
>  Function:
>  Returns : value of division
>  Args    : newvalue (optional)
>
>
> =cut
>
> sub division {
>    my $obj = shift;
>    if( @_ ) {
>       my $value = shift;
>       $obj->{'_division'} = $value;
>     }
>     return $obj->{'_division'};
>
> }
>
> =head2 molecule
>
>  Title   : molecule
>  Usage   : $obj->molecule($newval)
>  Function:
>  Returns : type of molecule (DNA, mRNA)
>  Args    : newvalue (optional)
>
>
> =cut
>
> sub molecule {
>    my $obj = shift;
>    if( @_ ) {
>       my $value = shift;
>       $obj->{'_molecule'} = $value;
>     }
>     return $obj->{'_molecule'};
>
> }
>
>
> =head2 add_species
>
>  Title   : add_species
>  Usage   : $self->add_species($species)
>  Function: adds a species
>  Example :
>  Returns :  an array of such strings
>  Args    :
>
>
> =cut
>
> sub add_species {
>    my ($self, at species) = @_;
>    foreach my $dt ( @species ) {
>        push(@{$self->{'_species'}},$dt);
>    }
> }
>
> =head2 get_species
>
>  Title   : get_species
>  Usage   :
>  Function:
>  Example :
>  Returns : an array of strings
>  Args    :
>
>
> =cut
>
> sub get_species{
>    my ($self) = @_;
>    return @{$self->{'_species'}};
> }
>
>
> =head2 add_date
>
>  Title   : add_date
>  Usage   : $self->add_date($datestr)
>  Function: adds a date
>  Example :
>  Returns : a date string or an array of such strings
>  Args    :
>
>
> =cut
>
>
>
> sub add_date {
>    my ($self, at dates) = @_;
>    foreach my $dt ( @dates ) {
>        push(@{$self->{'_dates'}},$dt);
>    }
> }
>
> =head2 get_dates
>
>  Title   : get_dates
>  Usage   :
>  Function:
>  Example :
>  Returns : an array of date strings
>  Args    :
>
>
> =cut
>
> sub get_dates{
>    my ($self) = @_;
>    return @{$self->{'_dates'}};
> }
>
>
> =head2 pid
>
>  Title   : pid
>  Usage   :
>  Function: Get (and set, depending on the implementation) the PID property
>            for the sequence.
>  Example :
>  Returns : a string
>  Args    :
>
>
> =cut
>
> sub pid {
>     my ($self,$pid) = @_;
>
>     if(defined($pid)) {
>       $self->{'_pid'} = $pid;
>     }
>     return $self->{'_pid'};
> }
>
>
> =head2 accession
>
>  Title   : accession
>  Usage   : $obj->accession($newval)
>  Function: Whilst the underlying sequence object does not
>            have an accession, so we need one here.
>
>            In this implementation this is merely a synonym for
>            accession_number().
>  Example :
>  Returns : value of accession
>  Args    : newvalue (optional)
>
>
> =cut
>
> sub accession {
>    my ($obj, at args) = @_;
>    return $obj->accession_number(@args);
> }
>
> =head2 add_secondary_accession
>
>  Title   : add_secondary_accession
>  Usage   : $self->add_domment($ref)
>  Function: adds a secondary_accession
>  Example :
>  Returns :
>  Args    : a string or an array of strings
>
>
> =cut
>
> sub add_secondary_accession {
>    my ($self) = shift;
>    foreach my $dt ( @_ ) {
>        push(@{$self->{'_secondary_accession'}},$dt);
>    }
> }
>
> =head2 get_secondary_accessions
>
>  Title   : get_secondary_accessions
>  Usage   :
>  Function:
>  Example :
>  Returns : An array of strings
>  Args    :
>
>
> =cut
>
> sub get_secondary_accessions{
>    my ($self, at args) = @_;
>    return @{$self->{'_secondary_accession'}};
> }
>
> =head2 seq_version
>
>  Title   : seq_version
>  Usage   : $obj->seq_version($newval)
>  Function:
>  Example :
>  Returns : value of seq_version
>  Args    : newvalue (optional)
>
>
> =cut
>
> sub seq_version{
>    my ($obj,$value) = @_;
>    if( defined $value) {
>       $obj->{'_seq_version'} = $value;
>     }
>     return $obj->{'_seq_version'};
>
> }
>
>
> =head2 keywords
>
>  Title   : keywords
>  Usage   : $obj->keywords($newval)
>  Function:
>  Returns : value of keywords (a string)
>  Args    : newvalue (optional) (a string)
>
>
> =cut
>
> sub keywords {
>    my $obj = shift;
>    if( @_ ) {
>       my $value = shift;
>       $obj->{'_keywords'} = $value;
>     }
>     return $obj->{'_keywords'};
>
> }
>
> #
> ##
> ### Deprecated methods kept for ease of transtion
> ##
> #
>
> sub each_date {
>    my ($self) = @_;
>    $self->warn("Deprecated method... please use get_dates");
>    return $self->get_dates;
> }
>
>
> sub each_secondary_accession {
>    my ($self) = @_;
>    $self->warn("each_secondary_accession - deprecated method. use
> get_secondary_accessions");
>    return $self->get_secondary_accessions;
>
> }
>
> sub sv {
>    my ($obj,$value) = @_;
>    $obj->warn("sv - deprecated method. use seq_version");
>    $obj->seq_version($value);
> }
>
>
> 1;
>
>
>
>
> Best regards
>
> Laure Durufle
>
>
>
>
> ***************************************************************************
>***************** S - This message contains confidential information and is
> intended only for the individual named. If you are not the named addressee,
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately by e-mail if you have received this e-mail by
> mistake and delete this e-mail from your system.
> e-mail transmission cannot be guaranteed to be secure or error-free as
> information could be intercepted, corrupted, lost, destroyed, arrive late
> or incomplete, or contain malware. The presence of this disclaimer is not a
> proof that it was originated at Serono International S.A. or one of its
> affiliates. Serono International S.A and its affiliates therefore do not
> accept liability for any errors or omissions in the content of this
> message, which arise as a result of e-mail transmission. If verification is
> required, please request a hard-copy version. Serono International SA,
> 15bis Chemin Des Mines, Geneva, Switzerland, www.serono.com.
> ***************************************************************************
>******************
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list