[Bioperl-l] Bio::Seq::RichSeq error

Heikki Lehvaslaiho heikki at nildram.co.uk
Thu Dec 11 05:45:33 EST 2003


Fang,

It is not a bug but a feature. In EMBL, GenBank and Swiss-Prot parsers you'll 
find these lines:

    # Don't make a species object if it's empty or "Unknown" or "None"
    return unless $genus and  $genus !~ /^(Unknown|None)$/oi;

There are 58 entries with Unknown as the first word in the OS line in the 
current EMBL databank. I would not be too difficult to modify the parsers to 
include these, but would it be useful and how to do it?

binomial() should return a valid scientific name, so we should not use 
species, I guess. Higher taxa might be of some use. We already have one 
exception writen in Viri, but these unknown species are even fuzzier.

You someone  can come up with a plan, I am happy to code it in.

	-Heikki


On Thursday 11 Dec 2003 6:59 am, Magic Fang wrote:
> test file:
> LOCUS       AY007677                1433 bp    DNA     linear   BCT
> 29-OCT-2001 DEFINITION  Unknown marine alpha proteobacterium JP66.1 16S
> ribosomal RNA, partial sequence.
> ACCESSION   AY007677
> VERSION     AY007677.1  GI:12000363
> KEYWORDS    .
> SOURCE      unknown marine alpha proteobacterium JP66.1
>   ORGANISM  unknown marine alpha proteobacterium JP66.1
>             Bacteria; Proteobacteria; Alphaproteobacteria.
> REFERENCE   1  (bases 1 to 1433)
>   AUTHORS   Eilers,H., Pernthaler,J., Peplies,J., Glockner,F.O., Gerdts,G.
> and Amann,R.
>   TITLE     Isolation of novel pelagic bacteria from the German bight and
> their seasonal contributions to surface picoplankton
>   JOURNAL   Appl. Environ. Microbiol. 67 (11), 5134-5142 (2001)
>   MEDLINE   21536174
>    PUBMED   11679337
> REFERENCE   2  (bases 1 to 1433)
>   AUTHORS   Eilers,H., Pernthaler,J., Peplies,J., Gloeckner,F.O.,
> Gerdts,G., Schuett,C. and Amann,R.
>   TITLE     Identification and seasonal dominance of culturable marine
> bacteria JOURNAL   Unpublished
> REFERENCE   3  (bases 1 to 1433)
>   AUTHORS   Eilers,H., Pernthaler,J., Peplies,J., Gloeckner,F.O.,
> Gerdts,G., Schuett,C. and Amann,R.
>   TITLE     Direct Submission
>   JOURNAL   Submitted (30-AUG-2000) Molecular Ecology,
> Max-Planck-Institute, Celsiusstrasse 1, Bremen 28359, Germany
> FEATURES             Location/Qualifiers
>      source          1..1433
>                      /organism="unknown marine alpha proteobacterium
> JP66.1" /mol_type="genomic DNA"
>                      /db_xref="taxon:145652"
>      rRNA            <1..>1433
>                      /product="16S ribosomal RNA"
> ORIGIN
>         1 tcatggctca gaacgaacgc tggcggcagg cttaacacat gcaagtcgaa cgatctcttc
>        61 ggagatagtg gcagacgggt gagtaacgcg tgggaaccta ccttattcta cggaataaca
>       121 gttagaaatg actgctaata ccgtatacgc ccttcggggg aaagatttat cggagtagga
>       181 tgggcccgcg ttggattagc tagttggtgg ggtaatggcc taccaaggcg acgatctata
>       241 gctggtctga gaggatgatc agccacactg gaactgagac acggtccaga ctcctacggg
>       301 aggcagcagt ggggaatatt ggacaatggg cgcaagcctg atccagccat gccgcctgag
>       361 tgatgaaggc cttagggttg taaagctctt tcaacggtga agataatgac ggtaaccgta
>       421 gaagaagccc cggctaactt cgtgccagca gccgcggtaa tacgaagggg gctagcgttg
>       481 ttcggaatta ctgggcgtaa agcgtacgta ggcggattag aaagttaggg gtgaaatccc
>       541 agggctcaac cctggaactg cctctaaaac tcctaatctt gagttcgaga gaggtgagtg
>       601 gaattccgag tgtagaggtg aaattcgtag atattcggag gaacaccagt ggcgaaggcg
>       661 gctcactggc tcgatactga cgctgaggta cgaaagcgtg gggagcaaac aggattagat
>       721 accctggtag tccacgccgt aaacgatgaa tgttagccgt cgggcagtat actgttcggt
>       781 ggcgcagcta acgcattaaa cattccgcct ggggagtacg gtcgcaagat taaaactcaa
>       841 aggaattgac gggggcccgc acaagcggtg gagcatgtgg tttaattcga agcaacgcgc
>       901 agaaccttac cagcccttga cataccaatc gcggttagtg gagacacttt ccttcagttc
>       961 ggctggattg gatacaggtg ctgcatggct gtcgtcagct cgtgtcgtga gatgttgggt
>      1021 taagtcccgc aacgagcgca accctcgcct ttagttgcca gcatttagtt gggcactcta
>      1081 gagggactgc cggtgataag ccggaggaag gtggggatga cgtcaagtcc tcatggccct
>      1141 tacgggctgg gctacacacg tgctacaatg gtggtgacag tgggcagcga gacggcaacg
>      1201 tcgagctaat ctccaaaaac catctcartt cggattgggg tctgcaactc gacccccatg
>      1261 aagttggaat cgctagtaat cgcggatcag catgccgcgg tgaatacgtt cccgggcctt
>      1321 gtacacaccg cccgtcacac catgggagtt ggtcttaccc gaaggcgatg cgctaaccag
>      1381 caatggaggc agtcgaccac ggtagggtca gcgactgggg tgaagtcgta aca
> //
>
> test command:
> $ perl -e 'use Bio::SeqIO;$in=Bio::SeqIO->new(-file=>"test.gbk",
> -format=>"genbank");$seq=$in->next_seq;print $seq->dis play_id, "\t",
> $seq->species->species, "\n";'
>
> error message:
> Can't call method "species" on an undefined value at -e line 1, <GEN0> line
> 61.
>
> bioperl version 1.3.01
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list