[Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines

Mark A. Miller mamillerpa at yahoo.com
Tue May 2 11:41:01 UTC 2006


Hello all.

I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
make FASTA subset files for some bacterial strains.  I haven't been
able to parse out the strain information from the OS or RC lines. 
These lines typically look like:

OS Somegenus somespecies subsp. somesubspecies strain ABC123.
RC STRAIN=ABC123.

I'm not especiialy good with Perl, and I'm definitely weak when it
comes to OOP.

I have included some code I pasted together from various pages on the
bioperl wiki.  In addition to the wiki, I have been making use of 
www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html

The code I have so far reports the species but not the subspecies or
variant.  I have also tried to walk through all of the feature,
annotation and reference objects but I still can't seem to parse out
the information I need.  (For brevity, the example I'm including below
only lists the code I used for the annotation objects.)  Also, this
code only prints the information...  I know that I'll have to write a
FASTA sequence object seperately.

Any suggestions?

Thanks,
Mark

---   ---   ---


#!/usr/bin/perl



use Bio::SeqIO;



my $usage = "getaccs.pl file format\n";

my $file = shift or die $usage;

my $format = shift or die $usage;



my $inseq = Bio::SeqIO->new(-file   => "<$file",

   -format => $format );



while (my $seq = $inseq->next_seq) {



  my $species_object = $seq->species;

  my $species_string = $species_object->species;

  my $variant_string = $species_object->variant;

  my $common_string = $species_object->common_name;

  my $sub_string = $species_object->sub_species;

  my $binomial = $species_object->binomial('FULL');

  

  print "display   ",$seq->display_id,"\n";

  print "accession ",$seq->accession_number,"\n";

  print "desc      ",$seq->desc,"\n";

  

  print "species   ",$species_string,"\n";

  print "variant   ",$variant_string,"\n";

  print "common    ",$common_string,"\n";

  print "sub       ",$sub_string,"\n";

  print "binomial  ",$binomial,"\n";

  

  print $seq->seq,"\n";

  

  my $anno_collection = $seq->annotation;

  for my $key ( $anno_collection->get_all_annotation_keys ) {

    my @annotations = $anno_collection->get_Annotations($key);

    for my $value ( @annotations ) {

      print "tagname : ", $value->tagname, "\n";

      # $value is an Bio::Annotation, and has an "as_text" method

      print "  annotation value: ", $value->as_text, "\n";



       if ($value->tagname eq "reference") {

        my $hash_ref = $value->hash_tree;

        for my $key (keys %{$hash_ref}) {

          print $key,": ",$hash_ref->{$key},"\n";

        }

      }

    }

  }

  print "\n";

}

exit;





---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the Bioperl-l mailing list