[Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines

Brian Osborne osborne1 at optonline.net
Wed May 3 15:22:27 UTC 2006


Mark,

So you're trying to get the information in the RC line from a Swissprot
format file?

Brian O.


On 5/2/06 7:41 AM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Hello all.
> 
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
> 
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
> 
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
> 
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
> 
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
> 
> Any suggestions?
> 
> Thanks,
> Mark
> 
> ---   ---   ---
> 
> 
> #!/usr/bin/perl
> 
> 
> 
> use Bio::SeqIO;
> 
> 
> 
> my $usage = "getaccs.pl file format\n";
> 
> my $file = shift or die $usage;
> 
> my $format = shift or die $usage;
> 
> 
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
> 
>    -format => $format );
> 
> 
> 
> while (my $seq = $inseq->next_seq) {
> 
> 
> 
>   my $species_object = $seq->species;
> 
>   my $species_string = $species_object->species;
> 
>   my $variant_string = $species_object->variant;
> 
>   my $common_string = $species_object->common_name;
> 
>   my $sub_string = $species_object->sub_species;
> 
>   my $binomial = $species_object->binomial('FULL');
> 
>   
> 
>   print "display   ",$seq->display_id,"\n";
> 
>   print "accession ",$seq->accession_number,"\n";
> 
>   print "desc      ",$seq->desc,"\n";
> 
>   
> 
>   print "species   ",$species_string,"\n";
> 
>   print "variant   ",$variant_string,"\n";
> 
>   print "common    ",$common_string,"\n";
> 
>   print "sub       ",$sub_string,"\n";
> 
>   print "binomial  ",$binomial,"\n";
> 
>   
> 
>   print $seq->seq,"\n";
> 
>   
> 
>   my $anno_collection = $seq->annotation;
> 
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
> 
>     my @annotations = $anno_collection->get_Annotations($key);
> 
>     for my $value ( @annotations ) {
> 
>       print "tagname : ", $value->tagname, "\n";
> 
>       # $value is an Bio::Annotation, and has an "as_text" method
> 
>       print "  annotation value: ", $value->as_text, "\n";
> 
> 
> 
>        if ($value->tagname eq "reference") {
> 
>         my $hash_ref = $value->hash_tree;
> 
>         for my $key (keys %{$hash_ref}) {
> 
>           print $key,": ",$hash_ref->{$key},"\n";
> 
>         }
> 
>       }
> 
>     }
> 
>   }
> 
>   print "\n";
> 
> }
> 
> exit;
> 
> 
> 
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list