[Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines
Jason Stajich
jason.stajich at duke.edu
Tue May 2 18:36:08 UTC 2006
This is really a limitation of the EMBL/GenBank format
See this thread:
http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html
or on GMANE
http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557
I don't know if any of this has been resolved really so hopefully
James will speak up if he's implemented anything.
-jason
On May 2, 2006, at 7:41 AM, Mark A. Miller wrote:
> Hello all.
>
> I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to
> make FASTA subset files for some bacterial strains. I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
>
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
>
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
>
> I have included some code I pasted together from various pages on the
> bioperl wiki. In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
>
> The code I have so far reports the species but not the subspecies or
> variant. I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need. (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.) Also, this
> code only prints the information... I know that I'll have to write a
> FASTA sequence object seperately.
>
> Any suggestions?
>
> Thanks,
> Mark
>
> --- --- ---
>
>
> #!/usr/bin/perl
>
>
>
> use Bio::SeqIO;
>
>
>
> my $usage = "getaccs.pl file format\n";
>
> my $file = shift or die $usage;
>
> my $format = shift or die $usage;
>
>
>
> my $inseq = Bio::SeqIO->new(-file => "<$file",
>
> -format => $format );
>
>
>
> while (my $seq = $inseq->next_seq) {
>
>
>
> my $species_object = $seq->species;
>
> my $species_string = $species_object->species;
>
> my $variant_string = $species_object->variant;
>
> my $common_string = $species_object->common_name;
>
> my $sub_string = $species_object->sub_species;
>
> my $binomial = $species_object->binomial('FULL');
>
>
>
> print "display ",$seq->display_id,"\n";
>
> print "accession ",$seq->accession_number,"\n";
>
> print "desc ",$seq->desc,"\n";
>
>
>
> print "species ",$species_string,"\n";
>
> print "variant ",$variant_string,"\n";
>
> print "common ",$common_string,"\n";
>
> print "sub ",$sub_string,"\n";
>
> print "binomial ",$binomial,"\n";
>
>
>
> print $seq->seq,"\n";
>
>
>
> my $anno_collection = $seq->annotation;
>
> for my $key ( $anno_collection->get_all_annotation_keys ) {
>
> my @annotations = $anno_collection->get_Annotations($key);
>
> for my $value ( @annotations ) {
>
> print "tagname : ", $value->tagname, "\n";
>
> # $value is an Bio::Annotation, and has an "as_text" method
>
> print " annotation value: ", $value->as_text, "\n";
>
>
>
> if ($value->tagname eq "reference") {
>
> my $hash_ref = $value->hash_tree;
>
> for my $key (keys %{$hash_ref}) {
>
> print $key,": ",$hash_ref->{$key},"\n";
>
> }
>
> }
>
> }
>
> }
>
> print "\n";
>
> }
>
> exit;
>
>
>
>
>
> --- --- --- --- --- --- --- ---
>
> Mark A. Miller
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list