[Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines
Mark A. Miller
mamillerpa at yahoo.com
Tue May 2 11:41:01 UTC 2006
Hello all.
I have a recently donwloaded UniProt/TrEMBL flat file. I am trying to
make FASTA subset files for some bacterial strains. I haven't been
able to parse out the strain information from the OS or RC lines.
These lines typically look like:
OS Somegenus somespecies subsp. somesubspecies strain ABC123.
RC STRAIN=ABC123.
I'm not especiialy good with Perl, and I'm definitely weak when it
comes to OOP.
I have included some code I pasted together from various pages on the
bioperl wiki. In addition to the wiki, I have been making use of
www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
The code I have so far reports the species but not the subspecies or
variant. I have also tried to walk through all of the feature,
annotation and reference objects but I still can't seem to parse out
the information I need. (For brevity, the example I'm including below
only lists the code I used for the annotation objects.) Also, this
code only prints the information... I know that I'll have to write a
FASTA sequence object seperately.
Any suggestions?
Thanks,
Mark
--- --- ---
#!/usr/bin/perl
use Bio::SeqIO;
my $usage = "getaccs.pl file format\n";
my $file = shift or die $usage;
my $format = shift or die $usage;
my $inseq = Bio::SeqIO->new(-file => "<$file",
-format => $format );
while (my $seq = $inseq->next_seq) {
my $species_object = $seq->species;
my $species_string = $species_object->species;
my $variant_string = $species_object->variant;
my $common_string = $species_object->common_name;
my $sub_string = $species_object->sub_species;
my $binomial = $species_object->binomial('FULL');
print "display ",$seq->display_id,"\n";
print "accession ",$seq->accession_number,"\n";
print "desc ",$seq->desc,"\n";
print "species ",$species_string,"\n";
print "variant ",$variant_string,"\n";
print "common ",$common_string,"\n";
print "sub ",$sub_string,"\n";
print "binomial ",$binomial,"\n";
print $seq->seq,"\n";
my $anno_collection = $seq->annotation;
for my $key ( $anno_collection->get_all_annotation_keys ) {
my @annotations = $anno_collection->get_Annotations($key);
for my $value ( @annotations ) {
print "tagname : ", $value->tagname, "\n";
# $value is an Bio::Annotation, and has an "as_text" method
print " annotation value: ", $value->as_text, "\n";
if ($value->tagname eq "reference") {
my $hash_ref = $value->hash_tree;
for my $key (keys %{$hash_ref}) {
print $key,": ",$hash_ref->{$key},"\n";
}
}
}
}
print "\n";
}
exit;
--- --- --- --- --- --- --- ---
Mark A. Miller
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the Bioperl-l
mailing list