[Bioperl-l] Bio::DB::SeqFeature sequences with no identifier?

Wed May 14 09:22:23 UTC 2014

Hi all BioPerlers!

I'm confused by something.  In the scenario below I have a Fasta file 
and a GFF file:

=========
File:  a.fas

 >SEQ1
AAAATTTTCCCCGGGG

=========
File:  b.gff

SEQ1    hit1    match_part    1    5    .    .    .    .
SEQ1    hit2    match_part    6    10    .    .    .    .
=========

I load them into a seqfeature DB:

bp_seqfeature_load.pl -d dbi:mysql:seqdb -c -u root -p pass  a.fas b.gff

I then explore the data as follows:

use Bio::DB::SeqFeature::Store;

my $db = Bio::DB::SeqFeature::Store->new(
     -adaptor => 'DBI::mysql',
     -dsn     => 'dbi:mysql:seqdb',
     -user => 'root',
     -password => 'pass');

my $iterator = $db->get_seq_stream();
while (my $feature = $iterator->next_seq){
     print $feature->seq->seq;
     # THE SEQUENCE IS PRINTED
     print " comes from sequence named ";
     print $feature->seq->id;
     #  THE METHOD ABOVE RETURNS UNDEF
}

my $seq = $db->segment('SEQ1');
      # $seq is undef, NOTHING IS RETURNED!?!?

============

This is all very confusing.  It seems that the feature knows what 
sequence it is attached to, because it gives me the correct string of 
letters, but it doesn't know what the name of that sequence is... and in 
fact, calling the sequence by name returns undef.

Is this a bug, or is there a reason for this "disconnect" between a 
sequence and its name?

Help appreciated!

Mark