[BioSQL-l] Problem with Bio::DB::BioSQL::PrimarySeqAdapter

Hilmar Lapp hlapp at gnf.org
Thu May 26 19:20:38 EDT 2005


Doesn't look immediately obvious what's going on but one suspicion I 
have is that the sequence retrieval optimization is playing a role 
here. The sequence of a db-retrieved entry is actually lazy-loaded, 
i.e., only on demand. Theoretically, though, truncating or revcom'ing 
the sequence should provide for the demand ...

Can you try in your test script to print out $pseq before you truncate 
and revcom it? I.e.,

	my $pseq=$objadap->find_by_query($query)->next_object;
	print "\$pseq isa Bio::PrimarySeq\n" if $pseq->isa('Bio::PrimarySeq');
	print $out $pseq;
	my $ptrunc=$pseq->trunc(100,120);
	my $prc=$pseq->revcom;
	print $out $ptrunc, $prc;

Does this yield a different result?

	-hilmar

On May 26, 2005, at 9:26 AM, Roy Chaudhuri wrote:

> Hi.
>
> [Wasn't sure which list to post to, apologies if this is more
> appropriate for the BioPerl list]
>
> I'm having problems using the PrimarySeqAdapter to get a 
> Bio::PrimarySeq
>  object from a BioSQL database. The object appears to work okay, and
> will print out fine using SeqIO, but if I trunc() or revcom() the
> sequence information disappears. I can work around this by using the
> Bio::SeqI adapter instead of Bio::PrimarySeqI, but this is slow as I'm
> working with whole bacterial genome GenBank entries with lots of
> features. The problem isn't with PrimarySeq objects generally, as if I
> define one from scratch it will trunc and revcom correctly.
>
> Here's a test script that demonstrates the problem:
> #!/usr/bin/perl
> use warnings;
> use strict;
> use Bio::PrimarySeq;
> use Bio::SeqIO;
> use Bio::DB::Query::BioQuery;
> use Bio::DB::BioDB;
> my $out=Bio::SeqIO->newFh(-format=>'fasta');
> my $tinyseq=Bio::PrimarySeq->new(-seq=>'ATGATGATGATGATG',
>                                  -display_id=>'test');
> my $tinytrunc=$tinyseq->trunc(2,5);
> my $tinyrc=$tinyseq->revcom;
> print "\$tinyseq isa Bio::PrimarySeq\n" if 
> $tinyseq->isa('Bio::PrimarySeq');
> print $out $tinyseq, $tinytrunc, $tinyrc;
>
> my $dbadap= Bio::DB::BioDB->new(-database => 'biosql',
>                                 -dbname => 'biosql',
>                                 -user => 'username',
>                                 -pass => 'password',
>                                 -driver => 'mysql');
> my $query = Bio::DB::Query::BioQuery->new(-datacollections =>
> ["Bio::PrimarySeqI entry"],
>                                           -where =>
> ["entry.accession_number='AE003850'"]
>                                          );
>
> my $objadap = $dbadap->get_object_adaptor('Bio::PrimarySeqI');
> my $pseq=$objadap->find_by_query($query)->next_object;
> print "\$pseq isa Bio::PrimarySeq\n" if $pseq->isa('Bio::PrimarySeq');
> my $ptrunc=$pseq->trunc(100,120);
> my $prc=$pseq->revcom;
> print $out $pseq, $ptrunc, $prc;
>
> $objadap = $dbadap->get_object_adaptor('Bio::SeqI');
> my $seq=$objadap->find_by_query($query)->next_object;
> print "\$seq isa Bio::Seq\n" if $seq->isa('Bio::Seq');
> my $trunc=$seq->trunc(100,120);
> my $rc=$seq->revcom;
> print $out $seq, $trunc, $rc;
>
>
>
> This gives the following output:
> $tinyseq isa Bio::PrimarySeq
>> test
> ATGATGATGATGATG
>> test
> TGAT
>> test
> CATCATCATCATCAT
> $pseq isa Bio::PrimarySeq
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
> GGTACCCCCCACACCCCCCTACTCGCTCGTAACTGAGTACCCACGACCGGCTAGGTTCGC
> GCAAAAGGCCAACATGACCTCTAGGGGAACCCACTCCATGAAGCCAATGGCACGAGAACG
> GGAGGTATCGCTACAGGTGAGCATCCTACGAGCACTACGGAGCCGATAACGATCACCCGA
> GCTGCGAGCGTCTGAGACGCGCCAGGAGCGCACCAAACGGCGATAAGCGAAATACCCCCC
> ATCACCACGCTCACGATGATCCTGTAGATCGATACGAAGGGCATCAGACACAGGCCAATA
> GCCACCCTTACCCCAAACACGGCCCGTAAGCCCTTTCCAGCCTTCAGGGAGATTCTCAGA
> ACAACGCTGGTAATGGCGCACGCCTCGGGCGGCGTGCTTGCTCACGTACTGAAACCATCC
> GACAACCCCATCAATAATCCGACCATGCTGCCCACGCAGACCAGCACCACAGGAGGACGC
> AACAGCCAACCACGCATCGACGCATAGAAGCACATCGTAAACAGTGCCAGAAAACCAGAT
> AGCACAATGCAAATGCGGGACACCTCGACGCTGCCACTCCGTCACCCAGTGAACCCTGAT
> CATACCAGCACGCCTCATGCGAGCTTCCCACGCACGCCTGATTTTCTGCCACTCCTGAGC
> AGTAGGAGGGCAATCACGAACGGTAAGGGTCAAAGCGAGACCAGCGCCCGTTAACTGATC
> CTCACGAACGGACATGAGGAACTCTGTATTGCGACGGACAGCCCCAGGAGACCACCCCTG
> AACCTCGCCTCGTGGCGTCCTGATATGTGATGAGTTCATGGGAGCAACACCACCTTTTCC
> CCCATGACGGTAAACTGTAATTACTGGCATCGGCCTCTCCGATAGCTGGTCACGACCCCG
> GGTGCTCGTAACACCGCGGGGTTATTTTTTTGCCGCATGCAGGAAGGAGGAAAAACCCCC
> AACCTTAACAAAACGTACAGATATGTAACCACTAATCAAGGGAGGATGGAAATCCCCCCC
> GTTTCGCACTCGCTTCGCTCGCTCAAAAGCGGGGGAGATTTCTATTCCCCAATGACAATT
> TGTCAAGCAATCACTTGACGTTAAATCCAAGGGGGTTGAACTGAATGTCATCCAATTGGA
> GACCACTGGAAACCTAGATTTCCACCCAGGGGACACAGGGCGTAAAAACGGTTATCCGTG
> AAATAGATCAGGGCTTCGTGTTGGGGGTCATTTGGCCCCCACATAACGGACCGAAGGAGA
> GGGCGTAAAAGCGCCTCCGCAGGGGN
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
>
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
>
> $seq isa Bio::Seq
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
> GGTACCCCCCACACCCCCCTACTCGCTCGTAACTGAGTACCCACGACCGGCTAGGTTCGC
> GCAAAAGGCCAACATGACCTCTAGGGGAACCCACTCCATGAAGCCAATGGCACGAGAACG
> GGAGGTATCGCTACAGGTGAGCATCCTACGAGCACTACGGAGCCGATAACGATCACCCGA
> GCTGCGAGCGTCTGAGACGCGCCAGGAGCGCACCAAACGGCGATAAGCGAAATACCCCCC
> ATCACCACGCTCACGATGATCCTGTAGATCGATACGAAGGGCATCAGACACAGGCCAATA
> GCCACCCTTACCCCAAACACGGCCCGTAAGCCCTTTCCAGCCTTCAGGGAGATTCTCAGA
> ACAACGCTGGTAATGGCGCACGCCTCGGGCGGCGTGCTTGCTCACGTACTGAAACCATCC
> GACAACCCCATCAATAATCCGACCATGCTGCCCACGCAGACCAGCACCACAGGAGGACGC
> AACAGCCAACCACGCATCGACGCATAGAAGCACATCGTAAACAGTGCCAGAAAACCAGAT
> AGCACAATGCAAATGCGGGACACCTCGACGCTGCCACTCCGTCACCCAGTGAACCCTGAT
> CATACCAGCACGCCTCATGCGAGCTTCCCACGCACGCCTGATTTTCTGCCACTCCTGAGC
> AGTAGGAGGGCAATCACGAACGGTAAGGGTCAAAGCGAGACCAGCGCCCGTTAACTGATC
> CTCACGAACGGACATGAGGAACTCTGTATTGCGACGGACAGCCCCAGGAGACCACCCCTG
> AACCTCGCCTCGTGGCGTCCTGATATGTGATGAGTTCATGGGAGCAACACCACCTTTTCC
> CCCATGACGGTAAACTGTAATTACTGGCATCGGCCTCTCCGATAGCTGGTCACGACCCCG
> GGTGCTCGTAACACCGCGGGGTTATTTTTTTGCCGCATGCAGGAAGGAGGAAAAACCCCC
> AACCTTAACAAAACGTACAGATATGTAACCACTAATCAAGGGAGGATGGAAATCCCCCCC
> GTTTCGCACTCGCTTCGCTCGCTCAAAAGCGGGGGAGATTTCTATTCCCCAATGACAATT
> TGTCAAGCAATCACTTGACGTTAAATCCAAGGGGGTTGAACTGAATGTCATCCAATTGGA
> GACCACTGGAAACCTAGATTTCCACCCAGGGGACACAGGGCGTAAAAACGGTTATCCGTG
> AAATAGATCAGGGCTTCGTGTTGGGGGTCATTTGGCCCCCACATAACGGACCGAAGGAGA
> GGGCGTAAAAGCGCCTCCGCAGGGGN
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
> GAAGCCAATGGCACGAGAACG
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
> NCCCCTGCGGAGGCGCTTTTACGCCCTCTCCTTCGGTCCGTTATGTGGGGGCCAAATGAC
> CCCCAACACGAAGCCCTGATCTATTTCACGGATAACCGTTTTTACGCCCTGTGTCCCCTG
> GGTGGAAATCTAGGTTTCCAGTGGTCTCCAATTGGATGACATTCAGTTCAACCCCCTTGG
> ATTTAACGTCAAGTGATTGCTTGACAAATTGTCATTGGGGAATAGAAATCTCCCCCGCTT
> TTGAGCGAGCGAAGCGAGTGCGAAACGGGGGGGATTTCCATCCTCCCTTGATTAGTGGTT
> ACATATCTGTACGTTTTGTTAAGGTTGGGGGTTTTTCCTCCTTCCTGCATGCGGCAAAAA
> AATAACCCCGCGGTGTTACGAGCACCCGGGGTCGTGACCAGCTATCGGAGAGGCCGATGC
> CAGTAATTACAGTTTACCGTCATGGGGGAAAAGGTGGTGTTGCTCCCATGAACTCATCAC
> ATATCAGGACGCCACGAGGCGAGGTTCAGGGGTGGTCTCCTGGGGCTGTCCGTCGCAATA
> CAGAGTTCCTCATGTCCGTTCGTGAGGATCAGTTAACGGGCGCTGGTCTCGCTTTGACCC
> TTACCGTTCGTGATTGCCCTCCTACTGCTCAGGAGTGGCAGAAAATCAGGCGTGCGTGGG
> AAGCTCGCATGAGGCGTGCTGGTATGATCAGGGTTCACTGGGTGACGGAGTGGCAGCGTC
> GAGGTGTCCCGCATTTGCATTGTGCTATCTGGTTTTCTGGCACTGTTTACGATGTGCTTC
> TATGCGTCGATGCGTGGTTGGCTGTTGCGTCCTCCTGTGGTGCTGGTCTGCGTGGGCAGC
> ATGGTCGGATTATTGATGGGGTTGTCGGATGGTTTCAGTACGTGAGCAAGCACGCCGCCC
> GAGGCGTGCGCCATTACCAGCGTTGTTCTGAGAATCTCCCTGAAGGCTGGAAAGGGCTTA
> CGGGCCGTGTTTGGGGTAAGGGTGGCTATTGGCCTGTGTCTGATGCCCTTCGTATCGATC
> TACAGGATCATCGTGAGCGTGGTGATGGGGGGTATTTCGCTTATCGCCGTTTGGTGCGCT
> CCTGGCGCGTCTCAGACGCTCGCAGCTCGGGTGATCGTTATCGGCTCCGTAGTGCTCGTA
> GGATGCTCACCTGTAGCGATACCTCCCGTTCTCGTGCCATTGGCTTCATGGAGTGGGTTC
> CCCTAGAGGTCATGTTGGCCTTTTGCGCGAACCTAGCCGGTCGTGGGTACTCAGTTACGA
> GCGAGTAGGGGGGTGTGGGGGGTACC
>
> Any idea what's going on?
> Thanks.
> Roy.
>
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, UK
>
> http://colibase.bham.ac.uk
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list