[Bioperl-l] Problem with load_seqdatabase -> Redhat9 problem

avilella avilella at lycos.es
Fri Aug 29 16:49:51 EDT 2003


Hi,

I finally came up with the cause for the strange swissprot parsing
problem that I was having (on a Redhat9), and that it wasn't
reproducible on a different (Mandrake9.1) linux box:

It's due to the Redhat9 bad UTF-8 handling:

Michael G Schwern says:

RedHat 9 shipped with a prerelease version of Perl 5.8.1 with broken
UTF-8 handling.  If you set your LANG environment variable to something
which is not UTF8 (de_DE should work, or C) things should start working
again.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=87682

I solved the problem setting LANG to, for example:

export LANG=en_US

and the swissprot problem disappeared...

I hope it helps somebody,

Best regards,

Albert Vilella

P.D.: this problem affects other perl module installations, so be aware
of that...


On Fri, 2003-06-06 at 01:48, Hilmar Lapp wrote:
> Strange. Why should there be a difference I'm wondering, since they 
> both use the same module for parsing. I've downloaded sprot41 and 
> investigate as soon as I get to it. I think there was a similar report 
> not long ago that was then resolved somehow.
> 
> 	-hilmar
> 
> On Wednesday, June 4, 2003, at 07:53  AM, albert vilella wrote:
> 
> > Hi,
> >
> > I've been trying to load a swissprot dataset into a biosql database
> > using load_seqdatabase.pl, but I get an error:
> >
> > ./load_seqdatabase.pl -host localhost -dbname biosql -dbuser root
> > -dbpass '*******' -namespace bioperl -format swiss
> > /data/database/sprot41.dat
> >
> > ------------- EXCEPTION  -------------
> > MSG: swissprot stream with no ID. Not swissprot in my book
> > STACK Bio::SeqIO::swiss::next_seq
> > /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/swiss.pm:180
> > STACK toplevel ./load_seqdatabase.pl:386
> >
> > --------------------------------------
> >
> > Apparently, the next_seq subrutine gets stucked in the first entry 
> > while
> > parsing the ID field:
> >
> > swiss.pm
> > ----------------------------------------------------------------------
> >
> > $line =~ /^ID\s+([^\s_]+)(_([^\s_]+))?\s+([^\s;]+);\s+([^\s;]+);/ ||
> > $self->throw("swissprot stream with no ID. Not swissprot in my book");
> >
> > ----------------------------------------------------------------------
> >
> > This is strange because I can read the same entry in the same file 
> > with:
> >
> > #! /usr/bin/perl -w
> >
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Seq;
> >
> > my $file = shift @ARGV;
> > my $in = Bio::SeqIO->new ( -file => $file,
> > 			   -format => 'swiss');
> > my $seq = $in->next_seq();
> > print "Seq: ", $seq->accession_number(), " -- ", $seq->desc(), "\n\n";
> >
> > Anybody experiencing similar problems? Any guess of what is happening?
> >
> > Thanks in advance,
> >
> > Albert Vilella
> > Molecular Evolution - Dept. Genetics
> > Universitat de Barcelona
> > <signature.asc>_______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list