[Bioperl-l] Trying to get a mysql DB from genbank flat files

Raphael LaFrance rafe@scinq.org
Thu, 15 Nov 2001 13:42:10 -0500


Sorry for delay in replying... meetings all morning.

Yup, crazy chars abound in the Genbank data. That quote() function is
sorely missing in my program & a much better solution than my 1st try,
which was rather brute force-ish.  I also like to strip mutil-whitespace
chars down to a single space char. 

Oh yea, you probably already knew this but... on quoted data like in the
feature's qualifiers you cannot depend on the "/" being a reliable
delimiter you have to look for the closing quote via a strip of ending
white space & a check of the last char found, well last two if you think
there might be a \" sequence in the data itself, which I didn't find in
the 60 or so files I stripped but you never know.

Maybe new bugs will be by using the substr function, I've seen some odd
things in the data set. This, however, is a peculiar record where format
is the same but the fields may be missing & there are no easy delimiters
to go after. I also didn't post that the date was thrown off too.  Same
problem tho.

I'm of course I am willing to send the entire rippers but I'm not sure
how much use they'll be (aside from comic relief :D) given that I'm a
total newbie.

Apologies. I'm trying to track down the nasty record but the obvious n+1
isn't crashing it. More as soon as I get out of the next set of
meetings.

rafe

"Osborne, Brian" wrote:
> 
> Wilfred,
> 
> Yes. And there is also the quote() method, which escapes offending
> characters. However, I don't know what the complete offending character set
> is and how it overlaps with the set of annoying characters in the
> description, it might not help here.
> 
> quote
> $sql = $dbh->quote($string);
> This method escapes special characters (quotation marks, etc.) from strings
> and adds the required outer quotation marks. May not be able to handle all
> types of input (i.e binary data).
> 
> Example using the above methods:
> #!/usr/bin/perl -w
> use DBI;
> use strict;
> my $dbh = DBI->connect("DBI:mysql:contacts",undef,undef)
> or die "Unable to connect to contacts Database: $dbh->errstr\n";
> my $sth = $dbh->prepare("SELECT uid FROM contact WHERE last_name =
> 'Flaherty'");
> $sth->execute or die "Unable to execute query: $dbh->errstr\n";
> my $row = $sth->fetchrow_arrayref;
> my $uid = $row->[0];
> $sth->finish;
> my $newname = $dbh->quote("The Flahertys'");
> my $statement =qq(UPDATE contact SET last_name = '$newname'
>                   WHERE uid = $uid);
> my $rc = $dbh->do($statement) or die "Unable to prepare/execute $statement:
> $dbh->errstr\n";
> print "$rc rows were updated\n";
> $dbh->disconnect;
> exit;
> 
> Brian O.
> 
>  -----Original Message-----
> From:   Wilfred Li, Ph.D. [mailto:wilfred@sdsc.edu]
> Sent:   Thursday, November 15, 2001 12:41 PM
> To:     bioperl-l@bioperl.org
> Subject:        RE: [Bioperl-l] Trying to get a mysql DB from genbank flat
> files
> 
> >thanks for your mail, you have magically bumped into our daily nightmare,
> >that is that very often people put crazy characters in the description
> >lines, and we have to find ways of backslashing all of them otherwise
> >they will break mysql statements. Could you mail me the offending record?
> Hi,
> 
> If bind variables are used in place of a plain insert statement, many of
> the special characters will be taken care of by perl DBI. e.g.
> 
> $sth->prepare("insert ... values (?, ..., ?)");
> $sth->execute($id, ..., $kw);
> 
> or use
> 
> $dbh->do("insert ... values (?, ..., ?)", $id, ..., $kw);
> 
> to combine the two steps into one.
> 
> I had the problem with SeqAdaptor.pm when parsing SwissProt.
> 
> Wilfred
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l