[Bioperl-l] Trying to get a mysql DB from genbank flat files
Raphael LaFrance
rafe@scinq.org
Thu, 15 Nov 2001 13:42:10 -0500
Sorry for delay in replying... meetings all morning.
Yup, crazy chars abound in the Genbank data. That quote() function is
sorely missing in my program & a much better solution than my 1st try,
which was rather brute force-ish. I also like to strip mutil-whitespace
chars down to a single space char.
Oh yea, you probably already knew this but... on quoted data like in the
feature's qualifiers you cannot depend on the "/" being a reliable
delimiter you have to look for the closing quote via a strip of ending
white space & a check of the last char found, well last two if you think
there might be a \" sequence in the data itself, which I didn't find in
the 60 or so files I stripped but you never know.
Maybe new bugs will be by using the substr function, I've seen some odd
things in the data set. This, however, is a peculiar record where format
is the same but the fields may be missing & there are no easy delimiters
to go after. I also didn't post that the date was thrown off too. Same
problem tho.
I'm of course I am willing to send the entire rippers but I'm not sure
how much use they'll be (aside from comic relief :D) given that I'm a
total newbie.
Apologies. I'm trying to track down the nasty record but the obvious n+1
isn't crashing it. More as soon as I get out of the next set of
meetings.
rafe
"Osborne, Brian" wrote:
>
> Wilfred,
>
> Yes. And there is also the quote() method, which escapes offending
> characters. However, I don't know what the complete offending character set
> is and how it overlaps with the set of annoying characters in the
> description, it might not help here.
>
> quote
> $sql = $dbh->quote($string);
> This method escapes special characters (quotation marks, etc.) from strings
> and adds the required outer quotation marks. May not be able to handle all
> types of input (i.e binary data).
>
> Example using the above methods:
> #!/usr/bin/perl -w
> use DBI;
> use strict;
> my $dbh = DBI->connect("DBI:mysql:contacts",undef,undef)
> or die "Unable to connect to contacts Database: $dbh->errstr\n";
> my $sth = $dbh->prepare("SELECT uid FROM contact WHERE last_name =
> 'Flaherty'");
> $sth->execute or die "Unable to execute query: $dbh->errstr\n";
> my $row = $sth->fetchrow_arrayref;
> my $uid = $row->[0];
> $sth->finish;
> my $newname = $dbh->quote("The Flahertys'");
> my $statement =qq(UPDATE contact SET last_name = '$newname'
> WHERE uid = $uid);
> my $rc = $dbh->do($statement) or die "Unable to prepare/execute $statement:
> $dbh->errstr\n";
> print "$rc rows were updated\n";
> $dbh->disconnect;
> exit;
>
> Brian O.
>
> -----Original Message-----
> From: Wilfred Li, Ph.D. [mailto:wilfred@sdsc.edu]
> Sent: Thursday, November 15, 2001 12:41 PM
> To: bioperl-l@bioperl.org
> Subject: RE: [Bioperl-l] Trying to get a mysql DB from genbank flat
> files
>
> >thanks for your mail, you have magically bumped into our daily nightmare,
> >that is that very often people put crazy characters in the description
> >lines, and we have to find ways of backslashing all of them otherwise
> >they will break mysql statements. Could you mail me the offending record?
> Hi,
>
> If bind variables are used in place of a plain insert statement, many of
> the special characters will be taken care of by perl DBI. e.g.
>
> $sth->prepare("insert ... values (?, ..., ?)");
> $sth->execute($id, ..., $kw);
>
> or use
>
> $dbh->do("insert ... values (?, ..., ?)", $id, ..., $kw);
>
> to combine the two steps into one.
>
> I had the problem with SeqAdaptor.pm when parsing SwissProt.
>
> Wilfred
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l