[BioSQL-l] Can I just consider import HGI.101403 to biosequence table?

Josh Lauricha laurichj at bioinfo.ucr.edu
Wed Dec 17 12:26:35 EST 2003


On Wed 12/17/03 08:23, Hilmar Lapp wrote:
> 
> On Wednesday, December 17, 2003, at 07:15  AM, Jim Liu wrote:
> 
> >Dear All,
> >
> >I went to ftp://ftp.tigr.org/pub/data/tgi/Homo_sapiens/
> >and downloaded HGI.release_13.zip.
> >There are 4 data files:
> >
> >479,047,691 HGI.101403
> >  57,653,798 HGI.TC_EST.101403
> >  12,629,591 HGI.TCs.101403
> >  10,303,001 HGI.GO.101403
> >
> >I would like to import to MySQL database.
> >I already applied BioSQL schema on MySQL.
> >Can I just consider import HGI.101403 to biosequence table?
> 
> You'd also want to populate other tables. Generally I'd think you 
> should be able to import those data into biosql; whether or not you can 
> use the supplied tools to do it basically depends on whether there is a 
> bioperl SeqIO parser for the format. So, check whether the 
> SeqIO/tigr.pm XML parser is the one you need here; otherwise you could 
> still write your own.
> 

Wanting to test SeqIO/tigr.pm on more files I took a look at them...

These files are in various formats, some appear to be search results.
Those can't be imported into BioSQL, as far as I know. The only file
that could be is HGI.101403, which is in Fasta. So the best bet for this
is just to use an index such as Bio::Index::Fasta, unless you want to do
searching on the descriptions. Even then, it wouldn't be to hard to
write a quick and dirty DB_File index for those.

On a side note, is there anything thats more efficient at getting a
sequence from BioSQL than Bio::DB::BioDB? All the transformations done
by that interface make it too slow to use on a webpage.

> >How about the other data files?
> >How to find the detail definition of these data files?
> >
> >Thanks.
> >
> >Jim Liu
> >
> >_______________________________________________
> >BioSQL-l mailing list
> >BioSQL-l at open-bio.org
> >http://open-bio.org/mailman/listinfo/biosql-l
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

-- 

----------------------------
| Josh Lauricha            |
| laurichj at bioinfo.ucr.edu |
| Bioinformatics, UCR      |
|--------------------------|


More information about the BioSQL-l mailing list