[BioPython] UniGene parser
Sagar Damle
sagar@caltech.edu
Wed, 17 Jul 2002 07:34:25 -0700
Hi peter,
> For accessing LocusLink and maybe also for UniGene I would
> recommend to download the whole database in ASCII flatfile
> format, and then parsing the flat files. In my opinion
> it is much easier to write parsers for these
> flatfiles, than for any HTML generated primarily for human
> readers.
This seems like a good idea, but my own attempt at parsing the unigene/LL flatfiles (like LL_tmpl) makes me worry that these files are just too large to parse each time I need information. Might it be an even better idea to store these results in a local searchable database? I think the people at the GO-consortium have done this with their GOannotations, but I'd never seen it made available for unigene/LL at ncbi. Going to the website seemed to be the shortest path solution.
thoughts anyone? I'm not really a programmer, just a scripter, so I may be way off-base here.
sagar
On Wed, 17 Jul 2002 10:19:56 +0200
Peter Slickers <piet@clondiag.com> wrote:
> Cayte wrote:
> >
> > I just did some experiments with LocusLink files and when I strip out the
> > html tags very little information is left.
>
> Indeed, a LocusLink record contains only a few data fields, namely
>
> locusID (Number)
> symbol (alphanumerical code, => genecards )
> description (text)
>
> Further more, there is a list of related GenBank accessions for each LocusLink record.
>
>
> > For this reason I think I should use the same approach as UniGene. Have you
> > checked out Record in
> > Unigene? Is this what you want?
> >
>
> For accessing LocusLink and maybe also for UniGene I would
> recommend to download the whole database in ASCII flatfile
> format, and then parsing the flat files. In my opinion
> it is much easier to write parsers for these
> flatfiles, than for any HTML generated primarily for human
> readers.
>
>
> Unigene by ftp:
>
> ftp://ftp.ncbi.nih.gov/repository/UniGene/
> ftp://ftp.ncbi.nih.gov/repository/UniGene/README
>
>
>
> LocusLink by ftp:
>
> ftp://ftp.ncbi.nih.gov/refseq/LocusLink/
> ftp://ftp.ncbi.nih.gov/refseq/LocusLink/README
>
>
>
> Peter
> -------------------------------------------------------------------
> Peter Slickers piet@clondiag.com
> Clondiag Chip Technologies http://www.clondiag.com/
> Löbstedter Str. 105
> 07749 Jena
> Germany
>
> Fon: 03641/5947-65 Fax: 03641/5947-20
> -------------------------------------------------------------------
> _______________________________________________
> BioPython mailing list - BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>