[Bioperl-l] arabidopsis + load_seqdatabase.pl

Mon Dec 19 14:27:01 EST 2005

On 12/19/05 2:10 PM, "Angshu Kar" <angshu96 at gmail.com> wrote:

> Hi Sean,
>  
> What I need is precisely the latest arabidopsis files (peptide as well as dna)
> that has loaded the database successfully when used with the
> load_seqdatabase.pl script.
> I've tried some other files but they doesn't load all the tables correctly
> (e.g. cannot distinguish between accession #, name and identifier etc and load
> same data in all the 3 columns).

I might approach this in a different way.  I would seek to find the file or
files that contain all the information that I want to store--this is the
hard part in this case, perhaps.  If the data comes from TAIR (that looks to
be a good source of genome information for arabidopsis), then you need to
learn what files are there, what format they are in, what is in each of
them, and what isn't.  Then, and only then, should you try to load the data
into a database.  Only then can you determine what the problem is (if there
is one) with loading data into bioperl-db.  Imagine, for example, that the
datafile that you are trying to load includes only an accession.  In that
case, bioperl-db can't load other information, because there isn't any to
load.  So, you need to diagnose your own problem here, I think and determine
what is in the files that you have and why you have the situation in the
database you have.

So, what format file do you have right now and does bioperl support it?
What is expected to be in that file?  Is everything that you need in the
files that you have (you have to look at the files and understand them, not
at the bioperl parsing of them)?

Sean