[Bioperl-l] querying LocusLink

Hilmar Lapp hlapp at gnf.org
Wed May 21 15:47:00 EDT 2003


As for locally indexing LL, a good starting point is looking at the 
modules in Bio/Index. I'm not the best person to direct you here, but 
looking at an implementation in that directory and possibly looking at 
the scripts in scripts/index which take advantage of those should help.

As for querying NCBI LL directly over the network, the modules to look 
at would be those that implement e.g. the same for Genbank 
(Bio/DB/GenBank.pm). Lincoln also recently factored out some of the 
basic framework into modules in Bio/DB/Query/*.pm.

Lincoln and Ewan wrote many of the pertinent modules, so they might be 
able to get you started better.

As for how to write a bioperl module, there is a developer's guide in 
biodesign.pod (anyone knows of better ones? Ewan recently posted a 
little guide, should be in the News section on  the website.)

	-hilmar

On Monday, May 19, 2003, at 09:53  AM, Eric Wang wrote:

> Dear Hilmar,
> Thanks for replying.  I'd love to write this module.  As a 
> user/supporter
> of the bioperl/open source bioinformatics community, I'll be happy to
> contribute some of my time for development.  But I'd like some 
> pointers on
> where do I open some sources for inheritance and how to make 
> connection to
> LocusLink?
> Also, where do I look up how to make a module?
> Thanks for all your help,
>
> Eric
>
>  On Sat, 17 May 2003, Hilmar Lapp wrote:
>
>> In the absence of any better answer from someone else, here is mine:
>> you can't get the LocusLink entry through DB::GenBank. In fact, there
>> is no remote DB interface for retrieving LL entries yet - you or
>> anybody else is welcome to write that though (the framework for 
>> setting
>> up the connection etc should all be there to inherit from).
>>
>> Instead, download LL to your computer, then ... uuhhm, well I'm afraid
>> Bio::Index::* doesn't support LL either.
>>
>> So, either you write the Bio::Index module that support LL, or you 
>> dump
>> LL into biosql and then write a tool that extracts it based on protein
>> dbxref.
>>
>> Bottom line: I wish there was but I'm afraid there isn't an easy way 
>> to
>> do this yet in Bioperl.
>>
>> 	-hilmar
>>
>> On Thursday, May 15, 2003, at 11:48  AM, Eric Wang wrote:
>>
>>> Dear all,
>>>
>>> I have on my hand a protein name and I want to access the locus link
>>> information and the NT_ contig information.
>>> I tried several times with Bio::DB::GenBank objects but seems like I
>>> could
>>> only obtain the sequence but not the LL data.  So I guess my question
>>> is
>>> how would I go about finding the LocusLink information for this
>>> protein?
>>>
>>> Thanks for all the help!!
>>>
>>> Eric
>>>
>>>  On Wed, 14 May 2003, Hilmar Lapp wrote:
>>>
>>>> Are you talking about querying Genbank at NCBI through a
>>>> Bio::DB::GenPept object? I don't understand what exactly you're 
>>>> trying
>>>> to do. The LL parser in bioperl is a stream parser, not a random
>>>> access
>>>> interface.
>>>>
>>>> My recommendation is to state what you are trying to achieve, paste 
>>>> in
>>>> the script or the part that supposedly does this, and send that to 
>>>> the
>>>> list. There's many more very knowledgeable people there ... who may
>>>> have even tried to do the same as you before.
>>>>
>>>> 	-hilmar
>>>>
>>>> On Wednesday, May 14, 2003, at 10:46  PM, Eric Wang wrote:
>>>>
>>>>> Thanks for your help.
>>>>> I am still confused on how to query locuslink for the tags
>>>>> I have on my hand just the protein name and I want to basically 
>>>>> find
>>>>> the
>>>>> intron/exon boundaries.
>>>>> But it seems like when I use get_stream_by_query('prot_name'), I 
>>>>> got
>>>>> nothing... do you know a better way to approaching the problem?
>>>>>
>>>>> Thanks
>>>>> Eric
>>>>>
>>>>> On Wed, 14 May 2003, Hilmar Lapp wrote:
>>>>>
>>>>>> Theoretically yes, but be aware that these might be huge. If you
>>>>>> fetch
>>>>>> them remotely, you don't do a favor to anyone, not to NCBI and not
>>>>>> to
>>>>>> yourself. You want to have a local indexed flat file download for
>>>>>> that.
>>>>>>
>>>>>> 	-hilmar
>>>>>>
>>>>>> On Wednesday, May 14, 2003, at 08:40  PM, Eric Wang wrote:
>>>>>>
>>>>>>> Thanks for your help!
>>>>>>> Yeah, I noticed there isn't any top_features but
>>>>>>> the annotations will contain reference sequence (NT_ contigs) ?
>>>>>>> if that's the case, I would just use get_seq_by_id to retrieve it
>>>>>>> right?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks again
>>>>>>>
>>>>>>> Eric
>>>>>>> On Wed, 14 May 2003, Hilmar Lapp wrote:
>>>>>>>
>>>>>>>> my $seqio = Bio::SeqIO->new(-format => "locuslink");
>>>>>>>> while(my $ll = $seqio->next_seq()) {
>>>>>>>> 	# whatever
>>>>>>>> }
>>>>>>>>
>>>>>>>> There won't be any sequence features though. LL essentially is
>>>>>>>> lots
>>>>>>>> of
>>>>>>>> tag/value pairs, and that is how it ends up (i.e.,
>>>>>>>> $ll->annotation->get_Annotations()).
>>>>>>>>
>>>>>>>> The locus-contains-splice variants thing in LL isn't properly
>>>>>>>> reflected
>>>>>>>> in the resulting SeqI object, which is a problem of the Bioperl
>>>>>>>> object
>>>>>>>> model in the first place (there is no good object right now that
>>>>>>>> would
>>>>>>>> properly reflect an LL object wrt the datatype). What you'd want
>>>>>>>> is
>>>>>>>> a
>>>>>>>> gene model. Maybe it's worth thinking whether the feature-driven
>>>>>>>> SeqFeature::Gene::* objects fit the bill here. I'm not convinced
>>>>>>>> yet
>>>>>>>> though.
>>>>>>>>
>>>>>>>> 	-hilmar
>>>>>>>>
>>>>>>>> On Wednesday, May 14, 2003, at 05:44  PM, Eric Wang wrote:
>>>>>>>>
>>>>>>>>> I have a question regarding how to retrieve locuslink objects.
>>>>>>>>> I searched the documentation everywhere and couldn't find it.
>>>>>>>>> Can anybody give me some pointers on how to retrieve locuslink
>>>>>>>>> objects
>>>>>>>>> and
>>>>>>>>> can I use the top_Seqfeatures() for this object?
>>>>>>>>>
>>>>>>>>> Thanks in advance
>>>>>>>>>
>>>>>>>>> Eric
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at bioperl.org
>>>>>>>>> http://pw600a.bioperl.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list