[Biojava-l] Re: [Biojava-dev] reading pdb format or using tagvalue?

Matthew Pocock matthew_pocock@yahoo.co.uk
Fri, 17 Jan 2003 11:56:40 +0000


Hi Russell,

The tag-value stuff assumes that each line can be broken into a single 
tag with a value. Things like pdb don't look quite like this (multiple 
types of values on some lines), but I recently added some handlers to 
fool the system. You will need the 1.3 snapshot, and a Java 1.4 or 
higher vm.

Start off by creating a LineSplitParser instance. You will then have to 
configure it to match PDB. For example, each record seems to have a 6 
char tag, so you need to call lsp.setSplitOffset(6). Also, every line is 
a new piece of data (unlike embl where multiple lines with the same tag 
are part of the same entry), so you need to call 
lsp.setMergeSameTag(false). Continue in this vein untill you think you 
have something that should process the skeleton of the file.

Then, look at the demo code under demos-1.4/unigene/ParseUnigene.java 
for a simple skeleton for hooking your customized parser to some debug 
output. Once this is done, you should be able to see what kind of job 
it's made of the pdb entries.

Now comes the fun bit. The values so far will be single strings for the 
entire bit of the line that's not a tag. This is next to useless. You 
realy need to tokenize each line. You do this using a combination of 
TagDelegator and RegexFieldFinder. Let's call the instance of 
TagDelegator td. Now, for example, call td.setListener("HEADER", 
headerHandler). You can make headerHandler an instance of 
RegexFieldFinder, configure it with a regex to match the name and date 
and ID, and name them sanely. Don't forget to pass in your debug 
listener as the delegate for headerHandler - that way the events will 
get dumped out. For entries like AUTHOR that are lists, you can 
associate a listener that splits the output up. Use ChangeTable, 
RegexSplitter and ValueChanger to describe the process.

Sorry, this has got too long already. See how far you can get on your 
own and then pester me. It's not that hard to write these things once 
you're up to speed, but there's a steep learning curve.

Matthew

Russell Smithies wrote:
> 
> Hi,
> Has anyone got an example of how to use Matthew's new 
> biojava\bio\program\tagvalue package?
> 
> I wantto read 'tags' off .pdb files and get the property (atom x,y,z 
> coords) back and to do many(everything Brookhaven/RCSB has maybe?) files 
> so converting to xml first is probably a bit time/resource-consuming.
> 
> Maybe creating new Annotations is the better way to do it?
> Or can I trick SeqIOTools.readEmbl() to do it?
> 
> Any ideas?
> 
> thanx
> Russell
> 
> 
> 
> 
> _________________________________________________________________
> MSN 8 helps eliminate e-mail viruses. Get 2 months FREE* 
> http://join.msn.com/?page=features/virus
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev@biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk