[Biojava-l] how to cancel download chemcomp when parser a PDB file

Andreas Prlic andreas at sdsc.edu
Tue Dec 21 23:55:39 UTC 2010


Hi Fico,

- you are right, this was a bug (some index was off). I committed a
patch for this to SVN.
- I also added new behaviour for downloading chem comp files: The
default chem comp provider will fetch the components.cif.gz file and
extract all definitions into small files, which will be used from then
on.
- not sure about your last question. That is kind of already possible
I believe. You can use the getChemComp method to get the exact
definition for a group.

Andreas

On Sun, Dec 19, 2010 at 9:31 PM, Fico <wuuter at gmail.com> wrote:
> now the question of ChemComp download is OK, but I found a new question when
> I test bioJava3-Beta4, my program fragment:
>
>         FileParsingParameters params = new FileParsingParameters();
>         params.setLoadChemCompInfo(false);
>         params.setHeaderOnly(false);
>         // params.setParseCAOnly(true);
>         params.setAlignSeqRes(true);
>         params.setParseSecStruc(false);
>
>         // loop file
>         for (String file : getPdbFiles()) {
>
>             PDBFileReader pdbreader = new PDBFileReader();
>             pdbreader.setAutoFetch(false);
>             pdbreader.setPath(getPdbDir());
>
>             pdbreader.setFileParsingParameters(params);
>
>             // pdbreader.setLoadChemCompInfo(true);
>             Structure struc = null;
>             try {
>                 struc = pdbreader.getStructure(getPdbDir() + "\\" + file);
>             } catch (IOException e) {
>                 e.printStackTrace();
>             }
>
>             String pdbid = struc.getPDBCode();
>
>             for (int i = 0; i < struc.nrModels(); i++) {
>
>                 // loop chain
>                 for (Chain ch : struc.getModel(i)) {
>                     System.out.println(pdbid + ">>>" + ch.getChainID() +
> ">>>"
>                             + ch.getAtomSequence());
>                     System.out.println(pdbid + ">>>" + ch.getChainID() +
> ">>>"
>                             + ch.getSeqResSequence());
>                     // Test the getAtomGroups() and getSeqResGroups() method
>                     // List<Group> group = ch.getAtomGroups();
>                     List<Group> group = ch.getSeqResGroups();
>                     for (Group gp : group) {
>                         System.out.println(gp.getResidueNumber() + ":"
>                                 + gp.getPDBName());
>                     }
>                 }
>             }
>         }
>
> my test PDB file is 1O1G.pdb, there are 45 modified residues in chain A,
> when I use .getAtomGroups() I can get all residues' atom information, such
> as ResidueNumber and PDBName:
> 797:PHE
> 798:LEU
> 799:MET
> 800:ARG
> 801:VAL
> 802:GLU
> ......
> 840:PRO
> 841:LEU
> 842:LEU
> 843:LYS
>
> but use .getSeqResGroups(), the last 45 residues will miss some information,
> such as ResidueNumber and atom coordinate, the output of the program is:
> 797:PHE
> 798:LEU
> null:MET
> null:ARG
> null:VAL
> null:GLU
> ......
> null:PRO
> null:LEU
> null:LEU
> null:LYS
>
> In biojava3-Beta1 the two method produce same result just as
> .getAtomGroups() in Beta4. so is it a bug?
>
> P.S.
>     Could we add new method to get all amino acid sequence with modifed
> residues directly? now both getAtomSequence() and getSeqResSequence() can't
> do this, if I want get the amino acid sequence with modifed residues, I had
> to use .getAtomGroups() or .getSeqResGroups() first and then loop each
> residue to get one letter amino acid sequence.
>
>
>
>
>
> 2010/12/17 Andreas Prlic <andreas at sdsc.edu>
>>
>> ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes
>> set to true and it will not download chemical components if
>> loadChemComp is false. The drawback is that the data representation
>> will not be as precise.
>>
>> Andreas
>>
>>
>>
>> On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell <darnells at dnastar.com>
>> wrote:
>> > The SeqRes to Atom record alignment forces the use of chemical
>> > components to translate non-standard residues to their closest standard
>> > counterpart for the sequence alignment.  I have to disable
>> > setLoadChemCompInfo and setAlignSeqRes when I don't want to download
>> > chemical component files from RCSB when parsing a PDB file.
>> >
>> > Regards,
>> > Steve
>> >
>> > -----Original Message-----
>> > From: biojava-l-bounces at lists.open-bio.org
>> > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico
>> > Sent: Wednesday, December 15, 2010 8:46 PM
>> > To: Biojava-l at lists.open-bio.org
>> > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB
>> > file
>> >
>> > Hi, dear all:
>> >
>> > I use biojava3 beta1 to parse the PDB files recently, my program is:
>> >
>> >            PDBFileReader pdbreader = new PDBFileReader();
>> >            pdbreader.setAutoFetch(false);
>> >            pdbreader.setPath(pdbDirPath);
>> >
>> >            FileParsingParameters params = new FileParsingParameters();
>> >            params.setLoadChemCompInfo(*false*);
>> >            params.setHeaderOnly(*false*);
>> >            params.setAlignSeqRes(*true*);
>> >            params.setParseSecStruc(*false*);
>> >            pdbreader.setFileParsingParameters(params);
>> >
>> >            Structure structure = null;
>> >            try {
>> >                structure = pdbreader.getStructure(pdbDirPath + "\\" +
>> > file);
>> >            } catch (IOException e) {
>> >                e.printStackTrace();
>> >            }
>> >
>> > when I execute this program, it will download something such as:
>> >
>> > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp
>> > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif
>> > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif*
>> >
>> > but I do not want to lownload those stuff, How can I cancel it?
>> > Thanks.
>> > _______________________________________________
>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >
>> > _______________________________________________
>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >
>>
>>
>>
>> --
>> -----------------------------------------------------------------------
>> Dr. Andreas Prlic
>> Senior Scientist, RCSB PDB Protein Data Bank
>> University of California, San Diego
>> (+1) 858.246.0526
>> -----------------------------------------------------------------------
>
>



-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------




More information about the Biojava-l mailing list