FW: [Biojava-l] Parsing SwissProt flat file through BioJava

Santosh Kumar santosh@molecularconnections.com
Fri, 5 Jul 2002 09:00:07 +0530


This is a multi-part message in MIME format.

------=_NextPart_000_0000_01C22402.5BFBAC30
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit


> Hi   All , 
> 
> Thanks for your immediate reply . I can now parse the sequence & features
> . I have some more query related to it .We are parsing the data from
> SwissProt Flat files and populating them in our xml format .We intend to
> make a Automated application to do same .For that we need all the data
> related to FT & SQ .Please forward the url which can help us to know all
> the things related to our query (This will avoid unnecessarily disturbing
> u every time). I have visited http://biojava.org & have javadoc of it.
> 
> FT   PROPEP       27     28       ACTIVATION PEPTIDE.
> FT   CHAIN           29    262      GRANZYME A.
> ....
> ....
> SQ   SEQUENCE   262 AA;  28968 MW;  DA87363A0D92BAF4 CRC64;
>         MRNSYRFLAS SLSVVVSLLL IPEDVCEKII GGNEVTPHSR PYMVLLSLDR KTICAGALIA
> ....
> ....
> 
> Our java program is not able to parse the string which is shown in red +
> bold in the above format .
> 
> **  
> e-regards,
> Santosh Kumar
> Molecular Connections Pvt. Ltd.,
> Voice: 5598502/5327919 Ext: 42,
> "Never let yesterday's disappointments, overshadow tomorrow's dreams..."
> 
> **************************************************************************
> **********************
> 
> <http://www.molecularconnections.com>
>                       
> "Accelerate Life Sciences Research with IT"
> 
> **************************************************************************
> ***********************
> The contents of this communication are intended only for the addressee and
> may contain confidential and/or privileged material. If you are not the
> intended recipient, please do not read, copy, use or disclose this
> communication and notify the sender.  Opinions, conclusions and other
> information in this communication that do not relate to the official
> business of my company shall be understood as neither given nor endorsed
> by it.
> **************************************************************************
> ***********************
> 
> 
> 
> -----Original Message-----
> From: Roy Park [mailto:RPark@lexgen.com]
> Sent: Wednesday, July 03, 2002 8:03 PM
> To: 'Santosh Kumar'
> Subject: RE: [Biojava-l] Parsing SwissProt flat file through BioJava
> 
> 
> Santosh,
> 
> For features and sequence, you need to use different methods (believe it
> or
> not!).  You have to call Sequence.getFeatureHolder() to get the features,
> and call the Sequence.seqString() to get the sequence.
> 
> Roy
> 
> 
> -----Original Message-----
> From: Santosh Kumar [mailto:santosh@molecularconnections.com]
> Sent: Wednesday, July 03, 2002 12:01 AM
> To: Roy Park
> Subject: RE: [Biojava-l] Parsing SwissProt flat file through BioJava
> 
> 
> 
> 
> 
> 
> 
> 
> Hey ,
> 
> I am able to retrieve the data from SwissProt flat files but was surprised
> when I didn't get the data corresponding to FT & SQ from the same flat
> file.I donn't know the reason.
> 
> Plz let me know if I am missing same points.
> 
> regards
> 
> Santosh
> 
> -----Original Message-----
> From: Roy Park [mailto:RPark@lexgen.com]
> Sent: Tuesday, June 25, 2002 8:08 PM
> To: 'Rahul Deshpande'; biojava-l@biojava.org
> Cc: Santosh Kumar
> Subject: RE: [Biojava-l] Parsing SwissProt flat file through BioJava
> 
> 
> The following code should give you some ideas:
> 
> SequenceIterator seqIter = SeqIOTools.readSwissprot(br);
> while (seqIter.hasNext()) {
> 	Sequence thisSeq = seqIter.nextSequence();
> 	Annotation thisAnnot = thisSeq.getAnnotation();
> 	// Get species info..
> 	String species = thisAnnot.getProperty("OS");
> 	.
> 	.
> }
> 
> etc.
> 
> The Annotation.getProperty() call may end up returning an ArrayList of
> Strings instead (of a String object), if there are multiple XX entries.
> i.e.
> it could very well be:
> 
> 	String species = thisAnnot.getProperty("XXXXXX");
> 
> for SOME of the properties.  Personally, I handle this by writing a
> utility
> static method that can take either an ArrayList (of String) object or a
> String object, and returns a String object.
> 
> Roy Park
> Bioinformatics Data Analyst
> Lexicon Genetics Incorporated
> The Woodlands, TX
> 
> -----Original Message-----
> From: Rahul Deshpande [mailto:rahul@molecularconnections.com]
> Sent: Tuesday, June 25, 2002 1:21 AM
> To: biojava-l@biojava.org
> Cc: Santosh Kumar
> Subject: [Biojava-l] Parsing SwissProt flat file through BioJava
> 
> 
> Hello,
> 
> I am Rahul Deshpande. I work for a bio informatics company. Recently I
> came
> across biojava.
> I have a question about biojava.
> 
> How can we read the SwissProt flat files using biojava? The closest we
> could
> come was org.biojava.bio.seq.io.EmblLikeFormat class which has methods to
> return the two letter code and the remaining data in the line. Can we
> retrieve the data for each protein in the SwissProt flat file instead of
> reading each line by each line? Are there any methods like that?
> 
> Regards,
> Rahul Deshpande
> Molecularconnections Pvt. Ltd.
> A203, Blue Cross Chambers,
> Infantry Road Cross,
> Bangalore 560001
> Ph: +(91)-80-5598502, 5327919
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> **************************************************************************
> *
>  The contents of this communication are intended only for the addressee
> and
> may contain confidential and/or privileged material. If you are not the
> intended recipient, please do not read, copy, use or disclose this
> communication and notify the sender.  Opinions, conclusions and other
> information in this communication that do not relate to the official
> business of my company shall be understood as neither given nor endorsed
> by
> it.
> **************************************************************************
> *
> 
> 
> 
> 
> **************************************************************************
> * 
>  The contents of this communication are intended only for the addressee
> and
> may contain confidential and/or privileged material. If you are not the
> intended recipient, please do not read, copy, use or disclose this
> communication and notify the sender.  Opinions, conclusions and other
> information in this communication that do not relate to the official
> business of my company shall be understood as neither given nor endorsed
> by
> it.  
> **************************************************************************
> * 
> 
> 

------=_NextPart_000_0000_01C22402.5BFBAC30
Content-Type: application/ms-tnef;
	name="winmail.dat"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
	filename="winmail.dat"

eJ8+IgcDAQaQCAAEAAAAAAABAAEAAQeQBgAIAAAA5AQAAAAAAADoAAEIgAcAGAAAAElQTS5NaWNy
b3NvZnQgTWFpbC5Ob3RlADEIAQ2ABAACAAAAAgACAAEIAAUABAAAAAAAAAAAAAEJAAQAAgAAAAAA
AAABBoADAA4AAADSBwcABQAJAAAAAAAFAPMAAQOQBgBsEgAAKAAAAAsAAgABAAAACwAjAAAAAAAD
ACYAAAAAAAsAKQAAAAAAAwAuAAAAAAADADYAAAAAAB4ATQABAAAAAQAAAAAAAAAeAHAAAQAAADgA
AABbQmlvamF2YS1sXSBQYXJzaW5nIFN3aXNzUHJvdCBmbGF0IGZpbGUgdGhyb3VnaCBCaW9KYXZh
AAIBcQABAAAAGwAAAAHCIziyeaKlfCRA30mxrN2CzfiP8PEAJtd7gAALABcMAAAAAAIBHQwBAAAA
JgAAAFNNVFA6U0FOVE9TSEBNT0xFQ1VMQVJDT05ORUNUSU9OUy5DT00AAAALAAEOAAAAAEAABg4A
rP891CPCAQIBCg4BAAAAGAAAAAAAAAB+YbkByPzLT7sB94zSyiDtwoAAAAsAHw4BAAAAAgEJEAEA
AACaDQAAlg0AAKIcAABMWkZ1ZoL57AMACgByY3BnMTI18jIA+zM2AegCpANjAgCIcHJxAFBmY2gK
wKhzZXQOIDgF0k0LgD0RIG8CgxDxA+MRBzAgmwcTAoB9CoAIyCA7CWnfEZEKJBU6ChQOMDUVMxeh
lwm7AoAKgXYIkHdrC4A0ZDQMYGMAUAsDZmmQLTE0NAFAbGka01sM0BrTYwBBC2BuDhAw/DMzDCEL
tBnwCrEKhAswTxwAChEBQBDAb3QFkHTzEpQB0CBIAKAcAA5QDDJvHpUTwAMwHmFpILET0Gy/CVAe
Ih6VIMEfwiBHLCGtyx13CoBUETBuawQgAhC1BcB5CGEgB3AHgGQHMA0ewCAJcAtQeSAuIMhJIGMD
kW5vB+AKsbERYCB0aCaQEWBxClDrEiAmkCYlcGUmcAhwB5H7JxIRMHYoUQNwJpAEYAlweiAokXIm
8AlwC2AewGTNKCBvJgAFQC5XJpAKwFcmkCfSC4BnKCNkJnBhLyVwA2EGAAPyUB6hIEY/K2ElcAMQ
B5EAcCugcG+8cHUrYSzlLeALgCAl0vx4bQMgJYEAwCwEC4AewLcvYSvBAMBrLEET0HUrwPMxcSuR
YXALUA3gL+ECICUrsmQr0HNhKmEuRtcFsSgwLsF3JpBuCeAzgSchgC0YK0lGVCjxU1G1JwBQLxBh
KAElgXcLEacoIwhwAyB3aA3gaCdDuShAbHA5YAQgK8FrJ5J/NjYoMCzhBCArSSXSKuQoPyUQBAA1
oAMQAyAqAG9p+SugdW414CjQBBAKwAMQKybwJlBzKVFiLOJ1IN5lKhArES/wB4ApKbcZcAsAkCuC
aAJAcDovL8k/4G9qKgBhLgWwLQCvKQAp80KyNJBjMLBmK+EOLiRaN7EgwFBST1DMRVAhQUYCMjdG
JBGwD0YEG/EcsRPQQ1RJVnJBSABPTkWQReBIAEQeRUSGHiQ3oiDAQ0hBJklIcEqYMjlGMzYyB0ck
R8BHYyBHUkFOsFpZTUUT0EkNLk5BDx1kTkg4ASDAU0VRVRhFTkNNIEuEQUE7U0dXEaA5NhGxV1Dh
RAhBODcPMDNBMERAOTJCQUY0SjBSOEM2NBUwSStKlk1SgE5TWVJGTEEF8fhMU1ZV4FWwViAnIEXQ
JERWUDBLSScwR0cATkVWVFBIU1JbRZBNAFZWIFWwRFegSwFIAENBR0FMSUH3To5ZeB1kTyXhQrIn
wANg/wnANNAmAAQgJ5AFQAGgLxB7K7In2XQFECzxObQ9wXMdEkB3A6AwkQlxICsg3wbhK6AwkSgy
AaBvKhExRkkkWioqJDZlLQlwZ/0LEXMjYE80AHArwF6wWGDqdQDAch1kTQbwBZAvwZ8FwAhQPrI0
EgQgUHZEcKAgTHRkLmM1Vj5gCSjQOiAXsDk4NTCQMi81M0ZwOTFLUEhFeHRngDQyYzUi2wfAQGEg
LxAFQHkHkB7A8QsgYXknBCA/gTOhPmD/AjAHgAIwYyAwsEBhXrBD4e8H4DMhBbADYHdqwglwNND+
c1mCCyACYCiQHrFhTG9vv3B/cY9yn3OvdEckWjxCFXp3dpAuBGBk9QWgZZcueQWgbT4dZEqZSplp
JUHeYyjQLxBb4CaBTAaQJpDcU2MIkCjBBCBSB5ApMJ93UDnwA/AoMCcgVCJun/9+P39PgF+Bb4J/
dOslECaQf3dhMhFrwEQiO9EEIHghbf8+oDPmLGIx9CuRAiAm4SWC/WAjZG1hBBAJ4C9DAMAm8P+E
ogtxhJIaoAEAAjAHMS9Cti8FsRDAaRlwLxBnK5H/M0IHIicRREAlwSxTXFIoMteG1wlwe6Bwe7F0
a+ALUG84YzSRXFJtcWRr4AWgcL55a+A6oCaQBbE/gWMJAP8oA4WPL1JcUQaQQJEoQ4cR8nInECBP
jmADAGXxj/L/EiAKQACQZfIvUh6wKEAl8f+J8DFTNCJf85GfNWOPRytivyuyKDJEMBqge6CKcWKV
EVc14AQRRDFtiTJtCrBudybwbEE2QWI5UZOyP6BvtwRwLFBcMWV8wWnBZ4swfwnwJ4EFwDIhBbAR
YCugYv8m8ERngv+gz6Hfou+j/4NPaySlJFotp1JPBRCd0G79inFNPvKLgKdTRPUDYWeAiwgAJvBQ
CsBrIFsAwFMDECvAOlKp8kAvEHj1i4BueBJdTyVroWeALDDuZJqxapFr4EovwCbwHJDHa+AB0Gfw
IDg6HJBFkKZNJLWqoCAnY6snTyVYdWJqHtGpkUVngFviQkKULWxdqeIs0y4IzmYutighA2B1Zznw
sWH+SkLBpj9jlmM1RPUFsSkn/y9SKHZr4IySNeMrwZByJlD/ASAEkGuhKoARcBJAYxA9gH+cQBsg
QFEr4gWwHWRcUSH5QPEgWYyhKfMrwSdQNkFLBmAolC6LgHRGKTRI81+xBJAoKSuyvlEoIykmv2M1
L1K9cygyvccocVNdw/+/XCh2RIupsbTPp0+oX6lj72Orqjc0wGPDQHbfd+Wrz++s363iDiCuMDET
0K6Jqbb/sC+xP7JPs1+0b9VP1l8kh/xIZSbwtlsnMFvxXJYJcP9dwbtCLS/S/gQgmnA1kZ0h/nMI
cIsRnsEdZDmwngEnMPsmUM1AJwVAv7bbAwWhKXH/L5AZ0CzjN5gtsygzNNLTYvsdZC7yLt6xZYHf
ETsDKDLfbXEqQKtwJFo4QHpp4yph7zsDBpAnIVvxbQQBLOI0w/9rQ22wJFpixbUvXrDF38bv76i/
qc+q38x5VApQzXY14B1GYDWtuBGwrntSYWi3L8BSIAeQaJuRAQAnUOCbQoXSAEBCiR1kQ2PI3f/Q
X9Fv0n/Tj9SfJKc4kiGA/yegLOIFoAEAXqIvwCugndJvjIMqQ4oROHA66bsohUmfamEmcAWxKHEB
MiA9vbL/SFCu8AbwbbCPsi4DHpK68FxyKVOVObFcsSgBxS69ETBzB8BosL9Qv2BcD2D/HXQMkb23
lwO9wQIxBQY14N9osL3Gv1BTlgbCQT6wHrD/l+YuIApTAjEHlb5CClgJfP9CYEygagHgYI4xe/GW
Ilo3/wbDXcQOZgukClO+Qi5RDnCBFOB5KCJPUyIJi9sPOkSGfSRaDFBjRIuEYr8KWBGbv2C9c4kS
MiJ1OoDf2jE5cJRQLPE6IUHgMGqg/3tAP6BEIU8lXcMOwmpRj9CvPYBEMdswD/Vv99Mpa+B/5sGV
0ixEheDu0I5QXLFY3lieYdpSbbAmAC7EZivx/+AQ/nJAYzWwnBP/6w+vEL8/EckfACVSEpji1JCx
U0//TRGFMyyBEeQfc0WQbCE0MH+9gZBBQSL0MPrTlzGfAXf/3bAv89swMxD6wCBgxWU/oP8v4UQQ
unQ1VDoSiYAysZ1l/xn7HELCtL9gHRSQovvlHKv/a+DBIhlUlWEcnMR+0An4kXWWJ2OVYETbEgtA
vYB5+z+geGRM74CSIIZhDiC5EP01c0mU4JCwa0B68t31FoKeV5zhZTD0MGvRVFjq7//r/+0P84zu
h1vg85HKz8vf4/Cv8bgxOjLO/PSf9a//9r/3xviP+Z/6r/u/2AX9cf/Ynz0+jEEhMJCw7nCHstsw
/0Pxlhg1kZtljEB8IAdgukDfh4HesL1w4lDAxWNJQJrR/0SW2Qa889swByFdsIZTX6D/3QFS3Nf1
5CEs4iFAj6PBpN9I/3vxmoLg0EP1P0FghHL/kSIakVbxIJN4ZHgh4hDdMl1FES5ElkPxwnIuXPFF
/m3ZwHtALUC3IWDykRDksP+VYASRfIEFgbpn2gDo5hly/cOjdE7QaeICAv3zwSLkVP/uoRmj38OW
4+IQuyC5EIxA/kNWw2An2m6eQlJgSoADov+dYJbFV59J0xvGGsaPsuCy/2aTZBKe8mqXWdAaINqj
HgLfm7G6drsgLUGYQj/Ei+lUN7ZV8314ZE0/P5VRUHbTMxB7MHRkn1VBzmCtodZClQDiEENSg0MF
gF2g9yiBtlU38GbJASER7gAcEXtz87ZVQskA6WCRIIahNS42znDO4OVlaPNAKygAOTEpLTgwLTUw
NTk4Na3wzkA1M9QyN3igORXKX3rfe+//fLo0d0Qki7ErAeDBuyAakZwgLZQAR7dEjwpoYZBscDoN
8IB5L+6i7qBu/i9/QpYigeZIIEsfn7+Gb/+Hf4iPpVxZ5AzwjbHoUCck35c/HjLoMZOilaFurWEm
ovvDso/QZLexkUBiE4UUGKL/itJi4ZSymgC/ILpAmjLBIfYvZmHdoXZJ4e+guUBeQf8ogJoxTpGb
ELjSHjKYwsOx7yAFjRbpQJogcLswukDOQH8eweSw2uGYpgMwzkDgEHDvzaG5kmZhudBzWjOLg1s3
/4wauAGTkbng2HDDtJyCKEH+T5UwjDBAMZbCOABzsEiQ/3ISwSJJUGxhlAY1JQrRY6P/i78sk5YX
3FHZ48OyJzDckH8jcL2AhRTc8EiRO9EnIm3/j7JQo/4xvYIhkLmAmtJUcP85Ybfgt9C5EC2E/rKM
oOQQ/2Zx9DBmYN3RKgEgBnLViR//qN+p76r/ibusj4UPrv+wD/+xH7IMBiCKP4tPjF+Nb45//4+P
kJ+Rr5K/k8+U35Xvlv//mA+ZH5ovmz+cT51fnm+ff/+gj6Gfoq+jv6TPpd+m5yhQv64P0Z/Sr9O/
sk+tHX0TAAIA1xAAAAsAAYAIIAYAAAAAAMAAAAAAAABGAAAAAAOFAAAAAAAAAwADgAggBgAAAAAA
wAAAAAAAAEYAAAAAEIUAAAAAAAADAAeACCAGAAAAAADAAAAAAAAARgAAAABShQAAJ2oBAB4ACYAI
IAYAAAAAAMAAAAAAAABGAAAAAFSFAAABAAAABAAAADkuMAAeAAqACCAGAAAAAADAAAAAAAAARgAA
AAA2hQAAAQAAAAEAAAAAAAAAHgALgAggBgAAAAAAwAAAAAAAAEYAAAAAN4UAAAEAAAABAAAAAAAA
AB4ADIAIIAYAAAAAAMAAAAAAAABGAAAAADiFAAABAAAAAQAAAAAAAAALAA2ACCAGAAAAAADAAAAA
AAAARgAAAACChQAAAQAAAAsAOoAIIAYAAAAAAMAAAAAAAABGAAAAAA6FAAAAAAAAAwA8gAggBgAA
AAAAwAAAAAAAAEYAAAAAEYUAAAAAAAADAD2ACCAGAAAAAADAAAAAAAAARgAAAAAYhQAAAAAAAAsA
VYAIIAYAAAAAAMAAAAAAAABGAAAAAAaFAAAAAAAAAwBWgAggBgAAAAAAwAAAAAAAAEYAAAAAAYUA
AAAAAAACAfgPAQAAABAAAAB+YbkByPzLT7sB94zSyiDtAgH6DwEAAAAQAAAAfmG5Acj8y0+7AfeM
0sog7QIB+w8BAAAAoQAAAAAAAAA4obsQBeUQGqG7CAArKlbCAABQU1RQUlguRExMAAAAAAAAAABO
SVRB+b+4AQCqADfZbgAAAEM6XERvY3VtZW50cyBhbmQgU2V0dGluZ3Ncc2FudG9zaC5TQU5UT1NI
XExvY2FsIFNldHRpbmdzXEFwcGxpY2F0aW9uIERhdGFcTWljcm9zb2Z0XE91dGxvb2tcb3V0bG9v
ay5wc3QAAAAAAwD+DwUAAAADAA00/TcAAAIBfwABAAAAQAAAADxNTEVQTExBQklKT05HTE1MT0lC
Qk1FRkNDQUFBLnNhbnRvc2hAbW9sZWN1bGFyY29ubmVjdGlvbnMuY29tPgADAAYQYXIqNQMABxA9
EwAAAwAQEAEAAAADABEQAgAAAB4ACBABAAAAZQAAAEhJQUxMLFRIQU5LU0ZPUllPVVJJTU1FRElB
VEVSRVBMWUlDQU5OT1dQQVJTRVRIRVNFUVVFTkNFJkZFQVRVUkVTSUhBVkVTT01FTU9SRVFVRVJZ
UkVMQVRFRFRPSVRXRUFSRVAAAAAAdYw=

------=_NextPart_000_0000_01C22402.5BFBAC30--