FW: [Biojava-l] Parsing SwissProt flat file through BioJava
Santosh Kumar
santosh@molecularconnections.com
Fri, 5 Jul 2002 09:00:07 +0530
This is a multi-part message in MIME format.
------=_NextPart_000_0000_01C22402.5BFBAC30
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
> Hi All ,
>
> Thanks for your immediate reply . I can now parse the sequence & features
> . I have some more query related to it .We are parsing the data from
> SwissProt Flat files and populating them in our xml format .We intend to
> make a Automated application to do same .For that we need all the data
> related to FT & SQ .Please forward the url which can help us to know all
> the things related to our query (This will avoid unnecessarily disturbing
> u every time). I have visited http://biojava.org & have javadoc of it.
>
> FT PROPEP 27 28 ACTIVATION PEPTIDE.
> FT CHAIN 29 262 GRANZYME A.
> ....
> ....
> SQ SEQUENCE 262 AA; 28968 MW; DA87363A0D92BAF4 CRC64;
> MRNSYRFLAS SLSVVVSLLL IPEDVCEKII GGNEVTPHSR PYMVLLSLDR KTICAGALIA
> ....
> ....
>
> Our java program is not able to parse the string which is shown in red +
> bold in the above format .
>
> **
> e-regards,
> Santosh Kumar
> Molecular Connections Pvt. Ltd.,
> Voice: 5598502/5327919 Ext: 42,
> "Never let yesterday's disappointments, overshadow tomorrow's dreams..."
>
> **************************************************************************
> **********************
>
> <http://www.molecularconnections.com>
>
> "Accelerate Life Sciences Research with IT"
>
> **************************************************************************
> ***********************
> The contents of this communication are intended only for the addressee and
> may contain confidential and/or privileged material. If you are not the
> intended recipient, please do not read, copy, use or disclose this
> communication and notify the sender. Opinions, conclusions and other
> information in this communication that do not relate to the official
> business of my company shall be understood as neither given nor endorsed
> by it.
> **************************************************************************
> ***********************
>
>
>
> -----Original Message-----
> From: Roy Park [mailto:RPark@lexgen.com]
> Sent: Wednesday, July 03, 2002 8:03 PM
> To: 'Santosh Kumar'
> Subject: RE: [Biojava-l] Parsing SwissProt flat file through BioJava
>
>
> Santosh,
>
> For features and sequence, you need to use different methods (believe it
> or
> not!). You have to call Sequence.getFeatureHolder() to get the features,
> and call the Sequence.seqString() to get the sequence.
>
> Roy
>
>
> -----Original Message-----
> From: Santosh Kumar [mailto:santosh@molecularconnections.com]
> Sent: Wednesday, July 03, 2002 12:01 AM
> To: Roy Park
> Subject: RE: [Biojava-l] Parsing SwissProt flat file through BioJava
>
>
>
>
>
>
>
>
> Hey ,
>
> I am able to retrieve the data from SwissProt flat files but was surprised
> when I didn't get the data corresponding to FT & SQ from the same flat
> file.I donn't know the reason.
>
> Plz let me know if I am missing same points.
>
> regards
>
> Santosh
>
> -----Original Message-----
> From: Roy Park [mailto:RPark@lexgen.com]
> Sent: Tuesday, June 25, 2002 8:08 PM
> To: 'Rahul Deshpande'; biojava-l@biojava.org
> Cc: Santosh Kumar
> Subject: RE: [Biojava-l] Parsing SwissProt flat file through BioJava
>
>
> The following code should give you some ideas:
>
> SequenceIterator seqIter = SeqIOTools.readSwissprot(br);
> while (seqIter.hasNext()) {
> Sequence thisSeq = seqIter.nextSequence();
> Annotation thisAnnot = thisSeq.getAnnotation();
> // Get species info..
> String species = thisAnnot.getProperty("OS");
> .
> .
> }
>
> etc.
>
> The Annotation.getProperty() call may end up returning an ArrayList of
> Strings instead (of a String object), if there are multiple XX entries.
> i.e.
> it could very well be:
>
> String species = thisAnnot.getProperty("XXXXXX");
>
> for SOME of the properties. Personally, I handle this by writing a
> utility
> static method that can take either an ArrayList (of String) object or a
> String object, and returns a String object.
>
> Roy Park
> Bioinformatics Data Analyst
> Lexicon Genetics Incorporated
> The Woodlands, TX
>
> -----Original Message-----
> From: Rahul Deshpande [mailto:rahul@molecularconnections.com]
> Sent: Tuesday, June 25, 2002 1:21 AM
> To: biojava-l@biojava.org
> Cc: Santosh Kumar
> Subject: [Biojava-l] Parsing SwissProt flat file through BioJava
>
>
> Hello,
>
> I am Rahul Deshpande. I work for a bio informatics company. Recently I
> came
> across biojava.
> I have a question about biojava.
>
> How can we read the SwissProt flat files using biojava? The closest we
> could
> come was org.biojava.bio.seq.io.EmblLikeFormat class which has methods to
> return the two letter code and the remaining data in the line. Can we
> retrieve the data for each protein in the SwissProt flat file instead of
> reading each line by each line? Are there any methods like that?
>
> Regards,
> Rahul Deshpande
> Molecularconnections Pvt. Ltd.
> A203, Blue Cross Chambers,
> Infantry Road Cross,
> Bangalore 560001
> Ph: +(91)-80-5598502, 5327919
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
> **************************************************************************
> *
> The contents of this communication are intended only for the addressee
> and
> may contain confidential and/or privileged material. If you are not the
> intended recipient, please do not read, copy, use or disclose this
> communication and notify the sender. Opinions, conclusions and other
> information in this communication that do not relate to the official
> business of my company shall be understood as neither given nor endorsed
> by
> it.
> **************************************************************************
> *
>
>
>
>
> **************************************************************************
> *
> The contents of this communication are intended only for the addressee
> and
> may contain confidential and/or privileged material. If you are not the
> intended recipient, please do not read, copy, use or disclose this
> communication and notify the sender. Opinions, conclusions and other
> information in this communication that do not relate to the official
> business of my company shall be understood as neither given nor endorsed
> by
> it.
> **************************************************************************
> *
>
>
------=_NextPart_000_0000_01C22402.5BFBAC30
Content-Type: application/ms-tnef;
name="winmail.dat"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="winmail.dat"
eJ8+IgcDAQaQCAAEAAAAAAABAAEAAQeQBgAIAAAA5AQAAAAAAADoAAEIgAcAGAAAAElQTS5NaWNy
b3NvZnQgTWFpbC5Ob3RlADEIAQ2ABAACAAAAAgACAAEIAAUABAAAAAAAAAAAAAEJAAQAAgAAAAAA
AAABBoADAA4AAADSBwcABQAJAAAAAAAFAPMAAQOQBgBsEgAAKAAAAAsAAgABAAAACwAjAAAAAAAD
ACYAAAAAAAsAKQAAAAAAAwAuAAAAAAADADYAAAAAAB4ATQABAAAAAQAAAAAAAAAeAHAAAQAAADgA
AABbQmlvamF2YS1sXSBQYXJzaW5nIFN3aXNzUHJvdCBmbGF0IGZpbGUgdGhyb3VnaCBCaW9KYXZh
AAIBcQABAAAAGwAAAAHCIziyeaKlfCRA30mxrN2CzfiP8PEAJtd7gAALABcMAAAAAAIBHQwBAAAA
JgAAAFNNVFA6U0FOVE9TSEBNT0xFQ1VMQVJDT05ORUNUSU9OUy5DT00AAAALAAEOAAAAAEAABg4A
rP891CPCAQIBCg4BAAAAGAAAAAAAAAB+YbkByPzLT7sB94zSyiDtwoAAAAsAHw4BAAAAAgEJEAEA
AACaDQAAlg0AAKIcAABMWkZ1ZoL57AMACgByY3BnMTI18jIA+zM2AegCpANjAgCIcHJxAFBmY2gK
wKhzZXQOIDgF0k0LgD0RIG8CgxDxA+MRBzAgmwcTAoB9CoAIyCA7CWnfEZEKJBU6ChQOMDUVMxeh
lwm7AoAKgXYIkHdrC4A0ZDQMYGMAUAsDZmmQLTE0NAFAbGka01sM0BrTYwBBC2BuDhAw/DMzDCEL
tBnwCrEKhAswTxwAChEBQBDAb3QFkHTzEpQB0CBIAKAcAA5QDDJvHpUTwAMwHmFpILET0Gy/CVAe
Ih6VIMEfwiBHLCGtyx13CoBUETBuawQgAhC1BcB5CGEgB3AHgGQHMA0ewCAJcAtQeSAuIMhJIGMD
kW5vB+AKsbERYCB0aCaQEWBxClDrEiAmkCYlcGUmcAhwB5H7JxIRMHYoUQNwJpAEYAlweiAokXIm
8AlwC2AewGTNKCBvJgAFQC5XJpAKwFcmkCfSC4BnKCNkJnBhLyVwA2EGAAPyUB6hIEY/K2ElcAMQ
B5EAcCugcG+8cHUrYSzlLeALgCAl0vx4bQMgJYEAwCwEC4AewLcvYSvBAMBrLEET0HUrwPMxcSuR
YXALUA3gL+ECICUrsmQr0HNhKmEuRtcFsSgwLsF3JpBuCeAzgSchgC0YK0lGVCjxU1G1JwBQLxBh
KAElgXcLEacoIwhwAyB3aA3gaCdDuShAbHA5YAQgK8FrJ5J/NjYoMCzhBCArSSXSKuQoPyUQBAA1
oAMQAyAqAG9p+SugdW414CjQBBAKwAMQKybwJlBzKVFiLOJ1IN5lKhArES/wB4ApKbcZcAsAkCuC
aAJAcDovL8k/4G9qKgBhLgWwLQCvKQAp80KyNJBjMLBmK+EOLiRaN7EgwFBST1DMRVAhQUYCMjdG
JBGwD0YEG/EcsRPQQ1RJVnJBSABPTkWQReBIAEQeRUSGHiQ3oiDAQ0hBJklIcEqYMjlGMzYyB0ck
R8BHYyBHUkFOsFpZTUUT0EkNLk5BDx1kTkg4ASDAU0VRVRhFTkNNIEuEQUE7U0dXEaA5NhGxV1Dh
RAhBODcPMDNBMERAOTJCQUY0SjBSOEM2NBUwSStKlk1SgE5TWVJGTEEF8fhMU1ZV4FWwViAnIEXQ
JERWUDBLSScwR0cATkVWVFBIU1JbRZBNAFZWIFWwRFegSwFIAENBR0FMSUH3To5ZeB1kTyXhQrIn
wANg/wnANNAmAAQgJ5AFQAGgLxB7K7In2XQFECzxObQ9wXMdEkB3A6AwkQlxICsg3wbhK6AwkSgy
AaBvKhExRkkkWioqJDZlLQlwZ/0LEXMjYE80AHArwF6wWGDqdQDAch1kTQbwBZAvwZ8FwAhQPrI0
EgQgUHZEcKAgTHRkLmM1Vj5gCSjQOiAXsDk4NTCQMi81M0ZwOTFLUEhFeHRngDQyYzUi2wfAQGEg
LxAFQHkHkB7A8QsgYXknBCA/gTOhPmD/AjAHgAIwYyAwsEBhXrBD4e8H4DMhBbADYHdqwglwNND+
c1mCCyACYCiQHrFhTG9vv3B/cY9yn3OvdEckWjxCFXp3dpAuBGBk9QWgZZcueQWgbT4dZEqZSplp
JUHeYyjQLxBb4CaBTAaQJpDcU2MIkCjBBCBSB5ApMJ93UDnwA/AoMCcgVCJun/9+P39PgF+Bb4J/
dOslECaQf3dhMhFrwEQiO9EEIHghbf8+oDPmLGIx9CuRAiAm4SWC/WAjZG1hBBAJ4C9DAMAm8P+E
ogtxhJIaoAEAAjAHMS9Cti8FsRDAaRlwLxBnK5H/M0IHIicRREAlwSxTXFIoMteG1wlwe6Bwe7F0
a+ALUG84YzSRXFJtcWRr4AWgcL55a+A6oCaQBbE/gWMJAP8oA4WPL1JcUQaQQJEoQ4cR8nInECBP
jmADAGXxj/L/EiAKQACQZfIvUh6wKEAl8f+J8DFTNCJf85GfNWOPRytivyuyKDJEMBqge6CKcWKV
EVc14AQRRDFtiTJtCrBudybwbEE2QWI5UZOyP6BvtwRwLFBcMWV8wWnBZ4swfwnwJ4EFwDIhBbAR
YCugYv8m8ERngv+gz6Hfou+j/4NPaySlJFotp1JPBRCd0G79inFNPvKLgKdTRPUDYWeAiwgAJvBQ
CsBrIFsAwFMDECvAOlKp8kAvEHj1i4BueBJdTyVroWeALDDuZJqxapFr4EovwCbwHJDHa+AB0Gfw
IDg6HJBFkKZNJLWqoCAnY6snTyVYdWJqHtGpkUVngFviQkKULWxdqeIs0y4IzmYutighA2B1Zznw
sWH+SkLBpj9jlmM1RPUFsSkn/y9SKHZr4IySNeMrwZByJlD/ASAEkGuhKoARcBJAYxA9gH+cQBsg
QFEr4gWwHWRcUSH5QPEgWYyhKfMrwSdQNkFLBmAolC6LgHRGKTRI81+xBJAoKSuyvlEoIykmv2M1
L1K9cygyvccocVNdw/+/XCh2RIupsbTPp0+oX6lj72Orqjc0wGPDQHbfd+Wrz++s363iDiCuMDET
0K6Jqbb/sC+xP7JPs1+0b9VP1l8kh/xIZSbwtlsnMFvxXJYJcP9dwbtCLS/S/gQgmnA1kZ0h/nMI
cIsRnsEdZDmwngEnMPsmUM1AJwVAv7bbAwWhKXH/L5AZ0CzjN5gtsygzNNLTYvsdZC7yLt6xZYHf
ETsDKDLfbXEqQKtwJFo4QHpp4yph7zsDBpAnIVvxbQQBLOI0w/9rQ22wJFpixbUvXrDF38bv76i/
qc+q38x5VApQzXY14B1GYDWtuBGwrntSYWi3L8BSIAeQaJuRAQAnUOCbQoXSAEBCiR1kQ2PI3f/Q
X9Fv0n/Tj9SfJKc4kiGA/yegLOIFoAEAXqIvwCugndJvjIMqQ4oROHA66bsohUmfamEmcAWxKHEB
MiA9vbL/SFCu8AbwbbCPsi4DHpK68FxyKVOVObFcsSgBxS69ETBzB8BosL9Qv2BcD2D/HXQMkb23
lwO9wQIxBQY14N9osL3Gv1BTlgbCQT6wHrD/l+YuIApTAjEHlb5CClgJfP9CYEygagHgYI4xe/GW
Ilo3/wbDXcQOZgukClO+Qi5RDnCBFOB5KCJPUyIJi9sPOkSGfSRaDFBjRIuEYr8KWBGbv2C9c4kS
MiJ1OoDf2jE5cJRQLPE6IUHgMGqg/3tAP6BEIU8lXcMOwmpRj9CvPYBEMdswD/Vv99Mpa+B/5sGV
0ixEheDu0I5QXLFY3lieYdpSbbAmAC7EZivx/+AQ/nJAYzWwnBP/6w+vEL8/EckfACVSEpji1JCx
U0//TRGFMyyBEeQfc0WQbCE0MH+9gZBBQSL0MPrTlzGfAXf/3bAv89swMxD6wCBgxWU/oP8v4UQQ
unQ1VDoSiYAysZ1l/xn7HELCtL9gHRSQovvlHKv/a+DBIhlUlWEcnMR+0An4kXWWJ2OVYETbEgtA
vYB5+z+geGRM74CSIIZhDiC5EP01c0mU4JCwa0B68t31FoKeV5zhZTD0MGvRVFjq7//r/+0P84zu
h1vg85HKz8vf4/Cv8bgxOjLO/PSf9a//9r/3xviP+Z/6r/u/2AX9cf/Ynz0+jEEhMJCw7nCHstsw
/0Pxlhg1kZtljEB8IAdgukDfh4HesL1w4lDAxWNJQJrR/0SW2Qa889swByFdsIZTX6D/3QFS3Nf1
5CEs4iFAj6PBpN9I/3vxmoLg0EP1P0FghHL/kSIakVbxIJN4ZHgh4hDdMl1FES5ElkPxwnIuXPFF
/m3ZwHtALUC3IWDykRDksP+VYASRfIEFgbpn2gDo5hly/cOjdE7QaeICAv3zwSLkVP/uoRmj38OW
4+IQuyC5EIxA/kNWw2An2m6eQlJgSoADov+dYJbFV59J0xvGGsaPsuCy/2aTZBKe8mqXWdAaINqj
HgLfm7G6drsgLUGYQj/Ei+lUN7ZV8314ZE0/P5VRUHbTMxB7MHRkn1VBzmCtodZClQDiEENSg0MF
gF2g9yiBtlU38GbJASER7gAcEXtz87ZVQskA6WCRIIahNS42znDO4OVlaPNAKygAOTEpLTgwLTUw
NTk4Na3wzkA1M9QyN3igORXKX3rfe+//fLo0d0Qki7ErAeDBuyAakZwgLZQAR7dEjwpoYZBscDoN
8IB5L+6i7qBu/i9/QpYigeZIIEsfn7+Gb/+Hf4iPpVxZ5AzwjbHoUCck35c/HjLoMZOilaFurWEm
ovvDso/QZLexkUBiE4UUGKL/itJi4ZSymgC/ILpAmjLBIfYvZmHdoXZJ4e+guUBeQf8ogJoxTpGb
ELjSHjKYwsOx7yAFjRbpQJogcLswukDOQH8eweSw2uGYpgMwzkDgEHDvzaG5kmZhudBzWjOLg1s3
/4wauAGTkbng2HDDtJyCKEH+T5UwjDBAMZbCOABzsEiQ/3ISwSJJUGxhlAY1JQrRY6P/i78sk5YX
3FHZ48OyJzDckH8jcL2AhRTc8EiRO9EnIm3/j7JQo/4xvYIhkLmAmtJUcP85Ybfgt9C5EC2E/rKM
oOQQ/2Zx9DBmYN3RKgEgBnLViR//qN+p76r/ibusj4UPrv+wD/+xH7IMBiCKP4tPjF+Nb45//4+P
kJ+Rr5K/k8+U35Xvlv//mA+ZH5ovmz+cT51fnm+ff/+gj6Gfoq+jv6TPpd+m5yhQv64P0Z/Sr9O/
sk+tHX0TAAIA1xAAAAsAAYAIIAYAAAAAAMAAAAAAAABGAAAAAAOFAAAAAAAAAwADgAggBgAAAAAA
wAAAAAAAAEYAAAAAEIUAAAAAAAADAAeACCAGAAAAAADAAAAAAAAARgAAAABShQAAJ2oBAB4ACYAI
IAYAAAAAAMAAAAAAAABGAAAAAFSFAAABAAAABAAAADkuMAAeAAqACCAGAAAAAADAAAAAAAAARgAA
AAA2hQAAAQAAAAEAAAAAAAAAHgALgAggBgAAAAAAwAAAAAAAAEYAAAAAN4UAAAEAAAABAAAAAAAA
AB4ADIAIIAYAAAAAAMAAAAAAAABGAAAAADiFAAABAAAAAQAAAAAAAAALAA2ACCAGAAAAAADAAAAA
AAAARgAAAACChQAAAQAAAAsAOoAIIAYAAAAAAMAAAAAAAABGAAAAAA6FAAAAAAAAAwA8gAggBgAA
AAAAwAAAAAAAAEYAAAAAEYUAAAAAAAADAD2ACCAGAAAAAADAAAAAAAAARgAAAAAYhQAAAAAAAAsA
VYAIIAYAAAAAAMAAAAAAAABGAAAAAAaFAAAAAAAAAwBWgAggBgAAAAAAwAAAAAAAAEYAAAAAAYUA
AAAAAAACAfgPAQAAABAAAAB+YbkByPzLT7sB94zSyiDtAgH6DwEAAAAQAAAAfmG5Acj8y0+7AfeM
0sog7QIB+w8BAAAAoQAAAAAAAAA4obsQBeUQGqG7CAArKlbCAABQU1RQUlguRExMAAAAAAAAAABO
SVRB+b+4AQCqADfZbgAAAEM6XERvY3VtZW50cyBhbmQgU2V0dGluZ3Ncc2FudG9zaC5TQU5UT1NI
XExvY2FsIFNldHRpbmdzXEFwcGxpY2F0aW9uIERhdGFcTWljcm9zb2Z0XE91dGxvb2tcb3V0bG9v
ay5wc3QAAAAAAwD+DwUAAAADAA00/TcAAAIBfwABAAAAQAAAADxNTEVQTExBQklKT05HTE1MT0lC
Qk1FRkNDQUFBLnNhbnRvc2hAbW9sZWN1bGFyY29ubmVjdGlvbnMuY29tPgADAAYQYXIqNQMABxA9
EwAAAwAQEAEAAAADABEQAgAAAB4ACBABAAAAZQAAAEhJQUxMLFRIQU5LU0ZPUllPVVJJTU1FRElB
VEVSRVBMWUlDQU5OT1dQQVJTRVRIRVNFUVVFTkNFJkZFQVRVUkVTSUhBVkVTT01FTU9SRVFVRVJZ
UkVMQVRFRFRPSVRXRUFSRVAAAAAAdYw=
------=_NextPart_000_0000_01C22402.5BFBAC30--