[Biojava-dev] Accession defaults for GenbankFormat
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Tue Jul 4 02:19:21 UTC 2006
This seems reasonable. Can you forsee any problems Richard?
- Mark
Mark Schreiber
Research Investigator (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
phone +65 6722 2973
fax +65 6722 2910
"Bubba Puryear" <bubba.puryear at gmail.com>
Sent by: biojava-dev-bounces at lists.open-bio.org
07/03/2006 11:40 PM
To: biojava-dev at lists.open-bio.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-dev] Accession defaults for GenbankFormat
Hey all,
I'm using biojava for an internal app for my client that has about 5000
internally developed genbank records. The majority of these records do not
have ACCESSION fields, since they didn't come from a public data source.
(Many of these were created using Invitrogen's Vector NTI and saved as
files)
Because there is no accession number for these records, I get problems
when I try to use RichSequence and friends with this data. I've made a
patch
for GenbankFormat.java that sets the accession to the locus name of the
record during parsing. If/When the accession field is parsed, this value
is
over written, so I think it should be ok generally. I also have a test
case
and test data file.
The registration page thing discouraged attachments for this list -- how
should I provide these files? Thanks in advance,
Bubba
ps - The patch is small, I can inline it here:
Index: src/org/biojavax/bio/seq/io/GenbankFormat.java
===================================================================
RCS file:
/home/repository/biojava/biojava-live/src/org/biojavax/bio/seq/io/GenbankFormat.java,v
retrieving revision 1.63
diff -u -r1.63 GenbankFormat.java
--- src/org/biojavax/bio/seq/io/GenbankFormat.java 28 Jun 2006 17:02:47
-0000 1.63
+++ src/org/biojavax/bio/seq/io/GenbankFormat.java 1 Jul 2006 20:34:48
-0000
@@ -274,6 +274,9 @@
Matcher m = lp.matcher(loc);
if (m.matches()) {
rlistener.setName(m.group(1));
+ // default accession to locus name for sources that
do
not have accessions proper.
+ accession = m.group(1);
+ rlistener.setAccession(accession);
rlistener.setDivision(m.group(5));
rlistener.addSequenceProperty(Terms.getMolTypeTerm(),
m.group(3));
rlistener.addSequenceProperty(Terms.getDateUpdatedTerm
(),m.group(6));
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list