[Biojava-dev] Accession defaults for GenbankFormat

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue Jul 4 02:19:21 UTC 2006


This seems reasonable. Can you forsee any problems Richard?

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910





"Bubba Puryear" <bubba.puryear at gmail.com>
Sent by: biojava-dev-bounces at lists.open-bio.org
07/03/2006 11:40 PM

 
        To:     biojava-dev at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-dev] Accession defaults for GenbankFormat


Hey all,

   I'm using biojava for an internal app for my client that has about 5000
internally developed genbank records. The majority of these records do not
have ACCESSION fields, since they didn't come from a public data source.
(Many of these were created using Invitrogen's Vector NTI and saved as
files)

  Because there is no accession number for these records, I get problems
when I try to use RichSequence and friends with this data. I've made a 
patch
for GenbankFormat.java that sets the accession to the locus name of the
record during parsing. If/When the accession field is parsed, this value 
is
over written, so I think it should be ok generally. I also have a test 
case
and test data file.

  The registration page thing discouraged attachments for this list -- how
should I provide these files? Thanks in advance,
Bubba

ps - The patch is small, I can inline it here:

Index: src/org/biojavax/bio/seq/io/GenbankFormat.java
===================================================================
RCS file:
/home/repository/biojava/biojava-live/src/org/biojavax/bio/seq/io/GenbankFormat.java,v
retrieving revision 1.63
diff -u -r1.63 GenbankFormat.java
--- src/org/biojavax/bio/seq/io/GenbankFormat.java    28 Jun 2006 17:02:47
-0000    1.63
+++ src/org/biojavax/bio/seq/io/GenbankFormat.java    1 Jul 2006 20:34:48
-0000
@@ -274,6 +274,9 @@
                 Matcher m = lp.matcher(loc);
                 if (m.matches()) {
                     rlistener.setName(m.group(1));
+                    // default accession to locus name for sources that 
do
not have accessions proper.
+                    accession = m.group(1);
+                    rlistener.setAccession(accession);
                     rlistener.setDivision(m.group(5));
                     rlistener.addSequenceProperty(Terms.getMolTypeTerm(),
m.group(3));
 rlistener.addSequenceProperty(Terms.getDateUpdatedTerm
(),m.group(6));
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev






More information about the biojava-dev mailing list