[Biojava-dev] Accession defaults for GenbankFormat
Richard Holland
richard.holland at ebi.ac.uk
Tue Jul 4 08:05:52 UTC 2006
That seems like a good idea to me. I've made the change in CVS.
cheers,
Richard
On Mon, 2006-07-03 at 11:40 -0400, Bubba Puryear wrote:
> Hey all,
>
> I'm using biojava for an internal app for my client that has about 5000
> internally developed genbank records. The majority of these records do not
> have ACCESSION fields, since they didn't come from a public data source.
> (Many of these were created using Invitrogen's Vector NTI and saved as
> files)
>
> Because there is no accession number for these records, I get problems
> when I try to use RichSequence and friends with this data. I've made a patch
> for GenbankFormat.java that sets the accession to the locus name of the
> record during parsing. If/When the accession field is parsed, this value is
> over written, so I think it should be ok generally. I also have a test case
> and test data file.
>
> The registration page thing discouraged attachments for this list -- how
> should I provide these files? Thanks in advance,
> Bubba
>
> ps - The patch is small, I can inline it here:
>
> Index: src/org/biojavax/bio/seq/io/GenbankFormat.java
> ===================================================================
> RCS file:
> /home/repository/biojava/biojava-live/src/org/biojavax/bio/seq/io/GenbankFormat.java,v
> retrieving revision 1.63
> diff -u -r1.63 GenbankFormat.java
> --- src/org/biojavax/bio/seq/io/GenbankFormat.java 28 Jun 2006 17:02:47
> -0000 1.63
> +++ src/org/biojavax/bio/seq/io/GenbankFormat.java 1 Jul 2006 20:34:48
> -0000
> @@ -274,6 +274,9 @@
> Matcher m = lp.matcher(loc);
> if (m.matches()) {
> rlistener.setName(m.group(1));
> + // default accession to locus name for sources that do
> not have accessions proper.
> + accession = m.group(1);
> + rlistener.setAccession(accession);
> rlistener.setDivision(m.group(5));
> rlistener.addSequenceProperty(Terms.getMolTypeTerm(),
> m.group(3));
> rlistener.addSequenceProperty(Terms.getDateUpdatedTerm
> (),m.group(6));
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
--
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416
More information about the biojava-dev
mailing list