[Biojava-dev] Accession defaults for GenbankFormat

Richard Holland richard.holland at ebi.ac.uk
Tue Jul 4 08:05:52 UTC 2006


That seems like a good idea to me. I've made the change in CVS.

cheers,
Richard

On Mon, 2006-07-03 at 11:40 -0400, Bubba Puryear wrote:
> Hey all,
> 
>    I'm using biojava for an internal app for my client that has about 5000
> internally developed genbank records. The majority of these records do not
> have ACCESSION fields, since they didn't come from a public data source.
> (Many of these were created using Invitrogen's Vector NTI and saved as
> files)
> 
>   Because there is no accession number for these records, I get problems
> when I try to use RichSequence and friends with this data. I've made a patch
> for GenbankFormat.java that sets the accession to the locus name of the
> record during parsing. If/When the accession field is parsed, this value is
> over written, so I think it should be ok generally. I also have a test case
> and test data file.
> 
>   The registration page thing discouraged attachments for this list -- how
> should I provide these files? Thanks in advance,
> Bubba
> 
> ps - The patch is small, I can inline it here:
> 
> Index: src/org/biojavax/bio/seq/io/GenbankFormat.java
> ===================================================================
> RCS file:
> /home/repository/biojava/biojava-live/src/org/biojavax/bio/seq/io/GenbankFormat.java,v
> retrieving revision 1.63
> diff -u -r1.63 GenbankFormat.java
> --- src/org/biojavax/bio/seq/io/GenbankFormat.java    28 Jun 2006 17:02:47
> -0000    1.63
> +++ src/org/biojavax/bio/seq/io/GenbankFormat.java    1 Jul 2006 20:34:48
> -0000
> @@ -274,6 +274,9 @@
>                  Matcher m = lp.matcher(loc);
>                  if (m.matches()) {
>                      rlistener.setName(m.group(1));
> +                    // default accession to locus name for sources that do
> not have accessions proper.
> +                    accession = m.group(1);
> +                    rlistener.setAccession(accession);
>                      rlistener.setDivision(m.group(5));
>                      rlistener.addSequenceProperty(Terms.getMolTypeTerm(),
> m.group(3));
>                      rlistener.addSequenceProperty(Terms.getDateUpdatedTerm
> (),m.group(6));
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416




More information about the biojava-dev mailing list