[Biojava-l] BiojavaX EmblFormat

mark.schreiber at novartis.com mark.schreiber at novartis.com
Thu Mar 9 01:02:09 UTC 2006


The biojavax parser uses regular expressions to parse these lines. I will 
need to check what needs changing in these regex's to allow parsing of 
these files.

Thanks for your testing!

- Mark





"Jolyon Holdstock" <jolyon.holdstock at ogt.co.uk>
Sent by: biojava-l-bounces at portal.open-bio.org
03/08/2006 06:47 PM

 
        To:     <biojava-l at biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BiojavaX EmblFormat


Hi,

 

I am using the new format parsers in BioJavaX. GenbankFormat is great,
but I am having some trouble with the EMBLFormat class. I have
downloaded a sequence file (ID:U00096) from the EBI in EMBL format but I
don't believe it is parsing properly.

 

My code is as follows:

String fileName = "path to file";

try {

  RichSequenceIterator rsi = RichSequence.IOTools.readEMBLDNA(new
BufferedReader(new FileReader(fileName)), null);

  while (rsi.hasNext()) {

    RichSequence seq = rsi.nextRichSequence();

    System.out.println(seq.getURN());

    System.out.println(seq.length());

    System.out.println(seq.getAccession());

  }

}

catch (IOException IOE) {

  System.out.println("BioJava IOException " + IOE);

}

catch (BioException BIOE) {

  System.out.println("BioJavaX BioException " + BIOE);

  BIOE.printStackTrace(); 

}

 

The BioJava parser will read it.

seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence(); //works

 

 

I checked the web CVS and the EMBLFormat class is 3 months old so I am
using the most recent version.

I have pasted a snippet of the sequence file that retains the problems
below.

 

The errors are:

 

The ID line isn't parsed because of 'genomic' being there - deleting it
removes the problem

 

org.biojava.bio.BioException: Could not read sequence

Caused by: org.biojava.bio.seq.io.ParseException: 

    Bad ID line found: U00096     standard; circular genomic DNA; PRO;
4639675 BP.

ID   U00096     standard; circular genomic DNA; PRO; 4639675 BP. //fails

ID   U00096     standard; circular DNA; PRO; 4639675 BP. //works

 

 

There is a problem with the RX tag which fails with output:

 

org.biojava.bio.BioException: Could not read sequence

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1

      at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:352)

 
Replacing 
RX   DOI; 10.1126/science.277.5331.1453.

with removes the error 

XX   RX   DOI; 10.1126/science.277.5331.1453.

 

 

There is an error with parsing the authors

 

org.biojava.bio.BioException: Could not read sequence

Caused by: java.lang.IllegalArgumentException: Authors string cannot be
null

      at
org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:75)

      at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:395)

      at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead
er.java:100)

 

 

I am looking at the code trying to see where the problems are but
suspect that it may be beyond me.

So if anybody has some experience of this I would welcome their input.

 

Thanks,

 

Jolyon

 

 

 

 

This is a snippet of the code that reproduces the errors in my hands.

 

ID   U00096     standard; circular genomic DNA; PRO; 4639675 BP.

XX

AC   U00096; AE000111-AE000510;

XX

SV   U00096.2

XX

DT   23-FEB-2006 (Rel. 86, Created)

DT   06-MAR-2006 (Rel. 87, Last updated, Version 3)

XX

DE   Escherichia coli K-12 MG1655, complete genome.

XX

KW   .

XX

OS   Escherichia coli K12

OC   Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;

OC   Enterobacteriaceae; Escherichia.

XX

RN   [1]

RP   1-4639675

RX   DOI; 10.1126/science.277.5331.1453.

RX   PUBMED; 9278503.

RA   Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V.,
Riley M.,

RA   Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J.,

RA   Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B., Shao
Y.;

RT   "The complete genome sequence of Escherichia coli K-12";

RL   Science 277(5331):1453-1474(1997).

XX

RN   [2]

RP   1-4639675

RX   DOI; 10.1093/nar/gkj150.

RX   PUBMED; 16397293.

RA   Riley M., Abe T., Arnaud M.B., Berlyn M.K., Blattner F.R.,
Chaudhuri R.R.,

RA   Glasner J.D., Horiuchi T., Keseler I.M., Kosuge T., Mori H., Perna
N.T.,

RA   Plunkett G. III, Rudd K.E., Serres M.H., Thomas G.H., Thomson N.R.,

RA   Wishart D., Wanner B.L.;

RT   "Escherichia coli K-12: a cooperatively developed annotation

RT   snapshot--2005";

RL   (er) Nucleic Acids Res. 34 (1), 1-9 (2006)

XX

RN   [3]

RC   Woods Hole, Mass., on 14-18 November 2003 (sequence corrections)

RP   1-4639675

RA   Arnaud M., Berlyn M.K.B., Blattner F.R., Galperin M.Y., Glasner
J.D.,

RA   Horiuchi T., Kosuge T., Mori H., Perna N.T., Plunkett G. III, Riley
M.,

RA   Rudd K.E., Serres M.H., Thomas G.H., Wanner B.L.;

RT   "Workshop on Annotation of Escherichia coli K-12";

RL   Unpublished.

XX

RN   [4]

RC   ASAP download 10 June 2004 (annotation updates)

RP   1-4639675

RA   Glasner J.D., Perna N.T., Plunkett G. III, Anderson B.D., Bockhorst
J.,

RA   Hu J.C., Riley M., Rudd K.E., Serres M.H.;

RT   "ASAP: Escherichia coli K-12 strain MG1655 version m56";

RL   Unpublished.

XX

RN   [5]

RC   GenBank accessions AG613214 to AG613378 (sequence corrections)

RP   1-4639675

RA   Hayashi K., Morooka N., Mori H., Horiuchi T.;

RT   "A more accurate sequence comparison between genomes of Escherichia
coli

RT   K12 W3110 and MG1655 strains";

RL   Unpublished.

XX

RN   [6]

RC   GenBank accession AY605712 (sequence corrections)

RP   1-4639675

RA   Perna N.T.;

RT   "Escherichia coli K-12 MG1655 yqiK-rfaE intergenic region, genomic
sequence

RT   correction";

RL   Unpublished.

XX

RN   [7]

RP   1-4639675

RA   Rudd K.E.;

RT   "A manual approach to accurate translation start site annotation:
an E.

RT   coli K-12 case study";

RL   Unpublished.

XX

RN   [8]

RP   1-4639675

RA   Blattner F.R., Plunkett G. III.;

RT   ;

RL   Submitted (16-JAN-1997) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [9]

RP   1-4639675

RA   Blattner F.R., Plunkett G. III.;

RT   ;

RL   Submitted (02-SEP-1997) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [10]

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (13-OCT-1998) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [11]

RC   Sequence update by submitter

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (10-JUN-2004) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [12]

RC   Protein updates by submitter

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (07-FEB-2006) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

DR   EMBL-TPA; BR000242.

XX

FH   Key             Location/Qualifiers

FH

FT   source          1..4639675

FT                   /organism="Escherichia coli K12"

FT                   /strain="K-12"

FT                   /sub_strain="MG1655"

FT                   /mol_type="genomic DNA"

FT                   /db_xref="taxon:83333"

FT   gene            190..255

FT                   /gene="thrL"

FT                   /locus_tag="b0001"

FT                   /note="synonyms: ECK0001, JW4367"

FT   CDS             190..255

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrL"

FT                   /locus_tag="b0001"

FT                   /product="thr operon leader peptide"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="leader; Amino acid biosynthesis:
Threonine"

FT                   /note="go_process: threonine biosynthesis [goid
0009088]"

FT                   /protein_id="AAC73112.1"

FT                   /translation="MKRISTTITTTITITTGNGAG"

FT   gene            337..2799

FT                   /gene="thrA"

FT                   /locus_tag="b0002"

FT                   /note="synonyms: Hs, thrD, ECK0002, JW0001"

FT   CDS             337..2799

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrA"

FT                   /locus_tag="b0002"

FT                   /product="fused aspartokinase I and homoserine

FT                   dehydrogenase I"

FT                   /function="1.5.1.21 metabolism; building block

FT                   biosynthesis; amino acids; homoserine"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="1.1.1.3"

FT                   /EC_number="2.7.2.4"

FT                   /note="bifunctional: aspartokinase I (N-terminal);

FT                   homoserine dehydrogenase I (C-terminal);
go_component:

FT                   cytoplasm [goid 0005737]; go_process: threonine

FT                   biosynthesis [goid 0009088]; go_process: homoserine

FT                   biosynthesis [goid 0009090]"

FT                   /protein_id="AAC73113.1"

FT
/translation="MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITN

FT
HLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHV

FT
LHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLES

FT
TVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADC

FT
CEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCL

FT
IKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMS

FT
RARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAII

FT
SVVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQM

FT
LFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGVANSKALLTNVHGLN

FT
LENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVT

FT
PNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGI

FT
LSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGREL

FT
ELADIEIEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDG

FT
VCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR

FT                   TLSWKLGV"

FT   gene            2801..3733

FT                   /gene="thrB"

FT                   /locus_tag="b0003"

FT                   /note="synonyms: ECK0003, JW0002"

FT   CDS             2801..3733

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrB"

FT                   /locus_tag="b0003"

FT                   /product="homoserine kinase"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="2.7.1.39"

FT                   /note="go_component: cytoplasm [goid 0005737];
go_process:

FT                   threonine biosynthesis [goid 0009088]"

FT                   /protein_id="AAC73114.1"

FT
/translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFS

FT
LNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVV

FT
AALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQ

FT
QVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELA

FT
AKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVAD

FT                   WLGKNYLQNQEGFVHICRLDTAGARVLEN"

FT   gene            3734..5020

FT                   /gene="thrC"

FT                   /locus_tag="b0004"

FT                   /note="synonyms: ECK0004, JW0003"

FT   CDS             3734..5020

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrC"

FT                   /locus_tag="b0004"

FT                   /product="threonine synthase"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="4.2.3.1"

FT                   /note="go_component: cytoplasm [goid 0005737];
go_process:

FT                   threonine biosynthesis [goid 0009088]"

FT                   /protein_id="AAC73115.1"

FT
/translation="MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDE

FT
MLKLDFVTRSAKILSAFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTL

FT
AFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKIS

FT
PLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQ

FT
ICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPR

FT
FLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTM

FT
RELKELGYTSEPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKE

FT                   LAERADLPLLSHNLPADFAALRKLMMNHQ"

XX

SQ   Sequence 4639675 BP; 1142228 A; 1179554 C; 1176923 G; 1140970 T; 0
other;

     agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc
60

     tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg
120

     tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac
180

     acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt
240

     aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg
300

     cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt
360

     acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc
420

     aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg
480

     gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa
540

     cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg
600

     caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt
660

     agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa
720

     atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc
780

     gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct
840

     gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca
900

     ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac
960

     tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac
1020

     gttgacgggg tctatacctg cgacccgcgt caggtgcccg atgcgaggtt gttgaagtcg
1080

     atgtcctacc aggaagcgat ggagctttcc tacttcggcg ctaaagttct tcacccccgc
1140

     accattaccc ccatcgccca gttccagatc ccttgcctga ttaaaaatac cggaaatcct
1200

     caagcaccag gtacgctcat tggtgccagc cgtgatgaag acgaattacc ggtcaagggc
1260

     atttccaatc tgaataacat ggcaatgttc agcgtttctg gtccggggat gaaagggatg
1320

     gtcggcatgg cggcgcgcgt ctttgcagcg atgtcacgcg cccgtatttc cgtggtgctg
1380

     attacgcaat catcttccga atacagcatc agtttctgcg ttccacaaag cgactgtgtg
1440

     cgagctgaac gggcaatgca ggaagagttc tacctggaac tgaaagaagg cttactggag
1500

     ccgctggcag tgacggaacg gctggccatt atctcggtgg taggtgatgg tatgcgcacc
1560

     ttgcgtggga tctcggcgaa attctttgcc gcactggccc gcgccaatat caacattgtc
1620

     gccattgctc agggatcttc tgaacgctca atctctgtcg tggtaaataa cgatgatgcg
1680

     accactggcg tgcgcgttac tcatcagatg ctgttcaata ccgatcaggt tatcgaagtg
1740

     tttgtgattg gcgtcggtgg cgttggcggt gcgctgctgg agcaactgaa gcgtcagcaa
1800

     agctggctga agaataaaca tatcgactta cgtgtctgcg gtgttgccaa ctcgaaggct
1860

     ctgctcacca atgtacatgg ccttaatctg gaaaactggc aggaagaact ggcgcaagcc
1920

     aaagagccgt ttaatctcgg gcgcttaatt cgcctcgtga aagaatatca tctgctgaac
1980

     ccggtcattg ttgactgcac ttccagccag gcagtggcgg atcaatatgc cgacttcctg
2040

     cgcgaaggtt tccacgttgt cacgccgaac aaaaaggcca acacctcgtc gatggattac
2100

     taccatcagt tgcgttatgc ggcggaaaaa tcgcggcgta aattcctcta tgacaccaac
2160

     gttggggctg gattaccggt tattgagaac ctgcaaaatc tgctcaatgc aggtgatgaa
2220

     ttgatgaagt tctccggcat tctttctggt tcgctttctt atatcttcgg caagttagac
2280

     gaaggcatga gtttctccga ggcgaccacg ctggcgcggg aaatgggtta taccgaaccg
2340

     gacccgcgag atgatctttc tggtatggat gtggcgcgta aactattgat tctcgctcgt
2400

     gaaacgggac gtgaactgga gctggcggat attgaaattg aacctgtgct gcccgcagag
2460

     tttaacgccg agggtgatgt tgccgctttt atggcgaatc tgtcacaact cgacgatctc
2520

     tttgccgcgc gcgtggcgaa ggcccgtgat gaaggaaaag ttttgcgcta tgttggcaat
2580

     attgatgaag atggcgtctg ccgcgtgaag attgccgaag tggatggtaa tgatccgctg
2640

     ttcaaagtga aaaatggcga aaacgccctg gccttctata gccactatta tcagccgctg
2700

     ccgttggtac tgcgcggata tggtgcgggc aatgacgtta cagctgccgg tgtctttgct
2760

     gatctgctac gtaccctctc atggaagtta ggagtctgac atggttaaag tttatgcccc
2820

     ggcttccagt gccaatatga gcgtcgggtt tgatgtgctc ggggcggcgg tgacacctgt
2880

     tgatggtgca ttgctcggag atgtagtcac ggttgaggcg gcagagacat tcagtctcaa
2940

     caacctcgga cgctttgccg ataagctgcc gtcagaacca cgggaaaata tcgtttatca
3000

     gtgctgggag cgtttttgcc aggaactggg taagcaaatt ccagtggcga tgaccctgga
3060

     aaagaatatg ccgatcggtt cgggcttagg ctccagtgcc tgttcggtgg tcgcggcgct
3120

     gatggcgatg aatgaacact gcggcaagcc gcttaatgac actcgtttgc tggctttgat
3180

     gggcgagctg gaaggccgta tctccggcag cattcattac gacaacgtgg caccgtgttt
3240

     tctcggtggt atgcagttga tgatcgaaga aaacgacatc atcagccagc aagtgccagg
3300

     gtttgatgag tggctgtggg tgctggcgta tccggggatt aaagtctcga cggcagaagc
3360

     cagggctatt ttaccggcgc agtatcgccg ccaggattgc attgcgcacg ggcgacatct
3420

     ggcaggcttc attcacgcct gctattcccg tcagcctgag cttgccgcga agctgatgaa
3480

     agatgttatc gctgaaccct accgtgaacg gttactgcca ggcttccggc aggcgcggca
3540

     ggcggtcgcg gaaatcggcg cggtagcgag cggtatctcc ggctccggcc cgaccttgtt
3600

     cgctctgtgt gacaagccgg aaaccgccca gcgcgttgcc gactggttgg gtaagaacta
3660

     cctgcaaaat caggaaggtt ttgttcatat ttgccggctg gatacggcgg gcgcacgagt
3720

     actggaaaac taaatgaaac tctacaatct gaaagatcac aacgagcagg tcagctttgc
3780

     gcaagccgta acccaggggt tgggcaaaaa tcaggggctg ttttttccgc acgacctgcc
3840

     ggaattcagc ctgactgaaa ttgatgagat gctgaagctg gattttgtca cccgcagtgc
3900

     gaagatcctc tcggcgttta ttggtgatga aatcccacag gaaatcctgg aagagcgcgt
3960

     gcgcgcggcg tttgccttcc cggctccggt cgccaatgtt gaaagcgatg tcggttgtct
4020

     ggaattgttc cacgggccaa cgctggcatt taaagatttc ggcggtcgct ttatggcaca
4080

     aatgctgacc catattgcgg gtgataagcc agtgaccatt ctgaccgcga cctccggtga
4140

     taccggagcg gcagtggctc atgctttcta cggtttaccg aatgtgaaag tggttatcct
4200

     ctatccacga ggcaaaatca gtccactgca agaaaaactg ttctgtacat tgggcggcaa
4260

     tatcgaaact gttgccatcg acggcgattt cgatgcctgt caggcgctgg tgaagcaggc
4320

     gtttgatgat gaagaactga aagtggcgct agggttaaac tcggctaact cgattaacat
4380

     cagccgtttg ctggcgcaga tttgctacta ctttgaagct gttgcgcagc tgccgcagga
4440

     gacgcgcaac cagctggttg tctcggtgcc aagcggaaac ttcggcgatt tgacggcggg
4500

     tctgctggcg aagtcactcg gtctgccggt gaaacgtttt attgctgcga ccaacgtgaa
4560

     cgataccgtg ccacgtttcc tgcacgacgg tcagtggtca cccaaagcga ctcaggcgac
4620

     gttatccaac gcgatggacg tgagtcagcc gaacaactgg ccgcgtgtgg aagagttgtt
4680

     ccgccgcaaa atctggcaac tgaaagagct gggttatgca gccgtggatg atgaaaccac
4740

     gcaacagaca atgcgtgagt taaaagaact gggctacact tcggagccgc acgctgccgt
4800

     agcttatcgt gcgctgcgtg atcagttgaa tccaggcgaa tatggcttgt tcctcggcac
4860

     cgcgcatccg gcgaaattta aagagagcgt ggaagcgatt ctcggtgaaa cgttggatct
4920

     gccaaaagag ctggcagaac gtgctgattt acccttgctt tcacataatc tgcccgccga
4980

     ttttgctgcg ttgcgtaaat tgatgatgaa tcatcagtaa aatctattca ttatctcaat
5040

     caggccgggt ttgcttttat gcagcccggc ttttttatga agaaattatg gagaaaaatg
5100

     acagggaaaa aggagaaatt ctcaataaat gcggtaactt agagattagg attgcggaga
5160

     ataacaaccg ccgttctcat cgagtaatct ccggatatcg acccataacg ggcaatgata
5220

     aaaggagtaa cctgtgaaaa agatgcaatc tatcgtactc gcactttccc tggttctggt
5280

     cgctcccatg gcagcacagg ctgcggaaat tacgttagtc ccgtcagtaa aattacagat
5340

     aggcgatcgt gataatcgtg gctattactg ggatggaggt cactggcgcg accacggctg
5400

     gtggaaacaa cattatgaat ggcgaggcaa tcgctggcac ctacacggac cgccgccacc
5460

     gccgcgccac cataagaaag ctcctcatga tcatcacggc ggtcatggtc caggcaaaca
5520

     tcaccgctaa atgacaaatg ccgggtaaca atccggcatt cagcgcctga tgcgacgctg
5580

     gcgcgtctta tcaggcctac gttaattctg caatatattg aatctgcatg cttttgtagg
5640

     caggataagg cgttcacgcc gcatccggca ttgactgcaa acttaacgct gctcgtagcg
5700

     tttaaacacc agttcgccat tgctggagga atcttcatca aagaagtaac cttcgctatt
5760

     aaaaccagtc agttgctctg gtttggtcag ccgattttca ataatgaaac gactcatcag
5820

     accgcgtgct ttcttagcgt agaagctgat gatcttaaat ttgccgttct tctcatcgag
5880

     gaacaccggc ttgataatct cggcattcaa tttcttcggc ttcaccgatt taaaatactc
5940

     atctgacgcc agattaatca ccacattatc gccttgtgct gcgagcgcct cgttcagctt
6000

     gttggtgatg atatctcccc agaattgata cagatctttc cctcgggcat tctcaagacg
6060

     gatccccatt tccagacgat aaggctgcat taaatcgagc gggcggagta cgccatacaa
6120

     gccggaaagc attcgcaaat gctgttgggc aaaatcgaaa tcgtcttcgc tgaaggtttc
6180

     ggcctgcaag ccggtgtaga catcaccttt aaacgccaga atcgcctggc gggcattcgc
6240

     cggcgtgaaa tctggctgcc agtcatgaaa gcgagcggcg ttgatacccg ccagtttgtc
6300

     gctgatgcgc atcagcgtgc taatctgcgg aggcgtcagt ttccgcgcct catggatcaa
6360

     ctgctgggaa ttgtctaaca gctccggcag cgtatagcgc gtggtggtca acgggctttg
6420

     gtaatcaagc gttttcgcag gtgaaataag aatcagcata tccagtcctt gcaggaaatt
6480

     tatgccgact ttagcaaaaa atgagaatga gttgatcgat agttgtgatt actcctgcga
6540

     aacatcatcc cacgcgtccg gagaaagctg gcgaccgata tccggataac gcaatggatc
6600

     aaacaccggg cgcacgccga gtttacgctg gcgtagataa tcactggcaa tggtatgaac
6660

     cacaggcgag agcagtaaaa tggcggtcaa attggtaata gccatgcagg ccattatgat
6720

     atctgccagt tgccacatca gcggaaggct tagcaaggtg ccgccgatga ccgttgcgaa
6780

     ggtgcagatc cgcaaacacc agatcgcttt agggttgttc aggcgtaaaa agaagagatt
6840

     gttttcggca taaatgtagt tggcaacgat ggagctgaag gcaaacagaa taaccacaag
6900

     ggtaacaaac tcagcacccc aggaacccat tagcacccgc atcgccttct ggataagctg
6960

     aataccttcc agcggcatgt aggttgtgcc gttacccgcc agtaatatca gcatggcgct
7020

     tgccgtacag atgaccaggg tgtcgataaa aatgccaatc atctggacaa tcccttgcgc
7080

     tgccggatgc ggaggccagg acgccgctgc cgctgccgcg tttggcgtcg aacccattcc
7140

     cgcctcattg gaaaacatac tgcgctgaaa accgttagta atcgcctggc ttaaggtata
7200

     tcccgccgcg ccgcctgccg cttcctgcca gccaaaagca ctctcaaaaa tagaccaaat
7260

     gacgtgggga agttgcccga tattcattac gcaaattacc aggctggtca gtacccagat
7320

     tatcgccatc aacgggacaa agccctgcat gagccgggcg acgccatgaa gaccgcgagt
7380

     gattgccagc agagtaaaga cagcgagaat aatgcctgtc accagcgggg gaaaatcaaa
7440

     agaaaaactc agggcgcggg caacggcgtt cgcttgaact ccgctgaaaa ttatgccata
7500

     ggcgatgagc aaaaagacgg cgaacagaac gcccatccag cgcatcccca gcccgcgcgc
7560

     catataccat gccggtccgc cacgaaactg cccattgacg tcacgttctt tataaagttg
7620

     tgccagagaa cattcggcaa acgaggtcgc catgccgata aacgcggcaa cccacatcca
7680

     aaagacggct ccaggtccac cggcggtaat agccagcgca acgccggcca ggttgccgct
7740

     acccacgcgc gccgcaagac tggtacacaa tgactgaaat gaggttaaac cgcctggctg
7800

     tggatgaatg ctatttttaa gacttttgcc aaactggcgg atgtagcgaa actgcacaaa
7860

     tccggtgcga aaagtgaacc aacaacctgc gccgaagagc aggtaaatca ttaccgatcc
7920

     ccaaaggacg ctgttaatga aggagaaaaa atctggcatg catatccctc ttattgccgg
7980

     tcgcgatgac tttcctgtgt aaacgttacc aattgtttaa gaagtatata cgctacgagg
8040

     tacttgataa cttctgcgta gcatacatga ggttttgtat aaaaatggcg ggcgatatca
8100

     acgcagtgtc agaaatccga aacagtctcg cctggcgata accgtcttgt cggcggttgc
8160

     gctgacgttg cgtcgtgata tcatcagggc agaccggtta catcccccta acaagctgtt
8220

     taaagagaaa tactatcatg acggacaaat tgacctccct tcgtcagtac accaccgtag
8280

     tggccgacac tggggacatc gcggcaatga agctgtatca accgcaggat gccacaacca
8340

     acccttctct cattcttaac gcagcgcaga ttccggaata ccgtaagttg attgatgatg
8400

     ctgtcgcctg ggcgaaacag cagagcaacg atcgcgcgca gcagatcgtg gacgcgaccg
8460

     acaaactggc agtaaatatt ggtctggaaa tcctgaaact ggttccgggc cgtatctcaa
8520

     ctgaagttga tgcgcgtctt tcctatgaca ccgaagcgtc aattgcgaaa gcaaaacgcc
8580

     tgatcaaact ctacaacgat gctggtatta gcaacgatcg tattctgatc aaactggctt
8640

     ctacctggca gggtatccgt gctgcagaac agctggaaaa agaaggcatc aactgtaacc
8700

     tgaccctgct gttctccttc gctcaggctc gtgcttgtgc ggaagcgggc gtgttcctga
8760

     tctcgccgtt tgttggccgt attcttgact ggtacaaagc gaataccgat aagaaagagt
8820

     acgctccggc agaagatccg ggcgtggttt ctgtatctga aatctaccag tactacaaag
8880

     agcacggtta tgaaaccgtg gttatgggcg caagcttccg taacatcggc gaaattctgg
8940

     aactggcagg ctgcgaccgt ctgaccatcg caccggcact gctgaaagag ctggcggaga
9000

//

 

Jolyon Holdstock Ph.D.

Senior Computational Biologist,

Oxford Gene Technology (Ops) Ltd.

Begbroke Business and Science Park

Sandy Lane, Yarnton

Oxford, OX5 1PF

 

Tel: 01865 309699

Fax: 01865 842116

 

Confidentiality Notice:

The contents of this email from the Oxford Gene Technology Group of
Companies are confidential and intended solely for the person to whom it
is addressed. It may contain privileged and confidential information. If
you are not the intended recipient you must not read, copy, distribute,
discuss or take any action in reliance on it.

 

 


_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list