[Bioperl-l] Genbank files with CONTIG lines in them.
Fields, Christopher J
cjfields at illinois.edu
Mon Dec 16 04:28:22 UTC 2013
Govind,
Can you try this with the latest CPAN release (v 1.6.922)?
chris
On Dec 11, 2013, at 6:15 AM, Govind Chandra <govind.chandra at jic.ac.uk> wrote:
> Hi,
>
> Some Genbank files have a line beginning with "CONTIG" as shown below.
>
> .
> .
> .
> /protein_id="YP_008390690.1"
> /db_xref="GI:529229870"
> /db_xref="GeneID:16501453"
> /translation="MSAEATPNTGEVQRYVKGLGRAASFVAGLVVLAFAADCIPPWPF
> VTEDGSPAKLRRLGMLRCPACGLMSNREHRRLCRGPWRAGEDVST"
> CONTIG join(CP006261.1:1..19314)
> ORIGIN
> 1 ggggggcaga ggccatgcgg ctacgccgcg tcacctccgg gcctgcggcc ctcacggacg
> 61 gtgacggtca ctctccgcgg tcgtgcctac ggcacatccc cgccgccgtg tcaacccccg
> 121 cgcgcaactt ttccccgaca acctgcggtt gtcgtccgcc gtcccgggac cgcacccccc
> 181 acccgatcac cccccaccgg ccgggctacg cccacggccg gcccctcggc cgtctgtggc
> 241 ccacaggttc cccccgccgc ctacggcgtc tcgtccgggc ataccccccc ctgctacgcc
> 301 accccaccga acgcgccgag cccgcaaagg ccggcggcgc gtcggccgac acactccgtc
> 361 tgtccccgtg aggctgcggg tatcggccat gcctggcctg ccctgcttcg ccgctcggcc
> .
> .
> .
>
>
>
> If the CONTIG line is present in a Genbank file then the string
> returned by the Bio::Seq->seq() method is zero-length or undefined (I
> haven't checked which).
>
> I made two versions of the same genbank file, one with the CONTIG line
> and one without. Then I ran the script pasted below.
>
>
> ### Code begins ###
>
> use strict;
> use Bio::SeqIO;
>
>
> for my $gbkfile (qw(withContigLine.gbk withoutContigLine.gbk)) {
>
> my $seqin = Bio::SeqIO->new(-file => $gbkfile);
> my $seqobj = $seqin->next_seq();
> my $ntseq = $seqobj->seq();
> my $strlen = length($ntseq);
> my $bplen = $seqobj->length();
>
> print <<"REPORT";
> $gbkfile
>
> Bioperl reports length as $bplen.
> Length of the sequence string is $strlen.
>
> =========================================
>
> REPORT
>
> }
>
> print("Perl version is: $]\n");
> print("Bioperl version is: ", $Bio::SeqIO::VERSION, "\n");
> printf "Bioperl version again: %vd\n", $Bio::SeqIO::VERSION;
>
> exit;
>
> ### Code Ends ###
>
> The output from the above script is pasted below.
>
>
> ### Output begins ###
>
> withContigLine.gbk
>
> Bioperl reports length as 19314.
> Length of the sequence string is .
>
> =========================================
>
> withoutContigLine.gbk
>
> Bioperl reports length as 19314.
> Length of the sequence string is 19314.
>
> =========================================
>
> Perl version is: 5.018000
> Bioperl version is: 1.006001
> Bioperl version again: 49.46.48.48.54.48.48.49
>
> ### Output ends ###
>
>
> Do I have to do something different to get the sequence string from
> Genbank files which have the CONTIG line in them?
>
> Any suggestions will be most gratefully received.
>
> Thanks
>
> Govind
>
> Govind Chandra
> Molecular Microbiology
> John Innes Centre
> Norwich UK.
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list