[Bioperl-l] Parsing Genbank

Mark A. Jensen maj at fortinbras.us
Wed Dec 2 20:52:28 UTC 2009


Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds
as if there is a bug. If you can provide data that can reproduce
it, as Chris suggests, we can get onto it. 
thanks MAJ
  ----- Original Message ----- 
  From: Brandi Cantarel 
  To: Mark A. Jensen 
  Sent: Wednesday, December 02, 2009 3:38 PM
  Subject: Re: [Bioperl-l] Parsing Genbank


  How can I tell what version I am using?When I use the command from the website:


  perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION'


  I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release….


  Brandi




  On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote:


    with fake seq data and that header, I don't get a problem:

    DB<2> x $cds->location
    0  Bio::Location::Simple=HASH(0x37b1df4)
     '_end' => 974
     '_location_type' => 'EXACT'
     '_root_verbose' => 0
     '_seqid' => 'subjpool12_contig3'
     '_start' => 911
     '_strand' => '-1'

    Are you using the latest BioPerl (1.6.1 or the trunk) ?
    MAJ
    ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
    Cc: <bioperl-l at lists.open-bio.org>
    Sent: Wednesday, December 02, 2009 2:29 PM
    Subject: Re: [Bioperl-l] Parsing Genbank


    Here is some of my code, the real code actually enters the data into a database.


    $in  = Bio::SeqIO->new(-file => $gbkfile,
         '-format' => 'genbank');

    W1:while (my $seq = $in->next_seq()) {
    my @feats = $seq->get_all_SeqFeatures();
    my $j = 0;
    F1:foreach $cds (@feats) {
    next F1 unless ($cds->primary_tag() eq 'CDS');
    ###>> debugger stops here for above output

    #do something with the cds start and cds end
    }
    }


    LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
    ACCESSION   subjpool12_contig3
    KEYWORDS    .
    SOURCE      human metagenome
    ORGANISM  human metagenome
              unclassified sequences; organismal metagenomes,metagenomes.
    FEATURES             Location/Qualifiers
       source          1..974
                       /mol_type="genomic DNA"
                       /isolation_source="Homo sapiens"
                       /organism="human metagenome"
                       /collection_date="19-Nov-2009"
       CDS             complement(911..974)
                       /locus_tag="subjpool12_contig3|metagene|gene_2"
                       /translation="IRIMTVELINPYIRHVEHST"
                       /score="2.52804"
                       /product="hypothetical protein"
                       /note="score=2.52804"
                       /note="score=2.52804"
                       /note="frame=1"
    ORIGIN
    #some sequence….





      From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.




    ~~~~~~~~~~~~~~~~~~~~
    Brandi Cantarel, PhD
    Bioinformatics Analyst
    Institute for Genome Sciences
    School of Medicine
    University of Maryland, Baltimore

    On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:


      Hi Brandi-

      If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.

      Can you elaborate by posting your code?

      cheers,

      MAJ

      ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>

      To: <bioperl-l at lists.open-bio.org>

      Sent: Wednesday, December 02, 2009 1:36 PM

      Subject: [Bioperl-l] Parsing Genbank





        Hi all,

        I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.





        For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.



        x $cds->start

        1

        x $cds->end

        64



        How can I get the original coordinates?  Is there a command for that or will I have to just do the math?



        Feature or Bug?





        ~~~~~~~~~~~~~~~~~~~~

        Brandi Cantarel, PhD

        Bioinformatics Analyst

        Institute for Genome Sciences

        School of Medicine

        University of Maryland, Baltimore





        _______________________________________________

        Bioperl-l mailing list

        Bioperl-l at lists.open-bio.org

        http://lists.open-bio.org/mailman/listinfo/bioperl-l







    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at lists.open-bio.org
    http://lists.open-bio.org/mailman/listinfo/bioperl-l








More information about the Bioperl-l mailing list