[Bioperl-l] genomic coordinates always on the plus strand

Surya Saha ss2489 at cornell.edu
Sun May 6 21:15:12 UTC 2012


Hi Hermann,

To back up what Jim said.. this convention is not only specific to BioPerl
but all GFF files, the de-facto file format for annotations. See
http://gmod.org/wiki/GFF. Coordinates are always numbered according to the
positive strand. If you have two genes that differ only in strand, then the
GFF records will only differ in the value of the strand field. Hope that
helps.

-Surya

On Sat, May 5, 2012 at 4:44 PM, Jim Hu <jimhu at tamu.edu> wrote:

> In BioPerl end( to) is always > start(from) and the strand is indicated by
> strand.  IIRC, there is a proposal for how to handle this for features that
> cross the origin in circular genomes, but it hasn't been implemented yet.
> See:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Range.html
>
> Jim Hu
>
> On May 4, 2012, at 2:29 PM, Hermann Norpois wrote:
>
> > Hello,
> >
> > in the tutorial
> > http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequencesthere is a
> > script that retrieves genomic coordinates (see below). I tested
> > it with 14 geneIDs and got always coordinates on "plus strand" meaning
> > $from was always a lower number than $to. Principally this is nice but I
> > was surprised. This means that all by genes are (by chance) on the plus
> > strand or that there are 2 "coordinates" (one for the "plus" one for the
> > "minus" strand). Then it could be possible (theoretically and not very
> > likely) that there are two genes for one $from/$to pair (one on the plus
> > and one on the minus strand with the same coordinates with different
> IDs).
> > I did not find anything about this issue in the documentation or in the
> > archive. Could please anybody comment on this?
> >
> > use strict;use Bio::DB::EntrezGene;
> > my $id = shift <http://perldoc.perl.org/functions/shift.html> or die
> > <http://perldoc.perl.org/functions/die.html> "Id?\n"; # use a Gene id
> > my $db = new Bio::DB::EntrezGene;
> > my $seq = $db->get_Seq_by_id($id);
> > my $ac = $seq->annotation;
> > for my $ann ($ac->get_Annotations('dblink')) {
> >       if ($ann->database eq "Evidence Viewer") {
> >                # get the sequence identifier, the start, and the stop
> >               my ($contig,$from,$to) = $ann->url =~
> >                 /contig=([^&]+).+from=(\d+)&to=(\d+)/;
> >               print <http://perldoc.perl.org/functions/print.html>
> "$contig\t$from\t$to\n";
> >       }}
> >
> >
> > Thank you
> > Hermann Norpois
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> =====================================
> Jim Hu
> Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4054
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list