[Bioperl-l] PopGen

Wed Jan 3 19:44:25 UTC 2007

Let's see if anyone else in the bioperl-ml has any comments/ideas on
how this could be done :)

On 1/3/07, Marian Thieme <marian.thieme at klinik.uni-regensburg.de> wrote:
> Hi again,
>
> if I understand the t/data/example.hap file right, then it is nearly the
> desired format and I can relatively easy import our snp data into some
> perl object.
> Because you asked/encouraged me to give some feedback: I didnt find a
> way to output the complete sequence of each individual while regarding
> the specific sequence properties (specific alleles) of that individual.
> If I understand the PopGen Api right, then it can represent
> snps/variations specific to an individual, but it doesnt cope the
> complete (reference) sequence.
> Of course, I can get the some reference sequence from genbank or where
> else and produce a individual specific sequence based on that reference
> seq. by subsituting bases in the corresponding positions. But I was
> hoping there is some function which do this for me. If not perhaps I can
> develop this feature and contribute to the bioperl package !?
>
> Marian
>
>
> Albert Vilella wrote:
>
> > To add a bit more info. Using the example.hap file in the t/data dir
> > of bioperl, you can see that the alleles correspond to the
> > nucleotides, and the marker name corresponds to the dbSNP rs id (I
> > guess in your case it can be something that relates to the coords of
> > the sequence):
> >
> > #!/usr/local/bin/perl
> >
> > use Bio::PopGen::IO;
> > my $io = new Bio::PopGen::IO(-format => 'hapmap',
> >                             -file   => '../../t/data/example.hap');
> >
> > # Some IO might support reading in a population at a time
> >
> > my @population;
> > while ( my $ind = $io->next_individual ) {
> >    push @population, $ind;
> > }
> >
> > foreach my $individual (@population) {
> >    my @genotypes = $individual->get_Genotypes;
> >    foreach my $genotype (@genotypes) {
> >        print "individual_id ", $genotype->individual_id ,"\n";
> >        print "alleles ", $genotype->get_Alleles ,"\n";
> >        print "marker_name ", $genotype->marker_name ,"\n";
> >    }
> > }
> >
> > 1;
> >
> >
> > On 1/3/07, Albert Vilella <avilella at gmail.com> wrote:
> >
> >> Well, in that cases the alleles are numerical ids instead of
> >> nucleotides... but in your case you will have the nucleotide
> >> corresponding to the coordinate with polymorphism...
> >>
> >> On 1/3/07, Marian Thieme <marian.thieme at klinik.uni-regensburg.de> wrote:
> >> > Albert,
> >> >
> >> > thank you very much for this hint. I did completely overlook the
> >> PopGen
> >> > package. But at least one question remains, because I didnt fully
> >> > understand the allele attribute of the Bio::PopGen::Genotype object,
> >> > perhaps you can help me:
> >> >
> >> > in the HOWTO (http://www.bioperl.org/wiki/HOWTO:PopGen) there is a
> >> > Genotype created by:
> >> >
> >> > my $genotype = Bio::PopGen::Genotype->new(-marker_name   => 'D7S123',
> >> >                                            -individual_id => '1001',
> >> >                                            -alleles       =>
> >> > ['104','107'] );
> >> >
> >> > Can you explain me what the numbers mean (-alleles=> ['104','107']
> >> );) ?
> >> > I would expect that an allele is specified by a position AND the bases
> >> > which are different to the bases in the original (reference) sequence.
> >> >
> >> > Regards,
> >> > Marian
> >> >
> >> > Albert Vilella wrote:
> >> >
> >> > > The Bio::PopGen modules contain Individual, population and genotype
> >> > > objects, among other utilities. There are some input/output
> >> formats in
> >> > > Bio::PopGen::IO and also some methods to go from an aln to a
> >> > > population.
> >> > >
> >> > > That said, I am not entirely sure about how much of that overlaps
> >> with
> >> > > Bio::Variation.
> >> > >
> >> > > If you think anything missing that you would like to have
> >> implemented
> >> > > in bioperl, we would greatly appreciate your feedback,
> >> > >
> >> > > Cheers,
> >> > >
> >> > >    Albert.
> >> > >
> >> > > On 1/2/07, Marian Thieme <marian.thieme at klinik.uni-regensburg.de>
> >> wrote:
> >> > >
> >> > >> Hi all,
> >> > >>
> >> > >> I am quite new to bioperl and I have a question about sequence
> >> data: I
> >> > >> am working on a resequencing project. Here we have resequenced 1000
> >> > >> genes of a certain gene. My question: What is easiest way to
> >> store each
> >> > >> discovered variation of each individual and get a fasta sequence
> >> for an
> >> > >> arbitrary individual.
> >> > >>
> >> > >> I would expect that there is some way to set up a reference
> >> sequence and
> >> > >> store all variationsm relative to this reference sequence.
> >> Afterward it
> >> > >> should be possible to genereate sequences for each indiviudal in
> >> > >> question, right ?
> >> > >>
> >> > >> My approach was the following:
> >> > >>
> >> > >> I have created an seqdiff object:
> >> > >>
> >> > >> $seqDiff = Bio::Variation::SeqDiff->new (...)
> >> > >>
> >> > >>
> >> > >> and I have assigned the reference sequence to that object via:
> >> > >>
> >> > >> $seqDiff->dna_ori('atgcgtatatg');
> >> > >>
> >> > >>
> >> > >> Now I thought, I can create some variations via DNAMutation object:
> >> > >>
> >> > >> $dnamut = Bio::Variation::DNAMutation->new (
> >> > >>   -start => 6,
> >> > >>   -end => 6,
> >> > >>   -length => 1,
> >> > >>   -isMutation => 1,
> >> > >>   -upStreamSeq => 'atgcg',
> >> > >>   -dnStreamSeq => 'atatg'
> >> > >> );
> >> > >>
> >> > >> $a1 = Bio::Variation::Allele->new;
> >> > >> $a1->seq('t');
> >> > >> $dnamut->allele_ori($a1);
> >> > >>
> >> > >> my $a2 = Bio::Variation::Allele->new;
> >> > >> $a2->seq('a');
> >> > >> $dnamut->add_Allele($a2);
> >> > >>
> >> > >>
> >> > >>
> >> > >> Is that the correct way to describe the reference sequence,
> >> describe a
> >> > >> variation and attach this to seqdiff object ?
> >> > >> Probably I didnt understand the api right. (I did assume
> >> start/end means
> >> > >> start/endposition of the mutation). Is it possible to get a
> >> complete
> >> > >> sequence print (fast format) of each variation/indiviudal ?
> >> > >>
> >> > >> Regards,
> >> > >> Marian
> >> > >>
> >> > >> --
> >> > >> Marian Thieme
> >> > >> University Regensburg
> >> > >> Institute of Functional Genomics
> >> > >> Josef-Engert-Str. 9
> >> > >> 93053
> >> > >> Regensburg
> >> > >> Germany
> >> > >> P: 0049 (0)941 943 5055
> >> > >> F: 0049 (0)941 943 5020
> >> > >> E: marian.thieme at klinik.uni-regensburg.de
> >> > >> W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik
> >> > >>
> >> > >> _______________________________________________
> >> > >> Bioperl-l mailing list
> >> > >> Bioperl-l at lists.open-bio.org
> >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > >>
> >> >
> >> >
> >> > --
> >> > Marian Thieme
> >> > University Regensburg
> >> > Institute of Functional Genomics
> >> > Josef-Engert-Str. 9
> >> > 93053
> >> > Regensburg
> >> > Germany
> >> > P: 0049 (0)941 943 5055
> >> > F: 0049 (0)941 943 5020
> >> > E: marian.thieme at klinik.uni-regensburg.de
> >> > W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik
> >> >
> >> >
> >>
>
>
> --
> Marian Thieme
> University Regensburg
> Institute of Functional Genomics
> Josef-Engert-Str. 9
> 93053
> Regensburg
> Germany
> P: 0049 (0)941 943 5055
> F: 0049 (0)941 943 5020
> E: marian.thieme at klinik.uni-regensburg.de
> W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik
>
>