[Bioperl-l] Sequence of Blast hit

Jonathan Manning bmb9jrm at bmb.leeds.ac.uk
Mon Jun 7 09:58:54 EDT 2004


Thanks for the help once again Jason. This is part of my larger project,
and I'm trying to minimise the amount of local files needed, and keep
the overall script running reasonably quickly. Contig files take ages to
download, so I think it's back to the Ensembl API approach.

Thanks,

Jon

On Mon, 2004-06-07 at 14:34, Jason Stajich wrote:
> I'm a little confused of what you want at the end of the day.
> 
> For cDNA X you want the genomic locus of the homologous region in species
> Y?  So the locus is defined as the minimum and maximum span of the cDNA on
> some contig?  If you can generate a table that looks like
> contig start end strand
> by parsing a sequence alignment output (Bio::SearchIO)
> 
> Then you can get those subsequences with several tools depending on if you
> are keeping the whole DNA local (Bio::DB::Fasta is the easiest here,
> or you can use Bio::DB::Flat, Bio::Index::Fasta) or if contigs are in
> public repositories Bio::DB::GenBank or Bio::DB::EMBL.  There has been
> some talk of work on doing subsequence request with
> Entrez/Bio::DB::GenBank but I don't know if it has been actually
> incorporated or not.  I personally find it simpler and more reliable
> (you always know which version you are working with, network problems
> don't bite you, etc) to download the datasets and use the local indexes to
> solve the problem.
> 
> A word of caution before using Bio::DB::Fasta make sure all the sequences
> are consistently formatted in terms of line width - mixed width will cause
> problesm.  The simplest thing is to use sreformat/bp_sreformat.PLS and
> just re-export it as fasta.
> 
> -jason
> On Mon, 7 Jun 2004, Jonathan Manning wrote:
> 
> > Hi All,
> >
> > I have been using BLAST to locate nucleotide sequences to the genome. I
> > have then been using the Ensembl API to extract information based on the
> > results. I'm doing something now where I really only need the matching
> > genomic sequence, and from organisms other than those represented in
> > Ensembl. The trouble is that when I BLAST with a cDNA, the resulting
> > HSPs clearly only match exonic regions, so I need to get the sequence
> > information from somewhere else. I don't want to retrieve the entire
> > contig file.
> >
> > Is there an easy way to download a subsequence of contig/chromosome
> > sequence without the whole file?
> >
> > Thanks,
> >
> > Jon
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 



More information about the Bioperl-l mailing list