[Bioperl-l] about gene "boundaries"

Thu Apr 29 06:57:13 UTC 2010

Hi Dimitar,

Attached is a C++ program to do your job. It is extremely faster than perl. Can do your job in less than a second even for full chromosome 1.

Steps:-
1- Download FASTA file and then remove the header. (>asdasfdasfassa)
2- Use RemoveNewline.pl program like this   
     RemoveNewline.pl inputfile > outputfile
3- You have to compile the C++ program using this command.

g++ ExtractSequence.cpp -o ExtractSequence
4- Then you can use the C++  program like this in linux:-

./ExtractSequence inputfilename start stop 
 or  
In Windows

ExtractSequence inputfilename start stop

e.g:- ExtractSequence chr1.fasta 10000  20000

Hope this helps.

Thanks
Ashfaq

________________________________
From: Chris Fields <cjfields at illinois.edu>
To: Dimitar Kenanov <dimitark at bii.a-star.edu.sg>
Cc: bioperl-l at bioperl.org
Sent: Wed, 28 April, 2010 11:10:40 PM
Subject: Re: [Bioperl-l] about gene "boundaries"

By local DB, do you mean a BioPerl-based local DB?  Or is it something else?  This is a bit vague.

On the BioPerl side I suggest looking into Bio::DB::SeqFeature::Store for storing and querying genome information (it does exactly what you want if the proper information is loaded), or maybe the Ensembl Perl API, which can be used with a local or remote Ensembl setup.  Beyond that you'll need to be more specific.

chris
On Apr 28, 2010, at 8:17 AM, Dimitar Kenanov wrote:

> Hello guys,
> i have a question about gene "boundaries". Is there some module in BioPerl which can help me extract the DNA sequence from a genomic DB (from specific chromosome). I have my human genome in a local DB and some "from-to" data sets corresponding to different chromosomes. So i want to get the DNA seqs for these from-to's. I know i can do that the normal way but if there is a way to do it with BioPerl it will be more consistent with the rest of the code.
> 
> Thanks for any tips :)
> 
> Cheers
> Dimitar
> 
> -- 
> Dimitar Kenanov
> Postdoctoral research fellow
> Protein Sequence Analysis Group
> Bioinformatics Institute
> A*STAR, Singapore
> email: dimitark at bii.a-star.edu.sg
> tel: +65 6478 8514
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ExtractSequence.cpp
Type: application/octet-stream
Size: 777 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100429/1edf4c04/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RemoveNewline.pl
Type: application/octet-stream
Size: 137 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100429/1edf4c04/attachment-0009.obj>