[EMBOSS] SUMMARY: size limits on extractseq?
Iain Drummond
idrummon at receptor.mgh.harvard.edu
Mon May 31 12:18:50 UTC 2004
This turned out to be operator error. extractseq works fine on large files.
The NCBI mouse chromosome files are actually a collection of contigs;
extractseq looks only at the first file. Both comments from Clinton and
Peter are on the mark; the problem was the sequence file.
--
Iain Drummond, Ph.D.
Assistant Professor
Department of Medicine, Harvard Medical School and
Renal Unit, Massachusetts General Hospital
Mailing address:
Renal Unit / MGH 149-8000
149 13th St.
Charlestown, MA 02129
Tel: 617 726 5647
Fax: 617 726 5669
idrummond at partners.org
idrummon at receptor.mgh.harvard.edu
Lab Home Page:
http://danio.mgh.harvard.edu
> From: Clinton Fernandes <cfernand at utm.utoronto.ca>
> Reply-To: clintonf at interchange.ubc.ca
> Date: Sun, 30 May 2004 21:37:30 -0700
> To: Iain Drummond <idrummon at receptor.mgh.harvard.edu>
> Subject: Re: [EMBOSS] size limits on extractseq?
>
> It may be the source of the file.
>
> I have experienced difficulty sometimes saving a file in a Windows environment
> and working with the file in a Linux environment. While I obviously don't know
> the specifics of your environment, this may be something you haven't
> considered.
>
> I never had a problem with extractseq, but what I was experiencing seemed
> somewhat similar to your problem. A sequence I downloaded from the internet
> had
> invisible characters that dos2unix did not remove. This resulted in some very
> quirky behaviour in an extraction program that I had coded. I had to copy the
> sequence to the clipboard and paste it into a text editor in my Linux machine.
>
> Again, what I have described may be completely left-field to what you are
> experiencing, but it may bear looking into if the situations are similar.
>
> --
> Clinton Fernandes
> Bioinformatician
> UBC, Dept. of Microbiology
> Wesbrook Bldg, Room 224
> 6174 University Blvd,
> Vancouver, BC, Canada
>
> (604) 827-5160
> e-mail: clintonf at interchange.ubc.ca
>
>
> Quoting Iain Drummond <idrummon at receptor.mgh.harvard.edu>:
>
>> Is there an upper limit on the file size that extractseq can handle?
>>
>> I run into a problem using extractseq to get segments out of mouse
>> chromosome files. It will only access the first 19.7 MB or so of a 185 MB
>> file. The files are fasta files of the mouse genome build 32 from NCBI. Here
>> is what the problem looks like: (real file sizes are ls'd below)
>>
>> $ extractseq
>> Extract regions from a sequence
>> Input sequence: mm_chr1.fa
>> Regions to extract (eg: 4-57,78-94) [1-19589943]:
>>
>> $ extractseq
>> Extract regions from a sequence
>> Input sequence: mm_chr2.fa
>> Regions to extract (eg: 4-57,78-94) [1-19704910]:
>>
>> $ ls -l
>> total 3782918
>> -rw-r--r-- 1 nobody nobody 200460727 May 25 23:56 mm_chr1.fa
>> -rw-r--r-- 1 nobody nobody 135604265 May 26 15:16 mm_chr10.fa
>> -rw-r--r-- 1 nobody nobody 123799904 May 26 15:17 mm_chr11.fa
>> -rw-r--r-- 1 nobody nobody 120032577 May 26 15:19 mm_chr12.fa
>> -rw-r--r-- 1 nobody nobody 119608375 May 26 15:20 mm_chr13.fa
>> -rw-r--r-- 1 nobody nobody 185168052 May 26 14:28 mm_chr2.fa
>>
>> --
>>
>> Iain Drummond, Ph.D.
>> Assistant Professor
>> Department of Medicine, Harvard Medical School and
>> Renal Unit, Massachusetts General Hospital
>>
>> Mailing address:
>> Renal Unit / MGH 149-8000
>> 149 13th St.
>> Charlestown, MA 02129
>>
>> Tel: 617 726 5647
>> Fax: 617 726 5669
>>
>> idrummond at partners.org
>> idrummon at receptor.mgh.harvard.edu
>>
>> Lab Home Page:
>> http://danio.mgh.harvard.edu
>>
>>
>>
>
More information about the EMBOSS
mailing list