[Bioperl-l] from SimpleAlign to SAM/BAM
John Marshall
john.marshall at sanger.ac.uk
Wed May 19 16:22:19 UTC 2010
On 19 May 2010, at 14:34, Mark A. Jensen wrote:
> Albert-- have a look at Bio::Tools::Run::Samtools which incorporates
> the use of Bio::Assembly::IO::sam (I think).
I've only briefly skimmed the B:T:R:Samtools documentation, but it
would appear that this mostly encapsulates running the various
samtools subcommands. These provide various manipulations on SAM and
BAM files, but don't give you anything in terms of converting from not-
SAM/BAM to SAM/BAM.
> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
> >
>> Considering I've got a way to map the cDNAs to chromosome
>> coordinates,
>> how can I generate a SAM/BAM file with ~1,000,000 entries against
>> ~23.000 human
>> coordinates?
Perhaps I misunderstand, but if you already have a bunch of snippets
of sequence and their mapped coordinates, then the easy way to
generate a SAM file containing them is just to print it out by hand.
A SAM file is just a tab-separated text file. For each sequence in
your Bio::SimpleAlign objects, print out a line containing appropriate
values for each of the 11 main SAM fields. (If the snippets are
effectively unpaired, then MRNM,MPOS,ISIZE can just be *,0,0, and the
only FLAG values you'll be choosing between are 0, 4, 16, and 20.)
You should also start the file with an @SQ header for each of the
chromosomes you've mapped against.
(I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf --
it's a little vague, but should be more than enough to explain how to
e.g. print out a basic SAM file with only the main fields.)
Once you've printed out a simple SAM file, you can use B:T:R:Samtools
or samtools directly or other tools to convert it to the binary BAM
format and/or otherwise work with it.
Cheers,
John
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list