[Bioperl-l] from SimpleAlign to SAM/BAM

John Marshall john.marshall at sanger.ac.uk
Wed May 19 16:22:19 UTC 2010


On 19 May 2010, at 14:34, Mark A. Jensen wrote:
> Albert-- have a look at Bio::Tools::Run::Samtools which incorporates  
> the use of Bio::Assembly::IO::sam (I think).

I've only briefly skimmed the B:T:R:Samtools documentation, but it  
would appear that this mostly encapsulates running the various  
samtools subcommands.  These provide various manipulations on SAM and  
BAM files, but don't give you anything in terms of converting from not- 
SAM/BAM to SAM/BAM.

> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com 
> >
>> Considering I've got a way to map the cDNAs to chromosome  
>> coordinates,
>> how can I generate a SAM/BAM file with ~1,000,000 entries against  
>> ~23.000 human
>> coordinates?

Perhaps I misunderstand, but if you already have a bunch of snippets  
of sequence and their mapped coordinates, then the easy way to  
generate a SAM file containing them is just to print it out by hand.

A SAM file is just a tab-separated text file.  For each sequence in  
your Bio::SimpleAlign objects, print out a line containing appropriate  
values for each of the 11 main SAM fields.  (If the snippets are  
effectively unpaired, then MRNM,MPOS,ISIZE can just be *,0,0, and the  
only FLAG values you'll be choosing between are 0, 4, 16, and 20.)

You should also start the file with an @SQ header for each of the  
chromosomes you've mapped against.

(I'm assuming you've read http://samtools.sourceforge.net/SAM1.pdf --  
it's a little vague, but should be more than enough to explain how to  
e.g. print out a basic SAM file with only the main fields.)

Once you've printed out a simple SAM file, you can use B:T:R:Samtools  
or samtools directly or other tools to convert it to the binary BAM  
format and/or otherwise work with it.

Cheers,

     John


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list