[Bioperl-l] fasta header replace

Frank Schwach fs5 at sanger.ac.uk
Mon Aug 30 15:11:06 UTC 2010


Hi Olivier,

Do you know how to read a file and build a hash from the contents? This 
is what you will need to do,
e.g. if your file is
A1 Strain_A
A2 Strain_A
A3 Strain_B

then you can do something like:

open (INFILE, '>', $infile_path) or die;
my %well2strain;
While (<INFILE>){
    my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/);
    $well2strain{$well}=$strain;
}

You can then use the values of the hash to set the sequence ID as you 
parse the FASTA file. The BioPerl SeqIO howto gives details about how to 
read and write the FASTA file 
(http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples).
You can change the id of a sequence object with
$some_seq_object->id( 'my new ID');

See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details.

Hope that helps to get you started.

Frank

 

odclerck wrote:
> Hi,
> Was wondering if someone had an easy script available that converts the
> headers of a fasta sequences to a value stored in a separate text file.
>
> Macrogen produces files with sequences that look more or less like this:
>   
>> 100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1	1012, 1000 bases, 0 checksum.
>>     
>
> I can filter out the position on the plate e.g. "A1" easily but would like
> to replace this with the name of the strain stored in a different text file,
> e.g. "A1_D1222".
>
> Realize this sounds pretty basic to most of you, but I'm pretty new at
> scripting.
> Olivier
>
>   


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list