[Bioperl-l] parse EMBL Feature Table only

Frank Schwach fs5 at sanger.ac.uk
Mon Dec 14 12:18:17 UTC 2009


Hi,

Maybe I'm really missing something here but I can't find how to parse a
file that is basically just the Feature Table from an EMBL file, looking
like this:

FT   CDS
join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842)
FT                   /colour=7
FT                   /product="RNA-binding protein, putative"
FT   CDS             213199..214812
FT                   /colour=7
FT                   /product="eukaryotic translation initiation factor
3
FT                   subunit 7, putative"
...[more of the same]

So the file has no header and no actual sequence and it is used simply
to annotate a chromosome in a genome assembly. I've always used GFF for
that purpose but have been given this file now.
BioSeqIO->new(-format=>"EMBL") complains about the missing header and if
I stick in a fake ID line, it warns about the missing sequence and the
fact that the features don't fit on the sequence (of length 0). 
Of course it's not difficult to write my own parser but I'm sure there
must be a BioPerl way of doing that that I have just overlooked. Thanks
for your help.





-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list