[Bioperl-l] Validate Fasta

john herbert john.herbert at clinical-pharmacology.oxford.ac.uk
Wed Mar 3 06:10:11 EST 2004


Interestingly, it also does not complain if you convert the fasta Perl
to EMBL format either :-)

ID   #!/usr/bin/perlstandard; AA; UNK; 527 BP.
XX
AC   unknown;
XX
DE   
XX
FH   Key             Location/Qualifiers
FH
XX
SQ   Sequence 527 BP; 38 A; 19 C; 5 G; 29 T; 436 other;
     my$backups ={'mysql'= "/mick/mys ql/",'apac he'="/res/ upity/apac 
      60
     he",'mwats on'="/res/ upity/mwat son",'www' ="/www/doc s",'ensemb 
     120
     l'="/too/f ools/ensem bl",'cgi'= "/www/cgi- bin/"};my$ location=" 
     180
     /mick/back ups";my$da te=`date`; my at date=sp lit(/\s+/, $date);my$ 
     240
     date=join( "_", at date[ 0..2],$dat e[$#date]) ;print"$da te\n";#whi 
     300
     le(my($nam e,$dir)=ea ch%{$backu ps}){forea ch$name(qw (apachemys 
     360
     qlmwatsonw wwensemblc gi)){$dir= $backups-{ $name};pri nt"tarzipp 
     420
     ing$dir\n" ;system("/ bin/tar-c$ dir$locati on/$name.$ date.tar") 
     480
     ;system("/ bin/gzip$l ocation/$n ame.$date. tar");}               
     527
//



>>> "michael watson (IAH-C)" <michael.watson at bbsrc.ac.uk> 03/03/2004
10:52:58 >>>
Thanks for youe help, but I am afraid not....

-----Original Message-----
From: john herbert
[mailto:john.herbert at clinical-pharmacology.oxford.ac.uk] 
Sent: 03 March 2004 10:45
To: michael.watson at bbsrc.ac.uk; bioperl-l at portal.open-bio.org 
Subject: Re: [Bioperl-l] Validate Fasta


Hello Michael.
Im not a BioPerl extra-ordinaire programmer (so anyone correct me if I
am wrong) but I think the -format flag should help here. 

Try 

my $in = Bio::SeqIO->new(-file => "rubbish.fasta", -format =>
'Fasta');
my $out = Bio::SeqIO->new(-file => ">rubbish2.fasta", -format =>
'Fasta');

I am pretty sure if you put this change in your code and run it on
your
very nice Perl fasta sequence, it will complain. 

Kind regards,

JOhn.


>>> "michael watson (IAH-C)" <michael.watson at bbsrc.ac.uk> 03/03/2004
10:16:04 >>>
Hi

I have searched the archives and only come up with one answer, and it
didn't work - I want to validate a FASTA sequence (DNA).  What I mean
is
that if I am given a perfect FASTA sequence, then thats ok, but if
there
are ANY whitespace characters, or any other characters that really
shouldn't be there, I want it to throw an error.  The script below was
suggested by Jason in 2002:

use Bio::SeqIO;

my $in = Bio::SeqIO->new(-file => "rubbish.fasta");
my $out = Bio::SeqIO->new(-file => ">rubbish2.fasta");

eval {
	LOOP: while( my $seq = $in->next_seq ) {
		$out->write_seq($seq);
	}

};
if( $@) {
	print "There's an Error!\n";
	goto LOOP;
}

I actually fired this at one of my scripts, a perl script that clearly
wasn't a fasta sequence - it has #'s, \ts, \ns and all sorts of non
DNA
sequence characters.  Here is the result:

>#!/usr/bin/perl
my$backups={'mysql'="/mick/mysql/",'apache'="/res/upity/apac
he",'mwatson'="/res/upity/mwatson",'www'="/www/Docs",'ensemb
l'="/too/fools/ensembl",'cgi'="/www/cgi-bin/"};my$location="
/mick/backups";my$date=`date`;my at date=split(/\s+/,$date);my$
date=join("_", at date[0..2],$date[$#date]);print"$date\n";#whi 
le(my($name,$dir)=each%{$backups}){foreach$name(qw(apachemys
qlmwatsonwwwensemblcgi)){$dir=$backups-{$name};print"tarzipp
ing$dir\n";system("/bin/tar-c$dir$location/$name.$date.tar")
;system("/bin/gzip$location/$name.$date.tar");}

This is undoubtedly a wonderfully FASTA formatted perl script, but...

Anyone?  Any ideas?

Thanks in advance for the help!

Mick
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org 
http://portal.open-bio.org/mailman/listinfo/bioperl-l 
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org 
http://portal.open-bio.org/mailman/listinfo/bioperl-l


More information about the Bioperl-l mailing list