[Bioperl-l] BIO::DB::FASTA ID
Michael Kiwala
mkiwala at watson.wustl.edu
Thu Jun 21 21:23:46 UTC 2007
You only have 1527 unique id's in the file.
~$ grep '^>' Desktop/T_orthologs_Dpse_genes.fa|cut -d\ -f1|sort -u|wc -l
1527
Change your make_id function to make sure the id's are unique.
Staffa, Nick (NIH/NIEHS) wrote:
> This program below returns only 1527 IDs from a fasta file that I have
> constructed, which has
> mildred> grep -c "^>Dpse" T_orthologs_Dpse_genes.fa
> 1820
> .
> It actually does not return the first 3 ids,
> nor the 5th, nor 7..36, 38,39,41..44......
> The header lines are of variable length and the sequence lines are 80
> characters except at the ends when they might be shorter.
> Is there some caveat that I am ignoring in my format that breaks
> bio::db::fasta?
>
>
> #!/usr/bin/perl
> #
> #
> #
> use strict;
> use Bio::DB::Fasta;
> use Bio::Tools::SeqWords;
> use Bio::Seq;
> use Bio::SeqIO;
> $|=1;
> #
> #
> my $Dpse_UTR_file_for_T_orthologs =
> "/home/staffa/clients/Kari/D_pse_genome/testit/T_orthologs_Dpse_genes.fa";
> my $db = Bio::DB::Fasta->new
> ('/home/staffa/clients/Kari/D_pse_genome/testit/T_orthologs_Dpse_genes.fa',
> -reindex, -makeid => \&make_my_id);
> my @ids = $db->ids;
> my $number_in = @ids;
> print "number of Dpse IDs = $number_in\n";
> foreach my $id (@ids){
> print "$id\n";
> }
> sub make_my_id {
> # parse header line:
> # >Dpse_GA13134 CG14636 NO UTR has 2 TATTTAT 117 145, 0 TTATTTATT
> my $line = shift;
> # print "line = $line\n";
> $line =~ />(\w+) /;
> my $ID = $1;
> # print "ID = $ID\n";
> return $ID;
> }
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list