[Bioperl-l] Very basic Perl/BioPerl Help
Sean Davis
sdavis2 at mail.nih.gov
Thu Apr 14 11:32:33 EDT 2005
On Apr 14, 2005, at 11:03 AM, Colin Erdman wrote:
> Hello all,
>
>
>
> I certainly pounded away at this one last night, I thought this part
> would
> be easy, but after spending so much time getting my Entrez gene data
> parsed
> etc my brain was a bit rubbery.
>
>
>
> What I am trying to do is take either A) Two fasta files with
> refseq/genbank
> data OR B) Two text files with 1 accession# per line and compare them,
> outputting only those fasta seqs or accession #'s that are not present
> in
> both.
>
> So is it easier to just use perl somehow to compare the
> two raw
> acc# text files?
>
Colin,
If you load your text files as one array for each file, you can easily
do what I think you are asking by looking here:
http://www.unix.org.ua/orelly/perl/cookbook/ch04_08.htm
> I just will need to match up those accession #'s NOT currently in our
> list
> with the appropriate Entrez Genes using gene2accession, but I am not
> sure
> how to do that either. I am assuming using a hash, but they have been
> steep
> for me in terms of learning curve, but I'd like to learn them now, I
> will
> just need some intuitive support.
Yep. Hash will do it. Read in your file grabbing the appropriate
columns and putting them in a hash like:
my %acc2genehash;
while (my $line=<INF>) {
my @params=split(/\t/,$line);
$acc2genehash{$params[1]}=$params[5];
}
Then you can do:
print $acc2genehash{'AAD12597.1'}
will give you 1246500, the gene id of that accession (from the first
line of gene2accession);
I haven't tested the above code, and you still need to do file loading,
etc., but I hope you get the point.
Sean
More information about the Bioperl-l
mailing list