[Bioperl-l] Very basic Perl/BioPerl Help

Thu Apr 14 11:03:06 EDT 2005

Hello all,

I certainly pounded away at this one last night, I thought this part would
be easy, but after spending so much time getting my Entrez gene data parsed
etc my brain was a bit rubbery. 

What I am trying to do is take either A) Two fasta files with refseq/genbank
data OR B) Two text files with 1 accession# per line and compare them,
outputting only those fasta seqs or accession #'s that are not present in
both.

            So is it easier to just use perl somehow to compare the two raw
acc# text files?

            Or should I keep them as FASTA seqs and compare using Bio::Seq
objs somehow?

The idea is to update a list of Chromosome 21 genes last revised in 2003 by
comparing those accession numbers in our list with all of those accession
#'s that I pulled from an entrezgene 21[CHR] AND Homo sapiens[ORGN] NOT
pseudogene query and then saved the output as an ASN.1 file. I have all the
accession #'s. 

I just will need to match up those accession #'s NOT currently in our list
with the appropriate Entrez Genes using gene2accession, but I am not sure
how to do that either. I am assuming using a hash, but they have been steep
for me in terms of learning curve, but I'd like to learn them now, I will
just need some intuitive support.

Thanks all!

Colin