[Bioperl-l] construct chromosome sequences from bac sequences

Tue Dec 30 15:02:34 UTC 2008

Hi,

I have FPC report and BAC sequences in hand. I was wondering what is the
most practical way to build chromosomes from these available information.

I HAVE:
FPC file:
accession    chr    chr_start    chr_end    contig    contig_start
contig_end
aaaaaaaaaa    1    14700    215600    ctg1    14700    215600
bbbbbbbbbb    1    196000    362600    ctg1    196000    362600
cccccccccc    1    352800    524300    ctg1    352800    524300
.
.

BAC fasta file:
>aaaaaaaaaa
GATCGATCAGCATCGACTACGACT...
>bbbbbbbbbb
AGTAGCAGTAGCTAGCACTACGAC...
>cccccccccc
ACGATCAGCATCAGCATCGACTAC...
.
.
.

I WANT:
>chr1
GACGACTAGCTACGACTAC...
>chr2
AGCTGATCACGATCACGAC...

In theory a sequence object called "Chr1" can be created and then according
to start and end locations of each BAC in FPC file, subsequences of Chr1 can
be retrieved. However, there are two facts which might prevent using
standard sequence objects.
1) There will be gaps in chromosomes. Is there a function to convert
unassigned locations to N?
2) There are overlaps between BAC sequences. If the overlapping sequences
are exactly same, it won't be problem, but if there are discrepancies
between them, a decision has to be made as to which sequence to use in final
Chr1 sequence.

thanks,

Alper Yilmaz