[Bioperl-l] construct chromosome sequences from bac sequences
Alper Yilmaz
alperyilmaz at gmail.com
Tue Dec 30 15:02:34 UTC 2008
Hi,
I have FPC report and BAC sequences in hand. I was wondering what is the
most practical way to build chromosomes from these available information.
I HAVE:
FPC file:
accession chr chr_start chr_end contig contig_start
contig_end
aaaaaaaaaa 1 14700 215600 ctg1 14700 215600
bbbbbbbbbb 1 196000 362600 ctg1 196000 362600
cccccccccc 1 352800 524300 ctg1 352800 524300
.
.
BAC fasta file:
>aaaaaaaaaa
GATCGATCAGCATCGACTACGACT...
>bbbbbbbbbb
AGTAGCAGTAGCTAGCACTACGAC...
>cccccccccc
ACGATCAGCATCAGCATCGACTAC...
.
.
.
I WANT:
>chr1
GACGACTAGCTACGACTAC...
>chr2
AGCTGATCACGATCACGAC...
In theory a sequence object called "Chr1" can be created and then according
to start and end locations of each BAC in FPC file, subsequences of Chr1 can
be retrieved. However, there are two facts which might prevent using
standard sequence objects.
1) There will be gaps in chromosomes. Is there a function to convert
unassigned locations to N?
2) There are overlaps between BAC sequences. If the overlapping sequences
are exactly same, it won't be problem, but if there are discrepancies
between them, a decision has to be made as to which sequence to use in final
Chr1 sequence.
thanks,
Alper Yilmaz
More information about the Bioperl-l
mailing list