We report a 12.5x sequence-based physical map of the cultivated sunflower genome based on a whole-genome profiling (WGP) technique developed by Keygene NV. Fourteen-genome equivalent BAC libraries were constructed and pooled two-dimensionally, and the restriction fragments from the pooled BAC clones were sequenced using the Illumina Genome Analyzer platform. WGP takes advantage of these short read sequences to generate 20-30 unique sequence tags for each BAC clone. BAC pools were tagged individually and based on the coordinates in the 2D pool screens, unique sequence tags were assigned to individual BACs. The BAC clones were ordered into contigs based on shared regions containing identical sequence tags, using a customized version of Finger Print Contigs (FPC) -the software that has been routinely used for restriction profile-based physical mapping. The resulting FPC map was integrated with high-density sequence-based genetic maps and genome assemblies, on the basis of which chimeric BACs were removed and chimeric contigs were split to generate an improved physical map comprising 8,576 contigs. The integrated physical-genetic map was used to construct Linkage Group (LG) specific physical maps. Approximately 3.3Gb (~92.5% of the 3.6 Gb genome) is assigned to the 17 linkage groups through these LG specific physical maps. Integration of the physical map with genetic maps and the genomic scaffolds not only validates the physical map but also demonstrates its ability to act as a common platform towards a superior sunflower genome assembly.
Whole Genome FPC Physical Map
The whole genome physical map is assembled at 1e-15 with chimeric BACs removed and chimeric contigs split. Here, the chimeric BACs refer to BACs determined as being chimeric through alignment with the early genetic maps, as well as the higher-density SNP genetic map. Chimeric contigs were split using a custom perl program instead of FPC. This fpc file is incompatible with the fpc software, but is compatible with the fpc BioPerl modules.
Physical Map to Genetic Map Associations:
- BLAST all tags to the genome assembly scaffolds used in a Sliding Window Genetic Map
- Assign tags to chromosomes. Tags can be assigned multiple chromosomes.
- Assign BACs to chromosomes with the most unique tag hits.
- Characterize the BAC as confidently placed, poorly placed, or unplaced:
- Allow multicopy tags, but give greater precedence to single copy tags.
- Single copy tag weight = 1
- Multicopy tag weight = weight single copy tag * total all multicopy tags / total all single copy tags = around 0.4
- Calculate maximum total weight of BAC = # single copy tags in bac * 1 + # multicopy tags in bac* 0.4
- BAC is unplaced if it has fewer than 3 tags hitting any genetic locus.
- BAC is confidently placed if the weight of its unique tags hitting the assigned chromosome >= 95% maximum tag weight.
- BAC is poorly placed if the weight of its unique tags hitting the assigned chromosome
- Assign physical map contigs to chromosomes using only confidently placed BACs. If confidently placed BACs hit multiple chromosomes, then the physical map contig is chimeric.
In the second round of BAC-chromosome placements, the initially 185,190 unplaced BACs were selected. Multicopy tags that mapped these BACs to multiple chromosomes were removed while a majority rule of 51% was implemented to get additional 78,774 BAC placements.
Overall, 66 % of the BACs were placed on the 17 Linkage Groups. The BAC to LG placements were used to generate the LG-specific physical maps at a cut off of 1e-15 and tolerance level 0. The DQer function of FPC was used to break up all contigs containing >10% Questionable (Q) clones. Approximately 3.3Gb (~92.5% of the 3.6 Gb genome) is assigned to the 17 linkage groups through these LG specific physical maps.