Raw sequence data generated for the SAM population was processed, cleaned and aligned to the sunflower reference genome HA412.v.1.1 using the bioinformatics pipelines and tools developed with the software company SAP-AG. Alignment files were further processed to mark potential PCR duplicates using the alignment coordinates and mapping edit distance information stored in a database. Variants including SNPs and indels were joint called across all accessions in one go using a haplotype sensitive algorithm implemented in open source software freebayes (Garrison and Marth 2012). Altogether 172,794,342 SNPs were detected among 284 accessions. The dataset was further filtered to remove sites with more than 30% missing data and a minor allele frequency lower than 5%, leaving a total of 1,398,933 SNPs.
The VCF files for the analysis described above will be made available following the publication of this data set.