Annotations

A major challenge of the sunflower genome project has been dealing with the large and repetitive nature of the genome. Below is a description of our custom annotation procedures and the results. You may skip to the results or download sections using the menu to the left. Note the references for each data source at the bottom of the page.

Annotation files

Separate transposon and gene annotation files are listed below but note that there are also combined annotation files provided that contain all features (under the section "Combined annotations").

Genes

Release versionFilenameDescriptionFormatDownload
v1.0Ha412v1r1_CDS_v1.0.fasta.gzNucleotide CDS for each geneFASTA Download
v1.0Ha412v1r1_CDS_v1.0.fasta.gz.md5MD5 checksum of nucleotide CDS for each geneMD5 checksum Download
v1.0Ha412v1r1_CDS_iprscan_v1.0.tsv.gzFunctional annotation tableTSV Download
v1.0Ha412v1r1_CDS_iprscan_v1.0.tsv.gz.md5MD5 checksum of functional annotation tableMD5 checksum Download
v1.0Ha412v1r1_prot_v1.0.faa.gzTranslated CDS, or protein sequenceFASTA Download
v1.0Ha412v1r1_prot_v1.0.faa.gz.md5MD5 checksum of translated CDSMD5 checksum Download
v1.0Ha412v1r1_genes_v1.0.gff3.gzAnnotated gene features in GFF3 formatFASTA Download
v1.0Ha412v1r1_genes_v1.0.gff3.gz.md5MD5 checksum of annotated gene features in GFF3MD5 checksum Download
v1.0Ha412v1r1_genes_v1.0.gtf.gzAnnotated gene features in GTF formatFASTA Download
v1.0Ha412v1r1_genes_v1.0.gtf.gz.md5MD5 checksum of annotated gene features in GTFMD5 checksumDownload
v1.0Ha412v1r1_genes_v1.0.fasta.gzFull-length nucleotide sequence for each geneFASTA Download
v1.0Ha412v1r1_genes_v1.0.fasta.gz.md5MD5 checksum of full-length nucleotide sequence for each geneMD5 checksum Download
v1.0Ha412v1r1_genes_v1.0_HanXRQr1.0-20151230_genes_id-mapping.tsv.gzMapping file to go from HA 412 to XRQ gene IDsTSV Download
v1.0Ha412v1r1_genes_v1.0_HanXRQr1.0-20151230_genes_id-mapping.tsv.gz.md5MD5 checksum of Mapping file to go from HA 412 to XRQ gene IDsMD5 checksum Download

Transposons

The transposon annotations below were generated with Tephra, a software package developed for this project. If you use these annotations in your work, please refer to the link provided on how to cite this software.

Release versionFilenameDescriptionFormatDownload
v1.0Ha412v1r1_transposons_v1.0.gff3.gzAnnotated transposons in GFF3 formatGFF3 Download
v1.0Ha412v1r1_transposons_v1.0.gff3.gz.md5MD5 checksum of annotated transposons in GFF3 formatMD5 checksum Download
v1.0Ha412v1r1_transposons_v1.0.fasta.gzFull-length nucleotide sequence for each transposonFASTA Download
v1.0Ha412v1r1_transposons_v1.0.gff3.gz.md5MD5 checksum of full-length nucleotide sequence for each transposonMD5 checksum Download

Combined annotations

Release versionFilenameDescriptionFormatDownload
v1.0Ha412v1r1_genes_transposons_v1.0.gff3.gzAnnotated gene and transposon features in GFF3 formatGFF3 Download
v2.0Ha412HO_V2.onlychr.fasta.mod.EDTA.TEanno.gff.gzAnnotated transposons in GFF3 format V2GFF3 Download
v1.0Ha412v1r1_genes_transposons_v1.0.fasta.gzFull-length gene and transposon sequencesFASTA Download
v1.0Ha412v1r1_genes_transposons_v1.0.fasta.gz.md5MD5 checksum of full-length gene and transposon sequencesMD5 checksum Download

Eugene Curated Annotations (HA412HO Version 2)

Release versionFilenameDescriptionFormatDownload
v2.0HAN412_Eugene_curated_v1_1.gff3.gzCurated Gene predictions for Assembly HAN412 provided by Eugene (INRA) GFF3 Download
v2.0PSC8_Eugene_curated_v1_1.gff3.gzCurated Gene predictions for Assembly PSC8 provided by Eugene (INRA) GFF3 Download
v2.0XRQv2_Eugene_curated_v1_1.gff3.gzCurated Gene predictions for Assembly XRQv2 provided by Eugene (INRA) GFF3 Download

File Specifications

Release versionFilenameDescriptionFormatDownload
v1.0FILE_SPECIFICATIONS.txtDetailed description of all annotation file contents and formatsPlain Text Download
v1.1Version_1_1_Description.txtEugene Annotations curation history version detailed Plain Text Download

References

  1. Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, et al. 2017. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546(7656): 148-152.