A cross-domain tropical species dataset with Chinese vernacular names and CITES source links
Title: A Cross-Domain Tropical Species Dataset Incorporating Chinese Vernacular Names and CITES Source Links
Abstract:
This paper presents a versioned cross-domain dataset comprising 410,499 active tropical species (snapshot date: 2026-04-20). These species are categorized across three applied subdomains—tropical_plants, tropical_aquatic, and tropical_pets—which share a commercial and regulatory lifecycle despite being distributed among kingdom-organized biodiversity infrastructures. The resource integrates taxonomic identifiers from major databases, including GBIF, Plants of the World Online, iNaturalist, NCBI Taxonomy, the Catalogue of Life, and the Encyclopedia of Life. Additionally, it introduces three novel layers: a cross-domain ontology that re-segments taxa based on trade and husbandry contexts; a Chinese vernacular layer featuring explicit per-name provenance under a typology that excludes unverified machine-generated proposals; and a CITES source-linkage layer that connects each taxon to its corresponding Species+ entry.
Chinese vernacular coverage, defined as the proportion of taxa possessing a CJK Chinese name distinct from the scientific binomial, stands at 99.50 percent (408,456 out of 410,499; representing the full population). It is important to note that this metric characterizes completeness rather than translation accuracy. The latter is constrained by the four-level provenance typology and is currently undergoing a preliminary internal review, which is reported here. A blind external audit remains the primary open item for validation.
For the original-contribution layers, upstream content is referenced solely via stable identifiers, thereby facilitating CC-BY 4.0 reuse. The dataset is archived on Zenodo (DOI: 10.5281/zenodo.20377811). This preprint serves as the canonical v1.0 description of the dataset’s current state. While a future Data Descriptor submission is anticipated, it is contingent upon the completion of validation and release-engineering tasks outlined in the Limitations section.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





