arXiv

A cross-domain tropical species dataset with Chinese vernacular names and CITES source links

Title: A Cross-Domain Tropical Species Dataset Incorporating Chinese Vernacular Names and CITES Source Links

Abstract:

This paper presents a versioned cross-domain dataset comprising 410,499 active tropical species (snapshot date: 2026-04-20). These species are categorized across three applied subdomains—tropical_plants, tropical_aquatic, and tropical_pets—which share a commercial and regulatory lifecycle despite being distributed among kingdom-organized biodiversity infrastructures. The resource integrates taxonomic identifiers from major databases, including GBIF, Plants of the World Online, iNaturalist, NCBI Taxonomy, the Catalogue of Life, and the Encyclopedia of Life. Additionally, it introduces three novel layers: a cross-domain ontology that re-segments taxa based on trade and husbandry contexts; a Chinese vernacular layer featuring explicit per-name provenance under a typology that excludes unverified machine-generated proposals; and a CITES source-linkage layer that connects each taxon to its corresponding Species+ entry.

Chinese vernacular coverage, defined as the proportion of taxa possessing a CJK Chinese name distinct from the scientific binomial, stands at 99.50 percent (408,456 out of 410,499; representing the full population). It is important to note that this metric characterizes completeness rather than translation accuracy. The latter is constrained by the four-level provenance typology and is currently undergoing a preliminary internal review, which is reported here. A blind external audit remains the primary open item for validation.

For the original-contribution layers, upstream content is referenced solely via stable identifiers, thereby facilitating CC-BY 4.0 reuse. The dataset is archived on Zenodo (DOI: 10.5281/zenodo.20377811). This preprint serves as the canonical v1.0 description of the dataset’s current state. While a future Data Descriptor submission is anticipated, it is contingent upon the completion of validation and release-engineering tasks outlined in the Limitations section.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...