Data

Where CVE envs come from, the ecosystems we cover, and licensing.

Sourcing

CVE envs are sourced from OSV, GHSA, and distro advisories. To defend against training-data contamination, only CVEs disclosed within the last 30 days at batch-cut time are admitted (see methodology). Each env is sourced from the upstream fix commit plus a curator-authored reproducer and gold patch.

Ecosystem scope

Languages currently in scope: Python, C, C++, Rust, Go. Vulnerability classes vary across the batch — memory safety, path traversal, SSRF, race conditions, symlink-following, unbounded recursion. The full list of admitted envs is at tasks.

Licensing

License pending public release. The CVEGym eval engine and dataset are not yet open-source. While the repo stays private, neither the eval set nor the task fixtures carry a published license — formal citation of CVEGym in model cards is not supported until the repo opens. Until then, all run access is by direct request — email the team.

When the repo opens, this section will reflect the chosen license (likely Apache 2.0, pending legal review of the per-task source upstreams).