57b3ea[private]
Details redacted
mediumnative c4held-out
This task's identity is intentionally redacted. The score below reflects DeepSource's run against it; CVE details, the repo, and the disclosure date are not disclosed. See methodology.
- kind
- native
- complexity
- 4/10
- recency
- current
Runs
20 fcv cases on this task| model | trajectory | |||
|---|---|---|---|---|
gpt-5.3-codex | pass | 52.3 | 240s | — |
gpt-5.3-codex | pass | 52.3 | 240s | — |
gpt-5.3-codex | pass | 52.3 | 240s | — |
gpt-5.5 | pass | 75.9 | 364s | — |
gpt-5.5 | pass | 75.9 | 364s | — |
gpt-5.5 | pass | 75.9 | 364s | — |
claude-opus-4-8 | pass | 107.9 | 1169s | — |
claude-opus-4-8 | pass | 107.9 | 1169s | — |
claude-opus-4-8 | pass | 107.9 | 1169s | — |
glm-5.1 | pass | 103.3 | 1605s | — |
claude-haiku-4-5 | fail | 90.5 | 530s | — |
claude-haiku-4-5 | fail | 90.5 | 530s | — |
claude-haiku-4-5 | fail | 90.5 | 530s | — |
glm-5.2 | fail | 75.4 | 823s | — |
glm-5.2 | fail | 75.4 | 823s | — |
glm-5.2 | fail | 75.4 | 823s | — |
minimax-m2.7 | fail | 86.9 | 846s | — |
minimax-m2.7 | fail | 86.9 | 846s | — |
minimax-m2.7 | fail | 86.9 | 846s | — |
deepseek-v4-pro | fail | 78.7 | 864s | — |
deepseek-v4-pro | fail | 78.7 | 864s | — |
deepseek-v4-pro | fail | 78.7 | 864s | — |
kimi-k2.7-code | fail | 103.7 | 1103s | — |
kimi-k2.7-code | fail | 103.7 | 1103s | — |
kimi-k2.7-code | fail | 103.7 | 1103s | — |
glm-5.1 | fail | 103.3 | 1605s | — |
glm-5.1 | fail | 103.3 | 1605s | — |
kimi-k2.6 | fail | 121.6 | 2519s | — |
kimi-k2.6 | fail | 121.6 | 2519s | — |
kimi-k2.6 | fail | 121.6 | 2519s | — |