Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ scoring regardless of reason; the id is informational.
distinguish a real implementation from a stub.
- `outcome_dependent_presence` — test appears in some eval runs but not others.
- `slow_or_hang` — test hangs mid-call or exceeds a duration threshold.
- `dependency_ignored` — a kept test that `@pytest.mark.dependency(depends=[X])` on an
*ignored* test `X`. Because `X` is deselected at collection, `pytest-dependency` skips
the dependent, so it would count as unresolved through no fault of the submission. The
reason `note` records the prerequisite (`depends on ignored test <X>`).
- `ignored_manual` — manually excluded.

## Quick reference
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -729,8 +729,8 @@
"name": "eval.tests.test_args_parsing.test_u_out_of_range_or_overflow_values_still_enter_render_loop[args2]",
"reasons": [
{
"id": "ignored_manual",
"description": "This test passes -u 999999999999999999999999, which overflows C's 32-bit int via atoi(), wrapping to a small/negative value so the original binary enters the render loop. This is undefined behavior in C, not intentional design. A correct reimplementation in a language without integer overflow (e.g. Python) would reasonably reject or hang on this value. The test enshrines accidental C overflow semantics rather than meaningful program behavior."
"description": "This test passes -u 999999999999999999999999, which overflows C's 32-bit int via atoi(), wrapping to a small/negative value so the original binary enters the render loop. This is undefined behavior in C, not intentional design. A correct reimplementation in a language without integer overflow (e.g. Python) would reasonably reject or hang on this value. The test enshrines accidental C overflow semantics rather than meaningful program behavior.",
"id": "ignored_manual"
}
]
}
Expand Down Expand Up @@ -3202,6 +3202,17 @@
}
]
},
{
"name": "eval.tests.test_tui_cmatrix.test_screen_changes_over_time",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_launch_draws_nonempty_screen",
"timestamp": 1781756185,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_tui_cmatrix.test_unbound_key_does_not_crash_in_normal_mode",
"reasons": [
Expand Down
33 changes: 33 additions & 0 deletions src/programbench/data/tasks/antonmedv__fx.86d0d34/tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -13178,6 +13178,17 @@
"eval.tests.test_tui_search.test_search_prev_with_N"
],
"ignored_tests": [
{
"name": "eval.tests.test_tui_actions.test_command_line_cancel",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_command_line_with_colon",
"timestamp": 1781756196,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_tui_actions.test_command_line_with_colon",
"reasons": [
Expand Down Expand Up @@ -13242,6 +13253,17 @@
}
]
},
{
"name": "eval.tests.test_tui_search.test_search_cancel_with_esc",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_search_opens_with_slash",
"timestamp": 1781756196,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_tui_search.test_search_case_sensitive",
"reasons": [
Expand Down Expand Up @@ -13274,6 +13296,17 @@
}
]
},
{
"name": "eval.tests.test_tui_search.test_search_no_results",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_search_opens_with_slash",
"timestamp": 1781756196,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_tui_search.test_search_opens_with_slash",
"reasons": [
Expand Down
22 changes: 22 additions & 0 deletions src/programbench/data/tasks/antonmedv__walk.bf802ef/tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -3821,6 +3821,28 @@
}
]
},
{
"name": "eval.tests.test_tui.test_tui_shows_directories",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_tui_launches",
"timestamp": 1781756196,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_tui.test_tui_shows_files",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_tui_launches",
"timestamp": 1781756196,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_tui.test_yank_directory",
"reasons": [
Expand Down
33 changes: 33 additions & 0 deletions src/programbench/data/tasks/nuta__nsh.bdd0702/tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -5337,6 +5337,17 @@
}
]
},
{
"name": "eval.tests.test_history_mode.test_history_mode_ctrl_c_exits",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_history_mode_activates",
"timestamp": 1781756408,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_history_mode.test_history_mode_empty_search",
"reasons": [
Expand Down Expand Up @@ -5407,6 +5418,28 @@
}
]
},
{
"name": "eval.tests.test_history_mode.test_history_mode_shows_commands",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_history_mode_activates",
"timestamp": 1781756408,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_history_mode.test_history_mode_shows_help_text",
"reasons": [
{
"id": "dependency_ignored",
"note": "depends on ignored test test_history_mode_activates",
"timestamp": 1781756408,
"user": "kilian"
}
]
},
{
"name": "eval.tests.test_history_mode.test_history_mode_tab_edits",
"reasons": [
Expand Down
Loading