ci(test): enable pytest-xdist for cuopt Python tests#1486
ci(test): enable pytest-xdist for cuopt Python tests#1486ramakrishnap-nv wants to merge 6 commits into
Conversation
Add pytest-xdist to test_python_common dependencies and pass -n auto to the non-nightly cuopt pytest invocation to parallelize tests across available CPUs. Server tests are excluded due to fixed-port binding (localhost:18900). The run_cuopt_pytests.sh script already cd-s into python/cuopt/cuopt where xdist+coverage is known to work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test eb9a30f |
CI Test Summary✅ All 31 test job(s) passed. |
-n auto caused two failures: - cudaErrorInvalidDevice: too many workers competing for the same GPU - warmstart timeout: two workers starting gRPC servers on the same port Switch to -n 2 to limit GPU contention. Fix the gRPC port conflict in _start_grpc_server_fixture by adding the xdist worker ID (from PYTEST_XDIST_WORKER) to the port, giving each worker a unique port within the 100-unit gap between fixture classes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
|
/ok to test 10d42be |
-n 2 passed cleanly; try -n 4 for more parallelism. The CI GPUs (L4/24GB, H100/80GB, RTX Pro 6000/48GB) have enough VRAM for 4 concurrent CUDA contexts without contention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
-n 4 is untested across all GPU types; stay conservative at -n 2 which is known clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
📝 WalkthroughWalkthroughThis PR adds Changespytest-xdist Parallel Test Execution
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Add -n 2 to run_cuopt_server_pytests.sh. Fix the port conflict by introducing _worker_port() in utils.py, which offsets the base port (5555) by the xdist worker ID (from PYTEST_XDIST_WORKER). Both the cuoptproc fixture and RequestClient default port now use _worker_port() so each worker gets its own server instance on a unique port. Falls back to 5555 when not running under xdist. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
python/cuopt_server/cuopt_server/tests/utils/utils.py (2)
20-23: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDuplicate worker-id parsing logic across test files.
The
PYTEST_XDIST_WORKER→ worker-id parsing here is essentially duplicated frompython/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py(_start_grpc_server_fixture). Consider extracting a shared helper (e.g., in a common test-utils module orxdist's ownworker_idfixture/get_xdist_worker_id()) instead of reimplementing thegwNparsing in two places.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt_server/cuopt_server/tests/utils/utils.py` around lines 20 - 23, The worker-id parsing logic in _worker_port duplicates the same PYTEST_XDIST_WORKER handling already present in _start_grpc_server_fixture, so refactor both call sites to use a shared helper instead of re-parsing gwN in multiple tests. Move the worker id extraction into a common test utility module or switch both places to xdist’s built-in worker-id support (for example a helper like get_xdist_worker_id() or the worker_id fixture), and update _worker_port to consume that shared helper.
368-368: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueDefault port computed at import time, not call time.
port=_worker_port()is evaluated once when the class is defined (module import), unlikecuoptproc's use of_worker_port()at fixture-execution time. This happens to work today becausePYTEST_XDIST_WORKERis set before worker-side test collection, but it's a fragile pattern relative to the runtime evaluation used elsewhere in this same file.♻️ Suggested fix for lazy evaluation
- def __init__(self, port=_worker_port()): + def __init__(self, port=None): + if port is None: + port = _worker_port()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt_server/cuopt_server/tests/utils/utils.py` at line 368, The default port for the class initializer is being computed too early because __init__ uses port=_worker_port() at definition/import time instead of when an instance is created. Update the initializer so _worker_port() is called lazily inside the constructor path, matching the runtime evaluation pattern used by cuoptproc in this file, and keep the default behavior unchanged for callers that do not pass an explicit port.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@python/cuopt_server/cuopt_server/tests/utils/utils.py`:
- Around line 20-23: The worker-id parsing logic in _worker_port duplicates the
same PYTEST_XDIST_WORKER handling already present in _start_grpc_server_fixture,
so refactor both call sites to use a shared helper instead of re-parsing gwN in
multiple tests. Move the worker id extraction into a common test utility module
or switch both places to xdist’s built-in worker-id support (for example a
helper like get_xdist_worker_id() or the worker_id fixture), and update
_worker_port to consume that shared helper.
- Line 368: The default port for the class initializer is being computed too
early because __init__ uses port=_worker_port() at definition/import time
instead of when an instance is created. Update the initializer so _worker_port()
is called lazily inside the constructor path, matching the runtime evaluation
pattern used by cuoptproc in this file, and keep the default behavior unchanged
for callers that do not pass an explicit port.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4d52f55b-597f-4924-be2f-21a9c38815d7
📒 Files selected for processing (2)
ci/run_cuopt_server_pytests.shpython/cuopt_server/cuopt_server/tests/utils/utils.py
Summary
pytest-xdisttotest_python_commondependencies (propagated to conda envs and pyproject.toml files via dependency file generator)-n autoto the non-nightly cuopt pytest invocation inci/run_cuopt_pytests.shrun_cuopt_server_pytests.sh) are unchanged — they bind a fixed port (localhost:18900) that would conflict across parallel workersThe script already
cds intopython/cuopt/cuoptwhere xdist + coverage is known to work (noted in the existing comment on line 7).Test plan
conda-python-testsjob shows parallel test workers in the logconda-python-testswall time is reduced compared to baseline🤖 Generated with Claude Code