test: clean up gRPC server process groups#1492
Conversation
1bbc8d7 to
df1c0de
Compare
|
/ok to test df1c0de |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughRefactors ChangesServerProcess process-group lifecycle refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/tests/linear_programming/grpc/grpc_integration_test.cpp`:
- Around line 162-181: The stop() logic in grpc_integration_test.cpp only sends
SIGKILL inside the branch where reap_parent() fails, so a lingering GPU worker
can survive when the server parent exits promptly. Update stop() to escalate to
kill(-pgid_, SIGKILL) whenever wait_for_group_exit() shows the process group has
not exited, not just when reap_parent() returns false, and keep the cleanup
around clear_lifecycle_state() only after the full group is confirmed gone.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f3c4bb1d-39d0-4334-81f9-b3f48630ebc2
📒 Files selected for processing (1)
cpp/tests/linear_programming/grpc/grpc_integration_test.cpp
df1c0de to
fd1e773
Compare
|
/ok to test fd1e773 |
CI Test Summary✅ All 31 test job(s) passed. |
Signed-off-by: Miles Lubin <mlubin@nvidia.com>
fd1e773 to
d4ec23e
Compare
|
/ok to test d4ec23e |
|
/merge |
Makes the gRPC integration test's ServerProcess helper clean up reliably. Each spawned cuopt_grpc_server is now put in its own process group (setpgid), and teardown signals the whole group, so the GPU worker child can no longer be orphaned when a test ends or a server is restarted.
stop() now returns success/failure and is idempotent, the process group is verified before it's ever used as a kill target, and teardown paths assert the group is gone. Adds ServerProcessLifecycleTests covering idle stop, idempotent stop, and crash-during-solve cleanup.
Test-only change; no library or API changes.
The change is motivated by CI failures like https://github.com/NVIDIA/cuopt/actions/runs/28380812204/job/84090251116?pr=1481. The Codex-diagnosed issue here is:
ServerProcess::stop()signals only the server parent.spawn_worker().