Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

TL/UCP: add support for onesided dynamic segments #1149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wfaderhold21 wants to merge 5 commits into openucx:master
base: master
Choose a base branch
Loading
from wfaderhold21:topic/mem_ext

Conversation

@wfaderhold21
Copy link
Collaborator

@wfaderhold21 wfaderhold21 commented Jul 3, 2025

What

This PR adds support in TL/UCP for implicit memory handle creation for onesided algorithms by the introduction of three internal interfaces: ucc_tl_ucp_dynamic_segment_{init | exchange | finalize}. The interfaces initialize a dynamic or implicit segment by registering the source and destination buffers at initialization time to create memory handles, exchanging memory handles during collective start, and finalizing (unmapping) memory on collective completion. These interfaces can be leveraged or ignored by existing or future algorithms. In the case that the user allocates and maps memory either through (1) context creation or (2) memory handles, then the interfaces do not perform any action. This is an update to PR #909.

Why ?

Even with the inclusion of PR #1070, the mapping of memory for message passing programming models such as MPI would require modification to current implementations. This is useful in situations where the messages exchanged in the collective are large.

greptile-apps[bot] reacted with thumbs up emoji
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

manjugv commented Aug 15, 2025

@wfaderhold21 Let's target this for v1.6 release. @janjust @Sergei-Lebedev can we review this in upcoming weeks?

Copy link
Collaborator Author

@janjust @Sergei-Lebedev @nsarka @ikryukov I believe I've addressed the feedback thus far. Please let me know if you have any further comments/concerns.

Copy link
Collaborator Author

@janjust @nsarka @ikryukov @Sergei-Lebedev any comments on this?

@wfaderhold21 wfaderhold21 force-pushed the topic/mem_ext branch 3 times, most recently from 715356b to 5c5253e Compare October 11, 2025 14:12
Copy link
Collaborator Author

@janjust @Sergei-Lebedev @nsarka @ikryukov any feedback would be appreciated.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces dynamic memory segment support for TL/UCP onesided collectives, enabling implicit memory handle creation at collective initialization rather than requiring pre-mapped memory regions. Three new internal interfaces (ucc_tl_ucp_dynamic_segment_{init|exchange|finalize}) handle on-demand registration of source/destination buffers, exchange of memory handles via service collectives (allgather), and cleanup on completion. The implementation removes static memory segment arrays (va_base, base_length) from team structures and adds dynamic segment metadata to task structures, allowing algorithms to defer registration until collective invocation. When memory is already mapped through context creation or explicit ucc_mem_map calls, the dynamic segment logic becomes a no-op, preserving backward compatibility. The changes primarily affect alltoall/alltoallv onesided algorithms, with corresponding test infrastructure updates to validate both pre-mapped and dynamic segment code paths.

Important Files Changed

Filename Score Overview
src/components/tl/ucp/tl_ucp_coll.c 2/5 Adds core dynamic segment implementation with memory handle creation/exchange/cleanup; contains critical memory leaks and use-after-free bugs in error paths
src/components/tl/ucp/alltoall/alltoall_onesided.c 2/5 Integrates dynamic segments into alltoall algorithm; missing memory handle swap for GET operations and unclear finalization status handling
src/components/tl/ucp/alltoallv/alltoallv_onesided.c 2/5 Adds global source memory handle support and enforces global destination handles; accesses flags field without mask validation
src/components/tl/ucp/tl_ucp_context.c 2/5 Fixes memory cleanup in unmap functions; introduces use-after-free bug at lines 827-830 where freed data is accessed
src/components/tl/ucp/tl_ucp_sendrecv.h 4/5 Refactors one-sided memory resolution with new segment resolution helper; thread-safety of concurrent rkey unpacking needs verification
src/components/tl/ucp/tl_ucp.h 4/5 Removes static segment arrays from team structure and adds public memory mapping API; architectural change affects all onesided algorithms
src/components/tl/ucp/tl_ucp_task.h 4/5 Adds task flag and embedded dynamic segment structure for per-collective memory management; well-designed integration with existing infrastructure
src/components/tl/ucp/tl_ucp_coll.h 3/5 Declares new dynamic segment interfaces with inline test helper; hard-coded threshold and alltoall-specific parameter may need refinement
src/components/tl/ucp/tl_ucp_team.c 3/5 Removes static memory info initialization from team creation; onesided algorithms must now use dynamic segments or context-level handles
test/mpi/test_alltoallv.cc 4/5 Restructures buffer allocation to use pre-mapped buffers for onesided operations and dynamic allocation otherwise; correctly implements dynamic segment support
test/gtest/coll/test_alltoall.cc 2/5 Adds dynamic segment test case but overrides flags after initialization, potentially creating inconsistent state
test/mpi/test_mpi.h 5/5 Adds memory handle tracking fields and dynamic segment flag to test infrastructure; clean and well-integrated additions
test/mpi/test_case.cc 5/5 Adds "DynSeg" label to test output for dynamic segment visibility; correct implementation following existing pattern
test/gtest/common/test_ucc.cc 5/5 Adds zero-initialization for memory map array to prevent undefined behavior; defensive programming improvement
test/mpi/test_mpi.cc 5/5 Test infrastructure changes supporting dynamic segment feature (review not detailed in file summaries)
src/components/tl/ucp/tl_ucp.c 5/5 Updates EXPORTED_MEMORY_HANDLE config from string to numeric boolean; cosmetic consistency improvement

Confidence score: 2/5

  • This PR contains critical memory management bugs including use-after-free errors and memory leaks that will cause crashes or undefined behavior in production code
  • Score reflects multiple severe issues: use-after-free in tl_ucp_context.c lines 827-830, memory leaks in tl_ucp_coll.c error paths (lines 228-229), use-after-free during cleanup loops (lines 594, 620-627), and missing memory handle swap for GET operations in alltoall_onesided.c. Additional concerns include unchecked flag field access in alltoallv_onesided.c line 35, hard-coded magic numbers in exchange loops, and unclear state handling during synchronous dynamic segment completion
  • Pay close attention to src/components/tl/ucp/tl_ucp_coll.c (memory management in error paths), src/components/tl/ucp/tl_ucp_context.c (use-after-free at lines 827-830), src/components/tl/ucp/alltoall/alltoall_onesided.c (GET operation memory handle logic), and src/components/tl/ucp/alltoallv/alltoallv_onesided.c (flags field validation)

Sequence Diagram

sequenceDiagram
 participant User
 participant UCC_Core
 participant TL_UCP_Coll
 participant Dynamic_Segment
 participant Service_Coll
 participant UCP_Worker
 User->>UCC_Core: ucc_collective_init(alltoall_onesided)
 UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_init()
 
 alt Memory handles NOT provided
 TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_init()
 Dynamic_Segment->>Dynamic_Segment: dynamic_segment_map_memh(src)
 Dynamic_Segment->>Dynamic_Segment: dynamic_segment_map_memh(dst)
 Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_map(EXPORT)
 Dynamic_Segment-->>TL_UCP_Coll: Set UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG
 end
 
 TL_UCP_Coll-->>UCC_Core: Return task handle
 
 User->>UCC_Core: ucc_collective_post(task)
 UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_start()
 
 alt Dynamic segments enabled
 TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_exchange_nb()
 Dynamic_Segment->>Dynamic_Segment: dynamic_segment_pack_memory_handles()
 Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_memh_pack()
 
 loop Exchange memory handle sizes
 Dynamic_Segment->>Service_Coll: dynamic_segment_calculate_sizes_start()
 Service_Coll->>Service_Coll: ucc_service_allgather(sizes)
 Dynamic_Segment->>Service_Coll: dynamic_segment_calculate_sizes_test()
 Service_Coll-->>Dynamic_Segment: Sizes exchanged
 end
 
 Dynamic_Segment->>Dynamic_Segment: dynamic_segment_allocate_buffers()
 
 loop Exchange packed memory handles
 Dynamic_Segment->>Service_Coll: dynamic_segment_pack_and_exchange_data_start()
 Service_Coll->>Service_Coll: ucc_service_allgather(packed_handles)
 Dynamic_Segment->>Service_Coll: dynamic_segment_pack_and_exchange_data_test()
 Service_Coll-->>Dynamic_Segment: Handles exchanged
 end
 
 Dynamic_Segment->>Dynamic_Segment: dynamic_segment_import_memory_handles()
 Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_map(IMPORT)
 Dynamic_Segment-->>TL_UCP_Coll: Memory handles ready
 end
 
 TL_UCP_Coll->>TL_UCP_Coll: Enqueue to progress queue
 
 loop Progress collective
 User->>UCC_Core: ucc_context_progress()
 UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_get/put_progress()
 
 alt Dynamic segment check
 TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_test_dynamic_segment()
 Dynamic_Segment-->>TL_UCP_Coll: Exchange complete/in-progress
 end
 
 loop Post onesided operations
 TL_UCP_Coll->>UCP_Worker: ucc_tl_ucp_get_nb/put_nb()
 UCP_Worker->>UCP_Worker: ucp_get_nbx/ucp_put_nbx()
 TL_UCP_Coll->>TL_UCP_Coll: alltoall_onesided_handle_completion()
 end
 
 TL_UCP_Coll->>TL_UCP_Coll: alltoall_onesided_wait_completion()
 UCP_Worker-->>TL_UCP_Coll: Operations complete
 end
 
 TL_UCP_Coll-->>UCC_Core: UCC_OK
 
 User->>UCC_Core: ucc_collective_finalize(task)
 UCC_Core->>TL_UCP_Coll: ucc_tl_ucp_alltoall_onesided_finalize()
 
 alt Dynamic segments used
 TL_UCP_Coll->>Dynamic_Segment: ucc_tl_ucp_coll_dynamic_segment_finalize()
 Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_unmap(IMPORT, global)
 Dynamic_Segment->>Dynamic_Segment: ucc_tl_ucp_mem_unmap(EXPORT, local)
 Dynamic_Segment->>Dynamic_Segment: Free buffers
 Dynamic_Segment-->>TL_UCP_Coll: Cleanup complete
 end
 
 TL_UCP_Coll-->>UCC_Core: Finalized
Loading

16 files reviewed, 18 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +282 to +285
for (auto i = 0; i < ctxs.size(); i++) {
ctxs[i]->args->mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER;
ctxs[i]->args->flags = 0; // No special flags for dynamic segments
}
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: overriding the mask and flags set by data_init creates inconsistent state. The function already set these based on coll_mask and coll_flags members (lines 38, 60-62). Consider setting coll_mask and coll_flags members before calling data_init instead. Should the test initialize coll_mask and coll_flags before calling data_init, or is there a specific reason to override after initialization?

data_init(size, dtype, count, ctxs, team, false);
for (auto i = 0; i < ctxs.size(); i++) {
ctxs[i]->args->mask = UCC_COLL_ARGS_FIELD_GLOBAL_WORK_BUFFER;
ctxs[i]->args->flags = 0; // No special flags for dynamic segments
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: setting flags = 0 removes the UCC_COLL_ARGS_FLAG_MEM_MAPPED_BUFFERS flag that was set on line 62 for onesided operations. This may break the onesided algorithm's assumptions.

Comment on lines 827 to 830
/* Free the TL data structure */
if (data) {
ucc_free(data);
memh->tl_data = NULL;
}
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: data is already freed in both branches above (export at line 805, import at line 820), so this check will always access freed memory. This is a use-after-free bug that could cause crashes.

return UCC_OK;
}

if (task->dynamic_segments.exchange_step < 5) {
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: magic number 5 is not self-documenting; define a constant like UCC_TL_UCP_DYN_SEG_EXCHANGE_STEPS to clarify the intended number of exchange phases

Comment on lines 279 to 290
ucc_status_t ucc_tl_ucp_coll_dynamic_segment_init(ucc_coll_args_t *coll_args,
ucc_tl_ucp_alltoall_onesided_alg_t alg,
ucc_tl_ucp_task_t *task);
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: parameter alg is typed as ucc_tl_ucp_alltoall_onesided_alg_t but dynamic segments may be used by other collective types (alltoallv, etc.); consider a more generic type or documenting the restriction. Is the alltoall-specific alg type intentional, or should this accept a generic algorithm enum?

Comment on lines +139 to +144
if (task->super.status != UCC_INPROGRESS &&
(task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG)) {
task->super.status = ucc_tl_ucp_coll_dynamic_segment_finalize(task);
}
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: dynamic segment finalization happens after completion/error but the finalize call's error status overwrites the original task status. If the task completed successfully (UCC_OK) but finalization fails, the error is propagated; however if the task already failed, the finalization error replaces it, losing the original failure reason. Should finalization errors be logged separately while preserving the original task failure status?

Comment on lines +196 to +205
if (task->super.status != UCC_INPROGRESS &&
(task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG)) {
task->super.status = ucc_tl_ucp_coll_dynamic_segment_finalize(task);
}
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: same issue as GET progress: finalization status overwrites task status

Comment on lines 209 to 226
if (task->flags & UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG) {
status = ucc_tl_ucp_coll_dynamic_segment_exchange_nb(task);
if (status == UCC_INPROGRESS) {
return ucc_progress_queue_enqueue(UCC_TL_CORE_CTX(team)->pq, &task->super);
}
if (UCC_OK != status) {
task->super.status = status;
tl_error(UCC_TL_TEAM_LIB(team),
"failed to exchange dynamic segments");
return task->super.status;
}
}
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: when dynamic segment exchange completes synchronously (returns UCC_OK), the function continues to line 223 and enqueues the task. However, the progress functions (GET/PUT) will call ucc_tl_ucp_test_dynamic_segment again at lines 106 and 164, which may not handle the already-completed case correctly. Does ucc_tl_ucp_test_dynamic_segment return UCC_OK immediately when called again after successful completion?

Comment on lines 306 to 317
status = ucc_tl_ucp_coll_dynamic_segment_init(&coll_args->args, alg, task);
if (UCC_OK != status) {
tl_error(UCC_TL_TEAM_LIB(tl_team),
"failed to initialize dynamic segments");
goto out;
}
Copy link
Contributor

@greptile-apps greptile-apps bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: if dynamic segment initialization fails, the code jumps to out: but the task object has already been allocated (line296) and is never cleaned up, causing a memory leak

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds dynamic segment infrastructure for onesided collectives, enabling implicit memory handle creation and exchange when buffers aren't pre-mapped. Three new interfaces (init, exchange_nb, finalize) manage the lifecycle of memory registration, handle exchange via service collectives, and cleanup.

Key Changes:

  • New tl_ucp_coll.c functions (771 lines) implement multi-step exchange protocol with state machine
  • alltoall_onesided.c integrates dynamic segments with conditional memory handle usage
  • alltoallv_onesided.c adds validation but doesn't support dynamic segments yet (not supported per line 185)
  • Task structure extended with dynamic_segments field tracking exchange state and buffers

Critical Issues Found:

  • Heap corruption bug: Lines 574 and 606 in tl_ucp_coll.c incorrectly call ucc_free() on pointers that point into a contiguous global_buffer, not separately allocated objects
  • Memory leak: Line 213 fails to free src_memh->tl_h->tl_data when dst mapping fails
  • Dead code: Line 559 checks i == UCC_RANK_INVALID inside a loop where this can never be true

Other Concerns:

  • Finalization errors overwrite original task status (alltoall lines 141, 198)
  • No validation that handles are global when provided by user in some paths

Confidence Score: 1/5

  • Critical heap corruption and memory leak bugs make this PR unsafe to merge without fixes
  • Multiple critical memory management bugs will cause crashes or corruption: freeing pointers into contiguous buffer allocation (lines 574, 606) and leaking allocated data on error path (line 213). These are deterministic bugs that will trigger in error scenarios
  • src/components/tl/ucp/tl_ucp_coll.c requires immediate attention - contains heap corruption bug at lines 574 and 606, plus memory leak at line 213. Must be fixed before merge

Important Files Changed

File Analysis

Filename Score Overview
src/components/tl/ucp/tl_ucp_coll.c 1/5 added 771 lines for dynamic segment infrastructure. Critical bugs: heap corruption from freeing pointers into contiguous buffer (lines 574, 606), memory leak when dst mapping fails (line 213)
src/components/tl/ucp/alltoall/alltoall_onesided.c 3/5 integrated dynamic segment support for alltoall. Memory handle swap logic preserved for GET but applied conditionally. Finalization errors may overwrite original task status
src/components/tl/ucp/alltoallv/alltoallv_onesided.c 4/5 minor formatting and validation improvements. Added check for global dst memory handle flag. No dynamic segment integration yet (alltoallv not supported per line 185)
src/components/tl/ucp/tl_ucp_coll.h 4/5 added enum for exchange steps, three new function declarations for dynamic segment lifecycle. Clean interface design with test helper function
src/components/tl/ucp/tl_ucp_task.h 4/5 added UCC_TL_UCP_TASK_FLAG_USE_DYN_SEG flag and dynamic_segments struct with exchange state, buffers, and service collective requests. Well-structured additions

Sequence Diagram

sequenceDiagram
 participant App as Application
 participant Alg as Alltoall Algorithm
 participant Init as dynamic_segment_init
 participant Exch as dynamic_segment_exchange_nb
 participant Fin as dynamic_segment_finalize
 participant Svc as Service Collective
 
 App->>Alg: ucc_tl_ucp_alltoall_onesided_init
 Alg->>Init: ucc_tl_ucp_coll_dynamic_segment_init
 Note over Init: Check if buffers already mapped
 alt Buffers not mapped
 Init->>Init: map src_memh via ucc_tl_ucp_mem_map
 Init->>Init: map dst_memh via ucc_tl_ucp_mem_map
 Init->>Alg: Set USE_DYN_SEG flag
 else Buffers pre-mapped
 Init->>Alg: Return UCC_OK (no-op)
 end
 
 App->>Alg: ucc_tl_ucp_alltoall_onesided_start
 Alg->>Exch: ucc_tl_ucp_coll_dynamic_segment_exchange_nb
 
 Note over Exch: STEP_INIT: Pack memory handles
 Exch->>Exch: dynamic_segment_pack_memory_handles
 Exch->>Alg: Return UCC_INPROGRESS
 
 App->>Alg: progress (via task queue)
 Alg->>Exch: ucc_tl_ucp_test_dynamic_segment
 Note over Exch: STEP_SIZE_TEST: Exchange sizes
 Exch->>Svc: Start allgather for pack sizes
 Exch->>Alg: Return UCC_INPROGRESS
 
 App->>Alg: progress
 Alg->>Exch: continue exchange
 Exch->>Svc: Test allgather completion
 Svc-->>Exch: Complete with global sizes
 
 Note over Exch: STEP_DATA_ALLOC: Allocate buffers
 Exch->>Exch: dynamic_segment_allocate_buffers
 
 Note over Exch: STEP_DATA_START: Exchange handles
 Exch->>Svc: Start allgather for packed handles
 Exch->>Alg: Return UCC_INPROGRESS
 
 App->>Alg: progress
 Alg->>Exch: continue exchange
 Exch->>Svc: Test allgather completion
 Svc-->>Exch: Complete with all handles
 
 Note over Exch: STEP_DATA_TEST: Import handles
 Exch->>Exch: dynamic_segment_import_memory_handles
 loop For each rank
 Exch->>Exch: ucc_tl_ucp_mem_map(IMPORT)
 end
 Exch->>Exch: Set exchange_step = COMPLETE
 Exch->>Alg: Return UCC_OK
 
 Note over Alg: Execute onesided operations
 Alg->>Alg: ucc_tl_ucp_get_nb / ucc_tl_ucp_put_nb
 
 App->>Alg: progress until complete
 
 App->>Alg: Task completion/finalize
 Alg->>Fin: ucc_tl_ucp_coll_dynamic_segment_finalize
 loop For each global handle
 Fin->>Fin: ucc_tl_ucp_mem_unmap(IMPORT)
 end
 Fin->>Fin: unmap src_local and dst_local
 Fin->>Alg: Return UCC_OK
Loading

9 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Copy link
Collaborator Author

based on WG discussions, will remove a2av support from here. cleanup and if OK will merge

Copy link
Contributor

greptile-apps bot commented Dec 18, 2025
edited
Loading

Greptile Summary

This PR implements dynamic (implicit) memory handle creation and exchange for one-sided collective algorithms in TL/UCP. The implementation introduces three new interfaces (ucc_tl_ucp_dynamic_segment_{init | exchange | finalize}) that handle registration, exchange via service collectives, and cleanup of memory handles at runtime when user-provided handles are not available.

Major changes:

  • Adds 760+ lines in tl_ucp_coll.c implementing multi-step memory handle exchange using service allgathers
  • Integrates dynamic segments with alltoall GET/PUT algorithms, removing requirement for pre-mapped buffers
  • Renames ucc_tl_ucp_alltoall_onesided_alg_t to generic ucc_tl_ucp_onesided_alg_type for broader use
  • Removes unused va_base/base_length team fields
  • Adds comprehensive test for dynamic segments in alltoall

Key issue found:

  • Service collective request leak in error cleanup path (line 770): if an error occurs after starting an allgather but before completion, the in-progress service requests are not finalized

The implementation properly handles most error paths and memory management after addressing previous review feedback. Memory handle assignment for GET/PUT algorithms appears correct (GET uses dst_local/src_global, PUT uses src_local/dst_global).

Confidence Score: 3/5

  • This PR has one resource leak issue that should be fixed, but otherwise implements a complex feature with reasonable correctness
  • Score reflects the service collective request leak in error paths which could cause resource exhaustion over time. The core implementation logic appears sound and previous review feedback has been addressed, but this cleanup issue needs resolution before merge.
  • Pay close attention to src/components/tl/ucp/tl_ucp_coll.c error cleanup paths (line 770)

Important Files Changed

Filename Overview
src/components/tl/ucp/tl_ucp_coll.c Adds 760+ lines implementing dynamic segment memory handle exchange infrastructure. Memory management appears correct after previous fixes. Core logic handles packing, allgather exchange, and unpacking of memory handles across ranks.
src/components/tl/ucp/alltoall/alltoall_onesided.c Integrates dynamic segment exchange with alltoall GET/PUT algorithms. Removes requirement for pre-mapped buffers. Memory handle assignment looks correct (dst_local/src_global for GET, src_local/dst_global for PUT).
src/components/tl/ucp/tl_ucp_task.h Adds dynamic_segments struct to task with local/global memh pointers, exchange args, and state tracking. Structure design appears sound for managing multi-step exchange process.
test/gtest/coll/test_alltoall.cc Adds test for dynamic segments. Test properly overrides mask/flags to simulate non-premapped buffers. Initializes coll_mask/coll_flags members correctly.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (6)

  1. src/components/tl/ucp/tl_ucp_coll.c, line 597 (link)

    logic: global[j] is not a separately allocated object—it points into the contiguous global_buffer allocated at line 567. Calling ucc_free(global[j]) will corrupt the heap. Only free global[j]->tl_h (already done at line 597) and the global array itself, not individual entries.

  2. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 117-118 (link)

    logic: memory handle swap missing for GET with dynamic segments. The original code (lines 95-98) swaps src_memh and dst_memh for GET because remote reads need the destination buffer's rkey. Without this swap for dynamic segments, remote address resolution will fail.

  3. test/gtest/coll/test_alltoall.cc, line 283-284 (link)

    logic: overriding mask and flags after data_init discards values set by the function (lines 38, 60-62). The function already configured these fields based on coll_mask and coll_flags members. Set this->coll_mask and this->coll_flags before calling data_init instead of overriding afterward.

  4. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 138-141 (link)

    style: finalization errors overwrite original task status. If the task completed successfully but finalization fails, the error propagates correctly. However, if the task already failed, finalization errors replace the original failure reason. Consider logging finalization errors separately while preserving the original status when task->super.status != UCC_OK.

  5. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 306-310 (link)

    logic: if dynamic segment initialization fails (line 306), the code jumps to out: but task was already allocated at line 296 and is never freed, causing a memory leak. Add cleanup for task before the out: label or use a different error path.

  6. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 209-223 (link)

    style: when exchange completes synchronously (returns UCC_OK at line 209), execution continues to line 223 and enqueues the task. Progress functions will call ucc_tl_ucp_test_dynamic_segment again (lines 106, 164). Verify that ucc_tl_ucp_test_dynamic_segment returns UCC_OK immediately when exchange_step == COMPLETE.

13 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

TL/UCP: alltoall onesided convert to dyn seg
TEST: add mpi dyn seg alltoallv test
REVIEW: fix clang-tidy
REVIEW: address feedback
REVIEW: cleanup rebase
REVIEW: address feedback
TL/UCP: add dynamic segment get support
REVIEW: fix clang-tidy
REVIEW: fix clang-tidy
REVIEW: fix asan error
REVIEW: fix clang-tidy
TL/UCP: revert alltoallv dynamic segments
REVIEW: fix clang-tidy
REVIEW: address feedback
TEST: remove alltoallv dynseg
REVIEW: address feedback
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. src/components/tl/ucp/tl_ucp_coll.c, line 198-201 (link)

    logic: condition should also check that the memory handles are present in the args, not just that the mask fields are set. Both src and dst memh could be NULL even if the mask is set.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. src/components/tl/ucp/tl_ucp_coll.c, line 770-776 (link)

    logic: potential service collective request leak: if an error occurs after starting scoll_req_sizes or scoll_req_data but before they complete, these requests are not finalized in the cleanup path

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@Sergei-Lebedev Sergei-Lebedev Awaiting requested review from Sergei-Lebedev

@janjust janjust Awaiting requested review from janjust

@nsarka nsarka Awaiting requested review from nsarka

@ikryukov ikryukov Awaiting requested review from ikryukov

1 more reviewer

@greptile-apps greptile-apps[bot] greptile-apps[bot] left review comments

Reviewers whose approvals may not affect merge requirements

At least 2 approving reviews are required to merge this pull request.

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /