Tuple reducer leaks memory when using set/concat/custom reducers
Tuple reductions were added after 6.7.1, but it looks like only CkReductionMgr::reduceMessages() was updated with the necessary fixes for tuple/set reductions. It looks like tuple reductions are currently untested on sections/CkMulticast.
Also there are two different TODOs noted in the code related to memory management in tuple reductions, one potentially a memory leak. These need to be addressed.
There is already an open issue for supporting [reductiontarget]'s with Tuple reductions.
#1 Updated by Sam White over 2 years ago
One of the TODOs, in CkReduction::tupleReduction(), is concerning:
CkReductionMsg* result = reducerFp(num_messages, simulated_messages); return_data[reduction_idx] = CkReduction::tupleElement(result->getLength(), result->getData(), reducerType); // TODO - leak - the built in reducers all reuse message memory, so this is not safe to delete // delete result;
It looks like the result msg is being leaked for tuple reductions on non-builtin reducers (any custom reductions that do not reuse msgs).
The other concerning part here is that only the copy of the reduction code in CkReductionMgr was updated with the fix for tuple reductions that consist of set reducers, so CkMulticast and the other duplicates of the reduction code are seemingly missing this.
#4 Updated by Sam White over 2 years ago
- Subject changed from Investigate completeness of Tuple reduction support to Potential memory leak in Tuple reducer
Juan noted in review of the above patch that CkMulticast is fine. The memory leak is the main concern left here (besides the fact that tuple reductions can't be used in reductiontarget entry methods, but that has a separate redmine issue).
#7 Updated by Sam White about 2 years ago
It looks like the issue here is that all of the builtin reducers--except set and concat--reuse one of their input messages as the returned message, since the size of all messages is the same, so it is not safe for tuple reducers to delete its intermediate result message. Consequently, if you do a tuple reduction composed of a set or concat or user-defined reducer that doesn't reuse message memory, the tuple reducer will leak that memory.
Here's valgrind output of a tuple/set reduction:
==24117== 5,600 bytes in 16 blocks are definitely lost in loss record 291 of 309 ==24117== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==24117== by 0x6BC9D5: malloc_nomigrate (libmemory-default.c:724) ==24117== by 0x6C9532: CmiAlloc (convcore.c:2930) ==24117== by 0x5D85E3: envelope::alloc(unsigned char, unsigned int, unsigned short) (envelope.h:312) ==24117== by 0x5D8939: _allocEnv(int, int, int) (envelope.h:485) ==24117== by 0x5EC80C: CkAllocMsg (msgalloc.C:21) ==24117== by 0x64E8D4: CkReductionMsg::alloc(int, unsigned long, int*, int) (ckreduction.C:1250) ==24117== by 0x658775: CMessage_CkReductionMsg::operator new(unsigned long, int*, int) (CkReduction.def.h:610) ==24117== by 0x64E80B: CkReductionMsg::buildNew(int, void const*, CkReduction::reducerType, CkReductionMsg*) (ckreduction.C:1226) ==24117== by 0x652E23: set(int, CkReductionMsg**) (ckreduction.C:1502) ==24117== by 0x653B38: CkReduction::tupleReduction(int, CkReductionMsg**) (ckreduction.C:1723) ==24117== by 0x64DB63: CkReductionMgr::reduceMessages() (ckreduction.C:951)
There is currently no way for the tuple reducer to know whether the reducer it is using is allocating new memory or not. That API would need to be added. I see two options:
1. Add another argument to addReducer()
2. Add another argument to CkReduction::tupleElement's constructor
I think 1 is a better design (because this is a property of the reducer itself) but 2 is less intrusive (only needs to be added for reducers that are actually used in tuple reductions)... Thoughts?
#8 Updated by Phil Miller about 2 years ago
Design 1 is also preferable (I think, if I understand correctly) because such a declaration would only have to appear once, ever, while in design 2, every call site to a tuple reduction using a reducer of concern would have to make the necessary declaration.
As a heuristic, would it be safe to implement and document that the tuple reducer will compare the message pointer returned from the reducer to the first message passed into it, and act based on that?
#9 Updated by Sam White about 2 years ago
Ah, I didn't think of that. We can indeed just check the returned pointer against the zeroth message passed into it, document that if a reducer is going to reuse a message it needs to be message 0, and then we don't need any API changes.
My first attempt at that solution passed all AMPI gather tests, but fails the typed_reduction example's test. The difference between the two is that AMPI uses a tuple reduction consisting of 2 set reductions. typed_reduction uses a tuple reduction consisting of 5 different reductions: max, sum, min, statistics, and set. The first 4 of those reductions pass their correctness tests but for some reason the set reducer's message data is incorrect...