Bug #1699

[inline] entry methods should use perfect forwarding in C++11

Added by Nils Deppe 10 months ago. Updated 2 months ago.

Target version:
Start date:
Due date:
% Done:




Currently inline entry methods are not as useful as they could be in C++11 because they do not use perfect forwarding. For example, if I call an inline entry method:


there is a still a copy done, even though in C++11 move semantics can be used to elide the copy. The necessary change to the generated receive_data method is to have:

template <class TemporalId, class ReceiveData_t>
#if __cplusplus >= 201103L
void CProxyElement_AlgorithmArray<
    Tentacle, Metavariables, OrderedActionsList, InboxTagList, TemporalId,
    InitialDataBox>::receive_data(TemporalId&& impl_noname_1,
                                  ReceiveData_t&& impl_noname_2,
                                  bool enable_if_disabled,
                                  const CkEntryOptions* impl_e_opts)
/* current code */
#endif  // __cplusplus >= 201103L

then for the call to the local object:

#if __cplusplus >= 201103L
obj->template receive_data<ReceiveTag>(std::forward<TemporalId>(impl_noname_1),
obj->template receive_data<ReceiveTag>(impl_noname_1, impl_noname_2, enable_if_disabled);

I would argue that without the use of perfect forwarding [inline] entry methods do not actually work as advertised in C++11 and so that is why I'm filing this as a bug rather than a feature request. The current workaround would be to pass a const pointer to a non-const object, then move that, but that does not actually do the same thing: if the sender wanted the receiver to copy the data (ie it's meant to be a const &) then you have a bug.

Related issues

Related to Charm++ - Feature #921: Entry tag [inline] is unable to optimize away most of the overhead Implemented 12/14/2015


#1 Updated by Sam White 10 months ago

  • Target version set to 6.9.0

#2 Updated by Eric Bohm 9 months ago

  • Assignee set to Eric Mikida

#3 Updated by Sam White 6 months ago

This looks to be straightforward? We don't need to check for '#if __cplusplus >= 201103L' since we are assuming full C++11 support in v6.9.0+

#4 Updated by Eric Mikida 6 months ago

  • Status changed from New to In Progress

Yeah I forgot to mark in progress. Have the implementation (without C++11 guards as mentioned) just had to test it and push. I'll finish that up today.

#5 Updated by Eric Mikida 6 months ago

Nils, unless I'm misunderstanding something, the above example would still either require that the objects function take a const ref, or by value (therefore still making a copy).

This is because for the code generated when the non-inlined path is taken, if the objects method just takes a regular reference, you get a compilation error for trying to bying a non-const lvalue reference to a temporary object.

Is that intended, or is there something I'm missing?

#6 Updated by Eric Mikida 6 months ago

Nevermind. The restriction I mentioned was due to a different change for marshalling objects without default constructors. It makes this patch less straightforward but I think I understand the workaround now. I still want to clarify that for inline methods you would want the decls to look something like the following, right? Specifically, the function for the object takes a reference (const or non-const should both be options).

In the ci:
entry [inline] void receiveData(DataType d)

In the h
void receiveData(DataType& d)

#7 Updated by Eric Mikida 6 months ago

After some more exploration, I'm not sure why this change is needed. Where is the extra copy you are referring to happening? The call to the proxy is pass-by-reference, and if you have your object method take a const reference, then the whole chain is pass-by-reference and no copies are made. If the objects method is not a const reference, then there is a single copy made between the proxy and your object to get pass-by-value semantics.

I guess I'm just confused by what exactly the bug is, where the extra copy occurs, and what this fix aims to do. (You can probably ignore my last updates)

#8 Updated by Sam White 5 months ago

  • Status changed from In Progress to Feedback

Nils, can you answer Eric's questions above?

#9 Updated by Nils Deppe 5 months ago

The reason is that currently without perfect forwarding the called function would receive an lvalue reference, which you should never move out of and could be really problematic. In order for the called function to properly be able to grab the data in the most efficient way I need to know whether or not I can move arguments passed to it. For example, say I call the same inline entry methods twice, once with an rvalue reference and once with a non-const lvalue reference. In the rvalue case I can safely move the data, in the lvalue case I cannot, resulting in an extra copy.

#10 Updated by Sam White 5 months ago

  • Status changed from Feedback to In Progress

#11 Updated by Sam White 5 months ago

  • Status changed from In Progress to Implemented

#12 Updated by Eric Mikida 4 months ago

The patch where this is implemented just replaces the currently generated code with the code that Nils provided above. However this breaks the API by enforcing that rvalue references always be passed, so something like proxy.entryMethod(obj) would have to be rewritten to proxy.entryMethod(std::move(obj)), which we don't want. We could have instead have the method overridden with the header Nils provided, however this would be an all or nothing approach, either all args would be r value refs, or none would. Is that sufficient? Is there a better solution I may be missing?

#13 Updated by Sam White 4 months ago

It's not clear what to do here for 6.9.0, any thoughts Nils?

#14 Updated by Nils Deppe 4 months ago

This should not be an API breaking change. If implemented correctly perfect forwarding does not break existing code, it only optimizes could in cases where it can use a move. The distinction is that the double ampersand is only added to template parameters where the type is deduced. The C++ committee seems to largely regret using double ampersand for both rvalue and forwarding references. If you point me to a commit, PR, or file I can take a look and see if I'm right and provide the solution. There are scenarios where perfect forwarding isn't possible, and we might have encountered that.

#16 Updated by Evan Ramos 4 months ago

If it would be useful at all, it is possible for lvalue and rvalue reference overloads to exist side by side.

void test(int &);
void test(int &&);

If generated code is what is affected by this, one downside would be that the number of functions that would need to be generated is 2^n in the number of parameters that need overloads of both types.

#17 Updated by Eric Mikida 4 months ago

It is certainly possible that my implementation is correct, this is really my first time doing anything specifically with r-value references. The one place that seems like it may still be an issue even if others can be resolved is typedefs, since the charmxi parser does not actually resolve them.

#18 Updated by Eric Bohm 4 months ago

  • Related to Feature #921: Entry tag [inline] is unable to optimize away most of the overhead added

#19 Updated by Evan Ramos 3 months ago

I believe the build failures we are seeing with Eric's patch are a result of the generated code not meeting the requirements for a forwarding reference:

If I understand this correctly, each forwarded parameter's type would need a corresponding function template parameter.

As an illustration, this is the kind of code that charmxi generates with the patch currently:

int f(bool&& x)
  return g(std::forward<T>(x));

This is what it would need to generate in order for x to be seen as a forwarding reference and not an rvalue reference:

template <typename T1 = bool>
int f(T1&& x)
  return g(std::forward<T>(x));

#20 Updated by Eric Mikida 3 months ago

Would that mean we'd basically have to make every single entry method a templated method? Well, I guess just every single inline and local method.

#21 Updated by Evan Ramos 3 months ago

I think so.

#22 Updated by Sam White 3 months ago

  • Tags set to #spectre

#23 Updated by Evan Ramos 3 months ago

  • Category set to Charmxi

#24 Updated by Sam White 3 months ago

  • Status changed from Implemented to Merged

This was merged last night and broke the Windows autobuilds:

../../../bin/charmc -optimize -production  -optimize -production  -o groupdependence.o groupdependence.C
c:\cygwin\home\nikhil\autobuild\netlrts-win-x86_64\charm\netlrts-win-x86_64\tests\charm++\megatest\groupdependence.def.h(1518): error C2765: 'CProxyElement_arrayA::test_em_inline': an explicit specialization of a function template cannot have any default arguments

#25 Updated by Evan Ramos 3 months ago

Sam White wrote:

This was merged last night and broke the Windows autobuilds:

Addressed in

#26 Updated by Sam White 2 months ago

Nils identified an issue with the above patch for Spectre.

New patches from Evan are here:

Also available in: Atom PDF