Bug #1572

Improve pup_stl performance

Added by Nils Deppe about 2 years ago. Updated over 1 year ago.

Target version:
Start date:
Due date:
% Done:



The serialization of std::vector (and other STL containers) is extremely slow. The reason for this is twofold. First, push_back is used to push into a zero-size (not just empty) container, which results in an insane amount of calls to malloc. My testing found that this can be solved by changing the pup_stl_container function in pup_stl.h to be:

  template <class container,class dtype>
  inline void PUP_stl_container(er &p,container &c) {
    int nElem=PUP_stl_container_size(p,c);
    if (p.isUnpacking()) {  // Unpacking: Extract each element and push_back:
    PUP_stl_container_items<container, dtype>(p, c);

This provides a factor of 2-3 or so speedup in serializing std::vector<double>.

The second issue is that even for arithmetic types, such as int, double, and float, the same inefficient algorithm is used. By using PUParray for these cases I get a speedup of more than a factor of 4-5. Here is the changed function:

  template <class T>
  inline void operator|(er &p, typename std::vector<T> &v) {
    if (std::is_arithmetic<T>::value) {
      int number_of_elements = PUP_stl_container_size(p, v);
      if (p.isUnpacking()) {
        if (v.size() != number_of_elements) {
      PUParray(p,, number_of_elements);
    } else {
      PUP_stl_container<std::vector<T>, T>(p, v);

The code could be further optimized by removing the if statement in favor of std::enable_if in the function template parameters to "specialize" the function call for arithmetic and non-arithmetic functions. Unfortunately std::is_arithmetic is part of C++11. However, the code is easy to implement. See, for example, the libc++ implementation. At the very least this change should be made for v6.9.0.


#1 Updated by Sam White about 2 years ago

  • Category set to Migration

I can't imagine why it was written this way, and the git history offers no explanation. I added Nils's first suggestion as a patch here:

#2 Updated by Nils Deppe about 2 years ago

Thank you! I think we should keep this ticket around setting the target to 6.9.0 for the second optimization. Thinking about it now, the second optimization could be added to PUP_stl_container by checking container::value_type, but that still requires C++11.

#3 Updated by Sam White about 2 years ago

  • Target version set to 6.9.0

I added explicit specializations for the common arithmetic types so that we have that for the common cases in 6.8.0:

We can leave this open targeted to 6.9.0 to complete it once we require C++11 support.

#4 Updated by Sam White about 2 years ago

  • Subject changed from Serialization of std::vector<double> is slow to Improve pup_stl performance

The two patches I posted above were merged, so the remaining issues are to specialize for all arithmetic types (or all types declared PUPbytes), and to use reserve() when possible instead of resize() to avoid calling the constructor repeatedly when unpacking.

#5 Updated by Phil Miller about 2 years ago

Also to use emplace_back instead of push_back where possible to move instead of copy.

#6 Updated by Nils Deppe about 2 years ago

push_back has an r-value override in C++11, so as long as `std::move` is used that will work fine. I prefer using emplace_back only when wanting to call a constructor other than the copy and move, but this is merely a personal preference :)

#7 Updated by Phil Miller about 2 years ago

Ah, yeah, that makes sense. I'd agree with your preferred expression.

#8 Updated by Eric Bohm about 2 years ago

  • Status changed from New to In Progress

#9 Updated by Eric Bohm about 2 years ago

  • Assignee set to Sam White

#10 Updated by Sam White over 1 year ago

  • Status changed from In Progress to Implemented

Implemented the second optimization here, now that we are clear of 6.8.2 and are requiring C++11 support for 6.9.0:

#11 Updated by Sam White over 1 year ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF