Project

General

Profile

Activity

From 04/26/2017 to 05/25/2017

05/25/2017

06:36 PM Bug #1574 (Merged): lrts smp/multicore megacon build fails with undefined reference to `TraceTime...
One example, on Bridges:... Jim Phillips
05:45 PM Bug #1572: Improve pup_stl performance
I can't imagine why it was written this way, and the git history offers no explanation. I added Nils's first suggesti... Sam White
12:08 PM Bug #1572 (Merged): Improve pup_stl performance
The serialization of @std::vector@ (and other STL containers) is extremely slow. The reason for this is twofold. Firs... Nils Deppe
04:46 PM Bug #1507 (In Progress): ckio test failure on gni-crayxc
Re-opening pending investigation Phil Miller
08:57 AM Bug #1507: ckio test failure on gni-crayxc
I'm not sure if it is the same bug, but after several checkpoints I get the following errors while in CkIO:
--------...
Thomas Quinn
04:21 PM Bug #1559 (Merged): cpuaffinity.c build errors on Linux and Win64 with --enable-tracing
The patches intended to fix the first two errors have been merged, so I'm closing this. If the @TraceTimerCommon@ fai... Phil Miller
02:44 PM Bug #635 (In Progress): all trees should be pe/node/physnode/network topology aware
Juan Galvez
02:44 PM Feature #1573: Make HDF5 library available for AMPI
CkExit has no return code parameter - how do charm++ applications signal failure? Matthias Diener
02:18 PM Feature #1573: Make HDF5 library available for AMPI
We can now. It used to be that CkExit could only be called once by something on PE 0, I believe. That was changed rec... Sam White
02:16 PM Feature #1573: Make HDF5 library available for AMPI
Ok, I'll do that. Regarding the exit() vs. CkExit() calls, can't we just @#define exit(foo) CkExit(foo)@ in ampi.h ? Matthias Diener
02:14 PM Feature #1573: Make HDF5 library available for AMPI
I would recompile AMPI with MSG_ORDER_DEBUG prints (top of ampi.C) enabled. That usually helps debug hangs. Sam White
02:11 PM Feature #1573: Make HDF5 library available for AMPI
With some modest changes, hdf5 compiles successfully (serial+parallel version).
The following serial tests (@make te...
Matthias Diener
12:31 PM Feature #1573 (Merged): Make HDF5 library available for AMPI
Currently, the hdf5 library needs some changes to work correctly under AMPI:
* -exit vs. CkExit()-
* -charmrun outp...
Matthias Diener
12:41 PM Bug #1571 (Merged): Documentation for ReadOnly is inaccurate regarding the number of copies per p...
Phil Miller

05/24/2017

12:55 PM Projections Feature #1524: Time Profile With Bracketed User Events
There have been substantial changes to the visualization of user bracketed events in Projections that are scheduled t... Matthias Diener
10:47 PM Bug #1571 (Implemented): Documentation for ReadOnly is inaccurate regarding the number of copies ...
https://charm.cs.illinois.edu/gerrit/#/c/2549/ Sam White

05/23/2017

02:51 PM Bug #1571 (Merged): Documentation for ReadOnly is inaccurate regarding the number of copies per p...
"They are broadcast to every PE by the Charm++ runtime, and can be accessed in the same way as C++ ``global'' variabl... Eric Bohm
01:47 PM Bug #1553 (In Progress): Support for sdag entry method with rdma parameter
I have finished implementing the SDAG code for rdma entry methods. The usecase for this is the stencil3d example wher... Nitin Bhat

05/22/2017

09:52 AM Feature #1237 (Implemented): Onesided sender side implementation for GNI layer
Feature: https://charm.cs.illinois.edu/gerrit/#/c/1908/
- Used buffering of short messages for sending messages whe...
Nitin Bhat
09:33 PM Bug #1509: -tracemode summary always fails an assertion at exit
Does '-tracemode summary' now pass 'make test' for LIBS? If so, we should add it to one of the autobuild targets Sam White
09:26 PM Bug #1522 (Rejected): Verbs failure on small messages
This only happens with CmiDirect, which I think is being replaced by rdma entry methods... Sam White

05/21/2017

04:32 PM Bug #1542: CkArrayCreated callback should be part of CkArrayOptions
We may want to get any API change here into 6.8.0. We need to decide on the relationship between the new initCallback... Sam White
03:03 PM Feature #1088: Trace MPI_ functions in AMPI
Change std::map to std::unordered_map: https://charm.cs.illinois.edu/gerrit/#/c/2545/ Sam White
02:46 PM Bug #1570 (Merged): Cpuaffinity ignores '++quiet'
When specifying a commap, cpuaffinity ignores '++quiet' and prints anyways:... Sam White
12:31 PM Bug #1561 (Merged): RDMA failures on multicore/SMP builds
Sam White

05/19/2017

01:30 PM Bug #1561 (Implemented): RDMA failures on multicore/SMP builds
Patch: https://charm.cs.illinois.edu/gerrit/#/c/2543/ Vipul Harsh
01:27 PM Bug #1561 (In Progress): RDMA failures on multicore/SMP builds
Vipul Harsh
09:44 AM Bug #1561: RDMA failures on multicore/SMP builds
The issue manifests in AMPI tests, but is not an AMPI issue as such. So, recategorizing and tagging Phil Miller
09:09 AM Bug #1568 (Merged): ckio failure on netlrts-linux
Phil Miller
09:08 AM Bug #1560 (Implemented): icc build fails on NASA Pleiades
https://charm.cs.illinois.edu/gerrit/2542 Phil Miller
08:26 PM Feature #1569 (Merged): Support the Flang Fortran compiler
Add Flang configurations to the build system: https://github.com/flang-compiler/flang Sam White

05/18/2017

03:37 PM Bug #1560: icc build fails on NASA Pleiades
The Intel 16 / GCC 6 combination errors at compile time. The configure output also shows error messages, but somehow ... Phil Miller
02:34 PM Bug #1560: icc build fails on NASA Pleiades
Notes on the errors encountered with different icc/gcc version matchups:
|_.gcc \ Intel |_.12 ...
Phil Miller
11:21 AM Bug #1568 (Implemented): ckio failure on netlrts-linux
https://charm.cs.illinois.edu/gerrit/2537 Phil Miller
08:29 AM Bug #1568 (Merged): ckio failure on netlrts-linux
Autobuild for netlrts-linux failed in the code added here: https://charm.cs.illinois.edu/autobuild/cur/netlrts-linux.txt Sam White

05/17/2017

05:45 PM Feature #1492: Remove need for +LBCommOff
Related: this patch allows '+LBCommOff' to avoid more of the overhead that comm stats collection entails: https://cha... Sam White
05:43 PM Bug #1514 (Merged): Throw a runtime error for registrations that occur after startup
Sam White
04:17 PM Bug #1564 (Implemented): Inline entry methods don't respect group dependence from CkEntryOptions
Phil Miller
02:59 PM Bug #1564 (Merged): Inline entry methods don't respect group dependence from CkEntryOptions
When the target object exists locally and delivery happens inline, the CkEntryOptions are ignored, even if it indicat... Phil Miller
04:17 PM Bug #1567 (Merged): [aggregate] entry methods should refuse to accept CkEntryOptions, since the c...
Without C++11 support, we can stick in... Phil Miller
04:10 PM Bug #1566 (Implemented): Parameter marshalled entry methods mostly don't set group dependence in ...
Phil Miller
04:08 PM Bug #1566 (Merged): Parameter marshalled entry methods mostly don't set group dependence in messa...
For parameter marshalled entry methods, charmxi only generated code to pull the group dependence from CkEntryOptions ... Phil Miller
03:55 PM Bug #1565 (Implemented): Non-group entry methods don't respect envelope group dependence
Phil Miller
03:37 PM Bug #1565 (Merged): Non-group entry methods don't respect envelope group dependence
Phil Miller
03:35 PM Feature #1417: Reduce CkReductionMsg envelope size
We could potentially have a boolean field for if the CkReductionMsg is for a section and only allocate the space for ... Sam White
02:52 PM Bug #1563 (Merged): Chare Array construction doesn't respect setGroupDepID in CkEntryOptions (or ...
@CProxy_ArrayFoo::ckNew@ accepts a @CkEntryOptions@ argument, but doesn't put it to sensible use.
When the underly...
Phil Miller
02:06 PM Bug #1509 (Merged): -tracemode summary always fails an assertion at exit
Phil Miller
08:51 PM Bug #1509 (Implemented): -tracemode summary always fails an assertion at exit
https://charm.cs.illinois.edu/gerrit/#/c/2527/ Ronak Buch
10:20 AM Bug #1561: RDMA failures on multicore/SMP builds
At least on netlrts-darwin-x86_64-smp, this does not crash for me when specifying ppn (current charm master, examples... Matthias Diener
09:21 AM Bug #1561: RDMA failures on multicore/SMP builds
Yes, we talked to Vipul after group meeting, and he is taking this issue over. For RDMA sends within a process, the m... Sam White
09:19 AM Bug #1561: RDMA failures on multicore/SMP builds
I suspect that the issue is that the RDMA code is doing packing/unpacking in something of the wrong place - when the ... Phil Miller
09:18 AM Bug #1561: RDMA failures on multicore/SMP builds
That started failing exactly the same as the multicore builds:
http://localhost:8080/job/Nightly-Build/label=linux,p...
Phil Miller
09:17 AM Bug #1561: RDMA failures on multicore/SMP builds
Indeed it does:
http://localhost:8080/job/Nightly-Build/label=linux,platform=mpi-linux-x86_64-smp/
(correct for por...
Phil Miller
08:52 AM Documentation #1219 (Merged): Update SDAG forall documentation
Sam White

05/16/2017

05:41 PM Feature #1353: charmc hardcodes unversioned compiler names
Unfortunately, I've found that using different compilers for building and linking can cause linking failures, so redu... William Throwe
09:22 AM Feature #1353: charmc hardcodes unversioned compiler names
Also, my apologies for the slow response to this. Phil Miller
09:21 AM Feature #1353: charmc hardcodes unversioned compiler names
A fix for this, and more generally to enable standard build configuration practices for Charm++, will be available th... Phil Miller
05:14 PM Bug #1509: -tracemode summary always fails an assertion at exit
OK, I've confirmed that this really did come about precisely with the commit changing the exit process, 67aa76d3b7e42... Phil Miller
04:03 PM Bug #1561: RDMA failures on multicore/SMP builds
The same failure should happen in SMP mode when running with ppn > 1. Sam White
01:56 PM Bug #1561: RDMA failures on multicore/SMP builds
git bisect shows that this commit causes the error:
https://charm.cs.illinois.edu/gerrit/#/c/2520/
Matthias Diener
01:17 PM Bug #1561: RDMA failures on multicore/SMP builds
K, doing a git bisect on './build AMPI multicore-darwin-x86_64 -j8 -g -O0' to find what commit broke this would be go... Sam White
01:12 PM Bug #1561: RDMA failures on multicore/SMP builds
(Note that the stack trace in message #8 was with RDMA enabled).
I think I did not run @make clean@ in the first t...
Matthias Diener
10:52 AM Bug #1561: RDMA failures on multicore/SMP builds
Whoa, what is SDAG doing in there?!
Can you try reverting this recently merged series of 3 commits (in which Eric ...
Sam White
10:47 AM Bug #1561: RDMA failures on multicore/SMP builds
Full stack trace with @-g -O0@, RDMA is enabled:... Matthias Diener
10:34 AM Bug #1561: RDMA failures on multicore/SMP builds
We still use the RDMA path for sends that are local (the sender and the recver are on the same PE), so that's one pos... Sam White
10:28 AM Bug #1561: RDMA failures on multicore/SMP builds
Full stack frame just before the crash (this is with RDMA off, supposedly):... Matthias Diener
10:25 AM Bug #1561: RDMA failures on multicore/SMP builds
Weird, but megampi may be simpler to debug and is the first issue here.
That stack trace is the same one we saw be...
Sam White
10:19 AM Bug #1561: RDMA failures on multicore/SMP builds
GDB output:... Matthias Diener
10:15 AM Bug #1561: RDMA failures on multicore/SMP builds
On Darwin, compiling with @-DAMPI_RDMA_IMPL=0@ does not fix the crash. (Full build command: @./build AMPI multicore-d... Matthias Diener
12:37 PM Bug #1560: icc build fails on NASA Pleiades
If we don't catch this during configure and abort with a message explaining why it failed, I think we're basically as... Sam White
09:17 AM Bug #1560: icc build fails on NASA Pleiades
We ran into the exact same issue on some of the NERSC Cray systems.
We could maybe push a test that would trigger ...
Phil Miller
12:13 PM Documentation #1219 (Implemented): Update SDAG forall documentation
https://charm.cs.illinois.edu/gerrit/2526 Phil Miller
11:56 AM Bug #635: all trees should be pe/node/physnode/network topology aware
This is a potentially serious performance defect, not just something to tidy up. Phil Miller
11:53 AM Bug #635: all trees should be pe/node/physnode/network topology aware
Reductions are now at least SMP aware due to the fix for #1278. We still form a topology-oblivious tree over the node... Phil Miller
09:14 AM Charm-NG Feature #1562: Enable message allocation, construction, packing, etc, without generated .ci file ...
One key goal in any API evolution or redesign would be eliminating the need for the @-fno-lifetime-dse@ flag passed t... Phil Miller
09:12 AM Charm-NG Feature #1562 (New): Enable message allocation, construction, packing, etc, without generated .ci...
If we keep explicit message types around as Charm++'s API design moves forward, we need to address how they will be h... Phil Miller
09:08 AM Feature #1343: Let user-defined main() work for all execution environments
Revisiting this, are there situation where the desired outcome of this issue isn't satisfied?
Maybe @readonly@ var...
Phil Miller
08:46 AM Bug #1507 (Merged): ckio test failure on gni-crayxc
Phil Miller

05/15/2017

06:28 PM Bug #1561: RDMA failures on multicore/SMP builds
One potential source of this issue is the RDMA stuff that was recently merged in AMPI. You can build AMPI with '-DAMP... Sam White
08:15 AM Bug #1561 (Merged): RDMA failures on multicore/SMP builds
multicore builds for linux, darwin, and win all failed in tests/ampi/megampi/ with +p2 +vp2 Sam White
05:10 PM Bug #1509: -tracemode summary always fails an assertion at exit
Ping. Progress? Phil Miller
05:05 PM Bug #1507 (Implemented): ckio test failure on gni-crayxc
Underlying issue with the patch provided, given that it was failing after restart from a checkpoint, was that an arra... Phil Miller
03:53 PM Bug #1507 (In Progress): ckio test failure on gni-crayxc
I'm seeing issues with that patch on simple ChaNGa test runs. Working through them now. Phil Miller
12:05 PM Bug #1559: cpuaffinity.c build errors on Linux and Win64 with --enable-tracing
This https://charm.cs.illinois.edu/gerrit/#/c/2524/ should take care of all cpuaffinity errors, although I didn't exp... Juan Galvez
10:55 PM Bug #1560: icc build fails on NASA Pleiades
Yes, that is true on a few other systems too. AFAIK we decided to require at least gcc v4.4 headers for 6.8.0, and we... Sam White
10:20 PM Bug #1560: icc build fails on NASA Pleiades
That last line was a clue: it looks like the intel compiler depends on the g++ libraries, so a modern gcc has to be l... Thomas Quinn
08:42 PM Bug #1560 (Merged): icc build fails on NASA Pleiades
Building with
./build ChaNGa verbs-linux-x86_64 cuda smp icc -j8 --with-production
gives errors like:
../bin/charm...
Thomas Quinn

05/14/2017

11:37 AM Bug #1556 (Merged): AMPI Fortran bindings for MPI_STATUS(ES)_IGNORE are broken
Phil Miller
09:53 AM Feature #1352 (Merged): CkArrayOptions callback for completion of chare array initialization
Sam White
09:52 AM Bug #1558 (Merged): win64 debug build fails to build due to missing lrand48
Sam White
08:30 PM Bug #1559 (In Progress): cpuaffinity.c build errors on Linux and Win64 with --enable-tracing
This should fix some of the machine layers but possibly not all of them:
https://charm.cs.illinois.edu/gerrit/#/c/25...
Juan Galvez

05/13/2017

12:14 AM Bug #1559: cpuaffinity.c build errors on Linux and Win64 with --enable-tracing
Somewhat arbitrarily assigning to Juan only because he's touched cpuaffinity before and doesn't appear to have any ot... Sam White
10:41 PM Feature #1546: RDMA example with migration
I think it's important to have an SDAG + Migration + RDMA example/test, but up to you whether that is this issue or not Sam White

05/12/2017

05:20 PM Bug #1559 (Merged): cpuaffinity.c build errors on Linux and Win64 with --enable-tracing
Building charm --no-build-shared --enable-tracing --enable-tracing-commthread -optimize
On multicore-linux64-iccst...
Jim Phillips
05:02 PM Feature #1546: RDMA example with migration
The bug that spawned this request, #1539, has now been fixed. Is it still critical to have a new test/example that sp... Phil Miller
08:56 AM Feature #1546: RDMA example with migration
I believe Nitin modified the stencil load balancing to use RDMA, and that is pending on a fix for RDMA entry methods ... Sam White
04:59 PM Bug #1539 (Merged): Failure in migration when using RDMA sends in AMPI
Phil Miller
08:33 PM Bug #1539 (Implemented): Failure in migration when using RDMA sends in AMPI
Patch: https://charm.cs.illinois.edu/gerrit/#/c/2520/ Vipul Harsh
04:23 PM Bug #1558 (Implemented): win64 debug build fails to build due to missing lrand48
https://charm.cs.illinois.edu/gerrit/#/c/2522/ Sam White
01:42 PM Bug #1558: win64 debug build fails to build due to missing lrand48
The temporary fix that I have now is to replace @lrand48()@ with @rand()@ for WIN64 builds. Would that be sufficient?... Karthik Senthil
12:43 PM Bug #1558: win64 debug build fails to build due to missing lrand48
Yes. My "debug" build options are "--no-build-shared --enable-randomized-msgq --with-prio-type=int --enable-error-ch... Jim Phillips
12:35 PM Bug #1558: win64 debug build fails to build due to missing lrand48
I think this is only on builds with randomized queues on Windows, but it still needs to be fixed for 6.8.0.
Assign...
Sam White
09:53 AM Bug #1558 (Merged): win64 debug build fails to build due to missing lrand48
... Jim Phillips
03:09 PM Bug #1556 (Implemented): AMPI Fortran bindings for MPI_STATUS(ES)_IGNORE are broken
https://charm.cs.illinois.edu/gerrit/#/c/2521/ Sam White
12:26 PM Bug #1556 (In Progress): AMPI Fortran bindings for MPI_STATUS(ES)_IGNORE are broken
Sam White
12:25 PM Documentation #1432 (Merged): Document CkLoop caller function
Sam White
12:25 PM Bug #1555 (Merged): converse segfaults processing msg whose handler has not been registered on th...
Sam White
12:24 PM Bug #833 (Merged): mpi smp build is locked to one core per node by default
Sam White

05/11/2017

06:27 PM Bug #1539: Failure in migration when using RDMA sends in AMPI
If that's all, I think the message just needs to have pack called in it before it gets forwarded, and unpack after it... Phil Miller
06:23 PM Bug #1539: Failure in migration when using RDMA sends in AMPI
I looked at the code and it never actually changes the pointers inside the rdma wrappers in the message, hence the rd... Vipul Harsh
05:10 PM Bug #1507: ckio test failure on gni-crayxc
It will take me a little while to reproduce my problem, since it usually happens after restarting from a checkpoint.
...
Thomas Quinn
04:40 PM Bug #1507 (Implemented): ckio test failure on gni-crayxc
https://charm.cs.illinois.edu/gerrit/2519
Tom, if you're still seeing this an issue here, could you try the above ...
Phil Miller
02:38 PM Bug #1507 (In Progress): ckio test failure on gni-crayxc
Phil Miller
02:36 PM Bug #1507: ckio test failure on gni-crayxc
Looks like the issue is that a message referencing the newly-constructed write session is reaching PEs other than 0 b... Phil Miller
04:21 PM Feature #1352 (Implemented): CkArrayOptions callback for completion of chare array initialization
Hackishly re-using the reduction manager's spanning tree now. Phil Miller
02:38 PM Feature #1352: CkArrayOptions callback for completion of chare array initialization
It looks like there's another use case for this outside AMPI - CkIO. Phil Miller
03:05 PM Bug #1557 (New): AMPI bindings for C-Fortran interop are incomplete
AMPI is missing definitions for MPI_F_STATUS(ES)_IGNORE and MPI_Status_f2c and MPI_Status_c2f. There may be other sim... Sam White
03:04 PM Bug #1556 (Merged): AMPI Fortran bindings for MPI_STATUS(ES)_IGNORE are broken
MPI_STATUS_IGNORE and MPI_STATUSES_IGNORE are both declared as arrays of 8 integers in ampif.h while the C++ code in ... Sam White
12:39 PM Bug #1555 (Implemented): converse segfaults processing msg whose handler has not been registered ...
Fix here: https://charm.cs.illinois.edu/gerrit/#/c/2517/
Juan Galvez
11:50 AM Bug #1555 (Merged): converse segfaults processing msg whose handler has not been registered on th...
No error is printed even with error checking enabled.
So, at least, with error checking there should be an explici...
Juan Galvez
10:04 AM Bug #1275: DistributedLB: Objects not migrating after strategy runs
I have updated the gerrit patch https://charm.cs.illinois.edu/gerrit/#/c/1951/ with Harshitha's fix from her branch. ... Kavitha Chandrasekar
09:13 PM Bug #1514 (Implemented): Throw a runtime error for registrations that occur after startup
Seonmyeong Bak

05/10/2017

06:01 PM Bug #833 (Implemented): mpi smp build is locked to one core per node by default
Posted a new patch in gerrit.
Should work on all architectures, including Cray because it does not rely on the Net...
Juan Galvez
03:55 PM Bug #1514: Throw a runtime error for registrations that occur after startup
After the initialization is done, the calling the templated entry method in an uninstantiated form leads to CkAbort. ... Seonmyeong Bak
02:40 AM Bug #1514: Throw a runtime error for registrations that occur after startup
https://charm.cs.illinois.edu/gerrit/#/c/2510/ Seonmyeong Bak
01:38 AM Bug #1553 (Merged): Support for sdag entry method with rdma parameter
I tried adding rdma functionality to the receiveGhosts method in examples/charm++/load_balancing/stencil3d. The closu... Nitin Bhat

05/09/2017

05:37 PM Bug #1547 (In Progress): Deprecate the FFT library in ck-libs in favor of Nikhil's new FFT library
There are a few complications to accomplishing this in that Nikhil's fft library is not a drop in replacement for the... Eric Bohm
04:59 PM Bug #647 (Merged): Make MeshStreamer classes [migratable] to support checkpoint/restart
Phil Miller
03:17 PM Bug #647 (Implemented): Make MeshStreamer classes [migratable] to support checkpoint/restart
Sam White
04:58 PM Bug #854 (Merged): RRMap broken for >1D chare arrays
Phil Miller
03:18 PM Bug #854 (Implemented): RRMap broken for >1D chare arrays
Sam White
04:09 PM Feature #1352: CkArrayOptions callback for completion of chare array initialization
We decided that since getting a proper reduction done inside CkArray will be ugly and doing all-to-one pt2pt sends wi... Sam White
03:52 PM Feature #1468: Enable pre-pinning memory for the zero-copy message sends through the Entry Method...
The automatic caching approach could go in 6.8.1, but an explicit API would have to be the next feature release. Phil Miller
03:50 PM Feature #1394: Node-level message aggregation for CkMulticast
This won't be an API change, AFAICT, so it could be done in a patch release. Phil Miller
03:43 PM Bug #1539: Failure in migration when using RDMA sends in AMPI
To reproduce this, do './build AMPI mpi-linux-x86_64 -g -O0' then 'make test' in examples/ampi/Cjacobi3D/.
Basical...
Sam White
03:18 PM Bug #833 (In Progress): mpi smp build is locked to one core per node by default
Sam White
02:00 PM Feature #1551: Better support for AMPI/Projections with multiple virtual ranks
Related issues:
https://charm.cs.illinois.edu/redmine/issues/1005
https://charm.cs.illinois.edu/redmine/issues/1524
Sam White
09:19 PM Bug #1540 (Merged): Memory leaks in RDMA
Phil Miller
07:28 PM Feature #1459: Zero-copy send support for the netlrts machine layer
We could do the packetization in a set aside buffer that we copy the user's data through as we send it. The key is to... Phil Miller
07:18 PM Feature #1459 (In Progress): Zero-copy send support for the netlrts machine layer
The current netlrts layer (UDP) in machine-eth.c sends a Datagram header with every packet it sends.
For every pack...
Nitin Bhat

05/08/2017

06:14 PM Feature #1551 (In Progress): Better support for AMPI/Projections with multiple virtual ranks
Two WIP patches:
- https://charm.cs.illinois.edu/gerrit/2503 (Projections)
- https://charm.cs.illinois.edu/gerrit/2...
Matthias Diener
05:20 PM Bug #1540: Memory leaks in RDMA
That fixes the memory leaks in the lower layer implementations.
In the rdma example too, there was a memory leak ...
Nitin Bhat
05:02 PM Bug #1540: Memory leaks in RDMA
Is that fix addressing all of the known leaks? If so, this can be marked Merged Phil Miller
05:05 PM Bug #887 (Closed): Investigate initialization of NullLB WRT thread safety
Phil Miller

05/07/2017

03:17 PM Bug #1540 (Implemented): Memory leaks in RDMA
Fix for leaks in the machine layer implementations (PAMI, Verbs, MPI): https://charm.cs.illinois.edu/gerrit/#/c/2461/... Nitin Bhat
07:03 PM Feature #1088: Trace MPI_ functions in AMPI
For the second issue (overlapping events), I created a new bug (#1551). Matthias Diener
07:02 PM Feature #1551 (Merged): Better support for AMPI/Projections with multiple virtual ranks
Since the tracing framework has no idea about virtual AMPI ranks (specified with +vp), bracketed user events seem to ... Matthias Diener

05/05/2017

03:43 PM Bug #1360: AMPI megampi test fails on mpi-crayxc and darwin builds
Patch is here:
https://charm.cs.illinois.edu/gerrit/#/c/2201/
Matthias Diener
01:52 PM Bug #887 (Resolved): Investigate initialization of NullLB WRT thread safety
Knowing there's synchronization protecting this fully seems to be sufficient. If others agree, this can be closed.
...
Phil Miller
01:43 PM Bug #887: Investigate initialization of NullLB WRT thread safety
When Group constructors are invoked on PE 0, broadcast messages are sent to other PEs to create the Group's chares on... Kavitha Chandrasekar

05/04/2017

05:13 PM Bug #1379 (Merged): SDAG doesn't properly handle callbacks to [reductiontarget] methods with refnums
Sam White
02:32 PM Bug #1379 (Implemented): SDAG doesn't properly handle callbacks to [reductiontarget] methods with...
Was a quick fix. The error was just a code gen issue that split an assignment. Fix posted https://charm.cs.illinois.e... Eric Mikida
10:46 AM Bug #1379 (In Progress): SDAG doesn't properly handle callbacks to [reductiontarget] methods with...
See Sam's comments in Gerrit about breaking the Bigsim build Phil Miller
05:13 PM Bug #1408 (Merged): Improve visibility and usability of flushTraceLog()
Sam White
07:04 PM Bug #1408 (Implemented): Improve visibility and usability of flushTraceLog()
https://charm.cs.illinois.edu/gerrit/#/c/2495/ Ronak Buch
05:12 PM Bug #1421 (Merged): Running leanmd with error checking enabled in Charm++ triggers assertion erro...
Sam White
03:35 PM Documentation #1251 (Merged): Document shrink/expand in the manual
Bilge Acun
12:13 PM Bug #833: mpi smp build is locked to one core per node by default
Code hangs on Blue Waters with multiple processes per node.
Problem is, the first process on the node doesn't rece...
Juan Galvez
11:32 AM Bug #887: Investigate initialization of NullLB WRT thread safety
Yes, we did talk about deferring the bug. I will take a look at it today and if I cannot simulate the race, I will up... Kavitha Chandrasekar
10:53 AM Bug #887: Investigate initialization of NullLB WRT thread safety
I think the conclusion in the Core meeting was that this could be deferred due to limited impact and low likelihood o... Phil Miller
11:04 AM Bug #1545: Serialize std::vector with Custom Allocator
We're not going to require any more C++11 support in a patch/bug-fix release like 6.8.1 than we do in the feature rel... Phil Miller
10:51 AM Cleanup #1550 (New): Missing 'make test' for examples/charm++/state_space_searchengine/
Phil Miller
10:49 AM Bug #1533 (Merged): State Space Search Examples Broken
Phil Miller
09:32 AM Feature #1549 (New): record priorities in traces and use in projections
One common cause of performance issues is wrong or missing priorities. Since tracing already records most of the inf... Jim Phillips
09:24 AM Bug #1539: Failure in migration when using RDMA sends in AMPI
I think I've found the problem, though I have no idea why it's happening: RDMA messages are *not* being forwarded to ... Sam White
07:46 AM Feature #1546: RDMA example with migration
This is really needed now to help debug current and future issues with RDMA before 6.8.0 Sam White

05/03/2017

02:43 PM Documentation #1251 (Implemented): Document shrink/expand in the manual
https://charm.cs.illinois.edu/gerrit/#/c/2492/ Bilge Acun
02:26 PM Bug #1533 (Implemented): State Space Search Examples Broken
The mainchare class was not inheriting from CBase_Main. Fix here: https://charm.cs.illinois.edu/gerrit/#/c/2489/ Sam White
11:45 AM Bug #1531 (Merged): Main (scheduler) thread can suspend, but that confuses QD, where other thread...
Phil Miller
11:45 AM Support #1548 (New): Reassess whether the primary scheduler thread should support CthSuspend
Phil Miller
11:42 AM Feature #1088: Trace MPI_ functions in AMPI
Also, clean up heap memory allocated for the funcmap. I made a half-baked attempt at that here but it has issues note... Sam White
10:27 AM Feature #1088: Trace MPI_ functions in AMPI
Another thing to followup:
- MPI_Finalize() does not get traced, because the thread gets only suspended, and the TCH...
Matthias Diener
08:53 PM Feature #1088 (Merged): Trace MPI_ functions in AMPI
2 things to follow up on:
1. Use unordered_map instead of map (use tr1::unordered_map if CMK_USING_XLC before 6.8.0)...
Sam White
10:44 PM Bug #1545: Serialize std::vector with Custom Allocator
That is a good question... This probably depends on the allocator itself. For some stateful allocators it'll make sen... Nils Deppe

05/02/2017

05:29 PM Bug #1547 (Merged): Deprecate the FFT library in ck-libs in favor of Nikhil's new FFT library
Right now the libraries manual still points to the in-tree version of the FFT library. Anyone using this library shou... Eric Mikida
05:27 PM Bug #1379 (Merged): SDAG doesn't properly handle callbacks to [reductiontarget] methods with refnums
Eric Mikida
02:39 PM Bug #1379: SDAG doesn't properly handle callbacks to [reductiontarget] methods with refnums
Updated the patch so it can now handle all types of reductions by using a templated method for setting the refnum of ... Eric Mikida
04:18 PM Bug #1531: Main (scheduler) thread can suspend, but that confuses QD, where other thread suspends...
Decision in Core was to go ahead with fixing this for now. After the release, we'll re-assess the 'cute trick' giving... Phil Miller
04:05 PM Bug #78: AMPI failure with migration under Cray compiler due to tcmalloc bugs or incompatibility
This is very low priority since it is A) AMPI-specific and B) CrayCC-specific. Sam White
04:03 PM Bug #1540 (In Progress): Memory leaks in RDMA
Nitin will investigate the same leak I fixed in MPI layer in PAMILRTS and Verbs Sam White
04:00 PM Feature #1546 (Merged): RDMA example with migration
The failures in AMPI RDMA have raised concern that there is something bad going on when using RDMA + migration, even ... Sam White
03:46 PM Bug #1509 (In Progress): -tracemode summary always fails an assertion at exit
Ronak Buch
03:46 PM Bug #1408 (In Progress): Improve visibility and usability of flushTraceLog()
Ronak Buch
03:35 PM Bug #1541 (Merged): Fix ambiguous explanations of Ctv, Cpv, and Csv variables
Phil Miller
03:35 PM Bug #1543 (Merged): Memory leaks in asynchronous array creation
Phil Miller
12:43 PM Bug #647: Make MeshStreamer classes [migratable] to support checkpoint/restart
@MeshStreamer@ base class and @GroupMeshStreamer@ are done. Currently working on @ArrayMeshStreamer@ and @GroupChunkM... Karthik Senthil
11:17 AM Bug #1539: Failure in migration when using RDMA sends in AMPI
The culprit is the AMPI RDMA patch (below) on mpi-linux-x86_64:
9349919 AMPI #1111: avoid sender-side copy for lar...
Sam White
11:13 AM Bug #1539: Failure in migration when using RDMA sends in AMPI
The range of commits that appear in the time window in question and look like they could be the culprit is as follows... Phil Miller
11:11 AM Bug #1539: Failure in migration when using RDMA sends in AMPI
Nevermind, the issue seems to be different on mpi-darwin-x86_64, and not being able to use ++debug on MPI builds hurt... Sam White
11:14 AM Bug #854: RRMap broken for >1D chare arrays
Note that the patch implements this for RRMap, and adds the mechanism to do it in BlockLB, but I think BlockLB will n... Sam White
11:07 AM Bug #1514 (In Progress): Throw a runtime error for registrations that occur after startup
Seonmyeong Bak
10:52 AM Bug #1421: Running leanmd with error checking enabled in Charm++ triggers assertion error in lbdb.h
Proposed fix here: https://charm.cs.illinois.edu/gerrit/2460 Phil Miller
10:49 AM Bug #1545: Serialize std::vector with Custom Allocator
For the moment, I'm going to target this at 6.8.1, since I don't think we're willing to hold the release to get this ... Phil Miller
10:48 AM Bug #1545: Serialize std::vector with Custom Allocator
I think we could do this, but I'm concerned about handling of instances in which the allocator isn't simply stateless... Phil Miller

05/01/2017

05:57 PM Bug #1514: Throw a runtime error for registrations that occur after startup
I think that's just a symptom of the example program I gave you. The call to the entry method that wasn't instantiate... Eric Mikida
05:17 PM Bug #1514: Throw a runtime error for registrations that occur after startup
This issue doesn't happen on multicore version. Even without instantiation of template on ci files, the application i... Seonmyeong Bak
05:13 PM Bug #1545 (New): Serialize std::vector with Custom Allocator
It should be possible to serialize std::vector that has a custom allocator. What works for me right now is changing a... Nils Deppe
04:52 PM Bug #1539: Failure in migration when using RDMA sends in AMPI
This is not related to AMPI's use of RDMA at all, and can be reproduced on commits before the AMPI RDMA patch was mer... Sam White
02:05 PM Bug #1379 (Implemented): SDAG doesn't properly handle callbacks to [reductiontarget] methods with...
A solution is implemented in https://charm.cs.illinois.edu/gerrit/#/c/2484/
As mentioned in the commit message, ca...
Eric Mikida
08:30 PM Feature #1494: Broadcast trees are not topology-aware
This would still be great to have for 6.8.0 if you/Juan don't have any other 6.8.0 bugs. Sam White

04/30/2017

02:38 PM Bug #1544 (Merged): CMK_TIMER_USE_PPC64 inaccurate with variable clock speeds
The pami-linux-ppc64le machine layer is defaulting to CMK_TIMER_USE_PPC64
src/arch/pami-linux-ppc64le/conv-mach.h:...
Jim Phillips

04/29/2017

07:29 PM Bug #1539: Failure in migration when using RDMA sends in AMPI
Still no real diagnosis of the underlying problem here...... Sam White
07:01 PM Bug #1421 (Implemented): Running leanmd with error checking enabled in Charm++ triggers assertion...
Sam White

04/28/2017

06:24 PM Bug #1540 (Implemented): Memory leaks in RDMA
Updated documentation & example program: https://charm.cs.illinois.edu/gerrit/#/c/2469/
Update AMPI RDMA use: http...
Sam White
02:59 PM Bug #1540: Memory leaks in RDMA
Yeah, it should be a regular Charm message but the documentation says "It should be noted that the received CkDataMsg... Sam White
02:57 PM Bug #1540: Memory leaks in RDMA
The documentation and examples may need to be more explicit about disposing of the message, though. Phil Miller
02:56 PM Bug #1540: Memory leaks in RDMA
This is just delivering a message, like any other, right? If so, the user can choose to manually delete the message, ... Phil Miller
02:39 PM Bug #1540: Memory leaks in RDMA
We didn't really workout the semantics of the callback function. But I think it would be better to have the charm run... Nitin Bhat
01:34 PM Bug #1540 (In Progress): Memory leaks in RDMA
Fix for the leak in the MPI layer's RDMA implementation: https://charm.cs.illinois.edu/gerrit/#/c/2461/
Can Nitin/...
Sam White
12:12 PM Bug #1540: Memory leaks in RDMA
The first one above doesn't happen on multicore because multicore doesn't actually ever call CkRdmaIssueRgets... Sam White
12:02 PM Bug #1540 (Merged): Memory leaks in RDMA
Running valgrind on examples/charm++/rdma/ you see two memory leaks on at least the mpi-linux-x86_64 and multicore-li... Sam White
06:15 PM Bug #1541 (Resolved): Fix ambiguous explanations of Ctv, Cpv, and Csv variables
https://charm.cs.illinois.edu/gerrit/#/c/2468/ Sam White
03:33 PM Bug #1541 (Merged): Fix ambiguous explanations of Ctv, Cpv, and Csv variables
The explanation of Ctv variables is ambiguous in the comments of the source code and in the manual.
Thus 2 fixes r...
Jaemin Choi
05:34 PM Bug #1379 (In Progress): SDAG doesn't properly handle callbacks to [reductiontarget] methods with...
Current issue:
Upon receiving a message, generated code creates a closure for that message, which in the case of mar...
Eric Mikida
03:20 PM Bug #1379: SDAG doesn't properly handle callbacks to [reductiontarget] methods with refnums
So the change proposed by Phil is easy to implement, but I'm worried about what it might break...
At the point in th...
Eric Mikida
05:25 PM Bug #1543 (Implemented): Memory leaks in asynchronous array creation
https://charm.cs.illinois.edu/gerrit/#/c/2465/ Sam White
05:17 PM Bug #1543: Memory leaks in asynchronous array creation
Could you run that with the option @--track-origins=yes@ (IIRC) to get fuller detail on the leak event? Phil Miller
05:06 PM Bug #1543 (Merged): Memory leaks in asynchronous array creation
AMPI uses the asynchronous array creation API to create split comm's.
Running valgrind on megampi with 4 vp's gives ...
Sam White
05:23 PM Bug #1542: CkArrayCreated callback should be part of CkArrayOptions
Design-wise, I agree with you, it probably should be (have been) part of CkArrayOptions.
We can add the new versio...
Phil Miller
03:54 PM Bug #1542 (New): CkArrayCreated callback should be part of CkArrayOptions
The asynchronous array creation API added an optional parameter after CkArrayOptions to ckNew(). I think this callbac... Sam White
12:01 PM Bug #1539: Failure in migration when using RDMA sends in AMPI
The second leak above also happen on multicore builds... opened a separate issue for memory leaks in rdma: https://ch... Sam White
10:17 AM Bug #1539: Failure in migration when using RDMA sends in AMPI
Even if I run with NullLB, these two leaks still show up in Valgrind for a mpi-linux-x86_64 build (I haven't tried va... Sam White
09:39 AM Bug #1539 (Merged): Failure in migration when using RDMA sends in AMPI
Several autobuild targets segfaulted in ampi/Cjacobi3D last night after AMPI RDMA was merged: mpi-linux-x86_64, mpi-l... Sam White
11:42 AM Bug #1507: ckio test failure on gni-crayxc
I've been having a CkIO failure on a ChaNGa production run on Blue Waters that may be related to this. The symptom i... Thomas Quinn
08:38 AM Bug #647 (In Progress): Make MeshStreamer classes [migratable] to support checkpoint/restart
https://charm.cs.illinois.edu/gerrit/#/c/2454/ Sam White
11:08 PM Bug #1530 (Merged): Isomalloc on SMP mode always prints warning about +isomalloc_sync, even when ...
Sam White
10:12 PM Documentation #1516 (Merged): Document the Statistics reducer
https://charm.cs.illinois.edu/gerrit/#/c/2445/ Sam White
10:00 PM Bug #1537 (Merged): Support for Shrink/Expand in 6.8.0
Sam White
10:00 PM Bug #1526 (Merged): Unused Variable impl_obj in _call_ArchiveChare_CkMigrateMessage
Sam White
09:59 PM Bug #1527 (Merged): Unused Parameter Warning in Entry Methods with no Parameters
Sam White
09:58 PM Bug #1163 (Merged): AMPI_Put should use targdisp from creation
Sam White
07:12 PM Bug #1528 (Merged): charmxi: Message type with no variable arrays can't be declared with { }
Ronak Buch

04/27/2017

06:51 PM Bug #1493 (Merged): Deleting an array also deletes all common elements from it's bound arrays
Ronak Buch
06:51 PM Bug #1529 (Merged): Easy build-time option to elide LB support (cut tracing overhead, etc)
Ronak Buch
05:25 PM Feature #1111 (Merged): Avoid sender-side copy in AMPI for large contiguous messages
Related issues left to do:
- Use RDMA for MPI-3's MPI_Rput and MPI_Raccumulate routines.
- Use RDMA sends when usin...
Sam White
04:56 PM Bug #1487 (Merged): Leaving -DGPU_MEMPOOl causes gpu manager to not build
Ronak Buch
04:56 PM Bug #1488 (Merged): GPU manager runs out of memory on talent
Ronak Buch
04:55 PM Bug #1506 (Merged): examples/hello/4darray breaks when doing sections
Ronak Buch
12:39 PM Bug #1506 (Implemented): examples/hello/4darray breaks when doing sections
Patch: https://charm.cs.illinois.edu/gerrit/#/c/2451/
The example wasn't using messages inherited from CkMcastBaseMs...
Vipul Harsh
04:46 PM Bug #1163 (Implemented): AMPI_Put should use targdisp from creation
https://charm.cs.illinois.edu/gerrit/#/c/2456/ Sam White
04:15 PM Bug #1275: DistributedLB: Objects not migrating after strategy runs
It fixes only for 2 PE case. The migration issue will likely be fixed with https://charm.cs.illinois.edu/gerrit/#/c/1... Kavitha Chandrasekar
04:09 PM Bug #1275: DistributedLB: Objects not migrating after strategy runs
Does your patch fix this in general or just for the 2 PE case, Kavitha? Ronak Buch
04:08 PM Documentation #1398: Document addReducer's new option 'streamable'
The correct link for the patch is: https://charm.cs.illinois.edu/gerrit/#/c/2447/, Sam's was abandoned due to duplica... Ronak Buch
04:07 PM Documentation #1398 (Merged): Document addReducer's new option 'streamable'
Ronak Buch
04:07 PM Bug #1410 (Merged): Tuple reducer leaks memory when using set/concat/custom reducers
Ronak Buch
01:02 PM Bug #1531 (Implemented): Main (scheduler) thread can suspend, but that confuses QD, where other t...
https://charm.cs.illinois.edu/gerrit/2453 Phil Miller
12:51 PM Bug #1531: Main (scheduler) thread can suspend, but that confuses QD, where other thread suspends...
So, it looks like adjusting the 'standin' scheduler thread code to call CsdScheduleForever instead of CsdSchedulePoll... Phil Miller
12:27 PM Bug #1531: Main (scheduler) thread can suspend, but that confuses QD, where other thread suspends...
A little more detailed instrumentation:... Phil Miller
12:18 PM Bug #1531: Main (scheduler) thread can suspend, but that confuses QD, where other thread suspends...
Here's a very minimal reproduction case... Phil Miller
10:42 AM Bug #1531: Main (scheduler) thread can suspend, but that confuses QD, where other thread suspends...
Confirmed that a callback to resume the thread works fine. QD has an issue with this. Phil Miller
10:24 AM Bug #1538: Support Shrink/Expand in verbs
These seem to be the only patches that will need to be ported over from netlrts to verbs:
https://charm.cs.illinoi...
Sam White
08:52 AM Bug #1538 (New): Support Shrink/Expand in verbs
Shrink/expand was never fully ported to verbs, but only to netlrts.
Necessary changes should be the same as netlrts,...
Bilge Acun
07:29 PM Bug #647: Make MeshStreamer classes [migratable] to support checkpoint/restart
I have completed adding PUP functions for all the involved classes in @VirtualRouter.h@ and for the @MeshStreamer@ ba... Karthik Senthil

04/26/2017

05:54 PM Bug #1493 (Implemented): Deleting an array also deletes all common elements from it's bound arrays
The fix was very straightforward, and is implemented here: https://charm.cs.illinois.edu/gerrit/2448
The only issu...
Eric Mikida
05:14 PM Bug #1493: Deleting an array also deletes all common elements from it's bound arrays
This bug looks to be due to the fact that ~CkMigratable() tries to access myRec in order to get the LBDB database for... Eric Mikida
05:06 PM Bug #1410: Tuple reducer leaks memory when using set/concat/custom reducers
Added documentation on reusing reduction msg memory in custom reducers: https://charm.cs.illinois.edu/gerrit/#/c/2446/ Sam White
04:22 PM Bug #1410 (Implemented): Tuple reducer leaks memory when using set/concat/custom reducers
https://charm.cs.illinois.edu/gerrit/#/c/2444/ Sam White
03:00 PM Bug #1410 (In Progress): Tuple reducer leaks memory when using set/concat/custom reducers
Sam White
01:41 PM Bug #1410: Tuple reducer leaks memory when using set/concat/custom reducers
Ah, I didn't think of that. We can indeed just check the returned pointer against the zeroth message passed into it, ... Sam White
12:05 PM Bug #1410: Tuple reducer leaks memory when using set/concat/custom reducers
Design 1 is also preferable (I think, if I understand correctly) because such a declaration would only have to appear... Phil Miller
11:54 AM Bug #1410: Tuple reducer leaks memory when using set/concat/custom reducers
It looks like the issue here is that all of the builtin reducers--except set and concat--reuse one of their input mes... Sam White
05:06 PM Documentation #1398 (Implemented): Document addReducer's new option 'streamable'
https://charm.cs.illinois.edu/gerrit/#/c/2446/ Sam White
10:29 AM Bug #1507: ckio test failure on gni-crayxc
The interesting note from the case I looked into is that a sane array ID is returned into the proxy object, but the m... Phil Miller
10:27 AM Bug #1507: ckio test failure on gni-crayxc
I may have a simpler test case for this, or at least one that exhibits the same CmiAbort behavior. Phil Miller
09:02 AM Bug #1537 (Implemented): Support for Shrink/Expand in 6.8.0
https://charm.cs.illinois.edu/gerrit/#/c/2438/ Bilge Acun
 

Also available in: Atom