Project

General

Profile

Activity

From 10/08/2017 to 11/06/2017

11/06/2017

04:51 PM Bug #1738 (New): Failure at LrtsInit with OFI build with verbs provider on Golub
When running 1darray hello example program on golub with the OFI build, I experienced the following error:... Jaemin Choi
03:24 PM Bug #903: ckexit with interop hangs sometimes
The example from the interop documentation (examples/charm++/user-driven-interop) also hangs in smp mode. Eric Mikida
03:23 PM Bug #903: ckexit with interop hangs sometimes
This issue, or a related one is now coming up in Charades as well. I still need to explore more, but for me its a han... Eric Mikida
12:17 PM Bug #872: smp standalone inconsistency
+ppn N, no +p -> N worker threads
+p N, no +ppn -> N worker threads
+ppn N +p N -> N worker threads
+ppn N +p M, M...
Jim Phillips
12:00 PM Bug #872: smp standalone inconsistency
What is the desired behavior? Evan Ramos
11:57 AM Support #1674: Add 'ofi' target to autobuild
I've been working on getting it running on golub, but there is an issue with using ++mpiexec, where if you set ppn in... Jaemin Choi
08:57 AM Support #1674: Add 'ofi' target to autobuild
Bump. Any reason to not add this on golub? Sam White
08:08 PM Bug #1572 (Merged): Improve pup_stl performance
Sam White
08:08 PM Bug #1716 (Merged): Add a strict configure test for C++11 compiler support
Sam White

11/04/2017

05:08 PM Cleanup #1314: Replace widespread dynamic allocated arrays with std::vector
Cleanup ckreduction.C: https://charm.cs.illinois.edu/gerrit/#/c/3234/
Cleanup the rest of ck-core (except CkSectionI...
Sam White
04:42 PM Feature #988: charmrun should not ignore +ppn
++ppn (two pluses) is working fine. Should +ppn (one plus) work in the same circumstances as ++ppn? Evan Ramos
12:54 PM Feature #988: charmrun should not ignore +ppn
Reassigning to Evan since he's been working on charmrun Sam White

11/03/2017

04:31 PM Bug #1724 (Implemented): Make BGQ builds default to using bgclang
Made bgclang the default compiler for all BGQ builds {pami/pamilrts/mpi}-bluegeneq: https://charm.cs.illinois.edu/ger... Sam White
02:45 PM Bug #1737 (New): tests/charm++/pingpong and examples/charm++/zerocopy/pingpong fail when run on 2...
Running on 2 processors: ./pgm
jsrun -n2 ./pgm
Choosing optimized barrier algorithm name I0:HybridBinomial:SHMEM:P...
Nitin Bhat
12:52 PM Bug #1694 (In Progress): Projections shows garbage for indices of 4d, 5d, 6d array elements
Can you take over 3218? It's failing in LDSend() currently. I think there might be another object in the LB Database ... Sam White
12:27 PM Cleanup #12: Factor out massive duplication in reductions
A little progress here: https://charm.cs.illinois.edu/gerrit/3224 Phil Miller
11:28 AM Bug #1448 (Merged): Potential buffer overflows in fscanf()
Matthias Diener
10:33 AM Bug #1736 (New): pami-linux-ppc64le-async-smp programs crash on Summitdev with an assertion failure
Build command: ./build LIBS pami-linux-ppc64le async smp --no-build-shared --disable-charmdebug --with-production -j2... Nitin Bhat
10:02 PM Feature #1657: CMA support for nocopy sends using the Entry Method API across processes on the sa...
Experimenting with different models has shown that CMA (Cross Memory Attach) is a good candidate for exploiting shm f... Nitin Bhat
09:12 PM Bug #1443: Serialization for std::unique_ptr Fails With Abstract Base Class
Hi Eric, that's great news! We use @Requires@ instead of @std::enable_if@ because upon failure it provides an "error ... Nils Deppe
08:44 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I've tested the version with the problem above. The short-circuit now actually improves performance! Thomas Quinn

11/02/2017

05:33 PM Support #1725: Improve pup_stl testing
Just FYI, I've added some cases already for bug #1443, which should be pushed to gerrit soon. There is still definite... Eric Mikida
05:32 PM Bug #1443 (In Progress): Serialization for std::unique_ptr Fails With Abstract Base Class
Eric Mikida
05:32 PM Bug #1443: Serialization for std::unique_ptr Fails With Abstract Base Class
Nils, I've pulled in the code you linked, as well as your definition of Requires, which is used heavily in your stl p... Eric Mikida
04:38 PM Bug #1443: Serialization for std::unique_ptr Fails With Abstract Base Class
(formatted code) Phil Miller
05:07 PM Feature #1575: The OpenMP integration modified to run on Converse user-level threads
The main commit for this was merged, along with several follow-on fixes.
Looks like maybe all that's left relating...
Phil Miller
04:44 PM Bug #1726 (Merged): Bigsim autobuild failures in checkpoint/restart test
Phil Miller
03:51 PM Bug #1726: Bigsim autobuild failures in checkpoint/restart test
https://charm.cs.illinois.edu/gerrit/#/c/3220/ Phil Miller
03:50 PM Bug #1726: Bigsim autobuild failures in checkpoint/restart test
Looks like bgtest is happy with that patch, too. Phil Miller
03:47 PM Bug #1726 (Implemented): Bigsim autobuild failures in checkpoint/restart test
It looks like the fix for #1735 also solves this. I'll run the tests a few more times to gain a bit of confidence. Phil Miller
03:01 PM Bug #1726: Bigsim autobuild failures in checkpoint/restart test
This bug is not specific to the checkpoint/restart test. The hang happens when number of Charm PEs does not divide th... Karthik Senthil
12:10 PM Bug #1726: Bigsim autobuild failures in checkpoint/restart test
Phil commented that git bisect shows this commit at fault: https://charm.cs.illinois.edu/gerrit/#/c/381/ Sam White
09:55 AM Bug #1726: Bigsim autobuild failures in checkpoint/restart test
Buil system changes broke bigsim even worse for the past couple days but those have been fixed so this is again showi... Sam White
04:44 PM Bug #1735 (Merged): Hang in syncfttest restart after fixing #537
Phil Miller
03:51 PM Bug #1735: Hang in syncfttest restart after fixing #537
https://charm.cs.illinois.edu/gerrit/#/c/3220/ Phil Miller
03:28 PM Bug #1735 (Implemented): Hang in syncfttest restart after fixing #537
Phil Miller
03:28 PM Bug #1735: Hang in syncfttest restart after fixing #537
In my testing so far, the fix is looking good. Will post it on Gerrit shortly. I need to tweak some of the tests so t... Phil Miller
03:21 PM Bug #1735 (In Progress): Hang in syncfttest restart after fixing #537
Ronak Buch
03:17 PM Bug #1735: Hang in syncfttest restart after fixing #537
The fix may actually be simple - replacing CmiBarrier with CmiNode(All)Barrier. Phil Miller
03:13 PM Bug #1735 (Merged): Hang in syncfttest restart after fixing #537
Per git bisect, the fix for issue #537 ( commit be7ee10917004cb13a8fd0c27fd0f026bf774c43 / change If277ed8110b41de323... Phil Miller
04:28 PM Bug #1715 (Merged): 20% slowdown in ChaNGa after commit 159fd36fc
Jaemin Choi
02:38 PM Bug #1715 (Implemented): 20% slowdown in ChaNGa after commit 159fd36fc
Sam White
01:41 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
Pushed a patch here: https://charm.cs.illinois.edu/gerrit/3219.
Jaemin ran some tests on golub using the verbs SMP...
Eric Mikida
03:39 PM Bug #1679: Do Not Require Default Constructors for Serializable Classes
An example program will be useful to have (and/or a fragment) in redmine to illustrate the feature. Laxmikant "Sanjay" Kale
12:28 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
FYI updating macports properly also allows a build targeted to gcc to work. So really the issue here was a incomplet... Eric Bohm
11:43 AM Bug #1728 (Rejected): Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
I had revised the path to place a macports installed gcc first. So the most recent issue I hit was due to a problem... Eric Bohm
09:52 AM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
Can you check config.log for hwloc? That will have more details of what went wrong Sam White
12:10 AM Bug #1375 (Merged): os-isomalloc failures during startup on SMP builds
Patch to make os-isomalloc the recommended and tested version of Isomalloc everywhere: https://charm.cs.illinois.edu/... Sam White
09:29 PM Bug #1694: Projections shows garbage for indices of 4d, 5d, 6d array elements
I happened to be looking at CkArrayIndex for issue #1065, realized it was missing support in places for >3 dimensiona... Sam White

11/01/2017

04:05 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
Yes, it looks like when things migrate, we are no longer catching 'multi-hop' messages and updating the sender. So wh... Eric Mikida
03:33 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
And one more data point: if I use "NullLB", tree building speeds up by a factor of 3! Thomas Quinn
03:15 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I did make a "comm" projections plot, and I think I am seeing what Sanjay is suggesting: a message goes to a comm thr... Thomas Quinn
03:02 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
We think we have narrowed down the issue. Should have more info and possibly a patch to test by end of day tomorrow. Eric Mikida
10:31 AM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
can someone upload the 2 gprof outputs to compare?
(the pngs for com are not viewable on my mac.. but I don't have "...
Laxmikant "Sanjay" Kale
03:09 PM Support #1732 (New): Add CUDA as an option in the build script
Jaemin Choi
02:50 PM Bug #1523: Verbs RDMA send fails on 0-byte sized message
I'll test this on golub, which is the only verbs machine I have access to. Jaemin Choi
02:46 PM Feature #952: Update AMPI's version of ROMIO
Pushing to 6.9.1 Matthias Diener
02:45 PM Bug #1464: CUDA example programs hang when run with 1 PE
Low priority, as the default mode of execution is CUDA events and not CUDA callbacks.
Will look into this issue agai...
Jaemin Choi
09:58 AM Feature #1569 (Merged): Support the Flang Fortran compiler
Sam White
09:22 AM Support #1674: Add 'ofi' target to autobuild
Is there a problem with ofi on golub? This needs to get done Sam White
08:11 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
hmm, thats strange. I just did a fresh checkout on High Sierra 10.13.1/Xcode 9.1, and charm++ compiles without proble... Matthias Diener

10/31/2017

05:15 PM Bug #1325: AMPI programs fail to link with Isomalloc heaps
We aren't currently getting Isomalloc tests run on autobuild on Darwin, because the linker there doesn't like -Wl,--a... Sam White
05:04 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
updating to high Sierra has landed me at a different bug.
Makefile:1028: Variable OPTS is defined to an empty stri...
Eric Bohm
12:25 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
So to summarize, I wouldn't bother fixing this, as it affects only a specific and very particular system configuratio... Matthias Diener
11:19 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
This seems to be a common bug with the new Xcode 9 on macOS Sierra (10.12): https://github.com/tensorflow/tensorflow/... Matthias Diener
11:01 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
I added Matthias as a watcher here because I think he found a fix for this issue? Sam White
02:45 PM Bug #1375 (Implemented): os-isomalloc failures during startup on SMP builds
This fixes memory os-isomalloc and updates the other modules to work on SMP mode: https://charm.cs.illinois.edu/gerri... Sam White
02:05 PM Bug #1375: os-isomalloc failures during startup on SMP builds
And the subsequent crash was simply because there were @CpvInitialize@ calls in @meta_init()@, and the call was guard... Phil Miller
01:43 PM Bug #1375: os-isomalloc failures during startup on SMP builds
Here's the real fix for the hang:
https://charm.cs.illinois.edu/gerrit/3204
It turns out that there was a mis-align...
Phil Miller
12:19 PM Feature #1731 (In Progress): Complete spack installation script
Spack has a charm package, but it is out of date (v6.7.1) and only supports a few charm build options (net, netlrts, ... Sam White
12:04 PM Bug #1076 (Rejected): failure to exit after long run on PSC Bridges
Sam White
09:30 AM Bug #1443: Serialization for std::unique_ptr Fails With Abstract Base Class
You should be able to just drop in our implementation into your code. We also have various other STL containers imple... Nils Deppe
08:47 AM Bug #1443: Serialization for std::unique_ptr Fails With Abstract Base Class
This needs to be revisited now that we're working on 6.9.0 Sam White
09:28 AM Documentation #1611 (Implemented): Document network dependent rdma thresholds, above which benefi...
I mistakenly thought this had been merged a while ago Sam White
11:15 PM Documentation #1611 (Merged): Document network dependent rdma thresholds, above which benefits of...
Sam White
09:05 AM Feature #1730 (Merged): The RTS should set std::set_terminate
Sam White
09:04 AM Feature #1729 (New): Mark the entire RTS noexcept
Charm++ doesn't use exception handling internally, and we've seen that passing -fno-exceptions -fno-rtti and such can... Sam White
09:00 AM Bug #571: pxshm shared queue lockless implementation is invalid
We are looking to use CMA rather than pxshm where possible, possibly obviating the need for this. Sam White
08:55 AM Feature #975 (Merged): OFI Layer
Sam White
08:51 AM Cleanup #1311: Align XL-specific conditional compilation TRAM to relevant versions
TRAM needs to be tested on the XLC on Summit-dev now without the #if CMK_USING_XLC's in charmxi's generated code
...
Sam White
08:40 AM Bug #1664 (Merged): Port Sameer's PAMI changes for POWER8 to PAMILRTS
Sam White
11:34 PM Feature #1158 (New): AMPI scatter(v) performance is poor
Sam White
11:23 PM Feature #1322 (Closed): PSM2 network layer
OFI outperforms MPI on PSM2 and has been merged Sam White
11:22 PM Feature #1389 (Merged): AMPI ATAReq test/wait performs poorly
Sam White
11:18 PM Feature #1480: API to control whether a PE helps other threads that generate CkLoop/OpenMP/Parall...
This should be near the top of priorities for the Intra-node group for 6.9.0 Sam White
11:16 PM Feature #1478 (Closed): Investigate use of pxshm in CmiAlloc
CMA doesn't require registration of any kind, and we are moving toward using it rather than pxshm. Sam White
11:08 PM Bug #1542: CkArrayCreated callback should be part of CkArrayOptions
If we are going to make a breaking API change here, 6.9.0 is the time to do it Sam White
11:07 PM Feature #1655: Enable use of shm transport for regular messages in LRTS
The current plan is to use CMA for interprocess copies. Nitin is working on it now. Sam White
11:05 PM Feature #1722: pxshm for mpi layer
The current plan is to use CMA. It is supported by Linux kernels v3.2+. It's not clear if it will be ready for 6.9.0 ... Sam White
11:04 PM Feature #1721: pxshm in OFI
CMA will work on all Linux v3.2+ kernels Sam White

10/30/2017

05:40 PM Bug #1728 (Rejected): Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
Apple LLVM version 9.0.0 (clang-900.0.38)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /App...
Eric Bohm
11:54 AM Bug #901: Threads awoken by CthAwaken don't let Projections trace back to the event that woke them
https://charm.cs.illinois.edu/gerrit/gitweb?p=charm.git;a=commitdiff;h=15f34d71f4c7704dde34db3efec1c9689604e4cb Sam White
11:53 AM Feature #1727 (Merged): Make Boost uFcontext the default ULT implementation on supported platforms
Boost uFcontext threads have lower context switching than any existing implementation we have, so should be made the ... Sam White
09:44 AM Feature #1394: Node-level message aggregation for CkMulticast
Respecting the dependencies during creation seems to solve problems. Performance still needs to be tuned.
But node...
Juan Galvez

10/29/2017

06:17 PM Feature #1133 (Implemented): PMPI_ interface for AMPI
The patch is here: https://charm.cs.illinois.edu/gerrit/#/c/2544/
Linux support:
Works on {netlrts,mpi}-linux-x86...
Matthias Diener
07:35 PM Feature #1133: PMPI_ interface for AMPI
It makes sense to me. If you already have a small test working then I'd say go ahead. Even if it didn't work on the M... Sam White

10/28/2017

06:52 PM Feature #1133: PMPI_ interface for AMPI
My current thinking is to implement something like this (using MPI_Send as an example):
In ampi.h:...
Matthias Diener
12:05 PM Bug #1726 (Merged): Bigsim autobuild failures in checkpoint/restart test
tests/charm++/chkpt/ is failing the past couple of days on the Bigsim autobuild target on Charity.... Sam White

10/27/2017

05:22 PM Bug #1668: Ensure that all libraries/modules will build as dynamic/shared objects (.so/.dylib vs .a)
charmc wants to reference hwloc when linking a shared object for QuickThreads. hwloc doesn't build a corresponding sh... Phil Miller
10:13 AM Bug #1668: Ensure that all libraries/modules will build as dynamic/shared objects (.so/.dylib vs .a)
Yeah, there is no update regarding romio yet. Matthias Diener
04:17 PM Bug #1174 (In Progress): Use hwloc data from compute host, rather than assuming they're identical...
Phil Miller
04:16 PM Feature #1175: Don't require autoconf to be installed on user systems for hwloc build
Agreed, https://charm.cs.illinois.edu/gerrit/#/c/3041/ resolves this Phil Miller
03:59 PM Feature #1175 (Merged): Don't require autoconf to be installed on user systems for hwloc build
I believe the merge of package-tarball.sh completes this task. Evan Ramos
03:39 PM Support #1725: Improve pup_stl testing
Basically, add some different stl containers as members of HeapObject, initialize them to whatever values, then after... Sam White
10:07 AM Support #1725 (In Progress): Improve pup_stl testing
examples/charm++/PUP/STLPUP/ only tests std::vector<float> right now. The PUP routine for std::vector has different s... Sam White
09:52 AM Bug #1716 (Implemented): Add a strict configure test for C++11 compiler support
Make configure test strict: https://charm.cs.illinois.edu/gerrit/#/c/3189/
Make clang the default compiler on BGQ:...
Sam White
09:51 AM Bug #1724: Make BGQ builds default to using bgclang
If we detect that we are on a BGQ and can't find bgclang, fail and tell the user to do 'soft add +mpiwrapper-bgclang'... Sam White
09:51 AM Bug #1724 (Merged): Make BGQ builds default to using bgclang
bgxlc and bggcc don't support C++11, so make the default compiler on BGQ builds bgclang. Sam White
09:26 AM Bug #1560: icc build fails on NASA Pleiades
We need to update this for 6.9.0 and C++11 support. Our configure script points people to this issue if using ICC wit... Sam White
01:28 AM Feature #1723 (Merged): Rebase the OpenMP version onto the latest version of LLVM runtime library
There has been a year since the initial version of OpenMP integration merged into the main branch of charm.
Before ...
Seonmyeong Bak
01:21 AM Bug #1577 (Closed): User-level thread based OpenMP integration support on Mac
Seonmyeong Bak
01:20 AM Feature #1609 (Merged): User-level thread implementation based on Boost context library
Seonmyeong Bak
01:19 AM Feature #1575 (Merged): The OpenMP integration modified to run on Converse user-level threads
Seonmyeong Bak
11:08 PM Feature #1569 (Implemented): Support the Flang Fortran compiler
Matthias Diener
11:06 PM Feature #1569: Support the Flang Fortran compiler
Patch here: https://charm.cs.illinois.edu/gerrit/3187 Matthias Diener
09:46 PM Bug #1572 (Implemented): Improve pup_stl performance
Implemented the second optimization here, now that we are clear of 6.8.2 and are requiring C++11 support for 6.9.0: h... Sam White
09:20 PM Cleanup #1065: Create a more efficient caching structure for location lookup
There'd be no need for a pup for this particular structure - it can be completely discarded and reconstituted when th... Phil Miller
09:18 PM Cleanup #1065: Create a more efficient caching structure for location lookup
Change CkLocMgr's std::unordered_map's to ska::flat_hash_map's: https://charm.cs.illinois.edu/gerrit/#/c/3170/
Nee...
Sam White
09:19 PM Bug #1679: Do Not Require Default Constructors for Serializable Classes
Moved to a constructor with a tag argument approach. We now do not require either of the default or move constructor.... Phil Miller
07:04 PM Cleanup #537 (Merged): Data races in handler registration and assignment to global index variables
Phil Miller

10/26/2017

04:54 PM Feature #1497: CMA support for passing data between processes on the same node
Nitin is working on adding support for using Cross Memory Attach (CMA) for this. We already has an implementation wor... Sam White
02:00 PM Bug #1668: Ensure that all libraries/modules will build as dynamic/shared objects (.so/.dylib vs .a)
libampiromio is only built statically because the version of ROMIO we have doesn't support shared builds.
I believe ...
Sam White
01:49 PM Bug #1668: Ensure that all libraries/modules will build as dynamic/shared objects (.so/.dylib vs .a)
Looks like our hwloc also needs to have shared object compilation enabled. It's missing, probably because it doesn't ... Phil Miller
12:19 PM Bug #1714 (Merged): examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode build...
Sam White
11:44 AM Feature #1721: pxshm in OFI
*Communication with Intel about SHM support in OFI :
Summary: Our OFI layer built on PSM2, which is mostly the defau...
Nitin Bhat
11:57 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I don't know if this helps, but I notice an even bigger slow down in the "tree building" phase of ChaNGa: nearly a fa... Thomas Quinn
08:40 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I can see the PNGs if I first download them, then view them with "display". Thomas Quinn

10/25/2017

04:47 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
Looking into this more now. The lack of updateLocation methods makes sense because they are part of the location mana... Eric Mikida
02:32 PM Feature #1722 (Rejected): pxshm for mpi layer
consult with Nitin regarding implementation details. Short term, figure out if this is plausible for 6.9 or not. Eric Bohm
02:30 PM Feature #1721 (Rejected): pxshm in OFI
Check with OFI at intel regarding whether and how this should be done. Eric Bohm
01:54 PM Feature #1113: smart-build.pl should detect supercomputers with specialized software environments...
Phil, do you think we should try to get this done in 6.9? I think the overall effort isn't hard, but there are a lot... Eric Bohm
01:31 PM Support #408 (Closed): Limit output during Charm build (i.e. 'quiet' build)
already implemented as --quiet. Unclear that this should be made a default. Eric Bohm
01:26 PM Feature #541: SMP mesage passing must enforce memory ordering consistency
This appears to require an audit of memory consistency usage throughout machine-smp. Eric Bohm
01:12 PM Feature #34: Reduce Charm Message Send Overhead for Marshalled Messages
The zero copy schemes should address this issue, but we'll need to retest. Eric Bohm
11:25 PM Bug #901: Threads awoken by CthAwaken don't let Projections trace back to the event that woke them
This issue is similar to the following CkLoop tracing issue and I fixed in my local branch for ULT OpenMP to be shown... Seonmyeong Bak
09:05 PM Bug #1716: Add a strict configure test for C++11 compiler support
We'll still need to check the Intel compiler's incompatibility with the active g++/libstdc++, I think.
We'll want ...
Sam White
09:03 PM Bug #1718 (Closed): Configure check for C++11 support
Duplicate of #1716. Sam White

10/24/2017

05:33 PM Bug #1718 (Closed): Configure check for C++11 support
We currently only check for the few C++11 features that we require in 6.8.1, not for full C++11 support which we want... Sam White
11:50 AM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I tried the experiment of running the benchmark without load balancing (well, one load balance at the very beginning ... Thomas Quinn

10/23/2017

05:43 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I tried the experiment of running the benchmark without load balancing (well, one load balance at the very beginning ... Thomas Quinn

10/21/2017

09:09 AM Bug #1714: examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode builds for pro...
If there's a 6.8.2 this will be in it. Sam White

10/20/2017

06:48 PM Bug #1717 (Closed): Sending variable sized messages with std::vectors
Sam White
06:47 PM Bug #1717 (Rejected): Sending variable sized messages with std::vectors
The issue is with the message object's constructor not copying all the elements of the vector into its array members.... Sam White
06:09 PM Bug #1717 (Closed): Sending variable sized messages with std::vectors
This bug is based on the email sent by Joszef Bakosi to the mailing list : https://lists.cs.illinois.edu/lists/arc/ch... Karthik Senthil
01:51 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I don't think this slowdown has to do with the use of unordered map. Even before this change the location manager sti... Eric Mikida
12:54 PM Feature #1040: support multiple InfiniBand cards per node
If we're using pami then it's not so critical, except if we wanted to compare pami to verbs. Jim Phillips
11:17 AM Bug #1716 (Merged): Add a strict configure test for C++11 compiler support
We might also want to change the default compiler of BGQ to clang. Otherwise, make sure the error message of using xl... Sam White
10:18 AM Bug #1714: examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode builds for pro...
The 6.8.1 release was already tagged. Sam White
09:43 AM Feature #1704: Add a pamilrts-linux-ppc64le build target
https://charm.cs.illinois.edu/gerrit/#/c/3141/ Sam White

10/19/2017

05:22 PM Cleanup #537 (Implemented): Data races in handler registration and assignment to global index var...
https://charm.cs.illinois.edu/gerrit/#/c/381/ Ronak Buch

10/18/2017

10:26 AM Feature #1704 (In Progress): Add a pamilrts-linux-ppc64le build target
Nitin Bhat
09:47 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
The given case doesn't have an unusual number of chares/PE. It does have a very small compute/communication ratio: I... Thomas Quinn

10/17/2017

06:40 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
Do you happen to know if that input case running on that many PEs will result in having many more chares per PE than ... Sam White
12:16 PM Bug #1715 (Merged): 20% slowdown in ChaNGa after commit 159fd36fc
A ChaNGa user reported a noticeable slow down when compiling with recent versions of charm. A "git bisect" session p... Thomas Quinn
11:06 AM Feature #1657: CMA support for nocopy sends using the Entry Method API across processes on the sa...
See the following paper for a description of how to use XPMEM efficiently. The key is that you can register the entir... Sam White

10/16/2017

02:24 PM Bug #1714 (Implemented): examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode ...
Fix: https://charm.cs.illinois.edu/gerrit/#/c/3133/
The bug was due to a race condition where the message in the c...
Nitin Bhat
09:37 AM Bug #1714 (Merged): examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode build...
Nitin Bhat

10/13/2017

01:52 PM Feature #1713 (New): DDT support for getting the addresses of contiguous parts of non-contiguous ...
When sending large buffers consisting of non-contiguous datatypes via the zero copy API, we want to perform multiple ... Sam White
11:07 AM Feature #1637 (Merged): Zero-copy send support in the OFI layer
Phil Miller
09:57 AM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
The hang occurs in NAMD with "setenv HUGETLB_MORECORE yes" but not "setenv HUGETLB_MORECORE no". Jim Phillips
09:50 AM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
The mpi-crayxc build of NAMD works fine with craype-hugepages8M loaded at Charm++ build, NAMD build, and run.
I also...
Jim Phillips
09:05 AM Feature #112: object location services: Share array element location cache above PE level
AMPI could now benefit from this, and I think many Charm applications already do their own process-level location man... Sam White

10/12/2017

02:48 PM Bug #1709 (Implemented): Need a test that uses +partitions
Gerrit patch : https://charm.cs.illinois.edu/gerrit/#/c/3125/ Karthik Senthil
10:40 AM Feature #1321: multiple communication threads per process
Ronak, do your results mean that CmiPushPE is main hotspot for communication thread?
As far as I understand there ar...
Mikhail Shiryaev

10/11/2017

05:10 PM Bug #1708 (Implemented): Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
Added a fix to check that hugepages is not loaded while building using charmc for mpi-crayxc and mpi-crayxe builds. F... Nitin Bhat
01:52 PM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
When does this issue occur?
The issue occurs presumably because of an incompatibility between using Cray MPI when C...
Nitin Bhat
10:20 AM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
I ran into this while running examples/charm++/zerocopy/pingpong/ to get numbers for mpi-crayxc build. I later found ... Nitin Bhat
03:45 PM Bug #1641: charmrun with nodelist option (++nodelist) fails on campus cluster
Would it be possible to grant me access to Golub/Taub so I can test this directly? Evan Ramos
03:39 PM Feature #1394: Node-level message aggregation for CkMulticast
This is crashing on BW with 64 nodes.
The dependency chain for building CkArray group is locMgr->mcastMgr->array. ...
Juan Galvez
03:23 PM Feature #1394: Node-level message aggregation for CkMulticast
Currently debugging this on Blue Waters. Juan Galvez
03:23 PM Feature #176: objid_t: tracing infrastructure should use objid_t
This is implicitly dependent on 64-bit ID, which is somewhat unstable according to Eric Mikida, so it's been waiting ... Ronak Buch
03:19 PM Cleanup #1059: Unify Data Collection in Charm++
Ronak, could you make the scheduling decision on this, and maybe identify incremental subtasks that could be schedule... Phil Miller
03:17 PM Feature #885: extend physical node detection across partitions
Michael, putting this on you as the the GPU support lead. Phil Miller
03:16 PM Feature #1040: support multiple InfiniBand cards per node
Nitin, please work out how critical and feasible this is, and this whether it should be a target to complete for 6.9,... Phil Miller
03:14 PM Feature #1436: trace CcdCallFnAfter() causality
Ronak or Karthik, please get whatever further details are necessary, and decide if this should be addressed in the ne... Phil Miller
03:11 PM Bug #1104 (Merged): AMPI instances may change if migrated while suspended
The above patches have fixed all known instances of this. Sam White
03:11 PM Feature #1677: improved topology-aware partitioner
Juan, please follow up with more details/discussion, and decide on scheduling this. Phil Miller
03:10 PM Bug #1214 (New): AMPI_Just_migrated callbacks break using tlsglobals/isomalloc
Sam White
03:10 PM Bug #1155 (New): AMPI's non-blocking collectives are not sequenced
Sam White
03:10 PM Feature #1321: multiple communication threads per process
I've been doing more detailed tracing on OFI. Here's a representative example of the machine state tracing (two proce... Ronak Buch
03:10 PM Bug #1325 (New): AMPI programs fail to link with Isomalloc heaps
Sam White
03:09 PM Bug #1279 (New): Proactive fault tolerance fails due to sending message to dead node.
Sam White
02:55 PM Feature #363 (Rejected): Investigate implementation of CCS on BG/Q
This is probably just not going to happen, so closing it out. Phil Miller
02:43 PM Feature #1450 (Feedback): Clean up and add CUDA example programs
Jaemin Choi
02:41 PM Cleanup #537: Data races in handler registration and assignment to global index variables
Some fixes that I've been trying to do for this (editing calls to happen only once and inserting barriers to prevent ... Ronak Buch
02:37 PM Cleanup #537: Data races in handler registration and assignment to global index variables
Ronak noted that the fix ran into trouble in rebasing and cleaning up. He'll add details here and/or on Gerrit Phil Miller
02:40 PM Documentation #1611: Document network dependent rdma thresholds, above which benefits of the zero...
Not going to hold the 6.8.1 release for this. Phil Miller
02:38 PM Bug #1162 (Closed): tracing runs segfault while writing logs
No longer seems to be reproducible. Re-open or open a new issue if it is observed again or can be reproduced. Phil Miller
10:28 AM Bug #1706 (Merged): MPI LrtsAbort doesn't kill all replicas
Phil Miller
09:46 PM Feature #1637 (Implemented): Zero-copy send support in the OFI layer
Gerrit Link: https://charm.cs.illinois.edu/gerrit/#/c/3122/ Nitin Bhat

10/10/2017

02:33 PM Feature #1712 (Rejected): Avoid intermediate ctx to scheduler in case of ULTs
If a ULT yields, we currently context switch back to the scheduler thread always, even if the next task in the schedu... Sam White
12:16 PM Bug #1710: syncft tests: warning and crash on init_checkpt
I think the flag '+restartisomalloc' may be needed here? If so we need to try to automate that or at least document it Sam White
07:55 AM Bug #1710 (New): syncft tests: warning and crash on init_checkpt
http://ppl-jenkins:8080/job/Nightly-Build/label=trusty,platform=net-linux-x86_64-syncft/1346/console... Phil Miller
07:58 AM Bug #1711: syncft tests: unclear failure
Possibly similar / the same: http://ppl-jenkins:8080/job/Nightly-Build/label=trusty,platform=net-linux-x86_64-syncft/... Phil Miller
07:57 AM Bug #1711 (In Progress): syncft tests: unclear failure
http://ppl-jenkins:8080/job/Nightly-Build/label=trusty,platform=net-linux-x86_64-syncft/1338/console... Phil Miller

10/09/2017

08:59 AM Bug #1705 (Merged): examples/charm++/kmeans occasionally loops forever, seen on uth-linux-x86_64
Phil Miller

10/08/2017

09:48 AM Bug #1706: MPI LrtsAbort doesn't kill all replicas
How would you detect that the partition you are sending an asynchronous message to has aborted?
Also, there are many...
Jim Phillips
08:54 AM Bug #1706: MPI LrtsAbort doesn't kill all replicas
I'm kind of inclined to question the assumption that (presumably) an application-level call to @CmiAbort@ *should* br... Phil Miller
09:42 AM Bug #1709 (Merged): Need a test that uses +partitions
Bug #1675 should have been caught much earlier with a simple @make test@ in the main repository.
One challenge is ...
Phil Miller
08:44 AM Bug #1675 (Merged): OFI replica crashes
Phil Miller
12:15 AM Bug #1675 (Implemented): OFI replica crashes
Karthik Senthil
12:14 AM Bug #1675: OFI replica crashes
Gerrit patch : https://charm.cs.illinois.edu/gerrit/#/c/3115/ Karthik Senthil
 

Also available in: Atom