Project

General

Profile

Activity

From 10/02/2017 to 10/31/2017

11/01/2017

08:11 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
hmm, thats strange. I just did a fresh checkout on High Sierra 10.13.1/Xcode 9.1, and charm++ compiles without proble... Matthias Diener

10/31/2017

05:15 PM Bug #1325: AMPI programs fail to link with Isomalloc heaps
We aren't currently getting Isomalloc tests run on autobuild on Darwin, because the linker there doesn't like -Wl,--a... Sam White
05:04 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
updating to high Sierra has landed me at a different bug.
Makefile:1028: Variable OPTS is defined to an empty stri...
Eric Bohm
12:25 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
So to summarize, I wouldn't bother fixing this, as it affects only a specific and very particular system configuratio... Matthias Diener
11:19 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
This seems to be a common bug with the new Xcode 9 on macOS Sierra (10.12): https://github.com/tensorflow/tensorflow/... Matthias Diener
11:01 PM Bug #1728: Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
I added Matthias as a watcher here because I think he found a fix for this issue? Sam White
02:45 PM Bug #1375 (Implemented): os-isomalloc failures during startup on SMP builds
This fixes memory os-isomalloc and updates the other modules to work on SMP mode: https://charm.cs.illinois.edu/gerri... Sam White
02:05 PM Bug #1375: os-isomalloc failures during startup on SMP builds
And the subsequent crash was simply because there were @CpvInitialize@ calls in @meta_init()@, and the call was guard... Phil Miller
01:43 PM Bug #1375: os-isomalloc failures during startup on SMP builds
Here's the real fix for the hang:
https://charm.cs.illinois.edu/gerrit/3204
It turns out that there was a mis-align...
Phil Miller
12:19 PM Feature #1731 (Merged): Complete spack installation script
Spack has a charm package, but it is out of date (v6.7.1) and only supports a few charm build options (net, netlrts, ... Sam White
12:04 PM Bug #1076 (Rejected): failure to exit after long run on PSC Bridges
Sam White
09:30 AM Bug #1443: Serialization for std::unique_ptr Fails With Abstract Base Class
You should be able to just drop in our implementation into your code. We also have various other STL containers imple... Nils Deppe
08:47 AM Bug #1443: Serialization for std::unique_ptr Fails With Abstract Base Class
This needs to be revisited now that we're working on 6.9.0 Sam White
09:28 AM Documentation #1611 (Implemented): Document network dependent rdma thresholds, above which benefi...
I mistakenly thought this had been merged a while ago Sam White
11:15 PM Documentation #1611 (Merged): Document network dependent rdma thresholds, above which benefits of...
Sam White
09:05 AM Feature #1730 (Merged): The RTS should set std::set_terminate
Sam White
09:04 AM Feature #1729 (New): Mark the entire RTS noexcept
Charm++ doesn't use exception handling internally, and we've seen that passing -fno-exceptions -fno-rtti and such can... Sam White
09:00 AM Bug #571: pxshm shared queue lockless implementation is invalid
We are looking to use CMA rather than pxshm where possible, possibly obviating the need for this. Sam White
08:55 AM Feature #975 (Merged): OFI Layer
Sam White
08:51 AM Cleanup #1311: Align XL-specific conditional compilation TRAM to relevant versions
TRAM needs to be tested on the XLC on Summit-dev now without the #if CMK_USING_XLC's in charmxi's generated code
...
Sam White
08:40 AM Bug #1664 (Merged): Port Sameer's PAMI changes for POWER8 to PAMILRTS
Sam White
11:34 PM Feature #1158 (New): AMPI scatter(v) performance is poor
Sam White
11:23 PM Feature #1322 (Closed): PSM2 network layer
OFI outperforms MPI on PSM2 and has been merged Sam White
11:22 PM Feature #1389 (Merged): AMPI ATAReq test/wait performs poorly
Sam White
11:18 PM Feature #1480: API to control whether a PE helps other threads that generate CkLoop/OpenMP/Parall...
This should be near the top of priorities for the Intra-node group for 6.9.0 Sam White
11:16 PM Feature #1478 (Closed): Investigate use of pxshm in CmiAlloc
CMA doesn't require registration of any kind, and we are moving toward using it rather than pxshm. Sam White
11:08 PM Bug #1542: CkArrayCreated callback should be part of CkArrayOptions
If we are going to make a breaking API change here, 6.9.0 is the time to do it Sam White
11:07 PM Feature #1655: Enable use of shm transport for regular messages in LRTS
The current plan is to use CMA for interprocess copies. Nitin is working on it now. Sam White
11:05 PM Feature #1722: pxshm for mpi layer
The current plan is to use CMA. It is supported by Linux kernels v3.2+. It's not clear if it will be ready for 6.9.0 ... Sam White
11:04 PM Feature #1721: pxshm in OFI
CMA will work on all Linux v3.2+ kernels Sam White

10/30/2017

05:40 PM Bug #1728 (Rejected): Darwin clang compilation fails without -D_DARWIN_C_SOURCE on Sierra
Apple LLVM version 9.0.0 (clang-900.0.38)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /App...
Eric Bohm
11:54 AM Bug #901: Threads awoken by CthAwaken don't let Projections trace back to the event that woke them
https://charm.cs.illinois.edu/gerrit/gitweb?p=charm.git;a=commitdiff;h=15f34d71f4c7704dde34db3efec1c9689604e4cb Sam White
11:53 AM Feature #1727 (Merged): Make Boost uFcontext the default ULT implementation on supported platforms
Boost uFcontext threads have lower context switching than any existing implementation we have, so should be made the ... Sam White
09:44 AM Feature #1394: Node-level message aggregation for CkMulticast
Respecting the dependencies during creation seems to solve problems. Performance still needs to be tuned.
But node...
Juan Galvez

10/29/2017

06:17 PM Feature #1133 (Implemented): PMPI_ interface for AMPI
The patch is here: https://charm.cs.illinois.edu/gerrit/#/c/2544/
Linux support:
Works on {netlrts,mpi}-linux-x86...
Matthias Diener
07:35 PM Feature #1133: PMPI_ interface for AMPI
It makes sense to me. If you already have a small test working then I'd say go ahead. Even if it didn't work on the M... Sam White

10/28/2017

06:52 PM Feature #1133: PMPI_ interface for AMPI
My current thinking is to implement something like this (using MPI_Send as an example):
In ampi.h:...
Matthias Diener
12:05 PM Bug #1726 (Merged): Bigsim autobuild failures in checkpoint/restart test
tests/charm++/chkpt/ is failing the past couple of days on the Bigsim autobuild target on Charity.... Sam White

10/27/2017

05:22 PM Bug #1668: Ensure that all libraries/modules will build as dynamic/shared objects (.so/.dylib vs .a)
charmc wants to reference hwloc when linking a shared object for QuickThreads. hwloc doesn't build a corresponding sh... Phil Miller
10:13 AM Bug #1668: Ensure that all libraries/modules will build as dynamic/shared objects (.so/.dylib vs .a)
Yeah, there is no update regarding romio yet. Matthias Diener
04:17 PM Bug #1174 (In Progress): Use hwloc data from compute host, rather than assuming they're identical...
Phil Miller
04:16 PM Feature #1175: Don't require autoconf to be installed on user systems for hwloc build
Agreed, https://charm.cs.illinois.edu/gerrit/#/c/3041/ resolves this Phil Miller
03:59 PM Feature #1175 (Merged): Don't require autoconf to be installed on user systems for hwloc build
I believe the merge of package-tarball.sh completes this task. Evan Ramos
03:39 PM Support #1725: Improve pup_stl testing
Basically, add some different stl containers as members of HeapObject, initialize them to whatever values, then after... Sam White
10:07 AM Support #1725 (In Progress): Improve pup_stl testing
examples/charm++/PUP/STLPUP/ only tests std::vector<float> right now. The PUP routine for std::vector has different s... Sam White
09:52 AM Bug #1716 (Implemented): Add a strict configure test for C++11 compiler support
Make configure test strict: https://charm.cs.illinois.edu/gerrit/#/c/3189/
Make clang the default compiler on BGQ:...
Sam White
09:51 AM Bug #1724: Make BGQ builds default to using bgclang
If we detect that we are on a BGQ and can't find bgclang, fail and tell the user to do 'soft add +mpiwrapper-bgclang'... Sam White
09:51 AM Bug #1724 (Merged): Make BGQ builds default to using bgclang
bgxlc and bggcc don't support C++11, so make the default compiler on BGQ builds bgclang. Sam White
09:26 AM Bug #1560: icc build fails on NASA Pleiades
We need to update this for 6.9.0 and C++11 support. Our configure script points people to this issue if using ICC wit... Sam White
01:28 AM Feature #1723 (Merged): Rebase the OpenMP version onto the latest version of LLVM runtime library
There has been a year since the initial version of OpenMP integration merged into the main branch of charm.
Before ...
Seonmyeong Bak
01:21 AM Bug #1577 (Closed): User-level thread based OpenMP integration support on Mac
Seonmyeong Bak
01:20 AM Feature #1609 (Merged): User-level thread implementation based on Boost context library
Seonmyeong Bak
01:19 AM Feature #1575 (Merged): The OpenMP integration modified to run on Converse user-level threads
Seonmyeong Bak
11:08 PM Feature #1569 (Implemented): Support the Flang Fortran compiler
Matthias Diener
11:06 PM Feature #1569: Support the Flang Fortran compiler
Patch here: https://charm.cs.illinois.edu/gerrit/3187 Matthias Diener
09:46 PM Bug #1572 (Implemented): Improve pup_stl performance
Implemented the second optimization here, now that we are clear of 6.8.2 and are requiring C++11 support for 6.9.0: h... Sam White
09:20 PM Cleanup #1065: Create a more efficient caching structure for location lookup
There'd be no need for a pup for this particular structure - it can be completely discarded and reconstituted when th... Phil Miller
09:18 PM Cleanup #1065: Create a more efficient caching structure for location lookup
Change CkLocMgr's std::unordered_map's to ska::flat_hash_map's: https://charm.cs.illinois.edu/gerrit/#/c/3170/
Nee...
Sam White
09:19 PM Bug #1679: Do Not Require Default Constructors for Serializable Classes
Moved to a constructor with a tag argument approach. We now do not require either of the default or move constructor.... Phil Miller
07:04 PM Cleanup #537 (Merged): Data races in handler registration and assignment to global index variables
Phil Miller

10/26/2017

04:54 PM Feature #1497: CMA support for passing data between processes on the same node
Nitin is working on adding support for using Cross Memory Attach (CMA) for this. We already has an implementation wor... Sam White
02:00 PM Bug #1668: Ensure that all libraries/modules will build as dynamic/shared objects (.so/.dylib vs .a)
libampiromio is only built statically because the version of ROMIO we have doesn't support shared builds.
I believe ...
Sam White
01:49 PM Bug #1668: Ensure that all libraries/modules will build as dynamic/shared objects (.so/.dylib vs .a)
Looks like our hwloc also needs to have shared object compilation enabled. It's missing, probably because it doesn't ... Phil Miller
12:19 PM Bug #1714 (Merged): examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode build...
Sam White
11:44 AM Feature #1721: pxshm in OFI
*Communication with Intel about SHM support in OFI :
Summary: Our OFI layer built on PSM2, which is mostly the defau...
Nitin Bhat
11:57 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I don't know if this helps, but I notice an even bigger slow down in the "tree building" phase of ChaNGa: nearly a fa... Thomas Quinn
08:40 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I can see the PNGs if I first download them, then view them with "display". Thomas Quinn

10/25/2017

04:47 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
Looking into this more now. The lack of updateLocation methods makes sense because they are part of the location mana... Eric Mikida
02:32 PM Feature #1722 (Rejected): pxshm for mpi layer
consult with Nitin regarding implementation details. Short term, figure out if this is plausible for 6.9 or not. Eric Bohm
02:30 PM Feature #1721 (Rejected): pxshm in OFI
Check with OFI at intel regarding whether and how this should be done. Eric Bohm
01:54 PM Feature #1113: smart-build.pl should detect supercomputers with specialized software environments...
Phil, do you think we should try to get this done in 6.9? I think the overall effort isn't hard, but there are a lot... Eric Bohm
01:31 PM Support #408 (Closed): Limit output during Charm build (i.e. 'quiet' build)
already implemented as --quiet. Unclear that this should be made a default. Eric Bohm
01:26 PM Feature #541: SMP mesage passing must enforce memory ordering consistency
This appears to require an audit of memory consistency usage throughout machine-smp. Eric Bohm
01:12 PM Feature #34: Reduce Charm Message Send Overhead for Marshalled Messages
The zero copy schemes should address this issue, but we'll need to retest. Eric Bohm
11:25 PM Bug #901: Threads awoken by CthAwaken don't let Projections trace back to the event that woke them
This issue is similar to the following CkLoop tracing issue and I fixed in my local branch for ULT OpenMP to be shown... Seonmyeong Bak
09:05 PM Bug #1716: Add a strict configure test for C++11 compiler support
We'll still need to check the Intel compiler's incompatibility with the active g++/libstdc++, I think.
We'll want ...
Sam White
09:03 PM Bug #1718 (Closed): Configure check for C++11 support
Duplicate of #1716. Sam White

10/24/2017

05:33 PM Bug #1718 (Closed): Configure check for C++11 support
We currently only check for the few C++11 features that we require in 6.8.1, not for full C++11 support which we want... Sam White
11:50 AM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I tried the experiment of running the benchmark without load balancing (well, one load balance at the very beginning ... Thomas Quinn

10/23/2017

05:43 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I tried the experiment of running the benchmark without load balancing (well, one load balance at the very beginning ... Thomas Quinn

10/21/2017

09:09 AM Bug #1714: examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode builds for pro...
If there's a 6.8.2 this will be in it. Sam White

10/20/2017

06:48 PM Bug #1717 (Closed): Sending variable sized messages with std::vectors
Sam White
06:47 PM Bug #1717 (Rejected): Sending variable sized messages with std::vectors
The issue is with the message object's constructor not copying all the elements of the vector into its array members.... Sam White
06:09 PM Bug #1717 (Closed): Sending variable sized messages with std::vectors
This bug is based on the email sent by Joszef Bakosi to the mailing list : https://lists.cs.illinois.edu/lists/arc/ch... Karthik Senthil
01:51 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
I don't think this slowdown has to do with the use of unordered map. Even before this change the location manager sti... Eric Mikida
12:54 PM Feature #1040: support multiple InfiniBand cards per node
If we're using pami then it's not so critical, except if we wanted to compare pami to verbs. Jim Phillips
11:17 AM Bug #1716 (Merged): Add a strict configure test for C++11 compiler support
We might also want to change the default compiler of BGQ to clang. Otherwise, make sure the error message of using xl... Sam White
10:18 AM Bug #1714: examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode builds for pro...
The 6.8.1 release was already tagged. Sam White
09:43 AM Feature #1704: Add a pamilrts-linux-ppc64le build target
https://charm.cs.illinois.edu/gerrit/#/c/3141/ Sam White

10/19/2017

05:22 PM Cleanup #537 (Implemented): Data races in handler registration and assignment to global index var...
https://charm.cs.illinois.edu/gerrit/#/c/381/ Ronak Buch

10/18/2017

10:26 AM Feature #1704 (In Progress): Add a pamilrts-linux-ppc64le build target
Nitin Bhat
09:47 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
The given case doesn't have an unusual number of chares/PE. It does have a very small compute/communication ratio: I... Thomas Quinn

10/17/2017

06:40 PM Bug #1715: 20% slowdown in ChaNGa after commit 159fd36fc
Do you happen to know if that input case running on that many PEs will result in having many more chares per PE than ... Sam White
12:16 PM Bug #1715 (Merged): 20% slowdown in ChaNGa after commit 159fd36fc
A ChaNGa user reported a noticeable slow down when compiling with recent versions of charm. A "git bisect" session p... Thomas Quinn
11:06 AM Feature #1657: CMA support for nocopy sends using the Entry Method API across processes on the sa...
See the following paper for a description of how to use XPMEM efficiently. The key is that you can register the entir... Sam White

10/16/2017

02:24 PM Bug #1714 (Implemented): examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode ...
Fix: https://charm.cs.illinois.edu/gerrit/#/c/3133/
The bug was due to a race condition where the message in the c...
Nitin Bhat
09:37 AM Bug #1714 (Merged): examples/zerocopy/pingpong crashes intermittently on OFI layer SMP mode build...
Nitin Bhat

10/13/2017

01:52 PM Feature #1713 (New): DDT support for getting the addresses of contiguous parts of non-contiguous ...
When sending large buffers consisting of non-contiguous datatypes via the zero copy API, we want to perform multiple ... Sam White
11:07 AM Feature #1637 (Merged): Zero-copy send support in the OFI layer
Phil Miller
09:57 AM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
The hang occurs in NAMD with "setenv HUGETLB_MORECORE yes" but not "setenv HUGETLB_MORECORE no". Jim Phillips
09:50 AM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
The mpi-crayxc build of NAMD works fine with craype-hugepages8M loaded at Charm++ build, NAMD build, and run.
I also...
Jim Phillips
09:05 AM Feature #112: object location services: Share array element location cache above PE level
AMPI could now benefit from this, and I think many Charm applications already do their own process/node-level locatio... Sam White

10/12/2017

02:48 PM Bug #1709 (Implemented): Need a test that uses +partitions
Gerrit patch : https://charm.cs.illinois.edu/gerrit/#/c/3125/ Karthik Senthil
10:40 AM Feature #1321: multiple communication threads per process
Ronak, do your results mean that CmiPushPE is main hotspot for communication thread?
As far as I understand there ar...
Mikhail Shiryaev

10/11/2017

05:10 PM Bug #1708 (Implemented): Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
Added a fix to check that hugepages is not loaded while building using charmc for mpi-crayxc and mpi-crayxe builds. F... Nitin Bhat
01:52 PM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
When does this issue occur?
The issue occurs presumably because of an incompatibility between using Cray MPI when C...
Nitin Bhat
10:20 AM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
I ran into this while running examples/charm++/zerocopy/pingpong/ to get numbers for mpi-crayxc build. I later found ... Nitin Bhat
03:45 PM Bug #1641: charmrun with nodelist option (++nodelist) fails on campus cluster
Would it be possible to grant me access to Golub/Taub so I can test this directly? Evan Ramos
03:39 PM Feature #1394: Node-level message aggregation for CkMulticast
This is crashing on BW with 64 nodes.
The dependency chain for building CkArray group is locMgr->mcastMgr->array. ...
Juan Galvez
03:23 PM Feature #1394: Node-level message aggregation for CkMulticast
Currently debugging this on Blue Waters. Juan Galvez
03:23 PM Feature #176: objid_t: tracing infrastructure should use objid_t
This is implicitly dependent on 64-bit ID, which is somewhat unstable according to Eric Mikida, so it's been waiting ... Ronak Buch
03:19 PM Cleanup #1059: Unify Data Collection in Charm++
Ronak, could you make the scheduling decision on this, and maybe identify incremental subtasks that could be schedule... Phil Miller
03:17 PM Feature #885: extend physical node detection across partitions
Michael, putting this on you as the the GPU support lead. Phil Miller
03:16 PM Feature #1040: support multiple InfiniBand cards per node
Nitin, please work out how critical and feasible this is, and this whether it should be a target to complete for 6.9,... Phil Miller
03:14 PM Feature #1436: trace CcdCallFnAfter() causality
Ronak or Karthik, please get whatever further details are necessary, and decide if this should be addressed in the ne... Phil Miller
03:11 PM Bug #1104 (Merged): AMPI instances may change if migrated while suspended
The above patches have fixed all known instances of this. Sam White
03:11 PM Feature #1677: improved topology-aware partitioner
Juan, please follow up with more details/discussion, and decide on scheduling this. Phil Miller
03:10 PM Bug #1214 (New): AMPI_Just_migrated callbacks break using tlsglobals/isomalloc
Sam White
03:10 PM Bug #1155 (New): AMPI's non-blocking collectives are not sequenced
Sam White
03:10 PM Feature #1321: multiple communication threads per process
I've been doing more detailed tracing on OFI. Here's a representative example of the machine state tracing (two proce... Ronak Buch
03:10 PM Bug #1325 (New): AMPI programs fail to link with Isomalloc heaps
Sam White
03:09 PM Bug #1279 (New): Proactive fault tolerance fails due to sending message to dead node.
Sam White
02:55 PM Feature #363 (Rejected): Investigate implementation of CCS on BG/Q
This is probably just not going to happen, so closing it out. Phil Miller
02:43 PM Feature #1450 (Feedback): Clean up and add CUDA example programs
Jaemin Choi
02:41 PM Cleanup #537: Data races in handler registration and assignment to global index variables
Some fixes that I've been trying to do for this (editing calls to happen only once and inserting barriers to prevent ... Ronak Buch
02:37 PM Cleanup #537: Data races in handler registration and assignment to global index variables
Ronak noted that the fix ran into trouble in rebasing and cleaning up. He'll add details here and/or on Gerrit Phil Miller
02:40 PM Documentation #1611: Document network dependent rdma thresholds, above which benefits of the zero...
Not going to hold the 6.8.1 release for this. Phil Miller
02:38 PM Bug #1162 (Closed): tracing runs segfault while writing logs
No longer seems to be reproducible. Re-open or open a new issue if it is observed again or can be reproduced. Phil Miller
10:28 AM Bug #1706 (Merged): MPI LrtsAbort doesn't kill all replicas
Phil Miller
09:46 PM Feature #1637 (Implemented): Zero-copy send support in the OFI layer
Gerrit Link: https://charm.cs.illinois.edu/gerrit/#/c/3122/ Nitin Bhat

10/10/2017

02:33 PM Feature #1712 (Rejected): Avoid intermediate ctx to scheduler in case of ULTs
If a ULT yields, we currently context switch back to the scheduler thread always, even if the next task in the schedu... Sam White
12:16 PM Bug #1710: syncft tests: warning and crash on init_checkpt
I think the flag '+restartisomalloc' may be needed here? If so we need to try to automate that or at least document it Sam White
07:55 AM Bug #1710 (New): syncft tests: warning and crash on init_checkpt
http://ppl-jenkins:8080/job/Nightly-Build/label=trusty,platform=net-linux-x86_64-syncft/1346/console... Phil Miller
07:58 AM Bug #1711: syncft tests: unclear failure
Possibly similar / the same: http://ppl-jenkins:8080/job/Nightly-Build/label=trusty,platform=net-linux-x86_64-syncft/... Phil Miller
07:57 AM Bug #1711 (In Progress): syncft tests: unclear failure
http://ppl-jenkins:8080/job/Nightly-Build/label=trusty,platform=net-linux-x86_64-syncft/1338/console... Phil Miller

10/09/2017

08:59 AM Bug #1705 (Merged): examples/charm++/kmeans occasionally loops forever, seen on uth-linux-x86_64
Phil Miller

10/08/2017

09:48 AM Bug #1706: MPI LrtsAbort doesn't kill all replicas
How would you detect that the partition you are sending an asynchronous message to has aborted?
Also, there are many...
Jim Phillips
08:54 AM Bug #1706: MPI LrtsAbort doesn't kill all replicas
I'm kind of inclined to question the assumption that (presumably) an application-level call to @CmiAbort@ *should* br... Phil Miller
09:42 AM Bug #1709 (Merged): Need a test that uses +partitions
Bug #1675 should have been caught much earlier with a simple @make test@ in the main repository.
One challenge is ...
Phil Miller
08:44 AM Bug #1675 (Merged): OFI replica crashes
Phil Miller
12:15 AM Bug #1675 (Implemented): OFI replica crashes
Karthik Senthil
12:14 AM Bug #1675: OFI replica crashes
Gerrit patch : https://charm.cs.illinois.edu/gerrit/#/c/3115/ Karthik Senthil

10/06/2017

06:11 PM Documentation #1611 (Implemented): Document network dependent rdma thresholds, above which benefi...
Fix: https://charm.cs.illinois.edu/gerrit/#/c/3100/
This documentation exposed two bugs: https://charm.cs.illinois...
Nitin Bhat
05:51 PM Bug #1708: Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
The bug was not caught by autobuild. My guess is that it runs mpi-crayxc tests only on 1 host. Nitin Bhat
05:46 PM Bug #1708 (Implemented): Charm++ programs hang with mpi-crayxc build on Edison when run on 2 hosts
The hang happens when a 2 process (logical node) run is made on 2 hosts. It however works on a 2 process run on 1 hos... Nitin Bhat
05:43 PM Bug #1707 (New): Nocopy Entry method API ack handling crashes on pamilrts-bluegeneq-async-smp
Nitin Bhat
04:41 PM Bug #1706: MPI LrtsAbort doesn't kill all replicas
Regarding 3, if you could get the comm thread to participate in a node-all barrier you could just have it call MPI_Ab... Jim Phillips
12:17 PM Bug #1706 (In Progress): MPI LrtsAbort doesn't kill all replicas
I'm not sure how we should go about making this safe for thread level FUNNELED in SMP mode.
1. Add a flag to the obj...
Sam White
10:46 AM Bug #1706: MPI LrtsAbort doesn't kill all replicas
Or have the comm thread call MPI_Abort. Jim Phillips
10:29 AM Bug #1706: MPI LrtsAbort doesn't kill all replicas
Assign to Sam as MPI machine layer owner.
I'm thinking @LrtsAbort@ should just call @MPI_Abort@ and not try to do ...
Phil Miller
09:48 AM Bug #1706: MPI LrtsAbort doesn't kill all replicas
Also, we have rank 0 (rather than the communication thread) making MPI calls, which is not kosher for MPI_THREAD_FUNN... Jim Phillips
09:44 AM Bug #1706 (Merged): MPI LrtsAbort doesn't kill all replicas
User reports that when one replica on Stampede2 dies the others keep running. It looks like the machine_exit code do... Jim Phillips
11:00 AM Bug #1701: Cannot have non-copyable types in constructor arguments
Yep, that's exactly correct. The only difference between my modified code and the generated is the addition of @std::... Nils Deppe
10:53 AM Bug #1701: Cannot have non-copyable types in constructor arguments
OK, so I think what you're saying is that the generated code needs to move the temporary instances, so that the recip... Phil Miller
10:49 AM Bug #1701: Cannot have non-copyable types in constructor arguments
OK, our notes crossed paths as we were writing them. I'll look further in a bit. Phil Miller
10:47 AM Bug #1701: Cannot have non-copyable types in constructor arguments
Indeed, CProxy_Foo's generated code takes its arguments by const& even when declared in the .ci file as taking them b... Phil Miller
10:44 AM Bug #1701: Cannot have non-copyable types in constructor arguments
Looking now, this ticket could be made much more helpful on my part. The serialization is not the problem, it is the ... Nils Deppe
10:32 AM Bug #1701: Cannot have non-copyable types in constructor arguments
I take it you're trying to pass an rvalue reference to such an object? It doesn't really make semantic sense for any ... Phil Miller
10:58 AM Bug #1700: Overloaded reduction targets result in compilation error
I'm not sure you'll get the correct behavior even with just one of those methods - what does the contribute call to p... Phil Miller
10:45 AM Bug #1685 (Closed): charmc Chokes on @explicit@ constructors
Phil Miller
10:31 AM Bug #1685: charmc Chokes on @explicit@ constructors
Ah good. Yes, with moving away from ci files I don't think this is worth anyone's time to fix. Nils Deppe
10:05 AM Bug #1685: charmc Chokes on @explicit@ constructors
So, it wasn't stated earlier, but it looks like having @explicit@ in the class declarations in the C++ code does work... Phil Miller
10:20 AM Support #126: Document process launching arguments with aim of cross-machine rationalization
This likely ends up being obviated by the shift to hwloc-driven launch Phil Miller
10:17 AM Projections Feature #995: Color by user supplied parameter (e.g. timestep) in non-timeline tools
Where's the code for this? Did it ever get integrated? I don't see it on master in the projections repo. Phil Miller
10:06 AM Support #1391 (Closed): Add an SMP/multicore build test to Jenkins
This has been working smoothly for a while. Phil Miller
09:59 PM Bug #1676: Replicas slower than separate jobs on GNI systems
Sorry, dynamic and static SMSG have indistinguishable performance at large replica counts, although the final WallClo... Jim Phillips
09:20 PM Bug #1676: Replicas slower than separate jobs on GNI systems
For 16 nodes:
aprun -n 496 -r 1 -N 31 -d 1 /u/sciteam/jphillip/NAMD_LATEST_CRAY-XE-ugni-BlueWaters/namd2 +pemap 0-30...
Jim Phillips

10/05/2017

06:09 PM Bug #1676: Replicas slower than separate jobs on GNI systems
And while you're at it, could you post your full command line and the runtime's startup output? Phil Miller
06:08 PM Bug #1676: Replicas slower than separate jobs on GNI systems
Could you try the same test (4 nodes per replica, increasing replica count) with @+useDynamicSmsg@? I'm kinda suspect... Phil Miller
12:54 PM Feature #1682 (In Progress): Expose Arrays' Index Type as a Type Alias
https://charm.cs.illinois.edu/gerrit/3109
I was able to add @array_index_t@ to @CProxy_@*. Adding it to the base l...
Evan Ramos
10:54 AM Feature #1682: Expose Arrays' Index Type as a Type Alias
I'd go with the obvious: @array_element_t@ :) Nils Deppe
11:04 AM Bug #1705 (Implemented): examples/charm++/kmeans occasionally loops forever, seen on uth-linux-x8...
https://charm.cs.illinois.edu/gerrit/3107 Phil Miller
10:51 AM Bug #1705: examples/charm++/kmeans occasionally loops forever, seen on uth-linux-x86_64
Looks like a floating point associativity failure in the use of the @sum_double@ reduction:... Phil Miller
10:39 AM Bug #1705 (Merged): examples/charm++/kmeans occasionally loops forever, seen on uth-linux-x86_64
We've seen this failure a few times, but never debugged it. I've added some prints and after a dozen or so runs, got ... Phil Miller
10:55 AM Bug #1686: Use a namespace for Charm++ code
Sounds like a good plan to me :) Nils Deppe
08:53 AM Feature #1704 (Merged): Add a pamilrts-linux-ppc64le build target
pami is in some sense deprecated in favor of pamilrts already. We only support the zero-copy API on pamilrts, not pam... Sam White
09:33 PM Bug #1702: Inconsistent charm++ archives
Awesome, thank you very much! Nils Deppe
08:11 PM Bug #1702 (Closed): Inconsistent charm++ archives
I've posted a gzipped version of the same tarball.
We'll release future versions as tar.gz, mostly for convenience...
Phil Miller

10/04/2017

05:13 PM Bug #1676: Replicas slower than separate jobs on GNI systems
From some basic profiling it appears that the amount of time spent in alloc_mempool_block (but not the number of call... Jim Phillips
03:34 PM Bug #1702: Inconsistent charm++ archives
Ah okay, well I'm fine with the change, I just wanted to make sure it's intentional. We restrict ourselves to the new... Nils Deppe
03:18 PM Bug #1702: Inconsistent charm++ archives
It was intentional, to provide a smaller download. I hadn't thought about the impact on systems that would be looking... Phil Miller
02:21 PM Bug #881: Automatically determine location of nvcc when compiling programs using charmc in accel
Sam White wrote:
> Is this critical for 6.8.1? Retarget to 6.9.0 if not
After looking at the title of this bug I ...
Michael Robson
01:35 PM Bug #1162 (Feedback): tracing runs segfault while writing logs
I was never able to reproduce this. Has this been an issue for you at all as of late, Jim (or anyone else)? Ronak Buch
01:29 PM Bug #1273 (Closed): Tracemode utilization crashes in production build of Charm++
Was likely fixed a while ago, but never updated. Since it's not reproducible, I'll close it for now. Ronak Buch
01:25 PM Bug #829 (Closed): CkLoop projections tracing doesn't reflect the origin/traceback of work
As far as I know, this is a duplicate of #1437, and it's been fixed in https://charm.cs.illinois.edu/gerrit/#/c/3084/... Ronak Buch

10/03/2017

02:32 PM Bug #1676: Replicas slower than separate jobs on GNI systems
All of the replicas are uniformly slow. There is no inter-replica interaction.
I haven't looked at large node count...
Jim Phillips
02:19 PM Bug #1676: Replicas slower than separate jobs on GNI systems
Ok, so the effect grows in magnitude with replica count, and requires at least a few nodes to occur.
What about th...
Phil Miller
02:14 PM Bug #1676: Replicas slower than separate jobs on GNI systems
No and no. I've been using 4 nodes per replica, non-smp. The effect starts to be visible above noise at 16 replicas... Jim Phillips
11:47 AM Bug #1676: Replicas slower than separate jobs on GNI systems
Querying test-case reduction, since there are basically no progress notes on this issue:
* Is a 2 node, 2 replica jo...
Phil Miller
02:32 PM Bug #1510: Hang in tests/charm++/chkpt when using -tracemode perfReport
So it looks like after the restart, @t->getTraceOn()@ returns false, and on multiple paths, this means that the code ... Phil Miller
01:11 PM Bug #1510: Hang in tests/charm++/chkpt when using -tracemode perfReport
Hang happens in @traceAutoPerfExitFunction@, at @autoPerfProxy.endStepResumeCb(true, CkMyPe(), CkCallbackResumeThread... Phil Miller
01:00 PM Bug #1510: Hang in tests/charm++/chkpt when using -tracemode perfReport
https://charm.cs.illinois.edu/gerrit/3097 tracemode perfReport: don't close file in race with code that will write to... Phil Miller
12:40 PM Bug #1510: Hang in tests/charm++/chkpt when using -tracemode perfReport
OK, I'm running this now, and actually seeing a crash in the exit path in the first run of @./hello@:... Phil Miller
12:09 PM Bug #1510: Hang in tests/charm++/chkpt when using -tracemode perfReport
Taking a second look at this, are we particularly concerned with tracing support across checkpoint/restart? Do we act... Phil Miller
01:14 PM Cleanup #566 (Merged): Charm++ cell example cleanup
Phil Miller
11:48 AM Cleanup #566 (Feedback): Charm++ cell example cleanup
Phil Miller
11:48 AM Cleanup #566 (Implemented): Charm++ cell example cleanup
Sam White
12:54 PM Bug #1273: Tracemode utilization crashes in production build of Charm++
I tested this again on my lab machine (netlrts-linux-x86_64-smp) for wave2d and jacobi2d. I didn't run into any crashes. Karthik Senthil
11:55 AM Bug #1273: Tracemode utilization crashes in production build of Charm++
Karthik can you test this again? Sam White
11:50 AM Bug #1162: tracing runs segfault while writing logs
Bump. Need to reproduce the failures and address them, or retarget to 6.9.0 Sam White
11:48 AM Bug #829 (Feedback): CkLoop projections tracing doesn't reflect the origin/traceback of work
Phil Miller
11:47 AM Bug #881: Automatically determine location of nvcc when compiling programs using charmc in accel
Is this critical for 6.8.1? Retarget to 6.9.0 if not Sam White
11:40 AM Bug #1081 (Merged): Converse command line arguments produce false warnings
Phil Miller

10/02/2017

03:48 PM Cleanup #566: Charm++ cell example cleanup
Or better, just review this if indeed that's the direction to go:
https://charm.cs.illinois.edu/gerrit/3092
Phil Miller
03:46 PM Cleanup #566: Charm++ cell example cleanup
Wasn't the decision on this to just delete the code in question, since there are up-to-date replacements in the upcom... Phil Miller
03:35 PM Bug #1201 (Rejected): SMP builds segfault on NULL lock in tests/charm++/chkpt
This doesn't seem to have appeared in any of the Jenkins Nightly-Build runs for any configuration since at least June... Phil Miller
03:23 PM Bug #1081: Converse command line arguments produce false warnings
https://charm.cs.illinois.edu/gerrit/3091 Phil Miller
03:18 PM Bug #1081 (Implemented): Converse command line arguments produce false warnings
Phil Miller
03:02 PM Bug #1081: Converse command line arguments produce false warnings
I'm going to take a quick look at this, and if I can't just nail it down, we should defer it. If we've lived with it ... Phil Miller
02:36 PM Bug #1680: ci file compilation fails with no details to debug when the module name has a hyphen
Fix for charmxi was here: https://charm.cs.illinois.edu/gerrit/3090 Phil Miller
02:33 PM Bug #1680 (Merged): ci file compilation fails with no details to debug when the module name has a...
Phil Miller
12:45 PM Bug #1680 (Implemented): ci file compilation fails with no details to debug when the module name ...
Samarth Kulshreshtha
02:35 PM Bug #1683 (Merged): Charmc ends up deleting .C/.cpp file in case you forget to specify the output...
Phil Miller
12:48 PM Bug #1702 (Closed): Inconsistent charm++ archives
The v6.8.0 archive is a @.tar.bz2@ archive while older versions are @.tar.gz@. This makes in more difficult when pack... Nils Deppe
 

Also available in: Atom