charm.git
6 years agoDisable remote_event macro for CMK_CRAYXC 25/425/6
Yanhua Sun [Fri, 7 Nov 2014 18:02:10 +0000 (10:02 -0800)]
Disable remote_event macro for CMK_CRAYXC

Change-Id: I691154bb9352ff4790df3b28451c4f8529d8a97c

6 years agofix uGNI machine.c send_smsg_message function call error 26/426/3
Yanhua Sun [Sat, 8 Nov 2014 04:16:28 +0000 (20:16 -0800)]
fix uGNI machine.c send_smsg_message function call error

Change-Id: I6162d576f18acaaee18fee67d45ecb7ddbe97546

6 years agoIssue 374, BGQ Mapping: use kernel call to find the mapping 22/422/2
Nikhil Jain [Tue, 4 Nov 2014 21:49:20 +0000 (21:49 +0000)]
Issue 374, BGQ Mapping: use kernel call to find the mapping

Change-Id: I19391f3662f82e978899a48cfb45ee4ed194e822

6 years agoAdd pxshm conv-mach in verbs 70/370/5
Nikhil Jain [Mon, 15 Sep 2014 20:29:17 +0000 (15:29 -0500)]
Add pxshm conv-mach in verbs

This should enable use of pxshm with verbs layer. Use of
lock is needed for correctness.

Change-Id: Ia89893f8dbff3ff7d68d88ee0b55f8a22df40fe6

6 years agoFeature #574 win: Copy binaries to <build>/bin not /usr/bin 23/423/7
Michael Robson [Wed, 5 Nov 2014 19:00:40 +0000 (13:00 -0600)]
Feature #574 win: Copy binaries to <build>/bin not /usr/bin

Change-Id: I0e383dbc7b5426bb4ebd1df80923845b5b320deb

6 years agoChanges to handle failures during checkpointing 15/415/4
Xiang Ni [Tue, 28 Oct 2014 04:24:02 +0000 (23:24 -0500)]
Changes to handle failures during checkpointing

This commit enables Charm++ RTS to handle failures that occur during checkpointing: this is achieved by keeping the previous checkpoint around till the new checkpoint is generated on all processors. CmiReduce is used to find the last safe checkpoint on all processors when failures happen.

Change-Id: Ib83de2c5d66b6257809d365b1c617298bb89f9c3

6 years ago#161 verbs, netlrts: Complete rename of 'noprocforcommthread' to 'sleepOnIdle' 77/377/3
Phil Miller [Mon, 29 Sep 2014 22:32:51 +0000 (17:32 -0500)]
#161 verbs, netlrts: Complete rename of 'noprocforcommthread' to 'sleepOnIdle'

Change-Id: I96aeca113f1dcbdfb81264a7e33d151b738c3bbd

6 years agonet & multicore: Print overall execution time at shutdown 76/376/3
Phil Miller [Mon, 29 Sep 2014 16:40:35 +0000 (11:40 -0500)]
net & multicore: Print overall execution time at shutdown

Change-Id: I01b187ac9a242537d518c1fa9f4e3acf0dc8b27b

6 years agoMake examples/collide workable 19/419/1
Phil Miller [Fri, 31 Oct 2014 19:14:32 +0000 (14:14 -0500)]
Make examples/collide workable

Add a top-level Makefile, and use $TESTOPTS in `make test' in the
subdirectory Makefiles.

Change-Id: I6016fa8081b3f251fdcba29a2c23a45f1c0f1d3f

7 years agoWindows unix2nt_cc: Add support for un-defining preprocessor macros 10/410/4
Phil Miller [Tue, 21 Oct 2014 19:07:31 +0000 (14:07 -0500)]
Windows unix2nt_cc: Add support for un-defining preprocessor macros

Change-Id: Id85176dd7c0c2ad9118c53fffa7b31cb44288c7b

7 years agonetlrts & verbs: eliminate usage of CmiState in the machine layer 03/403/6
Bilge Acun [Wed, 15 Oct 2014 15:26:30 +0000 (10:26 -0500)]
netlrts & verbs: eliminate usage of CmiState in the machine layer

CmiState usage is abstracted away from the machine layer by replacing
CmiGetPeGlobal with CmiMyPeGlobal and adding a new function,
CmiIdleLock_hasMessage to check if the state has messages

Change-Id: I772849f6228c95b94cd8f6d8baa5488b99cf11d7

7 years agoSet stack size for QT threads 14/414/3
Ehsan Totoni [Sun, 26 Oct 2014 16:51:19 +0000 (11:51 -0500)]
Set stack size for QT threads

QT code allocates the stack but does not
set the stack size in the base thread data structure.
This was causing a crash in net-linux builds for
thread stack check routine since it uses the stack size.

Change-Id: Ibccaf7b2629fc4150ba3076b975202bc64fe1be7

7 years agosolving issue of thread check code on win c 13/413/2
Ehsan Totoni [Fri, 24 Oct 2014 20:48:46 +0000 (15:48 -0500)]
solving issue of thread check code on win c

Win C compiler follows C90 standard that
prohibits mixing declarations and code.
The code is rearranged to follow those rules.

Change-Id: Ica7c9aa5f4c23619eea6e2138162139ec3332a4d

7 years agoBug #547 charmrun: exit(1) if mixed localhost and non-localhost detected in a nodelist 12/412/5
Michael Robson [Wed, 22 Oct 2014 02:42:06 +0000 (21:42 -0500)]
Bug #547 charmrun: exit(1) if mixed localhost and non-localhost detected in a nodelist

If localhost is used with other hostnames execution of the code will
hang unless the only other hostname used is the current host. This is
because nodes will try to connect to themselves instead of the intended node
due to using the loopback IP discovered from localhost.

Change-Id: Ia3d533eec16489741d58f5712a23c4df4e282ae3

7 years agoAdding sanity check (e.g. stack) in CthSuspend 97/397/4
Ehsan Totoni [Sun, 12 Oct 2014 20:29:32 +0000 (15:29 -0500)]
Adding sanity check (e.g. stack) in CthSuspend

CthSuspend is called after user functions so it is a good place
for checking for data corruption and stack overflow.
This code checks the magic number for corruption,
and checks the current stack pointer to see if it is still
between start and end of the thread's stack.

Change-Id: I22a39436e979085ecb692bdee9e15c1eb14b68ef

7 years agochanges for the new PE on Blue Waters. 11/411/4
Gengbin Zheng [Tue, 21 Oct 2014 19:43:36 +0000 (14:43 -0500)]
changes for the new PE on Blue Waters.

Cray-pmi package does not seem to be configured correctly by pkg-config.
However, pmi issue from the last PE (PMI env vars were not taken by CC
wrapper) is fixed. So pretty much we can discard the fix in conv-mach.sh
for the last PE.

Change-Id: I55b8fa4f20e41cc09770b136940cbaf197229fb5

7 years agonetlrts - verbs: initializing numNodes and myNodeID in LrtsInit 92/392/7
Bilge Acun [Tue, 7 Oct 2014 23:06:56 +0000 (18:06 -0500)]
netlrts - verbs: initializing numNodes and myNodeID in LrtsInit

eliminate usage of _Cmi_numnodes and _Cmi_mynode, replace with machine specific variables

Change-Id: I10ae3e7552a9d770d62f182877c2570c0200f325

7 years agoAdd Testopts to bgtest targets 09/409/2
Nikhil Jain [Tue, 21 Oct 2014 18:53:07 +0000 (13:53 -0500)]
Add Testopts to bgtest targets

Change-Id: Ic396f6f16945680aa52d17968ddafba20e4963eb

7 years agoMove TESTOPTS after binary name, so charmrun doesn't get confused with RTS flags 07/407/1
Phil Miller [Fri, 17 Oct 2014 21:54:00 +0000 (16:54 -0500)]
Move TESTOPTS after binary name, so charmrun doesn't get confused with RTS flags

Change-Id: Id20dace44164d333995160ad7972cd248551fe29

7 years agoCleanup #570 removing nothing_doing example 02/402/2
Bilge Acun [Tue, 14 Oct 2014 22:32:42 +0000 (17:32 -0500)]
Cleanup #570 removing nothing_doing example

Change-Id: I6166a4824229906b0fe50606087dfbbb5c4ec417

7 years agoCleanup #562 removing Maj_Min example - outdated frequency scaling example 01/401/2
Bilge Acun [Tue, 14 Oct 2014 22:28:38 +0000 (17:28 -0500)]
Cleanup #562 removing Maj_Min example - outdated frequency scaling example

Change-Id: If5ae2bcad24aba86b3127dd09e1eb5d5d5c46e20

7 years agoStandardize make commands to use "$(MAKE) -C" 06/406/3
Michael Robson [Wed, 15 Oct 2014 21:50:25 +0000 (16:50 -0500)]
Standardize make commands to use "$(MAKE) -C"

Previously these commands were using a variety of make and $(MAKE).
$(MAKE) is preferred for recursive calls because it will use the version
of make used to invoke the top level call. This also passes the
MAKEFLAGS variable for sub-makes. See section 5.7.1 of the GNU make
manual for more info.

Another form of variation was the use of "cd DIR ; make", which is
will not correctly catch errors, or "cd DIR && make", which will correctly
catch errors. I have used the -C flag instead per the instructions in
section 5.7 of the GNU make manual on recursive use of make. This will
generate the equivalent of "cd DIR && make" which will fail if the cd
fails instead of not cd'ing and silently continuing.

Change-Id: I139231172c7422ac556a5a81f734f541173746d0

7 years agoChange "cd $$d; $(MAKE)" to "cd $$d && $(MAKE)" to catch errors in cd command 05/405/3
Michael Robson [Wed, 15 Oct 2014 21:41:48 +0000 (16:41 -0500)]
Change "cd $$d; $(MAKE)" to "cd $$d && $(MAKE)" to catch errors in cd command

Change-Id: I91fe67890eec6863c3209c2ad6c48209bd51ec1e

7 years agotests/charm++/communication_overhead: Respect $TESTOPTS 00/400/1
Phil Miller [Tue, 14 Oct 2014 19:43:55 +0000 (14:43 -0500)]
tests/charm++/communication_overhead: Respect $TESTOPTS

Change-Id: I4b55b0ff68e65b9f8fe08e81fa72934625631048

7 years agofix bgtest error in NQueen makefile 99/399/1
YanhuaSun [Tue, 14 Oct 2014 02:00:58 +0000 (21:00 -0500)]
fix bgtest error in NQueen makefile

Change-Id: I5e7e67e4fde59941de3ca443e68ad6e49a65480b

7 years agoCleanup SDAG output by always putting a space after the comma 96/396/1
Michael Robson [Thu, 9 Oct 2014 23:22:10 +0000 (18:22 -0500)]
Cleanup SDAG output by always putting a space after the comma

Change-Id: Iaa31a64df66a2c175e75c75e6fe1eeb9e9623e4c

7 years agocharmxi sdag: Fix cases where commas would be missing in generated parameter lists 95/395/1
Phil Miller [Thu, 9 Oct 2014 21:55:59 +0000 (16:55 -0500)]
charmxi sdag: Fix cases where commas would be missing in generated parameter lists

The cases in question involved constructions of the following form:

entry void foo(void) {
  when bar[refnum_bar](int r),
       baz[refnum_baz](int z) {
  }
};

Specifically, multiple entries associated with a single `when'
construct, each with a reference number expression, in a context with
no state to be provided from any enclosing scope.

Change-Id: I4d54998a1b2a720a2af395d0200550893da24adf

7 years agocharmxi: Always emit a definition of the ChareClass_SDAG_CODE macro 94/394/1
Phil Miller [Thu, 9 Oct 2014 19:37:48 +0000 (14:37 -0500)]
charmxi: Always emit a definition of the ChareClass_SDAG_CODE macro

Rob Van Der Wijngaart of Intel noted that the definition of this macro
being conditional on the presence of SDAG code in a given chare class's
declaration was a usability impediment. Specifically, removing some
unused SDAG code from the .ci file could cause compilation failures in
the C++ source files.

Change-Id: I5d60b2dee2b2d8f9ba67c48ec1a8283c7d75ec48

7 years agoremove queens and PMEMimic from example 93/393/3
YanhuaSun [Wed, 8 Oct 2014 03:22:48 +0000 (22:22 -0500)]
remove queens and PMEMimic from example

Change-Id: Ie1abae9ea4a06757d34586344a85dc2c899b7c74

7 years agoclang: remove archaic 'register' declarations to quiet warnings from system headers 78/378/4
Phil Miller [Mon, 29 Sep 2014 22:36:13 +0000 (17:36 -0500)]
clang: remove archaic 'register' declarations to quiet warnings from system headers

Change-Id: I093c0b8dbde118f79f3af2bbf9967febe0b4fbbf

7 years agocharmxi: Support more general expressions in the length of readonly array variables 87/387/3
Phil Miller [Mon, 6 Oct 2014 21:59:23 +0000 (16:59 -0500)]
charmxi: Support more general expressions in the length of readonly array variables

Change-Id: Ib932a05b43aa87d08c97d86cfa0e456aa31b565b

7 years agoTRAM: remove the need to add -module completion when linking application binary 91/391/2
Lukasz Wesolowski [Tue, 7 Oct 2014 20:40:10 +0000 (15:40 -0500)]
TRAM: remove the need to add -module completion when linking application binary

Change-Id: I490e8b2d2bfed3fbdb50d6b406ee5a17c7efcf82

7 years agoTRAM documentation: add description of how to link in the required module and registe... 89/389/4
Lukasz Wesolowski [Tue, 7 Oct 2014 19:38:07 +0000 (14:38 -0500)]
TRAM documentation: add description of how to link in the required module and register template instances

Change-Id: Icdbe2081ff2e01eaa3339066e7dcf02166b324e5

7 years agoFix a race condition in CkLoop 63/363/5
Harshitha [Tue, 2 Sep 2014 17:22:19 +0000 (12:22 -0500)]
Fix a race condition in CkLoop

Change-Id: I04ef27b939344e39fdb3ee9ce6252f82545d455d

7 years agoBug #445 Remove edges from commData when either the sender or the receiver is not... 90/390/2
Harshitha [Tue, 7 Oct 2014 20:20:10 +0000 (15:20 -0500)]
Bug #445 Remove edges from commData when either the sender or the receiver is not present/deleted

Change-Id: I396ab052f5747eccc1231953f8073b7f90c74e4e

7 years agocharmxi: Support multi-dimensional arrays for readonly variables 80/380/4
Phil Miller [Tue, 30 Sep 2014 19:29:12 +0000 (14:29 -0500)]
charmxi: Support multi-dimensional arrays for readonly variables

Change-Id: Ia43fc6e651674ed492e5e194618479fe17f9f996

7 years agoWhen the charm build is ChaNGa, set --enable-lbuserdata 85/385/2
Harshitha [Mon, 6 Oct 2014 17:38:45 +0000 (12:38 -0500)]
When the charm build is ChaNGa, set --enable-lbuserdata

Change-Id: Ic1c11d27609efd9467e0d9a0b12f3d6db8cefa94

7 years agoBug #552. Check source is not NULL before memcpy 72/372/4
Harshitha [Fri, 19 Sep 2014 02:27:11 +0000 (21:27 -0500)]
Bug #552. Check source is not NULL before memcpy

Change-Id: I3d100e876f3e16ab814252058f0cab1628e0ef8c

7 years agoAdd documentation on how to use your own load balancer without modifying the charm... 75/375/3
Harshitha [Tue, 23 Sep 2014 13:33:10 +0000 (08:33 -0500)]
Add documentation on how to use your own load balancer without modifying the charm source.

Change-Id: I013c06fd92400aa968ef4c15462ddb970e3db677

7 years agoFix conv-mach for correctly using user-defined MPI compilers 74/374/1
Nikhil Jain [Mon, 22 Sep 2014 20:47:34 +0000 (15:47 -0500)]
Fix conv-mach for correctly using user-defined MPI compilers

Change-Id: I410a2003d783808cf94634b729c5d3a2863b2f4b

7 years agoBug #552. Initialize the data part in the lbuserdata. 71/371/2
Harshitha [Thu, 18 Sep 2014 21:27:08 +0000 (16:27 -0500)]
Bug #552. Initialize the data part in the lbuserdata.

Change-Id: I3b9828ca58eb3579f3030e7eb21d16b24715748c

7 years agodisable recent optimizations on ckreductionfor syncFT, since it is broken. 39/339/3
Gengbin Zheng [Fri, 1 Aug 2014 23:30:23 +0000 (18:30 -0500)]
disable recent optimizations on ckreductionfor syncFT, since it is broken.

Change-Id: I7dec4b5b4333e309d62508cfa2cec7a30fb26139

7 years agoAdding a test measuring task(chare) spawning performance 76/276/4
Lukasz Wesolowski [Wed, 11 Jun 2014 23:03:15 +0000 (18:03 -0500)]
Adding a test measuring task(chare) spawning performance

Change-Id: Ia625cfa2d658ccec230337b2f474018c3a63c7b5

7 years agoUpdate wave2d example to use new init function so liveViz works with other reductions 26/326/4
Zhengqi Yang [Tue, 29 Jul 2014 17:10:44 +0000 (12:10 -0500)]
Update wave2d example to use new init function so liveViz works with other reductions

Change-Id: I8548ef3c64bc0a6eaeda4aabb211c721e2c5d8d5

7 years agoBug #501: Also consider CPUs that are temporarily powered down by the OS. 92/292/3
Jim Phillips [Wed, 18 Jun 2014 19:54:16 +0000 (14:54 -0500)]
Bug #501: Also consider CPUs that are temporarily powered down by the OS.

Change-Id: I69d00eb73ecbbc2eb9fec5bdc38a88b4a7946873

7 years agoAdd test for marshalled pingpong on 1d array 69/369/3
Eric Bohm [Tue, 9 Sep 2014 17:01:14 +0000 (12:01 -0500)]
Add test for marshalled pingpong on 1d array

Change-Id: I994419c202d0ee302c0f47393a98ce59c51301a3

7 years agoAdd conv-mach.{h,sh} for threadsanitizer 58/358/3
Ronak Buch [Wed, 27 Aug 2014 22:04:03 +0000 (17:04 -0500)]
Add conv-mach.{h,sh} for threadsanitizer

Change-Id: I9a298b7abc39e02603fd9cd84fba5b254d69bcb1

7 years agoAdding a benchmark to determine communication overhead for Charm++ group and array... 94/294/2
Lukasz Wesolowski [Fri, 28 Feb 2014 20:05:37 +0000 (14:05 -0600)]
Adding a benchmark to determine communication overhead for Charm++ group and array messages

Change-Id: I9550ce09bdc3c028d54efa1157e467c78e570c6d

7 years agoUpdate version to 6.6.0 65/365/2
Eric Bohm [Mon, 8 Sep 2014 17:09:22 +0000 (12:09 -0500)]
Update version to 6.6.0

Change-Id: I4bbfa91b5cc735eee5260856268f2ea013207a84

7 years agoModernize README 68/368/1
Ronak Buch [Tue, 9 Sep 2014 16:59:39 +0000 (11:59 -0500)]
Modernize README

Change-Id: I67ca822d7d9e0c838a32a4d0d04d147288254dcc

7 years agoCkIO: make Manager *manager a Ckpv variable for SMP safety. 64/364/1
Thomas Quinn [Thu, 4 Sep 2014 15:39:25 +0000 (10:39 -0500)]
CkIO: make Manager *manager a Ckpv variable for SMP safety.

Change-Id: I9496ca4778578d6706163897b0c251ae12dbde73

7 years agoCkIO: work around contribute/ckdestroy problem. 56/356/4
Thomas Quinn [Sun, 24 Aug 2014 22:57:56 +0000 (17:57 -0500)]
CkIO: work around contribute/ckdestroy problem.

Change-Id: Ia805cbc9a2c0d15bba719697255ae78a92b7119a

7 years agoUpdate installation manual to use gerrit url 60/360/3 v6.6.0
Michael Robson [Wed, 27 Aug 2014 23:19:40 +0000 (18:19 -0500)]
Update installation manual to use gerrit url

Change-Id: I65fbf156390d87df0fb8d38ad9ccfc6c22589172

7 years agoRelease notes for 6.6.0 59/359/1
Phil Miller [Wed, 27 Aug 2014 22:32:30 +0000 (17:32 -0500)]
Release notes for 6.6.0

Change-Id: Icc2bf71eca3f81f68491631c1d5bcc749cf86d08

7 years agobugfix: copy QLOGIC from net-linux-x86_64-ibverbs to verbs-linux-x86_64 18/318/3 v6.6.0-rc4
Eric John Bohm of group sohrab [Thu, 17 Jul 2014 22:38:38 +0000 (18:38 -0400)]
bugfix: copy QLOGIC from net-linux-x86_64-ibverbs to verbs-linux-x86_64

tested on Yale Omega cluster which is QLogic based

Change-Id: I51c9f2ab5856d99be8cb314326ecae4bae8c82da

7 years agosmart-build: Default to multicore on Mac OS (darwin) systems 50/350/4
Phil Miller [Fri, 8 Aug 2014 02:05:32 +0000 (21:05 -0500)]
smart-build: Default to multicore on Mac OS (darwin) systems

Change-Id: I2e7efc142f35e298c5737a80d59a0e2bce730bf4

7 years ago#544 smart-build: Make selection of multicore builds effective, as an explicit choice... 53/353/2
Phil Miller [Fri, 8 Aug 2014 20:45:00 +0000 (15:45 -0500)]
#544 smart-build: Make selection of multicore builds effective, as an explicit choice for single-node use

The previous logic for selecting multicore as a shared-memory option
didn't actually do anything. It also conflated its nature as a
single-node-only option with available optimizations of the
distributed-memory builds.

Change-Id: I1fcaaaef8c4fce95b6cd773e489bca59da6e7297

7 years agosmart-build: add 'use warnings;' to flag any bad future changes 52/352/3
Phil Miller [Fri, 8 Aug 2014 02:36:58 +0000 (21:36 -0500)]
smart-build: add 'use warnings;' to flag any bad future changes

Change-Id: Ib598f0c3fcca2bd24b053740e17284020c962879

7 years agoAdd another constructor for OrbLB which specifies whether to use the lb userdata 23/323/2
Harshitha [Sun, 27 Jul 2014 03:51:05 +0000 (22:51 -0500)]
Add another constructor for OrbLB which specifies whether to use the lb userdata
so that other load balancers that inherit from OrbLB can specify whether to use
the lb userdata or not.

Change-Id: Icc217d90c81a4b974a9f5d8e37fbf821fb5c4ee8

7 years agosmart-build: add 'use strict;' and remove undefined variable reference $smp as a... 51/351/2
Phil Miller [Fri, 8 Aug 2014 02:26:51 +0000 (21:26 -0500)]
smart-build: add 'use strict;' and remove undefined variable reference $smp as a result

Change-Id: Iab4c8ce7577cbb3c39b42e497ae75b77fad0ec7a

7 years agoReplacing unsigned ints with ints in cuda-hybrid-api.cu for consistency 44/344/4
Harshit Dokania [Tue, 5 Aug 2014 20:04:22 +0000 (15:04 -0500)]
Replacing unsigned ints with ints in cuda-hybrid-api.cu for consistency

Change-Id: I30515d3092b87d51803daf1644eea3c29a9f5251

7 years agoCth: Fix formatting of bulk documentation comment 14/314/5
Phil Miller [Wed, 9 Jul 2014 20:05:06 +0000 (15:05 -0500)]
Cth: Fix formatting of bulk documentation comment

Change-Id: If97f80a7feb5c42ad5816ac8c3754fc78dd3a32a

7 years agoCth: Avoid racy fetch-and-increment by using atomic operation (#535) 13/313/5
Phil Miller [Wed, 9 Jul 2014 20:04:04 +0000 (15:04 -0500)]
Cth: Avoid racy fetch-and-increment by using atomic operation (#535)

Change-Id: Id0339574ceb6f892a0ed17de36939debf9e34bcc

7 years agoCPU Topology: Protect all access to 'done' count with topoLock to avoid undefined... 11/311/5
Phil Miller [Wed, 9 Jul 2014 18:34:37 +0000 (13:34 -0500)]
CPU Topology: Protect all access to 'done' count with topoLock to avoid undefined races (#535)

Change-Id: I99eef67e061c688e5f07c25e60b7020645ccc595

7 years agoCPU Topology: Move racy assignment into rank == 0 block (#535) 10/310/5
Phil Miller [Wed, 9 Jul 2014 18:22:03 +0000 (13:22 -0500)]
CPU Topology: Move racy assignment into rank == 0 block (#535)

All of the assignments to cpuTopo.numPes will write the same value. However,
per the C and C++ language standards, there's no such thing as a 'harmless'
data race - any program with a race is undefined.

Change-Id: I33189417e97672daf8b787d05a0c594de4937863

7 years agoCpvInitialize: Avoid races detected by ThreadSanitizer (#535) 08/308/5
Phil Miller [Wed, 9 Jul 2014 00:44:56 +0000 (19:44 -0500)]
CpvInitialize: Avoid races detected by ThreadSanitizer (#535)

A minimal Charm++ program that called CkExit() in its mainchare constructor
would throw 548 warnings of detected data races from ThreadSanitizer. All but
40 of these were the result of CpvInitialize not actually synchronizing between
rank 0 and the other threads properly. Rather than trying to dance with memory
fences, it's simpler to get true thread safety by just locking around accesses
to variables touched by multiple threads. In the process, get away from all
threads spinning on rank 0 to do the initialization by having whichever thread
gets the lock first take care of it.

The races detected for a single CpvInitialize on netlrts-linux-x86_64-smp-clang
run standalone with 1 PE were as follows:

==================
WARNING: ThreadSanitizer: data race (pid=11242)
  Write of size 4 at 0x7f2813db8ac0 by main thread:
    #0 ConverseRunPE ./machine-common-core.c:1214 (exe+0x000000249354)
    #1 ConverseInit ./machine-common-core.c:1152 (exe+0x000000248c48)
    #2 main main.C:18 (exe+0x000000155782)

  Previous read of size 4 at 0x7f2813db8ac0 by thread T1:
    #0 ConverseRunPE ./machine-common-core.c:1214 (exe+0x000000249247)
    #1 call_startfn ./machine-smp.c:413 (exe+0x00000024fedb)

  Thread T1 (tid=11244, running) created by main thread at:
    #0 pthread_create ??:0 (exe+0x00000011e9c2)
    #1 CmiStartThreads ./machine-smp.c:506 (exe+0x0000002491c2)
    #2 ConverseInit ./machine-common-core.c:1150 (exe+0x000000248c3f)
    #3 main main.C:18 (exe+0x000000155782)

SUMMARY: ThreadSanitizer: data race ./machine-common-core.c:1214 ConverseRunPE
==================
==================
WARNING: ThreadSanitizer: data race (pid=11242)
  Read of size 8 at 0x7f2813dbfaa8 by thread T1:
    #0 ConverseRunPE ./machine-common-core.c:1214 (exe+0x000000249292)
    #1 call_startfn ./machine-smp.c:413 (exe+0x00000024fedb)

  Previous write of size 8 at 0x7f2813dbfaa8 by main thread:
    #0 ConverseRunPE ./machine-common-core.c:1214 (exe+0x000000249310)
    #1 ConverseInit ./machine-common-core.c:1152 (exe+0x000000248c48)
    #2 main main.C:18 (exe+0x000000155782)

  Thread T1 (tid=11244, running) created by main thread at:
    #0 pthread_create ??:0 (exe+0x00000011e9c2)
    #1 CmiStartThreads ./machine-smp.c:506 (exe+0x0000002491c2)
    #2 ConverseInit ./machine-common-core.c:1150 (exe+0x000000248c3f)
    #3 main main.C:18 (exe+0x000000155782)

SUMMARY: ThreadSanitizer: data race ./machine-common-core.c:1214 ConverseRunPE
==================
==================
WARNING: ThreadSanitizer: data race (pid=11242)
  Write of size 8 at 0x7d040000f798 by thread T1:
    #0 ConverseRunPE ./machine-common-core.c:1214 (exe+0x0000002492a1)
    #1 call_startfn ./machine-smp.c:413 (exe+0x00000024fedb)

  Previous write of size 8 at 0x7d040000f798 by main thread:
    #0 calloc ??:0 (exe+0x00000011b6d6)
    #1 ConverseRunPE ./machine-common-core.c:1214 (exe+0x0000002492fd)
    #2 ConverseInit ./machine-common-core.c:1152 (exe+0x000000248c48)
    #3 main main.C:18 (exe+0x000000155782)

  Location is heap block of size 16 at 0x7d040000f790 allocated by main thread:
    #0 calloc ??:0 (exe+0x00000011b6d6)
    #1 ConverseRunPE ./machine-common-core.c:1214 (exe+0x0000002492fd)
    #2 ConverseInit ./machine-common-core.c:1152 (exe+0x000000248c48)
    #3 main main.C:18 (exe+0x000000155782)

  Thread T1 (tid=11244, running) created by main thread at:
    #0 pthread_create ??:0 (exe+0x00000011e9c2)
    #1 CmiStartThreads ./machine-smp.c:506 (exe+0x0000002491c2)
    #2 ConverseInit ./machine-common-core.c:1150 (exe+0x000000248c3f)
    #3 main main.C:18 (exe+0x000000155782)

SUMMARY: ThreadSanitizer: data race ./machine-common-core.c:1214 ConverseRunPE
==================

Change-Id: I0b2470e42a7a74d532d9bbdcdc66ece858be960d

7 years agoDelete CVS keyword blocks 48/348/1
Phil Miller [Wed, 6 Aug 2014 20:10:33 +0000 (15:10 -0500)]
Delete CVS keyword blocks

Change-Id: I5ad979416f8a977484c6daccf3305a85b1742012

7 years agoLocMgr: Delete code that's been commented since its introduction 47/347/1
Phil Miller [Wed, 6 Aug 2014 20:10:06 +0000 (15:10 -0500)]
LocMgr: Delete code that's been commented since its introduction

Change-Id: Icab0d428952bfa838cb822dc797786774260d6eb

7 years agoCausal FT cleanup: add missing newline in print 46/346/1
Phil Miller [Wed, 6 Aug 2014 20:09:35 +0000 (15:09 -0500)]
Causal FT cleanup: add missing newline in print

Change-Id: I0bbf70653db38deab914b12374593dd9446ab764

7 years agoDefaultArrayMap: When PE count changes across ckpt/restart, don't recalculate bin... 34/334/4
Phil Miller [Thu, 31 Jul 2014 21:17:35 +0000 (16:17 -0500)]
DefaultArrayMap: When PE count changes across ckpt/restart, don't recalculate bin sizes for deleted arrays

Change-Id: I6a4a69f0db1d0c13188a810a4f269914259daf7b

7 years agoCkLocMgr: Cleanup 33/333/4
Phil Miller [Thu, 31 Jul 2014 18:52:55 +0000 (13:52 -0500)]
CkLocMgr: Cleanup

Change-Id: I705aca7703120c08e461afc03493663d9f1d1c36

7 years agoCkLocMgr: Maintain assigned array map across checkpoint/restart 32/332/5
Phil Miller [Thu, 31 Jul 2014 18:52:35 +0000 (13:52 -0500)]
CkLocMgr: Maintain assigned array map across checkpoint/restart

Change-Id: I77fbbc2b4eb92097033e716b48ab5315912905fb

7 years agoMoved enums from inside the macros, as they were being undefined 43/343/5
Harshit Dokania [Mon, 4 Aug 2014 23:16:22 +0000 (18:16 -0500)]
Moved enums from inside the macros, as they were being undefined

Change-Id: I20f558682417c74252ecefaf0c8a80cd7df06d11

7 years ago#433 charmc: Don't try to link RTS libraries in build/headnode-hosted binaries 42/342/1
Phil Miller [Mon, 4 Aug 2014 23:05:48 +0000 (18:05 -0500)]
#433 charmc: Don't try to link RTS libraries in build/headnode-hosted binaries

Change-Id: I003b71a9744ebb299d33710ab545a7803a28053b

7 years agoImprove grammar in comments 41/341/3
Ronak Buch [Mon, 4 Aug 2014 17:02:00 +0000 (12:02 -0500)]
Improve grammar in comments

Change-Id: I1c3e01e366253fc7647b542a2bd302870476ce3d

7 years agockpt: Don't try to count, pack, or signal the elements of a deleted CkLocMgr 31/331/2
Phil Miller [Thu, 31 Jul 2014 18:49:36 +0000 (13:49 -0500)]
ckpt: Don't try to count, pack, or signal the elements of a deleted CkLocMgr

Change-Id: If0bd37a43e07dd374e5495bcd4b4fbae70f1254d

7 years agoCkIO: Re-initialize manager pointer after restarting from checkpoint 14/114/3
Phil Miller [Wed, 26 Feb 2014 22:35:09 +0000 (16:35 -0600)]
CkIO: Re-initialize manager pointer after restarting from checkpoint

Change-Id: I03fb2d33886eabd9c613358cb05443c2d39f1262

7 years agoCray XC30 bugs #401, #486, #533: Prevent random crashes and hangs from receipt of... 24/324/5
Gengbin Zheng [Mon, 28 Jul 2014 16:01:46 +0000 (11:01 -0500)]
Cray XC30 bugs #401, #486, #533: Prevent random crashes and hangs from receipt of garbage SMSGs

Fixed a bug on Cray XC where charm randomly crashes with garbage SMSGs received.

According to Cray developer: "The (uGNI) interface is the same but the FMA implementation grew more
complex for XC due to the need for PCI deadlock avoidance.  Internally on Aries, SMSG will retransmit
messages due to deadlock avoidance logic in addition to transaction errors.  Could more frequent
retransmissions have exposed an existing Charm bug related to freeing/overwriting a message buffer
before an SMSG send has successfully completed (as indicated by a global completion event)?"

That is due to hardware change on Cray XC, it is not safe to delete a send message right after SmsgSend
returns GNI_RC_SUCCEED. One has to wait until a SMSG event is received from the completion queue.
The fix is wrapped under macro: "CMK_SMSGS_FREE_AFTER_EVENT" which is turned on for Cray XC. It definitely
has some performance penalties, so it remains off for Cray XE (since there is no report of same crash so far).
Another minor optimization for non-SMP build is to turn off the unnecessary SMP_LOCK for non-smp version.

Change-Id: Iefc9ee2dea9886b6c6731ad01f4690afd0f0fd07

7 years agofix a major bug in TLSglobals where a global variable with initializer does not get... 27/327/4
Gengbin Zheng [Tue, 29 Jul 2014 21:43:46 +0000 (16:43 -0500)]
fix a major bug in TLSglobals where a global variable with initializer does not get the correct value in each thread.

This bug was first found in Cactus with AMPI, where the initialized TLS global
function pointer variables get garbage value, but it can be easily reproduced
by printing the value of any __thread variable with initial values in threads.

Change-Id: Ifa5c28f466f1ed74552800adf46645d0ea5f5abf

7 years agoMake CkLoop suitable for checkpoint restart by making it migratable and adding 21/321/4
Harshitha [Fri, 25 Jul 2014 18:59:53 +0000 (13:59 -0500)]
Make CkLoop suitable for checkpoint restart by making it migratable and adding
pup routines.

Change-Id: Iecce0defeed827c0cbf3f5117fa6e90c220f1083

7 years agoAdd documentation for checkpointing single chares. 25/325/1
Harshitha [Mon, 28 Jul 2014 20:11:47 +0000 (15:11 -0500)]
Add documentation for checkpointing single chares.

Change-Id: I248e27f8e3d4d43d341e4adb099b50f0fc9fe9c0

7 years agoBug #531: Fix and optimize stripe calculation in CkIO 20/320/3
Ronak Buch [Thu, 24 Jul 2014 22:09:58 +0000 (17:09 -0500)]
Bug #531: Fix and optimize stripe calculation in CkIO

Change-Id: I371e3c9548c9a3ca16a2ee99c7e4c01c2cf16ed1

7 years agoBug #528: NAMD startup phase 0 hangs for smp on stampede 06/306/3
Jim Phillips [Sun, 6 Jul 2014 14:25:06 +0000 (09:25 -0500)]
Bug #528: NAMD startup phase 0 hangs for smp on stampede
Modify fix for "Bug #485: Avoid race condition in printing warning about threads oversubscribing cores" to remove CmiNodeAllBarrier().

Change-Id: I14bb12257929bb80a9fa31f3f2587f6c0f41a75b

7 years agoRefactor GPU Manager to remove code duplication in gpuProgressFn 17/317/6
Harshit Dokania [Thu, 17 Jul 2014 20:07:20 +0000 (15:07 -0500)]
Refactor GPU Manager to remove code duplication in gpuProgressFn

Change-Id: I9a4f57586d625b8c1d63f8b4dc2cafa2551e6616

7 years agoAdded code for cublas Matrix Multiplication Kernel 16/316/4
Harshit Dokania [Sat, 12 Jul 2014 08:14:28 +0000 (03:14 -0500)]
Added code for cublas Matrix Multiplication Kernel

Change-Id: I863f29e8d011f517abf0c5a0d0c386dc37243e39

7 years agoUpdate API description in TRAM manual 05/305/3
Lukasz Wesolowski [Thu, 3 Jul 2014 23:21:40 +0000 (18:21 -0500)]
Update API description in TRAM manual

Change-Id: I9e484b71580be9b741baf4b64bb888585320fb56

7 years agoIssue #511: Automatically determine location of CUDA toolkit when building GPU Manager 03/303/7
Harshit Dokania [Tue, 1 Jul 2014 21:01:43 +0000 (16:01 -0500)]
Issue #511: Automatically determine location of CUDA toolkit when building GPU Manager

Change-Id: I7d3c4336fed5a9469ea05bac56d5a34d29084423

7 years agoCmiOpen: fix build break on Windows from lack of mode_t typedef 02/302/1
Phil Miller [Tue, 1 Jul 2014 17:32:41 +0000 (12:32 -0500)]
CmiOpen: fix build break on Windows from lack of mode_t typedef

Change-Id: I54c5fea639bad6ddb247bd64daa7ee71fca59085

7 years agogni: Add '+checksum' command line argument to enable message checksum verification 99/299/2
Phil Miller [Tue, 24 Jun 2014 20:38:35 +0000 (15:38 -0500)]
gni: Add '+checksum' command line argument to enable message checksum verification

Change-Id: I92ae6585df16a656ad73541812c2dcadcd008c93

7 years agogni bug #486: compute checksum should be only for useful message, instead of full... 98/298/4
YanhuaSun [Tue, 24 Jun 2014 11:25:52 +0000 (06:25 -0500)]
gni bug #486: compute checksum should be only for useful message, instead of full buffer

Change-Id: Ie33395973e7a7f1b2618acef20b596594508361f

7 years agoConverse file IO wrappers: write warning to (possibly redirected) stderr when retryin... 96/296/1
Phil Miller [Mon, 23 Jun 2014 15:57:11 +0000 (10:57 -0500)]
Converse file IO wrappers: write warning to (possibly redirected) stderr when retrying due to EINTR

Change-Id: I8c3b95ce284f5d2311cc01f3d31b107e7411b0fb

7 years agockio #446: Retry open calls that result in EINTR 95/295/2
Phil Miller [Thu, 19 Jun 2014 18:40:06 +0000 (13:40 -0500)]
ckio #446: Retry open calls that result in EINTR

On Lustre, it's not entirely remarkable for open() to fail with EINTR. When it
does so, this should not be a fatal error. Make the code retry in that case, as
it does during writes and syncs.

Change-Id: Ifdf16d0e258cdd71e8f8934094785fc5494b86ed

7 years agoBug #519: Properly check for and report errors from pthread_setaffinity_np and pthrea... 89/289/2
Jim Phillips [Tue, 17 Jun 2014 18:22:25 +0000 (13:22 -0500)]
Bug #519: Properly check for and report errors from pthread_setaffinity_np and pthread_getaffinity_np

These functions return their error code directly, rather than setting errno.

Change-Id: I208c46ec03274fdcc85a690bc560380e27e45389

7 years agoFixing cuda/overlapTestGPUManager example to work with arbitrary matrix sizes 90/290/6
Harshit Dokania [Tue, 17 Jun 2014 23:03:50 +0000 (18:03 -0500)]
Fixing cuda/overlapTestGPUManager example to work with arbitrary matrix sizes

Change-Id: I6fac60378b985489326c28448de19ef539532c43

7 years agoBug #500: Reduce pinned memory and number of pinned regions in ibverbs layers. 79/279/3
Thomas R. Quinn [Thu, 12 Jun 2014 21:33:10 +0000 (14:33 -0700)]
Bug #500: Reduce pinned memory and number of pinned regions in ibverbs layers.

NUMPOOLS is reduced to eliminate pools of buffers greater than 1 MB.
Blocking of pools of smaller buffers is increased.

Change-Id: I068528e17bd9e0286b5306f1b78c2c560f5f8a28

7 years agoBug #485: Avoid race condition in printing warning about threads oversubscribing... 78/278/1
Jim Phillips [Thu, 12 Jun 2014 21:41:01 +0000 (16:41 -0500)]
Bug #485: Avoid race condition in printing warning about threads oversubscribing cores

Change-Id: I7665d6ca756d6c8cbdca426c10e747e62ded9b9d

7 years agomanuals: fix errors in projections (--with, refereed, binary location) 75/275/2
Michael Robson [Wed, 11 Jun 2014 20:55:44 +0000 (15:55 -0500)]
manuals: fix errors in projections (--with, refereed, binary location)

Change-Id: I2b205d83bb5ad612b846ec893c7ff17e0e78ced7

7 years agonet/machine.c: Fix build break on MS VC++ from mixing declarations and statements 71/271/2
Phil Miller [Sat, 7 Jun 2014 21:50:28 +0000 (16:50 -0500)]
net/machine.c: Fix build break on MS VC++ from mixing declarations and statements

Change-Id: I60bac71340002cdfc4a668dcc0d0fd75d43dc00f

7 years agoDoc: Correct location of Projections visualization tool 74/274/2
Ronak Buch [Wed, 11 Jun 2014 06:59:20 +0000 (01:59 -0500)]
Doc: Correct location of Projections visualization tool

Change-Id: Icc0bf41f83bb581cc62a5d356d9cb86bac9d54cd

7 years agoFix path to projections binary in README.bigsim_quick 72/272/3
Ehsan Totoni [Tue, 10 Jun 2014 19:34:53 +0000 (14:34 -0500)]
Fix path to projections binary in README.bigsim_quick

Change-Id: I8571b587eb0499bbd773e17d8feaf6f6ff84efb1