Project

General

Profile

Cleanup #539

Cleanup #535: Errors reported by ThreadSanitizer

Data race in ConverseExit

Added by Phil Miller almost 5 years ago. Updated about 1 year ago.

Status:
In Progress
Priority:
Low
Assignee:
-
Category:
SMP
Target version:
Start date:
07/28/2014
Due date:
% Done:

0%


Description

SUMMARY: ThreadSanitizer: data race ./machine-common-core.c:1350 ConverseExit // comm thread reads without lock

History

#1 Updated by Nikhil Jain over 4 years ago

This is explicitly protected by commThdExitLock. Not sure what the race condition is.

#2 Updated by Phil Miller over 4 years ago

Any circumstance in which a variable is read in one thread and written in another, in which at least one of them is not synchronized, is a race condition according to the language standards. Nikhil was concerned that locking on this variable in the comm thread risks hurting performance severely. Phil expects that the performance impact will actually be negligible, because the lock should remain uncontended in the cache of the comm thread's cores until the runtime exits. If benchmarks are happy with locking, then we should just do that.

#3 Updated by Phil Miller over 4 years ago

The solution we came up with in the last core meeting was to have the comm thread allocate and send a 'exiting now' message to each of its worker threads, and have the handler for that message set each thread's local flag to 'exiting'. This eliminates the shared variables entirely.

This can probably also be deferred past 6.6.1.

#4 Updated by Nikhil Jain over 4 years ago

I should have paid more thought before agreeing that this solution will work. We have reversed the cause and the effect: the worker threads exit first and when all of them have exited, the comm thread exits (by checking the counter).

#5 Updated by Phil Miller over 4 years ago

  • Target version changed from 6.6.1 to 6.7.0

#6 Updated by Nikhil Jain over 4 years ago

  • Priority changed from Normal to Low

#7 Updated by Nikhil Jain over 3 years ago

  • Target version changed from 6.7.0 to 6.7.1

#8 Updated by Sam White over 3 years ago

  • Target version changed from 6.7.1 to 6.8.0

#9 Updated by Sam White over 2 years ago

  • Assignee deleted (Nikhil Jain)
  • Category set to SMP

#10 Updated by Eric Bohm over 2 years ago

  • Assignee set to Seonmyeong Bak

#11 Updated by Phil Miller about 2 years ago

  • Target version changed from 6.8.0 to 6.8.1

#12 Updated by Eric Bohm almost 2 years ago

  • Target version changed from 6.8.1 to 6.9.0

#13 Updated by Seonmyeong Bak over 1 year ago

  • Target version changed from 6.9.0 to 6.9.1

#14 Updated by Sam White over 1 year ago

  • Assignee deleted (Seonmyeong Bak)

#15 Updated by Eric Bohm over 1 year ago

  • Target version changed from 6.9.1 to Unscheduled

#16 Updated by Sam White about 1 year ago

  • Status changed from New to In Progress

Converted the volatile int + CmiNodeLock to a std::atomic<int>, but this may well hurt performance. So we need to benchmark it: https://charm.cs.illinois.edu/gerrit/#/c/charm/+/4103/

Also available in: Atom PDF