Project

General

Profile

Bug #1220

AMPI: Support tlsglobals with dynamically linked objects

Added by Phil Miller about 2 years ago. Updated 7 months ago.

Status:
Merged
Priority:
High
Assignee:
Category:
AMPI
Target version:
Start date:
09/22/2016
Due date:
% Done:

0%


Description

Currently, the TLS-based privatization of global variables won't work in a program or library compiled as a dynamic object. My understanding is that this results from not all of the TLS segment definitions appearing in the main binary's ELF headers as they do when linking statically. As long as new objects aren't being loaded with dlopen(), there shouldn't be an insurmountable hurdle to fixing this, it just takes reading in data from all the right tables and sections for every object that makes up the overall process image.

History

#1 Updated by Phil Miller about 2 years ago

Notes from a conversation with a friend who works heavily on TLS implementation issues:

Me
So, I'm going to have to deal with TLS allocation for variables in dynamically loaded libraries. Can I pick your brain at some point?

Andrew Hunter
Yeah

Me
You had to deal with the dlopen style case, not just -exec time loading, right?
I assume dealing with things loaded during exec is much easier

Andrew Hunter
yeah

Me
Though are there complications there, too?

Andrew Hunter
not a ton
the case where we just have dynamic dependencies not sure what the perfect word is but you know, stufd that shows up in ldd is very, very similar to the one-binary case

Me
Right, dynamic library references

Andrew Hunter
dlopen has two complexities

Me
Do you know of anything weird in how the compiler generates references to TLS variables depending on any of this?

Andrew Hunter
a) glibc does a sort of versioning thing on the control struct to add stuff to it without breaking things
b) initial_exec is weird. so there are 1000 ways to generate a tls reference and the compiler(s) are not consistent at all here
you need to read this https://www.akkadia.org/drepper/tls.pdf
and someone proposed a new api that's sort of supported and sometimes generated but i can't find a doc for(also, the "TLS" overload makes searching hard)
do you have a particular problem that's showing up?

Me
At the moment, we can consistently and successfully do some dirty hacks on static linked binaries to instantiate TLS segments for each user level thread we create, and activate the right one on context switch

Andrew Hunter
i hate you, but go on

Me
Our previous configuration of the build hard coded that we would static link when doing this. With dynamic linked code, we had no documented knowledge of what would go wrong

Andrew Hunter
so
a few things
first, you're more likely to get the compiler inserting calls to tls_get_addr()instead of redirections that ld fills in (note that it's allowed to do this in static mode too, it just rarely does)

Me
Testing today showed that at least one issue is that the normal bits in the elf header we looked at didn't show symbols for TLS variables unless they were declared with extern somewhere

Andrew Hunter
so the first thing you have to do is make sure tls_get_addr works

Me
Tls_get_addr works, afaik

Andrew Hunter
okay
the second thing is for dlopen in that case glibc iterates thread control structs and updates some important values (read by tls_get_addr)

Me
We're allocating an entire new segment for each thread, and setting the segment register used for that on context switch

Andrew Hunter
also, initializes state
so there you're in trouble
dlopen can be hard

Me
I'm willing to forsake dlopen and document that as a limitation for now, I think
The big thing is just to get dynamically linked binaries and their libraries working

Andrew Hunter
hmm
i don't know what bullshit you're doing to set up the segment but that should probably just work if it works with a static
essentially the same stuff gets set up, there's just an array per dso

Me
OK, so we may just need to hack the code up a little too look at each DSO
That kinda squares with what my juniors have noted
If you have any pointers to material that would help shortcut writing that code, it would be helpful. Otherwise, I'm sure we can figure it out

Andrew Hunter
not off the top of my head
i don't have snippets that do this because we don't cheat
we just use glibc
but if you look at their implementation of tls_get_addr
and the init code
you'll see what's going on

#2 Updated by Sam White over 1 year ago

  • Target version set to 6.8.1

#3 Updated by Sam White about 1 year ago

In charm/src/util/cmitls.c, the routine getTLSPhdrEntry() iterates over all the entries in ELF program header and checks for the PT_TLS type to find the address of the TLS segment. Since statically compiled libraries are in the same executable, all thread_local variables in those libraries show up here. But for shared objects, I think we need to open up their ELF headers and iterate through their symbol tables as well.

#4 Updated by Sam White about 1 year ago

  • Target version changed from 6.8.1 to 6.9.0
  • Assignee changed from Sam White to Evan Ramos

#5 Updated by Evan Ramos 9 months ago

Is this feature blocking 6.9.0 or should we reschedule it for a future version?

#6 Updated by Sam White 9 months ago

  • Target version deleted (6.9.0)

It's not blocking anything, so I'll move it off 6.9.0, but it is important.

#7 Updated by Evan Ramos 9 months ago

  • Priority changed from Normal to High

#8 Updated by Evan Ramos 8 months ago

I have investigated this issue in depth and have come to some conclusions. In summary, tlsglobals works by swapping the pointer to the TLS implementation's top-most data structure for each virtual process. This structure contains the storage for the thread_local variables of at least thread #0. Under certain circumstances, the compiler can generate code that directly accesses this block without traversing the rest of the data structure to look up the full location description, and we exploit this to change as little as possible in the TLS implementation's state while achieving the desired result. The work necessary to make the feature functional with shared objects is also what would be needed for it to work with compilers other than gcc, without the -mno-tls-direct-seg-refs flag, and/or in SMP mode: more robustly rewriting the TLS implementation's data structures. Unfortunately, these structures are implementation-defined, meaning we would be tracking glibc, a moving target, and our code would be subject to the same breakages over time as swapglobals.

#9 Updated by Sam White 8 months ago

I think it's probably worth seeing how much the data structures in question have actually changed over time. For instance, if the same patch will work for all versions that we see in use, then it's probably worth doing.

Also, what is the issue with -tlsglobals in SMP mode? I don't know of any limitation there, and it's one of the advantages we list of -tlsglobals over -swapglobals.

#10 Updated by Evan Ramos 8 months ago

Sam White wrote:

I think it's probably worth seeing how much the data structures in question have actually changed over time. For instance, if the same patch will work for all versions that we see in use, then it's probably worth doing.

Okay, I can look into glibc's implementation further.

Also, what is the issue with -tlsglobals in SMP mode? I don't know of any limitation there, and it's one of the advantages we list of -tlsglobals over -swapglobals.

I may have confused it with swapglobals.

#11 Updated by Sam White 8 months ago

Yeah -swapglobals doesn't work in SMP mode

#12 Updated by Evan Ramos 8 months ago

  • Status changed from New to In Progress

#13 Updated by Evan Ramos 7 months ago

  • Status changed from In Progress to Implemented

#14 Updated by Sam White 7 months ago

  • Status changed from Implemented to Merged
  • Target version set to 6.9.0

Also available in: Atom PDF