(A Very) Experimental Threading in R

I’ve been trying to find a way to introduce threads to R. I guess there can be many reasons to do that, among which I could mention simplified input/output logic, sending tasks to the background (e.g. building a model asynchronously), running computation-intensive tasks in parallel (e.g. parallel, chunk-wise var() on a large vector). Finally, it’s just a neat problem to look at ūüėČ I’m trying to follow approach similar to Python’s global interpreter lock.

So far it seems that:

  • one can re-enter the interpreter with R_tryEval which internally calls¬†R_ToplevelExec, which in turn¬†intercepts all long jumps (e.g. errors)
  • there are a few basic checks to verify whether the stack is in a good shape, e.g.¬†R_CStackStart which checks stack frames and R_PPStackTop which checks objects under PROTECTion

I think that one can run multiple threads in R and maintain a separate interpreter “instance” in each of them. R interpreter uses stack for its bookkeeping and each thread has its own stack. It also counts objects excluded from garbage collection with PROTECT. Thus, when coming back to a given R interpreter “instance” (after thread-level context switch), one needs to pay attention to re-set¬†R_PPStackTop to whatever that thread¬†was left with.

I have put these ideas together in the form of a R package thread (GitHub). This is what it can do:

  • start a new thread and execute a R function in its own interpreter
  • switch between threads on specific function calls, e.g. thread_join(), thread_print(), thread_sleep()
  • finish thread execution
  • keep track of R_PPStackTop
  • avoid SIGSEGV-faulting the R process ūüėČ

Here’s an example where two functions are run in parallel R threads (it’s also available via thread::run_r_printing_example()):

library(thread)

thread_runner <- function (data) {
  thread_print(paste("thread", data, "starting\n"))
  for (i in 1:10) {
    timeout <- as.integer(abs(rnorm(1, 500, 1000)))
    thread_print(paste("thread", data, "iteration", i,
                       "sleeping for", timeout, "\n"))
    thread_sleep(timeout)
  }
  thread_print(paste("thread", data, "exiting\n"))
}
 
message("starting the first thread")
thread1 <- new_thread(thread_runner, 1)
print(ls(threads))
 
message("starting the second thread")
thread2 <- new_thread(thread_runner, 2)
print(ls(threads))
 
message("going to join() both threads")
thread_join(thread1)
thread_join(thread2)

And here’s the output from my Ubuntu 16.10 x64:

starting the first thread
[1] "thread_140737231587072"
starting the second thread
[1] "thread_140737223194368" "thread_140737231587072"
going to join() both threads
thread 1 starting
thread 1 iteration 1 sleeping for 144 
thread 2 starting
thread 2 iteration 1 sleeping for 587 
thread 1 iteration 2 sleeping for 761 
thread 2 iteration 2 sleeping for 1327 
thread 1 iteration 3 sleeping for 360 
thread 1 iteration 4 sleeping for 1802 
thread 2 iteration 3 sleeping for 704 
thread 2 iteration 4 sleeping for 463 
thread 1 iteration 5 sleeping for 368 
thread 2 iteration 5 sleeping for 977 
thread 1 iteration 6 sleeping for 261 
thread 1 iteration 7 sleeping for 323 
thread 1 iteration 8 sleeping for 571 
thread 2 iteration 6 sleeping for 509 
thread 2 iteration 7 sleeping for 2521 
thread 1 iteration 9 sleeping for 298 
thread 1 iteration 10 sleeping for 394 
thread 1 exiting
thread 2 iteration 8 sleeping for 966 
thread 2 iteration 9 sleeping for 533 
thread 2 iteration 10 sleeping for 1795 
thread 2 exiting

How far is this from a real thread support in R? Well, there are three major challenges before this is really useful:

  • Context switch happens only when a function from this package is called explicitly
  • Memory allocation needs to be synchronized
  • Error handling runs into¬†R_run_onexits which in turn throws a very nasty error message – this suggests I haven’t covered all features of the interpreter related to switching stacks

Issues #1 and #2 are related: one cannot leave R (release R interpreter lock) and enter an arbitrary C function because it is legal to call allocVector() from any C/C++ code. This in turn needs to happen synchronously Рonly one thread can execute allocVector() (or more specifically, allocVector3()) at any given time. I think that the best way to address it would be to patch R (main/memory.c) and introduce a pointer to allocVector3 similar to ptr_R_WriteConsole). Then the thread package would inject a decorator for allocVector3 with additional synchronization logic.

Issue #3 is not clear to me yet. But it also suggests more attention is needed to the specifics of R code execution.

I’ll be grateful for comments and suggestions. I think R could benefit from native thread support, if only to simplify program logic – but maybe also to run parts of computation-intensive code in lightweight parallel manner.

Advertisements
Posted in R

7 thoughts on “(A Very) Experimental Threading in R

  1. True multi-threading, in which the threads execute R code in parallel, seems way too difficult, without totally rewriting both the interpreter and numerous packages with C code. To cite just one difficulty that occurs to me, how will the garbage collector know what objects are in use? C code currently needs to use PROTECT only when there is the possibility of an allocation. In bits of code that don’t do allocation, pointers to objects can be kept in local variables, which may live in processor registers, where it seems difficult for the garbage collector to find them. And which thread is running the garbage collector, anyway?

    If you want to use threads just to implement coroutines, with only one executing at a time, that might be more feasible, though I would expect problems, possibly intractable, with that as well.

    In pqR’s version of the R interpreter (see pqR-project.org), multiple threads execute in parallel, but all but the “master” thread do only numerical computations, using storage that has already been allocated. This avoids the intractable issues.

    Like

    1. Hi Radford,

      I know of pqR and actually it’s on my list to learn more about your patch – so I could even say I feel honored to see a comment from you. The only thing that stopped me was its size which I expect to be considerable, while I wanted to get my hands dirty first to get a feeling of the problem.

      You confirm my intuitions, too. I thought enabling true parallelism in R would be too much of a task, that’s why I mention Python’s global interpreter lock as an inspiration. Initially I thought I would be possible to leave R (and enter C) and provide secure threading there. Now it seems it’s a variation of the same problem as interpreter needs to be exclusively present in one thread only because of memory allocation and garbage collection. One way I am still going to investigate is identifying the set of memory-related APIs that need to be synchronized and making them secured with an exclusive mutex. That way only one thread could change the shared resource – interpreter’s manager heap.

      It still means that old C code cannot use this feature because allocation and protection from garbage collection are not atomic (which is a big problem). There could be a new call, e.g. allocVector4 with a flag to return a protected vector but only new code would know about it.

      Another way could be to provide synchronized and safe APIs through a package and allow only for R-level threads and C-level protected allocation in new code that accesses R via this new library. I would mean that all standard calls (even entering C) would take over synchronized resource (interpreter and managed heap). Only threads that explicitly promise not to call core R API directly would be allowed to run in parallel. Still, allocation via allocVector4 would take over the shared resource so core R needs to be patched to respect that. (Although one could simply copy garbage collector from R and maintain a separate, synchronized heap just for threads. Objects could be accessed from the main thread and vice versa but would never be deallocated by the main garbage collector.)

      A completely insane idea if R cannot be patched would be to replace entries in process’ symbol table. That’s the last resort, however, as it’s not portable and complex.

      Finally there is the ability to allocate all the memory ahead of time and then go to C with no need to access the managed heap. Current core R does not do this (I checked cumsum) – even if low-level function runs on pre-allocated memory, it’s hidden behind a public API that will allocate. So again, the answer is either to patch core R or to provide a C library with all required APIs (plus minimal patch to core R, which seems unavoidable).

      I know R Core team doesn’t want threads and maybe they’re right. R is a tool for statisticians and it needs sounds statistical toolset rather than general programming facilities. But who knows where R could go with approachable threading API. And it’s so much fun to even try it ūüėČ

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s