A new `subprocess` package for R

Here’s a new package that brings to R new API to handle child processes – similar to how Python handles them.

Unlike the already available system() and system2() calls from the base package or the mclapply() function from the parallel package, this new API is aimed at handling long-lived child processes that can be controlled by the parent R process in a programmatic way. Here’s an example:

handle <- spawn_process("/usr/bin/sshpass",
                        c("ssh", "-T", "user@domain.com"))
process_write(handle, "password")
process_write(handle, "ls\n")
process_read(handle, "stdout")
#> "bin"   "public_html"   "www-backup"

This of course can be done with system("ssh", c("user@domain.com", "ls")) as well (at least as long as password-less ssh connectivity is enabled). However, if there is a need to make a number of subsequent calls in response to user’s input, keeping a single connection open can save some time. Otherwise you need to wait for ssh to establish a new connection each time a new command is to be executed.

Perhaps a bit more silly example is working with a local (or remote, for that matter) Spark session. Imagine there is no package dedicated to Spark (which might well be the case with the next new thing that you find under your Christmas tree this year). The simplest approach could be to open Spark console and keep it alive while sending commands on its standard input and parsing the text output. However naive, this approach can save some prototyping time.

handle <- spawn_process("/usr/bin/spark-shell")
process_write(handle, 'val textFile = sc.textFile("README.md")\n')
process_write(handle, 'textFile.count()\n')
[1] "textFile: org.apache.spark.rdd.RDD[String] = README.md MapPart
itionsRDD[1] at textFile at <console>:25"
[2] "res0: Long = 126"

The new subprocess package is available from my GitHub account and CRAN. All functions can be run in both Linux and Windows and the few OS-specific details (like signals) are described in respective manual pages. There is also an introductory vignette.

I should also say that this package has been designed with Python’s subprocess module in mind, which (both package and language) I greatly admire. Its R equivalent is now in version 0.7.4 which is there to indicate that it’s a perfect equivalent. More (simultaneous wait on stdout and stderr) are still to come.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s