C interface for R

2023.03.27

Basics
Compiling
Foreign Function Interface for C
References

Basics

if we want to inerface C code with R, the functions in C need to have the following properties:

C functions called by R must all return void which means they need to return the results of the computation in their arguments
All args passed to the C function are passed by reference. Which means we pass a pointer to a number or array. Be careful to correctly dereference the pointers.
Each file containing C code to be called by R should include the R.h header file. If using special functions (e.g. distribution functions) then

include the proper R header (Rmath.h).

R has two C API’s. The old one in Rdefines.h, and the full one on Rinternals.h. To get the includes dir from R:

R.home("include")

All functions are defined with prefixes Rf_ or R_ but are exported without. This can be changed with #define R_NO_REMAP. -

Compiling

R is used for compiliation rather than the C compiler directly. R handles the linking to headers and libs. Given a C file called foo.c, use

R CMD SHLIB foo.c

to compile. This outputs foo.so. Alternatively you can specify the output name with the -o flag just like w/ GCC or Clang. *.so is a traditional UNIX name

To load compiled C code in R, use the dyn.load func. E.g. for the above example

dyn.load("foo.so")

All the functions defined in foo.c will be available for R to call

Foreign Function Interface for C

The .C func provides an interface to compiled code which has been linked into R, either at build time or with dyn.load. The interface can be used with any other language which can generate C interfaces.
The first arg of .C is a char string specifying the name of the func as known to C. Check that the symbol has been loaded with is.loaded(...).

Type mappings (. C interface)

On all R platforms int and INTEGER are are 32-bit.

Hello World

hello.c :

#include <R.h>

void hello(int *n)
{
    int i;
    for(i = 0; i < n; i++) {
        Rprint("Hello, world!\n");
    }
}

hello.R:

hello_r <- function(n) {
    .C("hello", as.integer(n))
}

.C, .Call, and .External

.C vs .Call

There are two different options to integrate C code with R: .C or .Call. The traditional method is .C while .Call is more modern and allows different usage.

.C allows you to pass some vectors from R to C, do some operations on them and then pass them back. However there is no way to get the length of the vector so the length will have to be passed. If you pass a length larger than the actual length, R will crash
.Call allows you to get lengths of vectors - .C does not allow you to pass an object that you create in C back to R, while .Call allows the creation and passing of objects back to R
Since .C does not share object, it copies objects into C while .Call protects them making .Call faster.

.Call vs .External

Essentially identical on the R side, however on the C side .Call passes a fixed number of args, whereas .External passes an argument list of any length (LISTSXP). Differences between interfaces on R side

.C("convolve",
    as.double(a),
    as.integer(length(a)),
    as.double(b),
    as.integer(length(b)),
    ab = double(length(a) + length(b) - 1))$ab

.Call("convolve2", a, b)
.External("convolveE", a, b)

Here is an example add function using .Call

add.c:

// In  C ----------------------------------------
#include <R.h>
#include <Rinternals.h>

SEXP add(SEXP a, SEXP b) {
    SEXP result = PROTECT(allocVector(REALSXP, 1));
    REAL(result)[0] = asReal(a) + asReal(b);
    UNPROTECT(1);
    return result;
}

add.R:

# In  R ----------------------------------------
add <- function(a, b) {
    .Call("add", a, b)
}

Data structures

All R objects are stored in a S-expression datatype (SEXP). Since all R objects are S-expressions, every C function must return and take SEXP types as inputs. The SEXP type is techically a pointer to a structure with typedef SEXPREC. The full acryonym expanded is Symbolic EXPression RECord/Pointer.

There are a 22 subtypes (some are esoteric and rarely used)

Vectors: LGLSXP, INTSXP, REALSXP, CPLXSXP, STRSXP, VECSXP, EXPRSXP
List-likes: LISTSXP, LANGSXP
Symbols & strings: SYMSXP, CHARSXP

SEXP internals

A header struct + a union construct
A major special case is made for the VECTOR_SEXPREC which uses a slightly shorter structure immediatly followed by data.
Other subtypes are generally a header plus a 3-pointer structure (CAR/CDR/TAG for lists, formals/body/env for functions, etc)

Accessing and creating vectors

From RTcl_ObjAsDoubleVector:

ans = allocVector(REALSXP, count);
for (i = 0; i < count; i++) {
    ret = Tcl_GetDoubleFromObj(RTcl_interp, elem[i], &x);
    if (ret != TCL_OK) x = NA_REAL;
    REAL(ans)[i] = x;
}

REAL(ans) gives a pointer to the base of an array which can be indexed as usual.
NA_REAL to encode missing values
Allocation performed with allocVector

Character vectors example

PROTECT(ans = allocVector(STRSXP, count));
for (i = 0; i < count; i++) {
   SET_STRING_ELT(ans, i
   mkChar(Tcl_GetStringFromObj(elem[i], NULL))
  );
  UNPROTECT(1);
}

Need to use mkChar to generate CHARSXP object
Use SET_STRING_ELT to change element of vector (write barrier)
Need to PROTECT

List-like structures

Internally R is based on Scheme, a Lisp variant.
Lists in R are really VECSXP objects (a generic vector), but internally we have LISTSXP which are similar to LISP lists. These are almost invisible at the R level.
LANGSXP objects are structurally similar to LISTSXP, EXPRSXP are like VECSXP s with mostly LANGSXP elements.

CAR and CDR

Lists are traditionally created from paired pointers, (A B C)
Content of Address Register (CAR) and Content of Decrement Register (CDR) are how they are accessed

PROTECT

To guard objects against R’s garbage collector, it maintains a protection stack. PROTECT(obj) pushes the object onto the protection stack, and UNPROTECT(n) pops the top n objects off the stack.

for (i = argc - 1; i > 1; i--)
{
    PROTECT(alist);
    alist = LCONS(mkString(argv[i]), alist);
    UNPROTECT(1);
}

It is better to protect too often than not. It is a trade off of code which will almost always run with a missed protection, and clutter making it hard to maintain.

Don’t protect

when you don’t need the object anymore.
when the object is part of an object which is already protected.
across calls where no allocation is involved.

Example of parsing args from .Exclude

printargs.c:

#include <R.h>

SEXP printargs(SEXP alist)
{
  SEXP p, ans;
  int n;

  for (p = alist, n = 0; p != R_NilValue; p = CDR(p), n++) {
  PrintValue(CAR(p));
  }

  ans = allocVector(INTSXP, 1);
  INTEGER(ans)[0] = n;
  return ans;
}

Compilation:

R CMD SHLIB demo.c

Linking

dyn.load("demo.so")
.External("printargs", 1, 2, 3:5, "hello")

External Pointers

R level interface

The interface to pointers is a C level interface. From R these are opaque objects with a type of externalptr.

Like environments and names, pointer reference objects are not copied by duplicate. If you want to create an R object which corresponds to a pointer, you should do something like

p <- .Call(...) # Create ptr object
object <- list(p)
class(object) <- "myclass"

C level interface

A pointer reference is constructed by calling R_MakeExternalPtr with three arguments;

the pointer value,
a tag (SEXP)
- used e.g. to attach type information to the pointer ref
and a value to be protected
- Can be used for associating with the pointer an R object

that must remain alive as long as the pointer is alive.

SEXP R_MakeExternalPtr(void *p, SEXP tag, SEXP prot);

defines R_MakeExternalPtr

Reader funcs are additionally provided to allow the pointer, tag, and protected values to be retrieved:

void *R_ExternalPtrAddr(SEXP s);
SEXP R_ExternalPtrTag(SEXP s);
SEXP R_ExternalPtrProtected(SEXP s);

Defines R_ExternalPtrAddr, R_ExternalPtrProtected, R_ExternalPtrTag

The pointer value can be cleared (set to NULL) or given a new value. Code that uses pointer references should check for NULL values since these can occur as a result of clearing or save/loads. It may also occasionally be useful to be able to change the tag or protected values of a pointer object.

[!TIP] Pointers in Finalization As part of finalization it is a good idea to clear a pointer reference just in case it has managed to get itself resurrected.

void R_ClearExternalPtr(SEXP s);
void R_SetExternalPtrAddr(SEXP s, void *p);
void R_SetExternalPtrTag(SEXP s, SEXP tag);
void R_SetExternalPtrProtected(SEXP s, SEXP p);

Defines R_ClearExternalPtr, R_SetExternalPtrAddr, R_SetExternalPtrProtected

Saving in a work space

When a pointer object is saved in a workspace its pointer field is saved as NULL since the pointer values are not likely to be useful across sessions. The tag object will be retained. For more info see A Future for R: Non-Exportable Objects.

Finalization

A finalizer can be registered for a pointer reference or an environment.

typedef void (*R_CFinalizer_t)(SEXP);

Defines R_CFinalizer_t

Finalizers are registered with

void R_RegisterFinalizer(SEXP s, SEXP fun);
void R_RegisterCFinalizer(SEXP s, R_CFinalizer_t fun);

Defines R_RegisterFinalizer

It is an error to register an object for finalization more than once.

The finalization function will be called sometime after the garbage collector detects that the object is no longer accessible from within R. The exact timing is not predictable. There is no guarantee that finalizers will be called before system exit, even for objects that may already have been determined to be eligible for finalization.