C interface for R
Basics
if we want to inerface C code with R, the functions in C need to have the following properties:
- C functions called by R must all return
void
which means they need to return the results of the computation in their arguments - All args passed to the C function are passed by reference. Which means we pass a pointer to a number or array. Be careful to correctly dereference the pointers.
- Each file containing C code to be called by R should include the
R.h
header file. If using special functions (e.g. distribution functions) then
include the proper R header (Rmath.h
).
R has two C API’s. The old one in Rdefines.h
, and the full one on
Rinternals.h
. To get the includes dir from R:
R.home("include")
All functions are defined with prefixes Rf_
or R_
but are exported
without. This can be changed with #define R_NO_REMAP
. -
Compiling
R is used for compiliation rather than the C compiler directly. R
handles the linking to headers and libs. Given a C file called foo.c
,
use
R CMD SHLIB foo.c
to compile. This outputs foo.so
. Alternatively you can specify the
output name with the -o
flag just like w/ GCC or Clang. *.so
is a traditional UNIX name
To load compiled C code in R, use the dyn.load
func. E.g. for the
above example
dyn.load("foo.so")
All the functions defined in foo.c
will be available for R to call
Foreign Function Interface for C
- The
.C
func provides an interface to compiled code which has been linked into R, either at build time or withdyn.load
. The interface can be used with any other language which can generate C interfaces. - The first arg of
.C
is a char string specifying the name of the func as known to C. Check that the symbol has been loaded withis.loaded(...)
.
Type mappings (. C interface)
On all R platforms int
and
INTEGER
are are 32-bit.
Hello World
hello.c
:
#include <R.h>
void hello(int *n)
{
int i;
for(i = 0; i < n; i++) {
Rprint("Hello, world!\n");
}
}
hello.R
:
hello_r <- function(n) {
.C("hello", as.integer(n))
}
.C, .Call, and .External
.C vs .Call
There are two different options to integrate C code with R: .C
or
.Call
. The traditional method is .C
while .Call
is more modern and
allows different usage.
.C
allows you to pass some vectors from R to C, do some operations on them and then pass them back. However there is no way to get the length of the vector so the length will have to be passed. If you pass a length larger than the actual length, R will crash.Call
allows you to get lengths of vectors -.C
does not allow you to pass an object that you create in C back to R, while.Call
allows the creation and passing of objects back to R- Since
.C
does not share object, it copies objects into C while.Call
protects them making.Call
faster.
.Call vs .External
- Essentially identical on the R side, however on the C side
.Call
passes a fixed number of args, whereas.External
passes an argument list of any length (LISTSXP
). Differences between interfaces on R side
.C("convolve",
as.double(a),
as.integer(length(a)),
as.double(b),
as.integer(length(b)),
ab = double(length(a) + length(b) - 1))$ab
.Call("convolve2", a, b)
.External("convolveE", a, b)
Here is an example add
function using .Call
add.c
:
// In C ----------------------------------------
#include <R.h>
#include <Rinternals.h>
SEXP add(SEXP a, SEXP b) {
SEXP result = PROTECT(allocVector(REALSXP, 1));
REAL(result)[0] = asReal(a) + asReal(b);
UNPROTECT(1);
return result;
}
add.R
:
# In R ----------------------------------------
add <- function(a, b) {
.Call("add", a, b)
}
Data structures
All R objects are stored in a S-expression datatype (SEXP
). Since
all R objects are S-expressions, every C function must return and take
SEXP
types as inputs. The SEXP
type is techically a pointer to a
structure with typedef SEXPREC
. The full acryonym expanded is
Symbolic EXPression RECord/Pointer.
There are a 22 subtypes (some are esoteric and rarely used)
- Vectors: LGLSXP, INTSXP, REALSXP, CPLXSXP, STRSXP, VECSXP, EXPRSXP
- List-likes: LISTSXP, LANGSXP
- Symbols & strings: SYMSXP, CHARSXP
SEXP internals
- A header struct + a union construct
- A major special case is made for the
VECTOR_SEXPREC
which uses a slightly shorter structure immediatly followed by data. - Other subtypes are generally a header plus a 3-pointer structure (CAR/CDR/TAG for lists, formals/body/env for functions, etc)
Accessing and creating vectors
From RTcl_ObjAsDoubleVector
:
ans = allocVector(REALSXP, count);
for (i = 0; i < count; i++) {
ret = Tcl_GetDoubleFromObj(RTcl_interp, elem[i], &x);
if (ret != TCL_OK) x = NA_REAL;
REAL(ans)[i] = x;
}
REAL(ans)
gives a pointer to the base of an array which can be indexed as usual.NA_REAL
to encode missing values- Allocation performed with
allocVector
Character vectors example
PROTECT(ans = allocVector(STRSXP, count));
for (i = 0; i < count; i++) {
SET_STRING_ELT(ans, i
mkChar(Tcl_GetStringFromObj(elem[i], NULL))
);
UNPROTECT(1);
}
- Need to use
mkChar
to generateCHARSXP
object - Use
SET_STRING_ELT
to change element of vector (write barrier) - Need to
PROTECT
List-like structures
- Internally R is based on Scheme, a Lisp variant.
- Lists in R are really
VECSXP
objects (a generic vector), but internally we haveLISTSXP
which are similar to LISP lists. These are almost invisible at the R level. LANGSXP
objects are structurally similar toLISTSXP
,EXPRSXP
are likeVECSXP
s with mostlyLANGSXP
elements.
CAR and CDR
- Lists are traditionally created from paired pointers,
(A B C)
- Content of Address Register (CAR) and Content of Decrement Register (CDR) are how they are accessed
PROTECT
To guard objects against R’s garbage collector, it maintains a
protection stack. PROTECT(obj)
pushes the object onto the protection
stack, and UNPROTECT(n)
pops the top n
objects off the stack.
for (i = argc - 1; i > 1; i--)
{
PROTECT(alist);
alist = LCONS(mkString(argv[i]), alist);
UNPROTECT(1);
}
It is better to protect too often than not. It is a trade off of code which will almost always run with a missed protection, and clutter making it hard to maintain.
Don’t protect
- when you don’t need the object anymore.
- when the object is part of an object which is already protected.
- across calls where no allocation is involved.
Example of parsing args from .Exclude
printargs.c
:
#include <R.h>
SEXP printargs(SEXP alist)
{
SEXP p, ans;
int n;
for (p = alist, n = 0; p != R_NilValue; p = CDR(p), n++) {
PrintValue(CAR(p));
}
ans = allocVector(INTSXP, 1);
INTEGER(ans)[0] = n;
return ans;
}
Compilation:
R CMD SHLIB demo.c
Linking
dyn.load("demo.so")
.External("printargs", 1, 2, 3:5, "hello")
External Pointers
R level interface
The interface to pointers is a C level interface. From R these are
opaque objects with a type of externalptr
.
Like environments and names, pointer reference objects are not copied by duplicate. If you want to create an R object which corresponds to a pointer, you should do something like
p <- .Call(...) # Create ptr object
object <- list(p)
class(object) <- "myclass"
C level interface
A pointer reference is constructed by calling R_MakeExternalPtr
with
three arguments;
- the pointer value,
- a tag (SEXP)
- used e.g. to attach type information to the pointer ref
- and a value to be protected
- Can be used for associating with the pointer an R object
that must remain alive as long as the pointer is alive.
SEXP R_MakeExternalPtr(void *p, SEXP tag, SEXP prot);
defines R_MakeExternalPtr
Reader funcs are additionally provided to allow the pointer, tag, and protected values to be retrieved:
void *R_ExternalPtrAddr(SEXP s);
SEXP R_ExternalPtrTag(SEXP s);
SEXP R_ExternalPtrProtected(SEXP s);
Defines
R_ExternalPtrAddr
,R_ExternalPtrProtected
,R_ExternalPtrTag
The pointer value can be cleared (set to NULL) or given a new value.
Code that uses pointer references should check for NULL
values since
these can occur as a result of clearing or save/loads. It may also
occasionally be useful to be able to change the tag or protected values
of a pointer object.
[!TIP] Pointers in Finalization As part of finalization it is a good idea to clear a pointer reference just in case it has managed to get itself resurrected.
void R_ClearExternalPtr(SEXP s);
void R_SetExternalPtrAddr(SEXP s, void *p);
void R_SetExternalPtrTag(SEXP s, SEXP tag);
void R_SetExternalPtrProtected(SEXP s, SEXP p);
Defines
R_ClearExternalPtr
,R_SetExternalPtrAddr
,R_SetExternalPtrProtected
Saving in a work space
When a pointer object is saved in a workspace its pointer field is saved as NULL since the pointer values are not likely to be useful across sessions. The tag object will be retained. For more info see A Future for R: Non-Exportable Objects.
Finalization
A finalizer can be registered for a pointer reference or an environment.
typedef void (*R_CFinalizer_t)(SEXP);
Defines
R_CFinalizer_t
Finalizers are registered with
void R_RegisterFinalizer(SEXP s, SEXP fun);
void R_RegisterCFinalizer(SEXP s, R_CFinalizer_t fun);
Defines
R_RegisterFinalizer
It is an error to register an object for finalization more than once.
The finalization function will be called sometime after the garbage collector detects that the object is no longer accessible from within R. The exact timing is not predictable. There is no guarantee that finalizers will be called before system exit, even for objects that may already have been determined to be eligible for finalization.
References
- .C vs .Call
- r-internals documentation (unofficial)
- Statistical Computing 2 - Interfacing R with C/C++
- R Internals
- Now you C me
- Storing C objects in R
- Simple references with pointer
- Non-exportable objects