R Notes

Tags :: Language

General

caching intermediate computations as Rdata

inter_fp <- file.path("data_test.Rdata")

## Check if data already exists on disk, otherwise create it and save to disk
if (file.exists(inter_fp)) {

  message(paste(inter_fp, "exists. Skipping"))

  ## NOTE:
  ## load(...) will load /all/ variables from the Rdata into the current
  ## environment. Variables from the Rdata will /overwrite/ variables of the
  ## same name in the current environment.

  load(inter_fp) # Loads variable "fun_sample" to environment

} else {

  ## ... do computation to create intermediate data

  N <- 100

  ## Simulate N samples
  fun_sample <- rnorm(N)

  ## save(...) allows us to only write /specific/ variables to an Rdata file,
  ## instead of all the data in the current environment.
  save(fun_sample, file = inter_fp)

  ## save(...) can also be used to save any number of variables
  ## e.g. save(fun_sample, N, file = inter_fp)
}


## Use loaded intermediate data later!
plot(1:100, fun_sample)

Memory model

Memory Profiling

R code

Profiling/logging memory usage in R is tricky, and is definitely not helped by the lack of __file__ and __line__ macros.

There are a couple of “experimental” (2006) additions to the language to profile memory usage.

  • Rprof has an option called memory.profiling which will write out information, heap sizes, memory in nodes, and num

of calls to Rf_duplicate (the internal C function which copies data)

  • summaryRprof can summarize
  • Only available if compiled with --enable-memory-profiling
    • tracemem marks an object so that a stack trace will be printed when an object is duplicated, or copies by coercion or arithmetic functions. intended for tracking accidental copying of large objects. untracemem will untraced an object and tracingState controls whether tracing info is printed.
    • tracemem cannot be used on function since it uses the same trace bit that trace uses, and will not work on objects such as environments that are passed by reference.

If R is complied with memory profiling, Rprofmem starts and stops a pure memory use profilier.

Enable memory profiling

RUN wget -c https://cran.r-project.org/src/base/R-4/R-4.1.1.tar.gz && \
    tar -xf R-4.1.1.tar.gz && \
    cd R-4.1.1 && \
    ./configure --enable-memory-profiling && \
    make -j$(nproc) -O && \
    make install && \
    cd /app && \
    rm R-4.1.1.tar.gz && \
    rm -r R-4.1.1

Methods to get memory usage

  • object.size:
    • Gets memory allocation on object by object basis
  • memory.size
    • Windows specific

How much performance is lost with profiling enabled at compile time?

Compiled packages

To enable profiling of compiled C code, the packages need to be installed with specific flags. The -g flag w/ GNU GCC or Clang will produce debugging information in the operating systems native format.

The compilation variables for Make are set in ~/.R/Makevars. This directory/file might have to be created by you:

mkdir -p ~/.R && touch ~/.R/Makevars

Set the flag as CXXFLAGS = -g

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so CPUPROFILE=sample.profile /usr/local/lib/R/bin/exec/R -f  main_profile.R

Valgrind

Valgrind can be used on linux with a common CPU type. Test an R script with it like:

R -d "valgrind --tool=memcheck --leak-check=full" --no-save < test.R

This is expected to run ~20x slower than without valgrind. In some cases it will be even slower.

On platforms where valgrind and its headers are installed you can build a version of R with extra instrumentation to help valgrind detect errors in the use of memory allocated from the R heap. The configure is --with-valgrind-instrumentation=level where level is 0, 1, or 2. - Level 0 is the default and does not add anything. - Level 1 will detect some uses of uninitialized memory and has little impact on speed. - Uninitialized memory in some numeric, logical, integer, raw, complex vectors, and in memory allocated by R_alloc - Level 2 will detect other memory-use bugs but makes R much slower when running under valgrind. - Includes using the data sections of R vectors after they are freed - Using level 2 with gctorture can be even more effective (and even slower)

Compiling

Options flags when configuring can seen with configure --help

Configure options

; ./configure --help
`configure' configures R 4.1.1 to adapt to many kinds of systems.

Usage: ./configure [OPTION]... [VAR=VALUE]...

To assign environment variables (e.g., CC, CFLAGS...), specify them as
VAR=VALUE.  See below for descriptions of some of the useful variables.

Defaults for the options are specified in brackets.

Configuration:
  -h, --help              display this help and exit
      --help=short        display options specific to this package
      --help=recursive    display the short help of all the included packages
  -V, --version           display version information and exit
  -q, --quiet, --silent   do not print `checking ...' messages
      --cache-file=FILE   cache test results in FILE [disabled]
  -C, --config-cache      alias for `--cache-file=config.cache'
  -n, --no-create         do not create output files
      --srcdir=DIR        find the sources in DIR [configure dir or `..']

Installation directories:
  --prefix=PREFIX         install architecture-independent files in PREFIX
                          [/usr/local]
  --exec-prefix=EPREFIX   install architecture-dependent files in EPREFIX
                          [PREFIX]

By default, `make install' will install all the files in
`/usr/local/bin', `/usr/local/lib' etc.  You can specify
an installation prefix other than `/usr/local' using `--prefix',
for instance `--prefix=$HOME'.

For better control, use the options below.

Fine tuning of the installation directories:
  --bindir=DIR            user executables [EPREFIX/bin]
  --sbindir=DIR           system admin executables [EPREFIX/sbin]
  --libexecdir=DIR        program executables [EPREFIX/libexec]
  --sysconfdir=DIR        read-only single-machine data [PREFIX/etc]
  --sharedstatedir=DIR    modifiable architecture-independent data [PREFIX/com]
  --localstatedir=DIR     modifiable single-machine data [PREFIX/var]
  --libdir=DIR            object code libraries [EPREFIX/lib]
  --includedir=DIR        C header files [PREFIX/include]
  --oldincludedir=DIR     C header files for non-gcc [/usr/include]
  --datarootdir=DIR       read-only arch.-independent data root [PREFIX/share]
  --datadir=DIR           read-only architecture-independent data [DATAROOTDIR]
  --infodir=DIR           info documentation [DATAROOTDIR/info]
  --localedir=DIR         locale-dependent data [DATAROOTDIR/locale]
  --mandir=DIR            man documentation [DATAROOTDIR/man]
  --docdir=DIR            documentation root [DATAROOTDIR/doc/R]
  --htmldir=DIR           html documentation [DOCDIR]
  --dvidir=DIR            dvi documentation [DOCDIR]
  --pdfdir=DIR            pdf documentation [DOCDIR]
  --psdir=DIR             ps documentation [DOCDIR]

R installation directories:
  --libdir=DIR        R files to R_HOME=DIR/R [EPREFIX/$LIBnn]
    rdocdir=DIR       R doc files to DIR      [R_HOME/doc]
    rincludedir=DIR   R include files to DIR  [R_HOME/include]
    rsharedir=DIR     R share files to DIR    [R_HOME/share]

X features:
  --x-includes=DIR    X include files are in DIR
  --x-libraries=DIR   X library files are in DIR

System types:
  --build=BUILD     configure for building on BUILD [guessed]
  --host=HOST       cross-compile to build programs to run on HOST [BUILD]

Optional Features:
  --disable-option-checking  ignore unrecognized --enable/--with options
  --disable-FEATURE       do not include FEATURE (same as --enable-FEATURE=no)
  --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
  --enable-R-profiling    attempt to compile support for Rprof() [yes]
  --enable-memory-profiling
                          attempt to compile support for Rprofmem(),
                          tracemem() [no]
  --enable-R-framework[=DIR]
                          macOS only: build R framework (if possible), and
                          specify its installation prefix [no,
                          /Library/Frameworks]
  --enable-R-shlib        build the shared/dynamic library 'libR' [no]
  --enable-R-static-lib   build the static library 'libR.a' [no]
  --enable-BLAS-shlib     build BLAS into a shared/dynamic library [perhaps]
  --enable-maintainer-mode
                          enable make rules and dependencies not useful (and
                          maybe confusing) to the casual installer [no]
  --enable-strict-barrier provoke compile error on write barrier violation
                          [no]
  --enable-prebuilt-html  build static HTML help pages [no]
  --enable-lto            enable link-time optimization [no]
  --enable-java           enable Java [yes]
  --enable-byte-compiled-packages
                          byte-compile base and recommended packages [yes]
  --enable-static[=PKGS]  (libtool) build static libraries [default=no]
  --enable-shared[=PKGS]  (libtool) build shared libraries [default=yes]
  --enable-fast-install[=PKGS]
                          (libtool) optimize for fast installation
                          [default=yes]
  --disable-libtool-lock  avoid locking (might break parallel builds)
  --enable-long-double    use long double type [yes]
  --disable-openmp        do not use OpenMP
  --disable-largefile     omit support for large files
  --disable-nls           do not use Native Language Support

  --disable-rpath         do not hardcode runtime library paths

Optional Packages:
  --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
  --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no)
  --with-blas             use system BLAS library (if available), or specify
                          it [no]
  --with-lapack           use system LAPACK library (if available), or specify
                          it [no]
  --with-readline         use readline library [yes]
  --with-pcre2            use PCRE2 library (if available) [yes]
  --with-pcre1            use PCRE1 library (if available and PCRE2 is not)
                          [yes]
  --with-aqua             macOS only: use Aqua (if available) [yes]
  --with-tcltk            use Tcl/Tk (if available), or specify its library
                          dir [yes]
  --with-tcl-config=TCL_CONFIG
                          specify location of tclConfig.sh []
  --with-tk-config=TK_CONFIG
                          specify location of tkConfig.sh []
  --with-cairo            use cairo (and pango) if available [yes]
  --with-libpng           use libpng library (if available) [yes]
  --with-jpeglib          use jpeglib library (if available) [yes]
  --with-libtiff          use libtiff library (if available) [yes]
  --with-system-tre       use system tre library (if available) [no]
  --with-valgrind-instrumentation
                          Level of additional instrumentation for Valgrind
                          (0/1/2) [0]
  --with-system-valgrind-headers
                          use system valgrind headers (if available) [no]
  --with-internal-tzcode  use internal time-zone code [no, yes on macOS]
  --with-internal-towlower
                          use internal code for towlower/upper [no, yes on
                          macOS and Solaris]
  --with-internal-iswxxxxx
                          use internal iswprint etc. [no, yes on macOS,
                          Solaris and AIX]
  --with-internal-wcwidth use internal wcwidth [yes]
  --with-recommended-packages
                          use/install recommended R packages [yes]
  --with-ICU              use ICU library (if available) [yes]
  --with-static-cairo     allow for the use of static cairo libraries [no, yes
                          on macOS]
  --with-pic[=PKGS]       (libtool) try to use only PIC/non-PIC objects
                          [default=use both]
  --with-aix-soname=aix|svr4|both
                          (libtool) shared library versioning (aka "SONAME")
                          variant to provide on AIX, [default=aix].
  --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
  --with-sysroot[=DIR]    Search for dependent libraries within DIR (or the
                          compiler's sysroot if not specified).
  --with-x                use the X Window System
  --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
  --with-libpth-prefix[=DIR]  search for libpth in DIR/include and DIR/lib
  --without-libpth-prefix     don't search for libpth in includedir and libdir
  --with-included-gettext use the GNU gettext library included here [no]
  --with-libintl-prefix[=DIR]  search for libintl in DIR/include and DIR/lib
  --without-libintl-prefix     don't search for libintl in includedir and libdir

Some influential environment variables:
  R_PRINTCMD  command used to spool PostScript files to the printer
  R_PAPERSIZE paper size for the local (PostScript) printer
  R_BATCHSAVE set default behavior of R when ending a session
  MAIN_CFLAGS additional CFLAGS used when compiling the main binary
  SHLIB_CFLAGS
              additional CFLAGS used when building shared objects
  MAIN_FFLAGS additional FFLAGS used when compiling the main binary
  SHLIB_FFLAGS
              additional FFLAGS used when building shared objects
  MAIN_LD     command used to link the main binary
  MAIN_LDFLAGS
              flags which are necessary for loading a main program which will
              load shared objects (DLLs) at runtime
  CPICFLAGS   special flags for compiling C code to be turned into a shared
              object.
  FPICFLAGS   special flags for compiling Fortran code to be turned into a
              shared object.
  SHLIB_LD    command for linking shared objects which contain object files
              from a C or Fortran compiler only
  SHLIB_LDFLAGS
              special flags used by SHLIB_LD
  DYLIB_LD    command for linking dynamic libraries which contain object files
              from a C or Fortran compiler only
  DYLIB_LDFLAGS
              special flags used for make a dynamic library
  CXXPICFLAGS special flags for compiling C++ code to be turned into a shared
              object
  SHLIB_CXXLD command for linking shared objects which contain object files
              from the C++ compiler
  SHLIB_CXXLDFLAGS
              special flags used by SHLIB_CXXLD
  TCLTK_LIBS  flags needed for linking against the Tcl and Tk libraries
  TCLTK_CPPFLAGS
              flags needed for finding the tcl.h and tk.h headers
  MAKE        make command
  TAR         tar command
  R_BROWSER   default browser
  R_PDFVIEWER default PDF viewer
  BLAS_LIBS   flags needed for linking against external BLAS libraries
  LAPACK_LIBS flags needed for linking against external LAPACK libraries
  LIBnn       'lib' or 'lib64' for dynamic libraries
  SAFE_FFLAGS Safe Fortran fixed-form compiler flags for e.g. dlamc.f
  r_arch      Use architecture-dependent subdirs with this name
  DEFS        C defines for use when compiling R
  JAVA_HOME   Path to the root of the Java environment
  R_SHELL     shell to be used for shell scripts, including 'R'
  YACC        The `Yet Another Compiler Compiler' implementation to use.
              Defaults to the first program found out of: `bison -y', `byacc',
              `yacc'.
  YFLAGS      The list of arguments that will be passed by default to $YACC.
              This script will default YFLAGS to the empty string to avoid a
              default value of `-d' given by some make applications.
  PKG_CONFIG  path to pkg-config (or pkgconf) utility
  PKG_CONFIG_PATH
              directories to add to pkg-config's search path
  PKG_CONFIG_LIBDIR
              path overriding pkg-config's default search path
  CC          C compiler command
  CFLAGS      C compiler flags
  LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries in a
              nonstandard directory <lib dir>
  LIBS        libraries to pass to the linker, e.g. -l<library>
  CPPFLAGS    (Objective) C/C++ preprocessor flags, e.g. -I<include dir> if
              you have headers in a nonstandard directory <include dir>
  CPP         C preprocessor
  FC          Fortran compiler command
  FCFLAGS     Fortran compiler flags
  CXX         C++ compiler command
  CXXFLAGS    C++ compiler flags
  CXXCPP      C++ preprocessor
  OBJC        Objective C compiler command
  OBJCFLAGS   Objective C compiler flags
  LT_SYS_LIBRARY_PATH
              User-defined run-time library search path.
  CXX11       C++11 compiler command
  CXX11STD    special flag for compiling and for linking C++11 code, e.g.
              -std=c++11
  CXX11FLAGS  C++11 compiler flags
  CXX11PICFLAGS
              special flags for compiling C++11 code to be turned into a
              shared object
  SHLIB_CXX11LD
              command for linking shared objects which contain object files
              from the C++11 compiler
  SHLIB_CXX11LDFLAGS
              special flags used by SHLIB_CXX11LD
  CXX14       C++14 compiler command
  CXX14STD    special flag for compiling and for linking C++14 code, e.g.
              -std=c++14
  CXX14FLAGS  C++14 compiler flags
  CXX14PICFLAGS
              special flags for compiling C++14 code to be turned into a
              shared object
  SHLIB_CXX14LD
              command for linking shared objects which contain object files
              from the C++14 compiler
  SHLIB_CXX14LDFLAGS
              special flags used by SHLIB_CXX14LD
  CXX17       C++17 compiler command
  CXX17STD    special flag for compiling and for linking C++17 code, e.g.
              -std=c++17
  CXX17FLAGS  C++17 compiler flags
  CXX17PICFLAGS
              special flags for compiling C++17 code to be turned into a
              shared object
  SHLIB_CXX17LD
              command for linking shared objects which contain object files
              from the C++17 compiler
  SHLIB_CXX17LDFLAGS
              special flags used by SHLIB_CXX17LD
  CXX20       C++20 compiler command
  CXX20STD    special flag for compiling and for linking C++20 code, e.g.
              -std=c++20
  CXX20FLAGS  C++20 compiler flags
  CXX20PICFLAGS
              special flags for compiling C++20 code to be turned into a
              shared object
  SHLIB_CXX20LD
              command for linking shared objects which contain object files
              from the C++20 compiler
  SHLIB_CXX20LDFLAGS
              special flags used by SHLIB_CXX20LD
  XMKMF       Path to xmkmf, Makefile generator for X Window System

Use these variables to override the choices made by `configure' or to help
it to find libraries and programs with nonstandard names/locations.

Report bugs to <https://bugs.r-project.org>.
R home page: <https://www.r-project.org>.

References


Links to this note