Package 'autometric' reference manual

Title:	Background Resource Logging
Description:	Intense parallel workloads can be difficult to monitor. Packages 'crew.cluster', 'clustermq', and 'future.batchtools' distribute hundreds of worker processes over multiple computers. If a worker process exhausts its available memory, it may terminate silently, leaving the underlying problem difficult to detect or troubleshoot. Using the 'autometric' package, a worker can proactively monitor itself in a detached background thread. The worker process itself runs normally, and the thread writes to a log every few seconds. If the worker terminates unexpectedly, 'autometric' can read and visualize the log file to reveal potential resource-related reasons for the crash. The 'autometric' package borrows heavily from the methods of packages 'ps' <doi:10.32614/CRAN.package.ps> and 'psutil'.
Authors:	William Michael Landau [aut, cre] , Eli Lilly and Company [cph, fnd], Posit Software, PBC [cph] (For the 'ps' package. See LICENSE.note.), Jay Loden [cph] (For the 'psutil' package. See LICENSE.note.), Dave Daeschler [cph] (For the 'psutil' package. See LICENSE.note.), Giampaolo Rodola [cph] (For the 'psutil' package. See LICENSE.note.)
Maintainer:	William Michael Landau <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.2.9000
Built:	2025-03-14 03:27:45 UTC
Source:	https://github.com/wlandau/autometric

Check the log thread.

Description

Check if the log is running.

Usage

log_active()
log_active()

Value

TRUE if a background thread is actively writing to the log, FALSE otherwise. The result is based on a static C variable, so the information is second-hand.

Examples

  path <- tempfile()
  log_active()
  log_start(seconds = 0.5, path = path)
  log_active()
  Sys.sleep(2)
  log_stop()
  Sys.sleep(2)
  log_active()
  unlink(path)
path <- tempfile()
  log_active()
  log_start(seconds = 0.5, path = path)
  log_active()
  Sys.sleep(2)
  log_stop()
  Sys.sleep(2)
  log_active()
  unlink(path)

Get log phase

Description

Get the current log phase.

Usage

log_phase_get()
log_phase_get()

Value

Character string with the name of the current log phase.

Examples

  path <- tempfile()
  log_phase_get()
  log_print(path = path)
  log_phase_set("different")
  log_phase_get()
  log_print(path = path)
  log_phase_reset()
  log_phase_get()
  log_read(path)
path <- tempfile()
  log_phase_get()
  log_print(path = path)
  log_phase_set("different")
  log_phase_get()
  log_print(path = path)
  log_phase_reset()
  log_phase_get()
  log_read(path)

Reset log phase

Description

Reset the current log phase to the default value.

Usage

log_phase_reset()
log_phase_reset()

Value

NULL (invisibly). Called for its side effects.

Examples

  path <- tempfile()
  log_phase_get()
  log_print(path = path)
  log_phase_set("different")
  log_phase_get()
  log_print(path = path)
  log_phase_reset()
  log_phase_get()
  log_read(path)
path <- tempfile()
  log_phase_get()
  log_print(path = path)
  log_phase_set("different")
  log_phase_get()
  log_print(path = path)
  log_phase_reset()
  log_phase_get()
  log_read(path)

Set log phase

Description

Set the current log phase.

Usage

log_phase_set(phase)
log_phase_set(phase)

Arguments

phase

Character string with the phase of the log. Only the first 255 characters are used. Cannot include the pipe character "|" because it is the delimiter of fields in the log output.

Value

NULL (invisibly). Called for its side effects.

Examples

  path <- tempfile()
  log_phase_get()
  log_print(path = path)
  log_phase_set("different")
  log_phase_get()
  log_print(path = path)
  log_phase_reset()
  log_phase_get()
  log_read(path)
path <- tempfile()
  log_phase_get()
  log_print(path = path)
  log_phase_set("different")
  log_phase_get()
  log_print(path = path)
  log_phase_reset()
  log_phase_get()
  log_read(path)

Plot a metric of a process over time

Description

Visualize a metric of a log over time for a single process ID in a single log file.

Usage

log_plot(log, pid = NULL, name = NULL, phase = NULL, metric = "resident", ...)
log_plot(log, pid = NULL, name = NULL, phase = NULL, metric = "resident", ...)

Arguments

`log`	Data frame returned by `log_read()`. Must be nonempty. `log_plot()` only includes rows with status code equal to 0.
`pid`	Either `NULL` or a non-negative integer with the process ID to plot. At least one of `pid` or `name` must be `NULL`.
`name`	Either `NULL` or a non-negative integer with the name of the process to plot. The name was previously specified in the names of the `pid` argument of `log_start()` or `log_print()`. At least one of `pid` or `name` must be `NULL`.
`phase`	Either `NULL` or a character string specifying the name of a log phase (see `log_phase_set()`). If not `NULL`, then `log_print()` will only visualize data from the given log phase.
`metric`	Character string with the name of a metric to plot against time. Must be only a single string. Defaults to the resident set size (RSS), the total amount of memory used by the process. See `log_read()` for descriptions of the metrics available.
`...`	Named optional parameters to pass to the base function `plot()`.

Value

A base plot of a metric of a log over time.

Examples

  path <- tempfile()
  log_start(seconds = 0.25, path = path)
  Sys.sleep(1)
  log_stop()
  Sys.sleep(2)
  log <- log_read(path)
  log_plot(log, metric = "cpu")
  unlink(path)
path <- tempfile()
  log_start(seconds = 0.25, path = path)
  Sys.sleep(1)
  log_stop()
  Sys.sleep(2)
  log <- log_read(path)
  log_plot(log, metric = "cpu")
  unlink(path)

Print once to the log.

Description

Sample CPU load metrics and print a single line to the log for each process in pids. Used for debugging and testing only. Not for users.

Usage

log_print(
  path,
  seconds = 1,
  pids = c(local = Sys.getpid()),
  error = getOption("autometric_error", TRUE)
)
log_print(
  path,
  seconds = 1,
  pids = c(local = Sys.getpid()),
  error = getOption("autometric_error", TRUE)
)

Arguments

`path`	Character string, path to a file to log resource usage. On Windows, the path must point to a physical file on disk. On Unix-like systems, `path` can be `"/dev/stdout"` to print to standard output or `"/dev/stderr"` to print to standard error. Standard output is the most convenient option for high-performance computing scenarios where worker processes already write to log files. Such workers usually already redirect standard output to a physical file, as with a cluster like SLURM, or capture messages with Amazon CloudWatch. Normally, standard output and standard error are discouraged because of how they interact with the R API and R console. However, the exported user interface of `autometric` only ever prints from a detached POSIX thread where it is unsafe to use `Rprintf()` or `REprintf()`.
`seconds`	Positive number, number of seconds to sample CPU load metrics before printing to the log. This number should be at least 1, usually more. A low number of seconds could burden the operating system and generate large log files.
`pids`	Nonempty vector of non-negative integers of process IDs to monitor. NOTE: On Mac OS, only the currently running process can be monitored. This is due to security restrictions around certain system calls, c.f. https://os-tres.net/blog/2010/02/17/mac-os-x-and-task-for-pid-mach-call/. # nolint If the `pids` vector is named, then the names will show alongside the process IDs in the log entries. Names cannot include the pipe character `"\|"` because it is the delimiter of fields in the log output.
`error`	`TRUE` to throw an error if the thread is not supported on the current platform. (See `log_support()`.)

Value

NULL (invisibly). Called for its side effects.

Examples

  path = tempfile()
  log_print(path = path)
  log_read(path)
  unlink(path)
path = tempfile()
  log_print(path = path)
  log_read(path)
  unlink(path)

Read a log.

Description

Read a log file into R.

Usage

log_read(
  path,
  units_cpu = c("percentage", "fraction"),
  units_memory = c("megabytes", "bytes", "kilobytes", "gigabytes"),
  units_time = c("seconds", "minutes", "hours", "days"),
  hidden = FALSE
)
log_read(
  path,
  units_cpu = c("percentage", "fraction"),
  units_memory = c("megabytes", "bytes", "kilobytes", "gigabytes"),
  units_time = c("seconds", "minutes", "hours", "days"),
  hidden = FALSE
)

Arguments

`path`	Character vector of paths to files and/or directories of logs to read.
`units_cpu`	Character string with the units of the `cpu` field. Defaults to `"percentage"` and must be in `c("percentage", "fraction")`.
`units_memory`	Character string with the units of the `memory` field. Defaults to `"megabytes"` and must be in `c("megabytes", "bytes", "kilobytes", "gigabytes")`.
`units_time`	Character string, units of the `time` field. Defaults to `"seconds"` and must be in `c("seconds", "minutes", "hours", "days")`.
`hidden`	`TRUE` to include hidden files in the files and directories listed in `path`, `FALSE` to omit.

Details

log_read() is capable of reading a log file where both autometric and other processes have printed. Whenever autometric writes to a log, it bounds the beginning and end of the text with the keyword "__AUTOMETRIC__". that way, log_read() knows to only read and process the correct lines of the file.

In addition, it automatically converts the log data into the units units_time, units_cpu, and units_memory arguments.

Value

A data frame of metrics from the log with one row per log entry and columns with metadata and resource usage metrics. log_read() automatically converts the data into the units chosen with arguments units_time, units_cpu, and units_memory. The returned data frame has the following columns:

version: Version of the package used to write the log entry.
pid: Process ID monitored.
status: A status code for the log entry. Status 0 means logging succeeded. A status code not equal to 0 indicates something went wrong and the metrics should not be trusted.
time: numeric time stamp at which the entry was logged. log_read() automatically recenters this column so that time 0 indicates the first logged entry. Use the units_time argument to customize the units of this field.
core: CPU load of the process scaled relative to a single CPU core. Measures the amount of time the process spends running during a given interval of elapsed time.

On Mac OS, the package uses native system calls to get CPU core usage. On Linux and Windows, the package calculates it manually using. user + kernel clock cycles that ran during a sampling interval. It measures the clock cycles that the process executed during the interval, converts the clock cycles into seconds, then divides the result by the elapsed time of the interval. The length of the sampling interval is the seconds argument supplied to log_start(), or length of time between calls to log_print(). The first core measurement is 0 to reflect that a full sampling interval has not elapsed yet.

core can be read in as a percentage or fraction, depending on the units_cpu argument.
cpu: core divided by the number of logical CPU cores. This metric measures the load on the machine as a whole, not just the CPU core it runs on. Use the units_cpu argument to customize the units of this field.
rss: resident set size, the total amount of memory used by the process at the time of logging. This include the memory unique to the process (unique set size USS) and shared memory. Use the units_memory argument to customize the units of this field.
virtual: total virtual memory available to the process. The process does not necessarily use all this memory, but it can request more virtual memory throughout its life cycle. Use the units_memory argument to customize the units of this field.

Examples

  path <- tempfile()
  log_start(seconds = 0.5, path = path)
  Sys.sleep(2)
  log_stop()
  Sys.sleep(2)
  log_read(path)
  unlink(path)
path <- tempfile()
  log_start(seconds = 0.5, path = path)
  Sys.sleep(2)
  log_stop()
  Sys.sleep(2)
  log_read(path)
  unlink(path)

Start the log thread.

Description

Start a background thread that periodically writes system usage metrics of the current R process to a log file. See log_read() for explanations of the specific metrics.

Usage

log_start(
  path,
  seconds = 1,
  pids = c(local = Sys.getpid()),
  error = getOption("autometric_error", TRUE)
)
log_start(
  path,
  seconds = 1,
  pids = c(local = Sys.getpid()),
  error = getOption("autometric_error", TRUE)
)

Arguments

`path`	Character string, path to a file to log resource usage. On Windows, the path must point to a physical file on disk. On Unix-like systems, `path` can be `"/dev/stdout"` to print to standard output or `"/dev/stderr"` to print to standard error. Standard output is the most convenient option for high-performance computing scenarios where worker processes already write to log files. Such workers usually already redirect standard output to a physical file, as with a cluster like SLURM, or capture messages with Amazon CloudWatch. Normally, standard output and standard error are discouraged because of how they interact with the R API and R console. However, the exported user interface of `autometric` only ever prints from a detached POSIX thread where it is unsafe to use `Rprintf()` or `REprintf()`.
`seconds`	Positive number, number of seconds between writes to the log file. This number should be noticeably large, anywhere between half a second to several seconds or minutes. A low number of seconds could burden the operating system and generate large log files. Because of the way CPU usage measurements work, the first log entry starts only after after the first interval of `seconds` has passed.
`pids`	Nonempty vector of non-negative integers of process IDs to monitor. NOTE: On Mac OS, only the currently running process can be monitored. This is due to security restrictions around certain system calls, c.f. https://os-tres.net/blog/2010/02/17/mac-os-x-and-task-for-pid-mach-call/. # nolint If the `pids` vector is named, then the names will show alongside the process IDs in the log entries. Names cannot include the pipe character `"\|"` because it is the delimiter of fields in the log output.
`error`	`TRUE` to throw an error if the thread is not supported on the current platform. (See `log_support()`.)

Details

Only one thread can run at a time. If the thread is already running, then log_start() does not start an additional one. Before creating a new thread, call log_stop() to terminate the first one.

Value

NULL (invisibly). Called for its side effects.

Examples

  path <- tempfile()
  log_start(seconds = 0.5, path = path)
  Sys.sleep(2)
  log_stop()
  Sys.sleep(2)
  log_read(path)
  unlink(path)
path <- tempfile()
  log_start(seconds = 0.5, path = path)
  Sys.sleep(2)
  log_stop()
  Sys.sleep(2)
  log_read(path)
  unlink(path)

Stop the log thread.

Description

Stop the background thread that periodically writes system usage metrics of the current R process to a log file.

Usage

log_stop()
log_stop()

Details

The background thread is detached, so is there no way to directly terminate it (other than terminating the main thread, i.e. restarting R). log_stop() merely signals to the thread using a static C variable protected by a mutex. It may take time for the thread to notice, depending on the value of seconds you supplied to log_start(). For this reason, you may see one or two lines in the log even after you call log_stop().

Value

NULL (invisibly). Called for its side effects.

Examples

  path <- tempfile()
  log_start(seconds = 0.5, path = path)
  Sys.sleep(2)
  log_stop()
  log_read(path)
  unlink(path)
path <- tempfile()
  log_start(seconds = 0.5, path = path)
  Sys.sleep(2)
  log_stop()
  log_read(path)
  unlink(path)

Log support

Description

Check if your system supports background logging.

Usage

log_support()
log_support()

Details

The background logging functionality requires a Linux, Mac, or Windows computer, It also requires POSIX thread support and the nanosleep() C function.

Value

TRUE if your system supports background logging, FALSE otherwise.

Examples

  log_support()
log_support()

Package 'autometric'

Help Index

Check the log thread.

Description

Usage

Value

Examples

Get log phase

Description

Usage

Value

Examples

Reset log phase

Description

Usage

Value

Examples

Set log phase

Description

Usage

Arguments

Value

Examples

Plot a metric of a process over time

Description

Usage

Arguments

Value

Examples

Print once to the log.

Description

Usage

Arguments

Value

Examples

Read a log.

Description

Usage

Arguments

Details

Value

Examples

Start the log thread.

Description

Usage

Arguments

Details

Value

Examples

Stop the log thread.

Description

Usage

Details

Value

Examples

Log support

Description

Usage

Details

Value

Examples