Title: | Background Resource Logging |
---|---|
Description: | Intense parallel workloads can be difficult to monitor. Packages 'crew.cluster', 'clustermq', and 'future.batchtools' distribute hundreds of worker processes over multiple computers. If a worker process exhausts its available memory, it may terminate silently, leaving the underlying problem difficult to detect or troubleshoot. Using the 'autometric' package, a worker can proactively monitor itself in a detached background thread. The worker process itself runs normally, and the thread writes to a log every few seconds. If the worker terminates unexpectedly, 'autometric' can read and visualize the log file to reveal potential resource-related reasons for the crash. The 'autometric' package borrows heavily from the methods of packages 'ps' <doi:10.32614/CRAN.package.ps> and 'psutil'. |
Authors: | William Michael Landau [aut, cre] , Eli Lilly and Company [cph, fnd], Posit Software, PBC [cph] (For the 'ps' package. See LICENSE.note.), Jay Loden [cph] (For the 'psutil' package. See LICENSE.note.), Dave Daeschler [cph] (For the 'psutil' package. See LICENSE.note.), Giampaolo Rodola [cph] (For the 'psutil' package. See LICENSE.note.) |
Maintainer: | William Michael Landau <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2.9000 |
Built: | 2024-11-14 23:20:51 UTC |
Source: | https://github.com/wlandau/autometric |
Check if the log is running.
log_active()
log_active()
TRUE
if a background thread is actively writing to the log,
FALSE
otherwise. The result is based on a static C variable,
so the information is second-hand.
path <- tempfile() log_active() log_start(seconds = 0.5, path = path) log_active() Sys.sleep(2) log_stop() Sys.sleep(2) log_active() unlink(path)
path <- tempfile() log_active() log_start(seconds = 0.5, path = path) log_active() Sys.sleep(2) log_stop() Sys.sleep(2) log_active() unlink(path)
Get the current log phase.
log_phase_get()
log_phase_get()
Character string with the name of the current log phase.
path <- tempfile() log_phase_get() log_print(path = path) log_phase_set("different") log_phase_get() log_print(path = path) log_phase_reset() log_phase_get() log_read(path)
path <- tempfile() log_phase_get() log_print(path = path) log_phase_set("different") log_phase_get() log_print(path = path) log_phase_reset() log_phase_get() log_read(path)
Reset the current log phase to the default value.
log_phase_reset()
log_phase_reset()
NULL
(invisibly). Called for its side effects.
path <- tempfile() log_phase_get() log_print(path = path) log_phase_set("different") log_phase_get() log_print(path = path) log_phase_reset() log_phase_get() log_read(path)
path <- tempfile() log_phase_get() log_print(path = path) log_phase_set("different") log_phase_get() log_print(path = path) log_phase_reset() log_phase_get() log_read(path)
Set the current log phase.
log_phase_set(phase)
log_phase_set(phase)
phase |
Character string with the phase of the log.
Only the first 255 characters are used.
Cannot include the pipe character |
NULL
(invisibly). Called for its side effects.
path <- tempfile() log_phase_get() log_print(path = path) log_phase_set("different") log_phase_get() log_print(path = path) log_phase_reset() log_phase_get() log_read(path)
path <- tempfile() log_phase_get() log_print(path = path) log_phase_set("different") log_phase_get() log_print(path = path) log_phase_reset() log_phase_get() log_read(path)
Visualize a metric of a log over time for a single process ID in a single log file.
log_plot(log, pid = NULL, name = NULL, phase = NULL, metric = "resident", ...)
log_plot(log, pid = NULL, name = NULL, phase = NULL, metric = "resident", ...)
log |
Data frame returned by |
pid |
Either |
name |
Either |
phase |
Either |
metric |
Character string with the name of a metric to plot
against time. Must be only a single string.
Defaults to the resident set size (RSS), the total amount of memory
used by the process.
See |
... |
Named optional parameters to pass to the base
function |
A base plot of a metric of a log over time.
path <- tempfile() log_start(seconds = 0.25, path = path) Sys.sleep(1) log_stop() Sys.sleep(2) log <- log_read(path) log_plot(log, metric = "cpu") unlink(path)
path <- tempfile() log_start(seconds = 0.25, path = path) Sys.sleep(1) log_stop() Sys.sleep(2) log <- log_read(path) log_plot(log, metric = "cpu") unlink(path)
Sample CPU load metrics and
print a single line to the log for each process in pids
.
Used for debugging and testing only. Not for users.
log_print( path, seconds = 1, pids = c(local = Sys.getpid()), error = getOption("autometric_error", TRUE) )
log_print( path, seconds = 1, pids = c(local = Sys.getpid()), error = getOption("autometric_error", TRUE) )
path |
Character string, path to a file to log resource usage.
On Windows, the path must point to a physical file on disk.
On Unix-like systems, Standard output is the most convenient option for high-performance computing scenarios where worker processes already write to log files. Such workers usually already redirect standard output to a physical file, as with a cluster like SLURM, or capture messages with Amazon CloudWatch. Normally, standard output and standard error are discouraged because
of how they interact with the R API and R console. However, the
exported user interface of |
seconds |
Positive number, number of seconds to sample CPU load metrics before printing to the log. This number should be at least 1, usually more. A low number of seconds could burden the operating system and generate large log files. |
pids |
Nonempty vector of non-negative integers
of process IDs to monitor. NOTE: On Mac OS, only the currently running
process can be monitored.
This is due to security restrictions around certain system calls, c.f.
https://os-tres.net/blog/2010/02/17/mac-os-x-and-task-for-pid-mach-call/. # nolint
If the |
error |
|
NULL
(invisibly). Called for its side effects.
path = tempfile() log_print(path = path) log_read(path) unlink(path)
path = tempfile() log_print(path = path) log_read(path) unlink(path)
Read a log file into R.
log_read( path, units_cpu = c("percentage", "fraction"), units_memory = c("megabytes", "bytes", "kilobytes", "gigabytes"), units_time = c("seconds", "minutes", "hours", "days"), hidden = FALSE )
log_read( path, units_cpu = c("percentage", "fraction"), units_memory = c("megabytes", "bytes", "kilobytes", "gigabytes"), units_time = c("seconds", "minutes", "hours", "days"), hidden = FALSE )
path |
Character vector of paths to files and/or directories of logs to read. |
units_cpu |
Character string with the units of the |
units_memory |
Character string with the units of the
|
units_time |
Character string, units of the |
|
log_read()
is capable of reading a log file where both
autometric
and other processes have printed. Whenever autometric
writes to a log, it bounds the beginning and end of the text
with the keyword "__AUTOMETRIC__"
.
that way, log_read()
knows to only read and process the correct
lines of the file.
In addition, it automatically converts the log data
into the units units_time
,
units_cpu
, and units_memory
arguments.
A data frame of metrics from the log with one row per log entry
and columns with metadata and resource usage metrics.
log_read()
automatically converts the data into the units
chosen with arguments units_time
, units_cpu
, and units_memory
.
The returned data frame has the following columns:
version
: Version of the package used to write the log entry.
pid
: Process ID monitored.
status
: A status code for the log entry. Status 0 means
logging succeeded. A status code not equal to 0 indicates
something went wrong and the metrics should not be trusted.
time
: numeric time stamp at which the entry was logged.
log_read()
automatically recenters this column so that time 0
indicates the first logged entry.
Use the units_time
argument to customize the units of this field.
core
: CPU load of the process scaled relative to a single
CPU core. Measures the amount of time the process spends running
during a given interval of elapsed time.
On Mac OS, the package uses native system calls to get CPU core usage.
On Linux and Windows, the package calculates it manually using.
user + kernel clock cycles that ran during a sampling interval.
It measures the clock cycles that the process executed during
the interval, converts the clock cycles into seconds,
then divides the result by the elapsed time of the interval.
The length of the sampling interval is the seconds
argument
supplied to log_start()
, or length of time between
calls to log_print()
.
The first core
measurement is 0 to reflect that a full sampling
interval has not elapsed yet.
core
can be read in as a percentage or fraction, depending on
the units_cpu
argument.
cpu
: core
divided by the number of logical CPU cores.
This metric measures the load on the machine as a whole,
not just the CPU core it runs on.
Use the units_cpu
argument to customize the units of this field.
rss
: resident set size, the total amount of memory used by the
process at the time of logging. This include the memory unique
to the process (unique set size USS) and shared memory.
Use the units_memory
argument to customize the units of this field.
virtual
: total virtual memory available to the process.
The process does not necessarily use all this memory, but
it can request more virtual memory throughout its life cycle.
Use the units_memory
argument to customize the units of this field.
path <- tempfile() log_start(seconds = 0.5, path = path) Sys.sleep(2) log_stop() Sys.sleep(2) log_read(path) unlink(path)
path <- tempfile() log_start(seconds = 0.5, path = path) Sys.sleep(2) log_stop() Sys.sleep(2) log_read(path) unlink(path)
Start a background thread that periodically writes
system usage metrics of the current R process to a log file.
See log_read()
for explanations of the specific metrics.
log_start( path, seconds = 1, pids = c(local = Sys.getpid()), error = getOption("autometric_error", TRUE) )
log_start( path, seconds = 1, pids = c(local = Sys.getpid()), error = getOption("autometric_error", TRUE) )
path |
Character string, path to a file to log resource usage.
On Windows, the path must point to a physical file on disk.
On Unix-like systems, Standard output is the most convenient option for high-performance computing scenarios where worker processes already write to log files. Such workers usually already redirect standard output to a physical file, as with a cluster like SLURM, or capture messages with Amazon CloudWatch. Normally, standard output and standard error are discouraged because
of how they interact with the R API and R console. However, the
exported user interface of |
seconds |
Positive number, number of seconds between writes to the
log file. This number should be noticeably large, anywhere between
half a second to several seconds or minutes.
A low number of seconds could burden the operating system
and generate large log files. Because of the way CPU usage measurements
work, the first log entry starts only after after the first interval of
|
pids |
Nonempty vector of non-negative integers
of process IDs to monitor. NOTE: On Mac OS, only the currently running
process can be monitored.
This is due to security restrictions around certain system calls, c.f.
https://os-tres.net/blog/2010/02/17/mac-os-x-and-task-for-pid-mach-call/. # nolint
If the |
error |
|
Only one thread can run at a time. If the thread is already
running, then log_start()
does not start an additional one.
Before creating a new thread, call log_stop()
to terminate
the first one.
NULL
(invisibly). Called for its side effects.
path <- tempfile() log_start(seconds = 0.5, path = path) Sys.sleep(2) log_stop() Sys.sleep(2) log_read(path) unlink(path)
path <- tempfile() log_start(seconds = 0.5, path = path) Sys.sleep(2) log_stop() Sys.sleep(2) log_read(path) unlink(path)
Stop the background thread that periodically writes system usage metrics of the current R process to a log file.
log_stop()
log_stop()
The background thread is detached, so is there no way to
directly terminate it (other than terminating the main thread,
i.e. restarting R). log_stop()
merely signals to the thread
using a static C variable protected by a mutex. It may take
time for the thread to notice, depending on the value of seconds
you supplied to log_start()
. For this reason, you may see one or two
lines in the log even after you call log_stop()
.
NULL
(invisibly). Called for its side effects.
path <- tempfile() log_start(seconds = 0.5, path = path) Sys.sleep(2) log_stop() log_read(path) unlink(path)
path <- tempfile() log_start(seconds = 0.5, path = path) Sys.sleep(2) log_stop() log_read(path) unlink(path)
Check if your system supports background logging.
log_support()
log_support()
The background logging functionality requires a Linux, Mac,
or Windows computer, It also requires POSIX thread support
and the nanosleep()
C function.
TRUE
if your system supports background logging, FALSE
otherwise.
log_support()
log_support()