R packages by wlandau

targets - Dynamic Function-Oriented 'Make'-Like Declarative Pipelines

Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).

Last updated 5 hours ago

data-sciencehigh-performance-computingmakepeer-reviewedpipeliner-targetopiareproducibilityreproducible-researchtargetsworkflow

15.18 score 979 stars 22 dependents 4.6k scripts 12k downloads

drake - A Pipeline Toolkit for Reproducible Computation at Scale

A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.

Last updated 4 months ago

data-sciencedrakehigh-performance-computingmakefilepeer-reviewedpipelinereproducibilityreproducible-researchropensciworkflow

11.49 score 1.3k stars 1 dependents 1.7k scripts 2.2k downloads

tarchetypes - Archetypes for Targets

Function-oriented Make-like declarative pipelines for Statistics and data science are supported in the 'targets' R package. As an extension to 'targets', the 'tarchetypes' package provides convenient user-side functions to make 'targets' easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible pipelines concisely and compactly. The methods in this package were influenced by the 'targets' R package. by Will Landau (2018) <doi:10.21105/joss.00550>.

Last updated 6 days ago

data-sciencehigh-performance-computingpeer-reviewedpipeliner-targetopiareproducibilitytargetsworkflow

11.27 score 142 stars 10 dependents 1.7k scripts 2.9k downloads

crew - A Distributed Worker Launcher Framework

In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The 'NNG'-powered 'mirai' R package by Gao (2023) <doi:10.5281/zenodo.7912722> is a sleek and sophisticated scheduler that efficiently processes these intense workloads. The 'crew' package extends 'mirai' with a unifying interface for third-party worker launchers. Inspiration also comes from packages. 'future' by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, 'rrq' by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, 'clustermq' by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and 'batchtools' by Lang, Bischel, and Surmann (2017) <doi:10.21105/joss.00135>.

Last updated 8 hours ago

high-performance-computing

11.15 score 137 stars 2 dependents 243 scripts 2.0k downloads

mirai - Minimalist Async Evaluation Framework for R

Designed for simplicity, a 'mirai' evaluates an R expression asynchronously in a parallel process, locally or distributed over the network. The result is automatically available upon completion. Modern networking and concurrency, built on 'nanonext' and 'NNG' (Nanomsg Next Gen), ensures reliable and efficient scheduling over fast inter-process communications or TCP/IP secured by TLS. Distributed computing can launch remote resources via SSH or cluster managers. An inherently queued architecture handles many more tasks than available processes, and requires no storage on the file system. Innovative features include support for otherwise non-exportable reference objects, event-driven promises, and asynchronous parallel map.

Last updated 1 days ago

asyncasynchronous-tasksconcurrencydistributed-computinghigh-performance-computingparallel-computing

10.90 score 221 stars 7 dependents 130 scripts 4.0k downloads

brms.mmrm - Bayesian MMRMs using 'brms'

The mixed model for repeated measures (MMRM) is a popular model for longitudinal clinical trial data with continuous endpoints, and 'brms' is a powerful and versatile package for fitting Bayesian regression models. The 'brms.mmrm' R package leverages 'brms' to run MMRMs, and it supports a simplified interfaced to reduce difficulty and align with the best practices of the life sciences. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>, Mallinckrodt (2008) <doi:10.1177/009286150804200402>.

Last updated 6 months ago

brmslife-sciencesmc-stanmmrmstanstatistics

8.80 score 21 stars 13 scripts 623 downloads

nanonext - NNG (Nanomsg Next Gen) Lightweight Messaging Library

R binding for NNG (Nanomsg Next Gen), a successor to ZeroMQ. NNG is a socket library for reliable, high-performance messaging over in-process, IPC, TCP, WebSocket and secure TLS transports. Implements 'Scalability Protocols', a standard for common communications patterns including publish/subscribe, request/reply and service discovery. As its own threaded concurrency framework, provides a toolkit for asynchronous programming and distributed computing. Intuitive 'aio' objects resolve automatically when asynchronous operations complete, and synchronisation primitives allow R to wait upon events signalled by concurrent threads.

Last updated 5 hours ago

concurrencyhttpsipc-messagemessaging-librarynngrpcsocket-communicationsynchronization-primitivestcp-protocolwebsocketmbedtls

8.79 score 60 stars 9 dependents 28 scripts 5.8k downloads

jagstargets - Targets for JAGS Pipelines

Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the 'jagstargets' R package is leverages 'targets' and 'R2jags' to ease this burden. 'jagstargets' makes it super easy to set up scalable JAGS pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than 'targets' alone. For the underlying methodology, please refer to the documentation of 'targets' <doi:10.21105/joss.02959> and 'JAGS' (Plummer 2003) <https://www.r-project.org/conferences/DSC-2003/Proceedings/Plummer.pdf>.

Last updated 4 months ago

bayesianhigh-performance-computingjagsmaker-targetopiareproducibilityrjagsstatisticstargetscpp

6.95 score 10 stars 32 scripts 631 downloads

stantargets - Targets for Stan Workflows

Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the 'stantargets' R package leverages 'targets' and 'cmdstanr' to ease these burdens. 'stantargets' makes it super easy to set up scalable Stan pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than 'targets' alone. 'stantargets' can access all of 'cmdstanr''s major algorithms (MCMC, variational Bayes, and optimization) and it supports both single-fit workflows and multi-rep simulation studies. For the statistical methodology, please refer to 'Stan' documentation (Stan Development Team 2020) <https://mc-stan.org/>.

Last updated 2 months ago

bayesianhigh-performance-computingmaker-targetopiareproducibilitystanstatisticstargets

6.85 score 49 stars 180 scripts

crew.cluster - Crew Launcher Plugins for Traditional High-Performance Computing Clusters

In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The 'crew.cluster' package extends the 'mirai'-powered 'crew' package with worker launcher plugins for traditional high-performance computing systems. Inspiration also comes from packages 'mirai' by Gao (2023) <https://github.com/shikokuchuo/mirai>, 'future' by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, 'rrq' by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, 'clustermq' by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and 'batchtools' by Lang, Bischl, and Surmann (2017). <doi:10.21105/joss.00135>.

Last updated 7 days ago

crewhigh-performance-computing

6.80 score 29 stars 68 scripts 745 downloads

proffer - Profile R Code and Visualize with 'Pprof'

Like similar profiling tools, the 'proffer' package automatically detects sources of slowness in R code. The distinguishing feature of 'proffer' is its utilization of 'pprof', which supplies interactive visualizations that are efficient and easy to interpret. Behind the scenes, the 'profile' package converts native Rprof() data to a protocol buffer that 'pprof' understands. For the documentation of 'proffer', visit <https://r-prof.github.io/proffer/>. To learn about the implementations and methodologies of 'pprof', 'profile', and protocol buffers, visit <https://github.com/google/pprof>. <https://protobuf.dev>, and <https://github.com/r-prof/profile>, respectively.

Last updated 5 months ago

6.40 score 88 stars 48 scripts 644 downloads

gittargets - Data Version Control for the Targets Package

In computationally demanding data analysis pipelines, the 'targets' R package (2021, <doi:10.21105/joss.02959>) maintains an up-to-date set of results while skipping tasks that do not need to rerun. This process increases speed and increases trust in the final end product. However, it also overwrites old output with new output, and past results disappear by default. To preserve historical output, the 'gittargets' package captures version-controlled snapshots of the data store, and each snapshot links to the underlying commit of the source code. That way, when the user rolls back the code to a previous branch or commit, 'gittargets' can recover the data contemporaneous with that commit so that all targets remain up to date.

Last updated 9 months ago

data-sciencedata-version-controldata-versioningreproducibilityreproducible-researchtargetsworkflow

5.99 score 88 stars 11 scripts 763 downloads

crew.aws.batch - A Crew Launcher Plugin for AWS Batch

In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The 'crew.aws.batch' package extends the 'mirai'-powered 'crew' package with a worker launcher plugin for AWS Batch. Inspiration also comes from packages 'mirai' by Gao (2023) <https://github.com/shikokuchuo/mirai>, 'future' by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, 'rrq' by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, 'clustermq' by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and 'batchtools' by Lang, Bischl, and Surmann (2017). <doi:10.21105/joss.00135>.

Last updated 7 days ago

aws-batchcrewhigh-performance-computing

5.02 score 15 stars 6 scripts 599 downloads

autometric - Background Resource Logging

Intense parallel workloads can be difficult to monitor. Packages 'crew.cluster', 'clustermq', and 'future.batchtools' distribute hundreds of worker processes over multiple computers. If a worker process exhausts its available memory, it may terminate silently, leaving the underlying problem difficult to detect or troubleshoot. Using the 'autometric' package, a worker can proactively monitor itself in a detached background thread. The worker process itself runs normally, and the thread writes to a log every few seconds. If the worker terminates unexpectedly, 'autometric' can read and visualize the log file to reveal potential resource-related reasons for the crash. The 'autometric' package borrows heavily from the methods of packages 'ps' <doi:10.32614/CRAN.package.ps> and 'psutil'.

Last updated 5 months ago

4.38 score 7 stars 9 scripts 1.1k downloads

multiverse.internals - Internal Infrastructure for R-multiverse

R-multiverse requires this internal infrastructure package to automate contribution reviews and populate universes.

Last updated 19 days ago

3.30 score 1 stars 1 scripts

multitools - Tools for Contributing Packages to R-multiverse

'R-multiverse' is a community-curated collection of R package releases, powered by 'R-universe'. The 'multitools' package has tools for maintainers of packages in 'R-multiverse'.

Last updated 10 months ago

2.65 score 3 stars