| Title: | Imputation with Deep Learning Methods |
|---|---|
| Description: | Imputation of mixed-type and compositional data with neural networks. The architecture (number and size of hidden layers, dropout, activation, optimiser) is user-configurable. See Templ (2021) <doi:10.1007/978-3-030-71175-7>. |
| Authors: | Matthias Templ [aut, cre] (ORCID: <https://orcid.org/0000-0002-8638-5276>) |
| Maintainer: | Matthias Templ <[email protected]> |
| License: | GPL-2 |
| Version: | 1.1.0 |
| Built: | 2026-06-11 10:29:41 UTC |
| Source: | https://github.com/cran/deepImp |
Concentrations of 16 volatile compounds measured in beer samples, together with an indicator distinguishing fresh from aged beer. The volatile compounds are recognised markers of beer flavour and ageing (e.g. furfural, 5-hydroxymethylfurfural, hexanal, methional-related aldehydes). The data are used to illustrate imputation of mixed continuous measurements.
data(beer)data(beer)
A data frame with 86 rows and 17 variables:
3-methylbutanal concentration.
3-methylbutanone concentration.
2-methylbutanal concentration.
hexanal concentration.
2-furanmethanol (furfuryl alcohol) concentration.
heptanal concentration.
2-acetylfuran concentration.
5-methyl-2-furaldehyde concentration.
furanoic acid ethyl ester concentration.
2-acetyl-5-methylfuran concentration.
2-phenylethanal (phenylacetaldehyde) concentration.
nicotinic acid ethyl ester concentration.
2-phenylethyl acetate concentration.
gamma-nonalactone concentration.
furfural concentration.
5-hydroxymethylfurfural (HMF) concentration.
indicator of beer condition (fresh vs. aged).
A backend-neutral description of the multilayer perceptron used by impNNet().
The same object is translated into a torch (and, later, keras) model.
deepimp_arch( hidden = c(256, 128, 64), dropout = 0.1, activation = "relu", batchnorm = TRUE, optimizer = "adam", learning_rate = 0.001 ) deepimp_arch_small( dropout = 0.1, activation = "relu", batchnorm = TRUE, optimizer = "adam", learning_rate = 0.001 )deepimp_arch( hidden = c(256, 128, 64), dropout = 0.1, activation = "relu", batchnorm = TRUE, optimizer = "adam", learning_rate = 0.001 ) deepimp_arch_small( dropout = 0.1, activation = "relu", batchnorm = TRUE, optimizer = "adam", learning_rate = 0.001 )
|
integer vector of hidden-layer widths; its length is the depth. |
|
dropout |
dropout rate applied after every hidden layer; a scalar in |
activation |
hidden-layer activation: one of |
batchnorm |
logical; apply batch normalisation after each hidden linear layer. |
optimizer |
one of |
learning_rate |
positive learning rate for the optimiser. |
an object of class "deepimp_arch".
deepimp_arch(hidden = c(128, 64), dropout = 0.2)deepimp_arch(hidden = c(128, 64), dropout = 0.2)
Extract imputed data from a deepimp object
getImputed(object, m = 1L, ...)getImputed(object, m = 1L, ...)
object |
a |
m |
which completed dataset to return: an integer index, or |
... |
unused. |
a data.frame, or a list of data.frames when m = "all".
Classifies each column as "numeric", "mixed" (semi-continuous, i.e. a spike
of repeated values such as zeros plus a continuous part), "binary",
"nominal", or "count". Used by impNNet() to choose the model head, and by
summary() of a "deepimp" object.
guessType(x)guessType(x)
x |
a data.frame, data.table, or tibble. |
a list with indices (a type-by-variable logical matrix) and type
(a character vector, one entry per column).
Matthias Templ
data(sleep, package = "VIM") guessType(sleep)data(sleep, package = "VIM") guessType(sleep)
Iterative, chained imputation of numeric, count, semi-continuous, binary and nominal variables with a configurable multilayer perceptron (torch backend).
impNNet( data, arch = NULL, m = 1L, backend = c("torch", "keras"), vartypes = "guess", initialize = "knn", iterations = 3L, eps = 0.01, normalize = TRUE, epochs = 400L, patience = 40L, validation_split = 0.2, batch_size = 32L, seed = NULL, verbose = FALSE, ... )impNNet( data, arch = NULL, m = 1L, backend = c("torch", "keras"), vartypes = "guess", initialize = "knn", iterations = 3L, eps = 0.01, normalize = TRUE, epochs = 400L, patience = 40L, validation_split = 0.2, batch_size = 32L, seed = NULL, verbose = FALSE, ... )
data |
a data.frame (tibbles/data.tables are coerced). |
arch |
a |
m |
number of stochastic replicate completions. NOTE: with |
backend |
|
vartypes |
|
initialize |
starting values for NAs: |
iterations |
maximum number of chained sweeps. |
eps |
convergence tolerance on the standardised mean change. |
normalize |
standardise numeric predictors. |
epochs, patience, validation_split, batch_size
|
training controls. |
seed |
optional integer seed (sets the R and backend RNGs). Exact
reproducibility is guaranteed when |
verbose |
print progress. |
... |
architecture scalar overrides forwarded to |
a "deepimp" object; read the data with getImputed().
Matthias Templ
deepimp_arch(), getImputed(), guessType()
if (requireNamespace("torch", quietly = TRUE) && torch::torch_is_installed()) { data(sleep, package = "VIM") imp <- impNNet(sleep, arch = deepimp_arch_small(), epochs = 5, seed = 1) head(getImputed(imp)) }if (requireNamespace("torch", quietly = TRUE) && torch::torch_is_installed()) { data(sleep, package = "VIM") imp <- impNNet(sleep, arch = deepimp_arch_small(), epochs = 5, seed = 1) head(getImputed(imp)) }
Imputes rounded zeros (values below a detection limit) in compositional data with a configurable neural network, following Templ (2021). Each part with rounded zeros is pivoted to the front, transformed to pivot log-ratio coordinates, regressed on the remaining coordinates, censored at the detection limit, and back-transformed with preservation of the observed absolute values.
impNNetCoDa( x, dl = NULL, label = 0, coda = TRUE, correction = c("truncate", "expectation", "none"), initialize = "kNNa", arch = NULL, m = 1L, backend = c("torch", "keras"), iterations = 2L, eps = 0.01, normalize = TRUE, epochs = 400L, patience = 40L, validation_split = 0.2, batch_size = 32L, seed = NULL, verbose = FALSE, ... )impNNetCoDa( x, dl = NULL, label = 0, coda = TRUE, correction = c("truncate", "expectation", "none"), initialize = "kNNa", arch = NULL, m = 1L, backend = c("torch", "keras"), iterations = 2L, eps = 0.01, normalize = TRUE, epochs = 400L, patience = 40L, validation_split = 0.2, batch_size = 32L, seed = NULL, verbose = FALSE, ... )
x |
a data.frame of compositional parts (positive values; rounded zeros
marked by |
dl |
detection limits, a numeric vector of length |
label |
the value marking a rounded zero in |
coda |
if |
correction |
censoring of imputed values: |
initialize |
starting values for the rounded zeros: |
arch |
a |
m |
number of stochastic replicate completions (not valid multiple
imputation; see |
backend |
|
iterations |
maximum number of chained sweeps. |
eps |
convergence tolerance on the standardised mean change. |
normalize |
standardise numeric predictors. |
epochs, patience, validation_split, batch_size
|
training controls. |
seed |
optional integer seed (sets the R and backend RNGs). Exact
reproducibility is guaranteed when |
verbose |
print progress. |
... |
architecture scalar overrides forwarded to |
a "deepimp" object; read the data with getImputed().
Matthias Templ
Templ, M. (2021) Imputation of rounded zeros for high-dimensional compositional data.
impNNet(), deepimp_arch(), getImputed()
if (requireNamespace("torch", quietly = TRUE) && torch::torch_is_installed()) { set.seed(1) x <- data.frame(a = runif(50, 5, 10), b = runif(50, 5, 10), c = runif(50, 5, 10)) x$a[1:5] <- 0 imp <- impNNetCoDa(x, dl = c(1, 1, 1), label = 0, arch = deepimp_arch_small(), epochs = 5, seed = 1) head(getImputed(imp)) }if (requireNamespace("torch", quietly = TRUE) && torch::torch_is_installed()) { set.seed(1) x <- data.frame(a = runif(50, 5, 10), b = runif(50, 5, 10), c = runif(50, 5, 10)) x$a[1:5] <- 0 imp <- impNNetCoDa(x, dl = c(1, 1, 1), label = 0, arch = deepimp_arch_small(), epochs = 5, seed = 1) head(getImputed(imp)) }
Low-level constructor for the object returned by impNNet(). Most users do
not call this directly; use getImputed() to read the completed data.
new_deepimp(data, imputed, arch, info = list())new_deepimp(data, imputed, arch, info = list())
data |
data.frame with missing values (original input). |
imputed |
list of completed data.frames (one per imputation). |
arch |
a |
info |
list with training metadata (backend, vartypes, convergence, ...). |
an object of class "deepimp".