---
title: "Conventions for MLModels Implementation"
author: "Brian J Smith"
date: "2021-07-23"
output: rmarkdown::html_vignette
bibliography: bibliography.bib
vignette: >
%\VignetteIndexEntry{Conventions for MLModels Implementation}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Model Constructor Components
- `MLModel` is a function supplied by the **MachineShop** package. It allows for the integration of statistical and machine learning models supplied by other R packages with the **MachineShop** model fitting, prediction, and performance assessment tools.
- The following are guidelines for writing model constructor functions that are wrappers around the `MLModel` function.
- In this context, the term "constructor" refers to the wrapper function and "source package" to the package supplying the original model implementation.
### Constructor Arguments
- The constructor should produce a valid model if called without any arguments; i.e., not have any required arguments.
- The source package defaults will be used for parameters with `NULL` values.
- Model formula, data, and weights are separate from model parameters and should not be defined as constructor arguments.
### name Slot
- Use the same name as the constructor.
### packages Slot
- Include all external packages whose functions are called directly from within the constructor.
- Use :: to reference source package functions.
### response_types Slot
- Include all response variable types (`"binary"`, `"factor"`, `"matrix"`, `"numeric"`, `"ordered"`, and/or `"Surv"`) that can be analyzed with the model.
### weights Slot
- Logical indicating whether the model supports case weights.
### params Slot
- List of parameter values set by the constructor, typically obtained internally with `new_params(environment())` if all arguments are to be passed to the source package fit function as supplied. Additional steps may be needed to pass the constructor arguments to the source package in a different format; e.g., when some model parameters must be passed in a control structure, as in `C50Model` and `CForestModel`.
### fit Function
- The first three arguments should be `formula`, `data`, and `weights` followed by an ellipsis (`...`).
- If weights are not supported, the following, or equivalent, should be included in the function:
```{r eval = FALSE}
if(!all(weights == 1)) warning("weights are not supported and will be ignored")
```
- Only add elements to the resulting fit object if they are needed and will be used in the `predict` or `varimp` functions.
- Return the fit object.
### predict Function
- The arguments are a model fit `object`, `newdata` frame, optionally `times` for prediction at survival time points, and an ellipsis.
- The predict function should return a vector or column matrix of probabilities for the second level of binary factors, a matrix whose columns contain the probabilities for factors with more than two levels, a matrix of predicted responses if matrix, a vector or column matrix of predicted responses if numeric, a matrix whose columns contain survival probabilities at `times` if supplied, or a vector of predicted survival means if `times` are not supplied.
### varimp Function
- Should have a single model fit `object` argument followed by an ellipsis.
- Variable importance results should generally be returned as a vector with elements named after the corresponding predictor variables. The package will handle conversions to a data frame and `VariableImportance` object. If there is more than one set of relevant variable importance measures, they can be returned as a matrix or data frame with predictor variable names as the row names.
## Documenting an MLModel
### Model Parameters
- Include the first sentences from the source package.
- Start sentences with the parameter value type (logical, numeric, character, etc.).
- Start sentences with lowercase.
- Omit indefinite articles (a, an, etc.) from the starting sentences.
### Details Section
- Include response types (binary, factor, matrix, numeric, ordered, and/or Surv).
- Include the following sentence:
> Default values for the \code{NULL} arguments and further model details can be
> found in the source link below.
### Return (Value) Section
- Include the following sentence:
> MLModel class object.
### See Also Section
- Include a link to the source package function and the other method functions shown below.
```
\code{\link[