CDR Configuration Files

The CDR utilities in this module read config files that follow the INI standard. Basic information about INI file syntax can be found e.g. here. This reference assumes familiarity with the INI protocol.

CDR configuration files contain the sections and fields described below.

Section: `[data]`

The [data] section supports the following fields:

REQUIRED

X_train: str; Path to training data (predictor matrix)
y_train: str; Path to training data (response matrix)
series_ids: space-delimited list of str; Names of columns used to define unique time series

Note that, unlike e.g. linear models, CDR does not require synchronous predictors and responses, which is why separate data objects must be provided for each of these components. If the predictors and responses are synchronous, this is fine. The X_train and y_train fields can point to the same file. The system will treat each unique combination of values in the columns given in series_ids as constituting a unique time series.

OPTIONAL

X_dev: str; Path to dev data (predictor matrix)
y_dev: str; Path to dev data (response matrix)
X_test: str; Path to test data (predictor matrix)
y_test: str; Path to test data (response matrix)
history_length: int; Length of history window in timesteps (default: 128)
filters: str; List of filters to apply to response data (;-delimited).

All variables used in a filter must be contained in the data files indicated by the y_* parameters in the [data] section of the config file. The variable name is specified as an INI field, and the condition is specified as its value. Supported logical operators are <, <=, >, >=, ==, and !=. For example, to keep only data points for which column foo is less or equal to 100, the following filter can be added:

filters = foo <= 100

To keep only data points for which the column foo does not equal bar, the following filter can be added:

filters = foo != bar

Filters can be conjunctively combined:

filters = foo > 5; foo <= 100

Count-based filters are also supported, using the designated nunique suffix. For example, if the column subject is being used to define a random effects level and we want to exclude all subjects with fewer than 100 data points, this can be accomplished using the following filter:

filters = subjectsnunique > 100

More complex filtration conditions are not supported automatically in CDR but can be applied to the data by the user as a preprocess.

Several CDR utilities (e.g. for prediction and evaluation) are designed to handle train, dev, and test partitions of the input data, but these partitions must be constructed in advance. This package also provides a partition utility that can be used to partition input data by applying modular arithmetic to some subset of the variables in the data. For usage details run:

python -m cdr.bin.partition -h

IMPORTANT NOTES

The files indicated in X_* must contain the following columns:
- time: Timestamp associated with each observation
- A column for each variable in series_ids
- A column for each predictor variable indicated in the model formula
The file in y_* must contain the following columns:
- time: Timestamp associated with each observation
- A column for the response variable in the model formula
- A column for each variable in series_ids
- A column for each random grouping factor in the in the model formula
- A column for each variable used for data filtration (see below)
Data in y_* may be filtered/partitioned, but data in X_* must be uncensored unless independent reason exists to assume that certain observations never have an impact on the response.

Section: `[global_settings]`

The [global_settings] section supports the following fields:

outdir: str; Path to output directory where checkpoints, plots, and Tensorboard logs should be saved (default: ./cdr_model/). If it does not exist, this directory will be created. At runtime, the train utility will copy the config file to this directory as config.ini, serving as a record of the settings used to generate the analysis.
use_gpu_if_available: bool; If available, run on GPU. If False, always runs on CPU even when system has compatible GPU.

Section: `[cdr_settings]`

The [cdr_settings] section supports the following fields:

All Models

outdir: str; Path to output directory, where logs and model parameters are saved. Default: ./cdr_model/
use_distributional_regression: bool; Whether to model all parameters of the response distribution as dependent on IRFs of the impulses (distributional regression). If False, only the mean depends on the predictors (other parameters of the response distribution are treated as constant). Default (CDR): False; Default (CDRNN): True
response_distribution_map: str or None; Definition of response distribution. Can be a single distribution name (shared across all response variables), a space-delimited list of distribution names (one per response variable), a space-delimited list of ‘;’-delimited tuples matching response variables to distribution names (e.g. response;Bernoulli), or None, in which case the response distribution will be inferred as JohnsonSU for continuous variables and Categorical for categorical variables. Default: None
center_inputs: bool; DISCOURAGED UNLESS YOU HAVE A GOOD REASON, since this can distort rate estimates. Center inputs by subtracting training set means. Can improve convergence speed and reduce vulnerability to local optima. Only affects fitting – prediction, likelihood computation, and plotting are reported on the source values. Default: False
rescale_inputs: bool; Rescale input features by dividing by training set standard deviation. Can improve convergence speed and reduce vulnerability to local optima. Only affects fitting – prediction, likelihood computation, and plotting are reported on the source values. Default: True
history_length: int; Length of the history (backward) window (in timesteps). Default: 128
future_length: int; Length of the future (forward) window (in timesteps). Note that causal IRF kernels cannot be used if future_length > 0. Default: 0
t_delta_cutoff: float or None; Maximum distance in time to consider (can help improve training stability on data with large gaps in time). If 0 or None, no cutoff. Default: None
constraint: str; Constraint function to use for bounded variables. One of ['abs', 'square', 'softplus']. Default: softplus
random_variables: str; Space-delimited list of model components to instantiate as (variationally optimized) random variables, rather than point estimates. Can be any combination of: [‘intercept’, ‘coefficient’, ‘interaction’, ‘irf_param’, ‘nn’]. Can also be ‘all’, ‘none’, or ‘default’, which defaults to all components except ‘nn’. Default: default
scale_loss_with_data: bool; Whether to multiply the scale of the LL loss by N, where N is num batches. This turns the loss into an expectation over training set likelihood. Default: True
scale_regularizer_with_data: bool; Whether to multiply the scale of all weight regularization by B * N, where B is batch size and N is num batches. If scale_loss_with_data is true, this approach ensures a stable regularization strength (relative to the loss) across datasets and batch sizes. Default: False
n_iter: int; Number of training iterations. If using variational inference, this becomes the expected number of training iterations and is used only for Tensorboard logging, with no impact on training behavior. Default: 100000
minibatch_size: int or None; Size of minibatches to use for fitting (full-batch if None). Default: 1024
eval_minibatch_size: int; Size of minibatches to use for prediction/evaluation. Default: 1024
n_samples_eval: int; Number of posterior predictive samples to draw for prediction/evaluation. Ignored for evaluating CDR MLE models. Default: 1000
optim_name: str or None; Name of the optimizer to use. Must be one of:
- 'SGD'
- 'Momentum'
- 'AdaGrad'
- 'AdaDelta'
- 'Adam'
- 'FTRL'
- 'RMSProp'
- 'Nadam' Default: Adam
max_gradient: float or None; Maximum allowable value for the gradient, which will be clipped as needed. If None, no max gradient. Default: None
max_global_gradient_norm: float or None; Maximum allowable value for the global norm of the gradient, which will be clipped as needed. If None, no max global norm for the gradient. Default: 1.0
use_safe_optimizer: bool; Stabilize training by preventing the optimizer from applying updates involving NaN gradients (affected weights will remain unchanged after the update). Incurs slight additional computational overhead and can lead to bias in the training process. Default: False
epsilon: float; Epsilon parameter to use for numerical stability in bounded parameter estimation (imposes a positive lower bound on the parameter). Default: 1e-05
response_dist_epsilon: float; Epsilon parameter to use for numerical stability in bounded parameters of the response distribution (imposes a positive lower bound on the parameter). Default: 1e-05
optim_epsilon: float; Epsilon parameter to use if optim_name in ['Adam', 'Nadam'], ignored otherwise. Default: 1e-08
learning_rate: float; Initial value for the learning rate. Default (CDR): 0.001; Default (CDRNN): 0.01
learning_rate_min: float; Minimum value for the learning rate. Default: 0.0
lr_decay_family: str or None; Functional family for the learning rate decay schedule (no decay if None). Default: None
lr_decay_rate: float; coefficient by which to decay the learning rate every lr_decay_steps (ignored if lr_decay_family==None). Default: 1.0
lr_decay_steps: int; Span of iterations over which to decay the learning rate by lr_decay_rate (ignored if lr_decay_family==None). Default: 100
lr_decay_iteration_power: float; Power to which the iteration number t should be raised when computing the learning rate decay. Default: 0.5
lr_decay_staircase: bool; Keep learning rate flat between lr_decay_steps (ignored if lr_decay_family==None). Default: False
filter_outlier_losses: float, bool or None; Whether outlier large losses are filtered out while training continues. If False, outlier losses trigger a restart from the most recent save point. Ignored unless loss_cutoff_n_sds is specified. Using this option avoids restarts, but can lead to bias if training instances are systematically dropped. If None, False, or 0, no loss filtering. Default: False
loss_cutoff_n_sds: float or None; How many moving standard deviations above the moving mean of the loss to use as a cut-off for stability (if outlier large losses are detected, training restarts from the preceding checkpoint). If None, or 0, no loss cut-off. Default: 1000
ema_decay: float; Decay factor to use for exponential moving average for parameters (used in prediction). Default: 0.999
convergence_n_iterates: int or None; Number of timesteps over which to average parameter movements for convergence diagnostics. If None or 0, convergence will not be programmatically checked (reduces memory overhead, but convergence must then be visually diagnosed). Default (CDR): 500; Default (CDRNN): 100
convergence_stride: int; Stride (in iterations) over which to compute convergence. If larger than 1, iterations within a stride are averaged with the most recently saved value. Larger values increase the receptive field of the slope estimates, making convergence diagnosis less vulnerable to local perturbations but also increasing the number of post-convergence iterations necessary in order to identify convergence. If early_stopping is True, convergence_stride will implicitly be multiplied by eval_freq. Default: 1
convergence_alpha: float or None; Significance threshold above which to fail to reject the null of no correlation between convergence basis and training time. Larger values are more stringent. Default: 0.5
early_stopping: bool; Whether to diagnose convergence based on dev set performance (True) or training set performance (False). Default: True
regularizer_name: str or None; Name of global regularizer; can be overridden by more regularizers for more specific parameters (e.g. l1_regularizer, l2_regularizer). If None, no regularization. Default: None
regularizer_scale: str or float; Scale of global regularizer; can be overridden by more regularizers for more specific parameters (ignored if regularizer_name==None). Default: 0.0
intercept_regularizer_name: str, "inherit" or None; Name of intercept regularizer (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no regularization. Default: inherit
intercept_regularizer_scale: str, float or "inherit"; Scale of intercept regularizer (ignored if regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: inherit
coefficient_regularizer_name: str, "inherit" or None; Name of coefficient regularizer (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no regularization. Default: inherit
coefficient_regularizer_scale: str, float or "inherit"; Scale of coefficient regularizer (ignored if regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: inherit
irf_regularizer_name: str, "inherit" or None; Name of IRF parameter regularizer (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no regularization. Default: inherit
irf_regularizer_scale: str, float or "inherit"; Scale of IRF parameter regularizer (ignored if regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: inherit
ranef_regularizer_name: str, "inherit" or None; Name of random effects regularizer (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no regularization. Regularization only applies to random effects without variational priors. Default (CDR): inherit; Default (CDRNN): l2_regularizer
ranef_regularizer_scale: str, float or "inherit"; Scale of random effects regularizer (ignored if regularizer_name==None). If 'inherit', inherits regularizer_scale. Regularization only applies to random effects without variational priors. Default (CDR): inherit; Default (CDRNN): 100.0
regularize_mean: bool; Mean-aggregate regularized variables. If False, use sum aggregation. Default: False
save_freq: int; Frequency (in iterations) with which to save model checkpoints. Default: 1
plot_freq: int; Frequency (in iterations) with which to plot model estimates (or 0 to turn off incremental plotting). Default: 10
eval_freq: int; Frequency (in iterations) with which to evaluate on dev data (or 0 to turn off incremental evaluation). Default: 10
log_freq: int; Frequency (in iterations) with which to log model params to Tensorboard. Default: 1
log_fixed: bool; Log random fixed to Tensorboard. Can slow training of models with many fixed effects. Default: True
log_random: bool; Log random effects to Tensorboard. Can slow training of models with many random effects. Default: True
log_graph: bool; Log the network graph to Tensorboard Default: False
indicator_names: str; Space-delimited list of predictors that are indicators (0 or 1). Used for plotting and effect estimation (value 0 is always used as reference, rather than mean). Default:
default_reference_type: "mean" or 0.0; Reference stimulus to use by default for plotting and effect estimation. If 0, zero vector. If mean, training set mean by predictor. Default (CDR): 0.0; Default (CDRNN): mean
reference_values: str; Predictor values to use as a reference in plotting and effect estimation. Structured as space-delimited pairs NAME=FLOAT. Any predictor without a specified reference value will use either 0 or the training set mean, depending on plot_mean_as_reference. Default:
plot_step: str; Size of step by predictor to take above reference in univariate IRF plots. Structured as space-delimited pairs NAME=FLOAT. Any predictor without a specified step size will inherit from plot_step_default. Default:
plot_step_default: str or float; Default size of step to take above reference in univariate IRF plots, if not specified in plot_step. Either a float or the string 'sd', which indicates training sample standard deviation. Default: sd
reference_time: float; Timepoint at which to plot interactions. Default: 0.0
plot_n_time_units: float; Number of time units to use for plotting. Default: 1
plot_n_time_points: int; Resolution of plot axis (for 3D plots, uses sqrt of this number for each axis). Default: 1024
plot_dirac: bool; Whether to include any Dirac delta IRF’s (stick functions at t=0) in plot. Default: False
plot_x_inches: float; Width of plot in inches. Default: 6.0
plot_y_inches: float; Height of plot in inches. Default: 4.0
plot_legend: bool; Whether to include a legend in plots with multiple components. Default: True
generate_univariate_irf_plots: bool; Whether to plot univariate IRFs over time. Default: True
generate_univariate_irf_heatmaps: bool; Whether to plot univariate IRF heatmaps over time. Default: False
generate_curvature_plots: bool; Whether to plot IRF curvature at time reference_time. Default: True
generate_irf_surface_plots: bool; Whether to plot IRF surfaces. Default: False
generate_interaction_surface_plots: bool; Whether to plot IRF interaction surfaces at time reference_time. Default: False
generate_err_dist_plots: bool; Whether to plot the average error distribution for real-valued responses. Default: True
generate_nonstationarity_surface_plots: bool; Whether to plot IRF surfaces showing non-stationarity in the response. Default: False
cmap: str; Name of MatPlotLib cmap specification to use for plotting (determines the color of lines in the plot). Default: gist_rainbow
dpi: int; Dots per inch of saved plot image file. Default: 300
keep_plot_history: bool; Keep IRF plots from each checkpoint of a run, which can help visualize learning trajectories but can also consume a lot of disk space. If False, only the most recent plot of each type is kept. Default: False
declare_priors_fixef: bool; Specify Gaussian priors for all fixed model parameters (if False, use implicit improper uniform priors). Default: True
declare_priors_ranef: bool; Specify Gaussian priors for all random model parameters (if False, use implicit improper uniform priors). Default: True
intercept_prior_sd: str, float or None; Standard deviation of prior on fixed intercept. Can be a space-delimited list of ;-delimited floats (one per distributional parameter per response variable), a float (applied to all responses), or None, in which case the prior is inferred from prior_sd_scaling_coefficient and the empirical variance of the response on the training set. Default: None
coef_prior_sd: str, float or None; Standard deviation of prior on fixed coefficients. Can be a space-delimited list of ;-delimited floats (one per distributional parameter per response variable), a float (applied to all responses), or None, in which case the prior is inferred from prior_sd_scaling_coefficient and the empirical variance of the response on the training set. Default: None
irf_param_prior_sd: str or float; Standard deviation of prior on convolutional IRF parameters. Can be either a space-delimited list of ;-delimited floats (one per distributional parameter per response variable) or a float (applied to all responses) Default: 1.0
y_sd_prior_sd: float or None; Standard deviation of prior on standard deviation of output model. If None, inferred as y_sd_prior_sd_scaling_coefficient times the empirical variance of the response on the training set. Default: None
prior_sd_scaling_coefficient: float; Factor by which to multiply priors on intercepts and coefficients if inferred from the empirical variance of the data (i.e. if intercept_prior_sd or coef_prior_sd is None). Ignored for any prior widths that are explicitly specified. Default: 1
y_sd_prior_sd_scaling_coefficient: float; Factor by which to multiply prior on output model variance if inferred from the empirical variance of the data (i.e. if y_sd_prior_sd is None). Ignored if prior width is explicitly specified. Default: 1
ranef_to_fixef_prior_sd_ratio: float; Ratio of widths of random to fixed effects priors. I.e. if less than 1, random effects have tighter priors. Default: 0.1
posterior_to_prior_sd_ratio: float; Ratio of posterior initialization SD to prior SD. Low values are often beneficial to stability, convergence speed, and quality of final fit by avoiding erratic sampling and divergent behavior early in training. Default: 0.01
center_X_time: bool; Whether to center time values as inputs under the hood. Times are automatically shifted back to the source location for plotting and model criticism. Default: False
center_t_delta: bool; Whether to center time offset values under the hood. Offsets are automatically shifted back to the source location for plotting and model criticism. Default: False
rescale_X_time: bool; Whether to rescale time values as inputs by their training SD under the hood. Times are automatically reconverted back to the source scale for plotting and model criticism. Default: True
rescale_t_delta: bool; Whether to rescale time offset values by their training SD under the hood. Offsets are automatically reconverted back to the source scale for plotting and model criticism. Default: False
nn_use_input_scaler: bool; Whether to apply a Hadamard scaling layer to the inputs to any NN components. Default: False
log_transform_t_delta: bool; Whether to log-modulus transform time offset values for stability under the hood (log-modulus is used to handle negative values in non-causal models). Offsets are automatically reconverted back to the source scale for plotting and model criticism. Default: False
nonstationary: bool; Whether to model non-stationarity in NN components by feeding impulse timestamps as input. Default: True
n_layers_ff: int or None; Number of hidden layers in feedforward encoder. If None, inferred from length of n_units_ff. Default: 2
n_units_ff: int, str or None; Number of units per feedforward encoder hidden layer. Can be an int, which will be used for all layers, or a str with n_layers_rnn space-delimited integers, one for each layer in order from bottom to top. If 0 or None, no feedforward encoder. Default: 128
n_layers_rnn: int or None; Number of RNN layers. If None, inferred from length of n_units_rnn. Default: None
n_units_rnn: int, str or None; Number of units per RNN layer. Can be an int, which will be used for all layers, or a str with n_layers_rnn space-delimited integers, one for each layer in order from bottom to top. Can also be 'infer', which infers the size from the number of predictors, or 'inherit', which uses size n_units_hidden_state. If 0 or None, no RNN encoding (i.e. use a context-independent convolution kernel). Default: None
n_layers_rnn_projection: int or None; Number of hidden layers in projection of RNN state (or of timestamp + predictors if no RNN). If None, inferred automatically. Default: None
n_units_rnn_projection: int, str or None; Number of units per hidden layer in projection of RNN state. Can be an int, which will be used for all layers, or a str with n_units_rnn_projection space-delimited integers, one for each layer in order from bottom to top. If 0 or None, no hidden layers in RNN projection. Default: None
n_layers_irf: int or None; Number of IRF hidden layers. If None, inferred from length of n_units_irf. Default: 2
n_units_irf: int, str or None; Number of units per hidden layer in IRF. Can be an int, which will be used for all layers, or a str with n_units_irf space-delimited integers, one for each layer in order from bottom to top. If 0 or None, no hidden layers. Default: 128
input_dependent_irf: bool; Whether or not NN IRFs are input-dependent (can modify their shape at different values of the predictors). Default: True
ranef_l1_only: bool; Whether to include random effects only on first layer of feedforward transforms (True) or on all neural components. Default: False
ranef_bias_only: bool; Whether to include random effects only on bias terms of neural components (True) or also on weight matrices. Default: True
normalizer_use_ranef: bool; Whether to include random effects in normalizer layers (True) or not. Default: False
ff_inner_activation: str or None; Name of activation function to use for hidden layers in feedforward encoder. Default: gelu
ff_activation: str or None; Name of activation function to use for output of feedforward encoder. Default: None
rnn_activation: str or None; Name of activation to use in RNN layers. Default: tanh
recurrent_activation: str or None; Name of recurrent activation to use in RNN layers. Default: sigmoid
rnn_projection_inner_activation: str or None; Name of activation function to use for hidden layers in projection of RNN state. Default: gelu
rnn_projection_activation: str or None; Name of activation function to use for final layer in projection of RNN state. Default: None
irf_inner_activation: str or None; Name of activation function to use for hidden layers in IRF. Default: gelu
irf_activation: str or None; Name of activation function to use for final layer in IRF. Default: None
kernel_initializer: str or None; Name of initializer to use in encoder kernels. Default: glorot_uniform_initializer
recurrent_initializer: str or None; Name of initializer to use in encoder recurrent kernels. Default: orthogonal_initializer
weight_sd_init: str, float or None; Standard deviation of kernel initialization distribution (Normal, mean=0). Can also be 'glorot', which uses the SD of the Glorot normal initializer. If None, inferred from other hyperparams. Default: glorot
batch_normalization_decay: bool, float or None; Decay rate to use for batch normalization in internal layers. If True, uses decay 0.999. If False or None, no batch normalization. Default: None
layer_normalization_type: bool, str or None; Type of layer normalization, one of ['z', 'length', None]. If 'z', classical z-transform-based normalization. If 'length', normalize by the norm of the activation vector. If True, uses 'z'. If False or None, no layer normalization. Default: z
normalize_ff: bool; Whether to apply normalization (if applicable) to hidden layers of feedforward encoders. Default: True
normalize_irf: bool; Whether to apply normalization (if applicable) to non-initial internal IRF layers. Default: True
normalize_after_activation: bool; Whether to apply normalization (if applicable) after the non-linearity (otherwise, applied before). Default: False
shift_normalized_activations: bool; Whether to use trainable shift in batch/layer normalization layers. Default: True
rescale_normalized_activations: bool; Whether to use trainable scale in batch/layer normalization layers. Default: True
normalize_inputs: bool; Whether to apply normalization (if applicable) to the inputs. Default: False
normalize_final_layer: bool; Whether to apply normalization (if applicable) to the final layer. Default: False
nn_regularizer_name: str, "inherit" or None; Name of weight regularizer (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no regularization. Default: None
nn_regularizer_scale: str, float or "inherit"; Scale of weight regularizer (ignored if regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: 1.0
activity_regularizer_name: str, "inherit" or None; Name of activity regularizer (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no activity regularization. Default: None
activity_regularizer_scale: str, float or "inherit"; Scale of activity regularizer (ignored if regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: 5.0
ff_regularizer_name: str or None; Name of weight regularizer (e.g. l1_regularizer, l2_regularizer) on output layer of feedforward encoders; overrides regularizer_name. If None, inherits from nn_regularizer_name. Default: None
ff_regularizer_scale: str or float; Scale of weight regularizer (ignored if regularizer_name==None) on output layer of feedforward encoders. If None, inherits from nn_regularizer_scale. Default: 5.0
regularize_initial_layer: bool; Whether to regulare the first layer of NN components. Default: True
regularize_final_layer: bool; Whether to regulare the last layer of NN components. Default: False
rnn_projection_regularizer_name: str or None; Name of weight regularizer (e.g. l1_regularizer, l2_regularizer) on output layer of RNN projection; overrides regularizer_name. If None, inherits from nn_regularizer_name. Default: None
rnn_projection_regularizer_scale: str or float; Scale of weight regularizer (ignored if regularizer_name==None) on output layer of RNN projection. If None, inherits from nn_regularizer_scale. Default: 5.0
context_regularizer_name: str, "inherit" or None; Name of regularizer on contribution of context (RNN) to hidden state (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no regularization. Default: l1_l2_regularizer
context_regularizer_scale: float or "inherit"; Scale of weight regularizer (ignored if context_regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: 10.0
maxnorm: float or None; Bound on norm of dense kernel dimensions for max-norm regularization. If None, no max-norm regularization. Default: None
input_dropout_rate: float or None; Rate at which to drop input_features. Default: None
ff_dropout_rate: float or None; Rate at which to drop neurons of FF projection. Default: 0.5
rnn_h_dropout_rate: float or None; Rate at which to drop neurons of RNN hidden state. Default: None
rnn_c_dropout_rate: float or None; Rate at which to drop neurons of RNN cell state. Default: None
h_rnn_dropout_rate: float or None; Rate at which to drop neurons of h_rnn. Default: 0.5
rnn_dropout_rate: float or None; Rate at which to entirely drop the RNN. Default: 0.5
irf_dropout_rate: float or None; Rate at which to drop neurons of IRF layers. Default: 0.5
ranef_dropout_rate: float or None; Rate at which to drop random effects indicators. Default: None
dropout_final_layer: bool; Whether to apply dropout to the last layer of NN components. Default: False
fixed_dropout: bool; Whether to fix the dropout mask over the time dimension during training, ensuring that each training instance is processed by the same resampled model. Default: True
declare_priors_weights: bool; Specify Gaussian priors for all fixed model parameters (if False, use implicit improper uniform priors). Default: True
declare_priors_biases: bool; Specify Gaussian priors for model biases (if False, use implicit improper uniform priors). Default: True
declare_priors_gamma: bool; Specify Gaussian priors for gamma parameters of any batch normalization layers (if False, use implicit improper uniform priors). Default: True
weight_prior_sd: str or float; Standard deviation of prior on CDRNN hidden weights. A float, 'glorot', or 'he'. Default: glorot
bias_prior_sd: str or float; Standard deviation of prior on CDRNN hidden biases. A float, 'glorot', or 'he'. Default: 1.0
gamma_prior_sd: str or float; Standard deviation of prior on batch norm gammas. A float, 'glorot', or 'he'. Ignored unless batch normalization is used Default: 1
bias_sd_init: str, float or None; Initial standard deviation of variational posterior over biases. If None, inferred from other hyperparams. Default: None
gamma_sd_init: str, float or None; Initial standard deviation of variational posterior over batch norm gammas. If None, inferred from other hyperparams. Ignored unless batch normalization is used. Default: None

Variational Bayes

declare_priors_fixef: bool; Specify Gaussian priors for all fixed model parameters (if False, use implicit improper uniform priors). Default: True
declare_priors_ranef: bool; Specify Gaussian priors for all random model parameters (if False, use implicit improper uniform priors). Default: True
intercept_prior_sd: str, float or None; Standard deviation of prior on fixed intercept. Can be a space-delimited list of ;-delimited floats (one per distributional parameter per response variable), a float (applied to all responses), or None, in which case the prior is inferred from prior_sd_scaling_coefficient and the empirical variance of the response on the training set. Default: None
coef_prior_sd: str, float or None; Standard deviation of prior on fixed coefficients. Can be a space-delimited list of ;-delimited floats (one per distributional parameter per response variable), a float (applied to all responses), or None, in which case the prior is inferred from prior_sd_scaling_coefficient and the empirical variance of the response on the training set. Default: None
irf_param_prior_sd: str or float; Standard deviation of prior on convolutional IRF parameters. Can be either a space-delimited list of ;-delimited floats (one per distributional parameter per response variable) or a float (applied to all responses) Default: 1.0
y_sd_prior_sd: float or None; Standard deviation of prior on standard deviation of output model. If None, inferred as y_sd_prior_sd_scaling_coefficient times the empirical variance of the response on the training set. Default: None
prior_sd_scaling_coefficient: float; Factor by which to multiply priors on intercepts and coefficients if inferred from the empirical variance of the data (i.e. if intercept_prior_sd or coef_prior_sd is None). Ignored for any prior widths that are explicitly specified. Default: 1
y_sd_prior_sd_scaling_coefficient: float; Factor by which to multiply prior on output model variance if inferred from the empirical variance of the data (i.e. if y_sd_prior_sd is None). Ignored if prior width is explicitly specified. Default: 1
ranef_to_fixef_prior_sd_ratio: float; Ratio of widths of random to fixed effects priors. I.e. if less than 1, random effects have tighter priors. Default: 0.1
posterior_to_prior_sd_ratio: float; Ratio of posterior initialization SD to prior SD. Low values are often beneficial to stability, convergence speed, and quality of final fit by avoiding erratic sampling and divergent behavior early in training. Default: 0.01

Neural Network Components

center_X_time: bool; Whether to center time values as inputs under the hood. Times are automatically shifted back to the source location for plotting and model criticism. Default: False
center_t_delta: bool; Whether to center time offset values under the hood. Offsets are automatically shifted back to the source location for plotting and model criticism. Default: False
rescale_X_time: bool; Whether to rescale time values as inputs by their training SD under the hood. Times are automatically reconverted back to the source scale for plotting and model criticism. Default: True
rescale_t_delta: bool; Whether to rescale time offset values by their training SD under the hood. Offsets are automatically reconverted back to the source scale for plotting and model criticism. Default: False
nn_use_input_scaler: bool; Whether to apply a Hadamard scaling layer to the inputs to any NN components. Default: False
log_transform_t_delta: bool; Whether to log-modulus transform time offset values for stability under the hood (log-modulus is used to handle negative values in non-causal models). Offsets are automatically reconverted back to the source scale for plotting and model criticism. Default: False
nonstationary: bool; Whether to model non-stationarity in NN components by feeding impulse timestamps as input. Default: True
n_layers_ff: int or None; Number of hidden layers in feedforward encoder. If None, inferred from length of n_units_ff. Default: 2
n_units_ff: int, str or None; Number of units per feedforward encoder hidden layer. Can be an int, which will be used for all layers, or a str with n_layers_rnn space-delimited integers, one for each layer in order from bottom to top. If 0 or None, no feedforward encoder. Default: 128
n_layers_rnn: int or None; Number of RNN layers. If None, inferred from length of n_units_rnn. Default: None
n_units_rnn: int, str or None; Number of units per RNN layer. Can be an int, which will be used for all layers, or a str with n_layers_rnn space-delimited integers, one for each layer in order from bottom to top. Can also be 'infer', which infers the size from the number of predictors, or 'inherit', which uses size n_units_hidden_state. If 0 or None, no RNN encoding (i.e. use a context-independent convolution kernel). Default: None
n_layers_rnn_projection: int or None; Number of hidden layers in projection of RNN state (or of timestamp + predictors if no RNN). If None, inferred automatically. Default: None
n_units_rnn_projection: int, str or None; Number of units per hidden layer in projection of RNN state. Can be an int, which will be used for all layers, or a str with n_units_rnn_projection space-delimited integers, one for each layer in order from bottom to top. If 0 or None, no hidden layers in RNN projection. Default: None
n_layers_irf: int or None; Number of IRF hidden layers. If None, inferred from length of n_units_irf. Default: 2
n_units_irf: int, str or None; Number of units per hidden layer in IRF. Can be an int, which will be used for all layers, or a str with n_units_irf space-delimited integers, one for each layer in order from bottom to top. If 0 or None, no hidden layers. Default: 128
input_dependent_irf: bool; Whether or not NN IRFs are input-dependent (can modify their shape at different values of the predictors). Default: True
ranef_l1_only: bool; Whether to include random effects only on first layer of feedforward transforms (True) or on all neural components. Default: False
ranef_bias_only: bool; Whether to include random effects only on bias terms of neural components (True) or also on weight matrices. Default: True
normalizer_use_ranef: bool; Whether to include random effects in normalizer layers (True) or not. Default: False
ff_inner_activation: str or None; Name of activation function to use for hidden layers in feedforward encoder. Default: gelu
ff_activation: str or None; Name of activation function to use for output of feedforward encoder. Default: None
rnn_activation: str or None; Name of activation to use in RNN layers. Default: tanh
recurrent_activation: str or None; Name of recurrent activation to use in RNN layers. Default: sigmoid
rnn_projection_inner_activation: str or None; Name of activation function to use for hidden layers in projection of RNN state. Default: gelu
rnn_projection_activation: str or None; Name of activation function to use for final layer in projection of RNN state. Default: None
irf_inner_activation: str or None; Name of activation function to use for hidden layers in IRF. Default: gelu
irf_activation: str or None; Name of activation function to use for final layer in IRF. Default: None
kernel_initializer: str or None; Name of initializer to use in encoder kernels. Default: glorot_uniform_initializer
recurrent_initializer: str or None; Name of initializer to use in encoder recurrent kernels. Default: orthogonal_initializer
weight_sd_init: str, float or None; Standard deviation of kernel initialization distribution (Normal, mean=0). Can also be 'glorot', which uses the SD of the Glorot normal initializer. If None, inferred from other hyperparams. Default: glorot
batch_normalization_decay: bool, float or None; Decay rate to use for batch normalization in internal layers. If True, uses decay 0.999. If False or None, no batch normalization. Default: None
layer_normalization_type: bool, str or None; Type of layer normalization, one of ['z', 'length', None]. If 'z', classical z-transform-based normalization. If 'length', normalize by the norm of the activation vector. If True, uses 'z'. If False or None, no layer normalization. Default: z
normalize_ff: bool; Whether to apply normalization (if applicable) to hidden layers of feedforward encoders. Default: True
normalize_irf: bool; Whether to apply normalization (if applicable) to non-initial internal IRF layers. Default: True
normalize_after_activation: bool; Whether to apply normalization (if applicable) after the non-linearity (otherwise, applied before). Default: False
shift_normalized_activations: bool; Whether to use trainable shift in batch/layer normalization layers. Default: True
rescale_normalized_activations: bool; Whether to use trainable scale in batch/layer normalization layers. Default: True
normalize_inputs: bool; Whether to apply normalization (if applicable) to the inputs. Default: False
normalize_final_layer: bool; Whether to apply normalization (if applicable) to the final layer. Default: False
nn_regularizer_name: str, "inherit" or None; Name of weight regularizer (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no regularization. Default: None
nn_regularizer_scale: str, float or "inherit"; Scale of weight regularizer (ignored if regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: 1.0
activity_regularizer_name: str, "inherit" or None; Name of activity regularizer (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no activity regularization. Default: None
activity_regularizer_scale: str, float or "inherit"; Scale of activity regularizer (ignored if regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: 5.0
ff_regularizer_name: str or None; Name of weight regularizer (e.g. l1_regularizer, l2_regularizer) on output layer of feedforward encoders; overrides regularizer_name. If None, inherits from nn_regularizer_name. Default: None
ff_regularizer_scale: str or float; Scale of weight regularizer (ignored if regularizer_name==None) on output layer of feedforward encoders. If None, inherits from nn_regularizer_scale. Default: 5.0
regularize_initial_layer: bool; Whether to regulare the first layer of NN components. Default: True
regularize_final_layer: bool; Whether to regulare the last layer of NN components. Default: False
rnn_projection_regularizer_name: str or None; Name of weight regularizer (e.g. l1_regularizer, l2_regularizer) on output layer of RNN projection; overrides regularizer_name. If None, inherits from nn_regularizer_name. Default: None
rnn_projection_regularizer_scale: str or float; Scale of weight regularizer (ignored if regularizer_name==None) on output layer of RNN projection. If None, inherits from nn_regularizer_scale. Default: 5.0
context_regularizer_name: str, "inherit" or None; Name of regularizer on contribution of context (RNN) to hidden state (e.g. l1_regularizer, l2_regularizer); overrides regularizer_name. If 'inherit', inherits regularizer_name. If None, no regularization. Default: l1_l2_regularizer
context_regularizer_scale: float or "inherit"; Scale of weight regularizer (ignored if context_regularizer_name==None). If 'inherit', inherits regularizer_scale. Default: 10.0
maxnorm: float or None; Bound on norm of dense kernel dimensions for max-norm regularization. If None, no max-norm regularization. Default: None
input_dropout_rate: float or None; Rate at which to drop input_features. Default: None
ff_dropout_rate: float or None; Rate at which to drop neurons of FF projection. Default: 0.5
rnn_h_dropout_rate: float or None; Rate at which to drop neurons of RNN hidden state. Default: None
rnn_c_dropout_rate: float or None; Rate at which to drop neurons of RNN cell state. Default: None
h_rnn_dropout_rate: float or None; Rate at which to drop neurons of h_rnn. Default: 0.5
rnn_dropout_rate: float or None; Rate at which to entirely drop the RNN. Default: 0.5
irf_dropout_rate: float or None; Rate at which to drop neurons of IRF layers. Default: 0.5
ranef_dropout_rate: float or None; Rate at which to drop random effects indicators. Default: None
dropout_final_layer: bool; Whether to apply dropout to the last layer of NN components. Default: False
fixed_dropout: bool; Whether to fix the dropout mask over the time dimension during training, ensuring that each training instance is processed by the same resampled model. Default: True

Variational Bayesian Neural Network Components

declare_priors_weights: bool; Specify Gaussian priors for all fixed model parameters (if False, use implicit improper uniform priors). Default: True
declare_priors_biases: bool; Specify Gaussian priors for model biases (if False, use implicit improper uniform priors). Default: True
declare_priors_gamma: bool; Specify Gaussian priors for gamma parameters of any batch normalization layers (if False, use implicit improper uniform priors). Default: True
weight_prior_sd: str or float; Standard deviation of prior on CDRNN hidden weights. A float, 'glorot', or 'he'. Default: glorot
bias_prior_sd: str or float; Standard deviation of prior on CDRNN hidden biases. A float, 'glorot', or 'he'. Default: 1.0
gamma_prior_sd: str or float; Standard deviation of prior on batch norm gammas. A float, 'glorot', or 'he'. Ignored unless batch normalization is used Default: 1
bias_sd_init: str, float or None; Initial standard deviation of variational posterior over biases. If None, inferred from other hyperparams. Default: None
gamma_sd_init: str, float or None; Initial standard deviation of variational posterior over batch norm gammas. If None, inferred from other hyperparams. Ignored unless batch normalization is used. Default: None

Section: `[irf_name_map]`

The optional [irf_name_map] section simply permits prettier variable naming in plots. For example, the internal name for a convolution applied to predictor A may be ShiftedGammaKgt1.s(A)-Terminal.s(A), which is not very readable. To address this, the string above can be mapped to a more readable name using an INI key-value pair, as shown:

ShiftedGammaKgt1.s(A)-Terminal.s(A) = A

The model will then print A in plots rather than ShiftedGammaKgt1.s(A)-Terminal.s(A). Unused entries in the name map are ignored, and model variables that do not have an entry in the name map print with their default internal identifier.

Sections: `[model_CDR_*]`

Arbitrarily many sections named [model_CDR_*] can be provided in the config file, where * stands in for a unique identifier. Each such section defines a different CDR model and must contain at least one field — formula — whose value is a CDR model formula (see CDR Model Formulas for more on CDR formula syntax) The identifier CDR_* will be used by the CDR utilities to reference the fitted model and its output files.

For example, to define a CDR model called readingtimes, the section header [model_CDR_readingtimes] is included in the config file along with an appropriate formula specification. To use this specific model once fitted, it can be referenced using the identifier CDR_readingtimes. For example, the following call will extract predictions on dev data from a fitted CDR_readingtimes defined in config file config.ini:

python -m cdr.bin.predict config.ini -m CDR_readingtimes -p dev

Additional fields from [cdr_settings] may be specified for a given model, in which case the locally-specified setting (rather than the globally specified setting or the default value) will be used to train the model. For example, imagine that [cdr_settings] contains the field n_iter = 1000. All CDR models subsequently specified in the config file will train for 1000 iterations. However, imagine that model [model_CDR_longertrain] should train for 5000 iterations instead. This can be specified within the same config file as:

[model_CDR_longertrain]
n_iter = 5000
formula = ...

This setup allows a single config file to define a variety of CDR models, as long as they all share the same data. Distinct datasets require distinct config files.

For hypothesis testing, fixed effect ablation can be conveniently automated using the ablate model field. For example, the following specification implicitly defines 7 unique models, one for each of the |powerset(a, b, c)| - 1 = 7 non-null ablations of a, b, and c:

[model_CDR_example]
n_iter = 5000
ablate = a b c
formula = C(a + b + c, Normal()) + (C(a + b + c, Normal()) | subject)

The ablated models are named using '!' followed by the name for each ablated predictor. Therefore, the above specification is equivalent to (and much easier to write than) the following:

[model_CDR_example]
n_iter = 5000
formula = C(a + b + c, Normal()) + (C(a + b + c, Normal()) | subject)

[model_CDR_example!a]
n_iter = 5000
formula = C(b + c, Normal()) + (C(a + b + c, Normal()) | subject)

[model_CDR_example!b]
n_iter = 5000
formula = C(a + c, Normal()) + (C(a + b + c, Normal()) | subject)

[model_CDR_example!c]
n_iter = 5000
formula = C(a + b, Normal()) + (C(a + b + c, Normal()) | subject)

[model_CDR_example!a!b]
n_iter = 5000
formula = C(c, Normal()) + (C(a + b + c, Normal()) | subject)

[model_CDR_example!a!c]
n_iter = 5000
formula = C(b, Normal()) + (C(a + b + c, Normal()) | subject)

[model_CDR_example!b!c]
n_iter = 5000
formula = C(a, Normal()) + (C(a + b + c, Normal()) | subject)

CDR Configuration Files

Section: [data]

Section: [global_settings]

Section: [cdr_settings]