# CDR Configuration Files¶

The CDR utilities in this module read config files that follow the INI standard. Basic information about INI file syntax can be found e.g. here. This reference assumes familiarity with the INI protocol.

CDR configuration files contain the sections and fields described below.

## Section: `[data]`

¶

The `[data]`

section supports the following fields:

**REQUIRED**

**X_train**:`str`

; Path to training data (impulse matrix)**y_train**:`str`

; Path to training data (response matrix)**series_ids**: space-delimited list of`str`

; Names of columns used to define unique time series

Note that, unlike e.g. linear models, CDR does not require synchronous predictors (impulses) and responses, which is why separate data objects must be provided for each of these components.
If the predictors and responses are synchronous, this is fine.
The `X_train`

and `y_train`

fields can point to the same file.
The system will treat each unique combination of values in the columns given in `series_ids`

as constituting a unique time series.

**OPTIONAL**

**X_dev**:`str`

; Path to dev data (impulse matrix)**y_dev**:`str`

; Path to dev data (response matrix)**X_test**:`str`

; Path to test data (impulse matrix)**y_test**:`str`

; Path to test data (response matrix)**history_length**:`int`

; Length of history window in timesteps (default:`128`

)**filters**:`str`

; List of filters to apply to response data (`;`

-delimited).

All variables used in a filter must be contained in the data files indicated by the `y_*`

parameters in the `[data]`

section of the config file.
The variable name is specified as an INI field, and the condition is specified as its value.
Supported logical operators are `<`

, `<=`

, `>`

, `>=`

, `==`

, and `!=`

.
For example, to keep only data points for which column `foo`

is less or equal to 100, the following filter can be added:

```
filters = foo <= 100
```

To keep only data points for which the column `foo`

does not equal `bar`

, the following filter can be added:

```
filters = foo != bar
```

Filters can be conjunctively combined:

```
filters = foo > 5; foo <= 100
```

Count-based filters are also supported, using the designated `nunique`

suffix.
For example, if the column `subject`

is being used to define a random effects level and we want to exclude all subjects with fewer than 100 data points, this can be accomplished using the following filter:

```
filters = subjectsnunique > 100
```

More complex filtration conditions are not supported automatically in CDR but can be applied to the data by the user as a preprocess.

Several CDR utilities (e.g. for prediction and evaluation) are designed to handle train, dev, and test partitions of the input data, but these partitions must be constructed in advance.
This package also provides a `partition`

utility that can be used to partition input data by applying modular arithmetic to some subset of the variables in the data.
For usage details run:

`python -m cdr.bin.partition -h`

**IMPORTANT NOTES**

The files indicated in

`X_*`

must contain the following columns:**time**: Timestamp associated with each observationA column for each variable in

`series_ids`

A column for each predictor variable indicated in the model formula

The file in

`y_*`

must contain the following columns:**time**: Timestamp associated with each observationA column for the response variable in the model formula

A column for each variable in

`series_ids`

A column for each random grouping factor in the in the model formula

A column for each variable used for data filtration (see below)

Data in

`y_*`

may be filtered/partitioned, but data in`X_*`

**must be uncensored**unless independent reason exists to assume that certain observations never have an impact on the response.

## Section: `[global_settings]`

¶

The `[global_settings]`

section supports the following fields:

**outdir**:`str`

; Path to output directory where checkpoints, plots, and Tensorboard logs should be saved (default:`./cdr_model/`

). If it does not exist, this directory will be created. At runtime, the`train`

utility will copy the config file to this directory as`config.ini`

, serving as a record of the settings used to generate the analysis.**use_gpu_if_available**:`bool`

; If available, run on GPU. If`False`

, always runs on CPU even when system has compatible GPU.

## Section: `[cdr_settings]`

¶

The `[cdr_settings]`

section supports the following fields:

### All models¶

**outdir**:`str`

; Path to output directory, where logs and model parameters are saved.**Default**: ./cdr_model/**use_distributional_regression**:`bool`

; Whether to model all parameters of the predictive distribution as dependent on IRFs of the impulses (distributional regression). If`False`

, only the mean depends on the predictors (other parameters of the predictive distribution are treated as constant).**Default (CDR)**: False;**Default (CDRNN)**: True**predictive_distribution_map**:`str`

or`None`

; Map defining predictive distribution. Can be a space-delimited list of distribution names (one per response variable), a space-delimited list of ‘;’-delimited tuples matching response variables to distribution names (e.g.`response;Bernoulli`

), or`None`

, in which case the predictive distribution will be inferred as`Normal`

for continuous variables and`Categorical`

for categorical variables.**Default**: None**center_inputs**:`bool`

; DISCOURAGED UNLESS YOU HAVE A GOOD REASON, since this can distort rate estimates. Center inputs by subtracting training set means. Can improve convergence speed and reduce vulnerability to local optima. Only affects fitting – prediction, likelihood computation, and plotting are reported on the source values.**Default**: False**rescale_inputs**:`bool`

; Rescale input features by dividing by training set standard deviation. Can improve convergence speed and reduce vulnerability to local optima. Only affects fitting – prediction, likelihood computation, and plotting are reported on the source values.**Default (CDR)**: False;**Default (CDRNN)**: True**standardize_response**:`bool`

; Standardize (Z-transform) the response variable implicitly during training using training set mean and variance. Can improve convergence speed and reduce vulnerability to local optima. Only affects fitting – prediction, likelihood computation, and plotting are reported on the source values.**Default**: True**history_length**:`int`

; Length of the history (backward) window (in timesteps).**Default**: 128**future_length**:`int`

; Length of the future (forward) window (in timesteps). Note that causal IRF kernels cannot be used if**future_length**> 0.**Default**: 0**asymmetric_error**:`bool`

; Whether to model numeric responses by default with an (asymmetric) SinhArcshin transform of the Normal distribution. Otherwise, defaults to a Normal distribution. Only affects response variables whose distributions have not been explicitly specified using**predictive_distribution_map**.**Default**: False**constraint**:`str`

; Constraint function to use for bounded variables. One of`['abs', 'square', 'softplus']`

.**Default**: softplus**scale_loss_with_data**:`bool`

; Whether to multiply the scale of the LL loss by N, where N is num batches. This turns the loss into an expectation over training set likelihood.**Default**: True**scale_regularizer_with_data**:`bool`

; Whether to multiply the scale of all weight regularization by B * N, where B is batch size and N is num batches. If**scale_loss_with_data**is true, this approach ensures a stable regularization strength (relative to the loss) across datasets and batch sizes.**Default**: False**n_iter**:`int`

; Number of training iterations. If using variational inference, this becomes the expected number of training iterations and is used only for Tensorboard logging, with no impact on training behavior.**Default**: 100000**minibatch_size**:`int`

or`None`

; Size of minibatches to use for fitting (full-batch if`None`

).**Default**: 1024**eval_minibatch_size**:`int`

; Size of minibatches to use for prediction/evaluation.**Default**: 10000**n_samples_eval**:`int`

; Number of posterior predictive samples to draw for prediction/evaluation. Ignored for evaluating CDR MLE models.**Default**: 1000**optim_name**:`str`

or`None`

; Name of the optimizer to use. Must be one of:`'SGD'`

`'Momentum'`

`'AdaGrad'`

`'AdaDelta'`

`'Adam'`

`'FTRL'`

`'RMSProp'`

`'Nadam'`

**Default**: Nadam

**max_global_gradient_norm**:`float`

or`None`

; Maximum allowable value for the global norm of the gradient, which will be clipped as needed. If`None`

, no gradient clipping.**Default (CDR)**: None;**Default (CDRNN)**: 1.0**epsilon**:`float`

; Epsilon parameter to use for numerical stability in bounded parameter estimation.**Default (CDR)**: 1e-05;**Default (CDRNN)**: 0.01**optim_epsilon**:`float`

; Epsilon parameter to use if**optim_name**in`['Adam', 'Nadam']`

, ignored otherwise.**Default**: 1e-08**learning_rate**:`float`

; Initial value for the learning rate.**Default (CDR)**: 0.001;**Default (CDRNN)**: 0.01**learning_rate_min**:`float`

; Minimum value for the learning rate.**Default**: 0.0**lr_decay_family**:`str`

or`None`

; Functional family for the learning rate decay schedule (no decay if`None`

).**Default**: None**lr_decay_rate**:`float`

; coefficient by which to decay the learning rate every`lr_decay_steps`

(ignored if`lr_decay_family==None`

).**Default**: 0.0**lr_decay_steps**:`int`

; Span of iterations over which to decay the learning rate by`lr_decay_rate`

(ignored if`lr_decay_family==None`

).**Default**: 25**lr_decay_iteration_power**:`float`

; Power to which the iteration number`t`

should be raised when computing the learning rate decay.**Default**: 1**lr_decay_staircase**:`bool`

; Keep learning rate flat between`lr_decay_steps`

(ignored if`lr_decay_family==None`

).**Default**: False**loss_filter_n_sds**:`float`

or`None`

; How many moving standard deviations above the moving mean of the loss to use as a cut-off for stability (suppressing large losses). If`None`

, or`0`

, no loss filtering.**Default (CDR)**: None;**Default (CDRNN)**: 1000.0**ema_decay**:`float`

or`None`

; Decay factor to use for exponential moving average for parameters (used in prediction).**Default**: 0.999**convergence_n_iterates**:`int`

or`None`

; Number of timesteps over which to average parameter movements for convergence diagnostics. If`None`

or`0`

, convergence will not be programmatically checked (reduces memory overhead, but convergence must then be visually diagnosed).**Default (CDR)**: 500;**Default (CDRNN)**: 100**convergence_stride**:`int`

; Stride (in iterations) over which to compute convergence. If larger than 1, iterations within a stride are averaged with the most recently saved value. Larger values increase the receptive field of the slope estimates, making convergence diagnosis less vulnerable to local perturbations but also increasing the number of post-convergence iterations necessary in order to identify convergence.**Default**: 1**convergence_alpha**:`float`

or`None`

; Significance threshold above which to fail to reject the null of no correlation between convergence basis and training time. Larger values are more stringent.**Default**: 0.5**regularizer_name**:`str`

or`None`

; Name of global regularizer; can be overridden by more regularizers for more specific parameters (e.g.`l1_regularizer`

,`l2_regularizer`

). If`None`

, no regularization.**Default**: None**regularizer_scale**:`str`

or`float`

; Scale of global regularizer; can be overridden by more regularizers for more specific parameters (ignored if`regularizer_name==None`

).**Default**: 0.0**coefficient_regularizer_name**:`str`

,`"inherit"`

or`None`

; Name of coefficient regularizer (e.g.`l1_regularizer`

,`l2_regularizer`

); overrides**regularizer_name**. If`'inherit'`

, inherits**regularizer_name**. If`None`

, no regularization.**Default**: inherit**coefficient_regularizer_scale**:`str`

,`float`

or`"inherit"`

; Scale of coefficient regularizer (ignored if`regularizer_name==None`

). If`'inherit'`

, inherits**regularizer_scale**.**Default**: inherit**intercept_regularizer_name**:`str`

,`"inherit"`

or`None`

; Name of intercept regularizer (e.g.`l1_regularizer`

,`l2_regularizer`

); overrides**regularizer_name**. If`'inherit'`

, inherits**regularizer_name**. If`None`

, no regularization.**Default**: inherit**intercept_regularizer_scale**:`str`

,`float`

or`"inherit"`

; Scale of intercept regularizer (ignored if`regularizer_name==None`

). If`'inherit'`

, inherits**regularizer_scale**.**Default**: inherit**ranef_regularizer_name**:`str`

,`"inherit"`

or`None`

; Name of random effects regularizer (e.g.`l1_regularizer`

,`l2_regularizer`

); overrides**regularizer_name**. If`'inherit'`

, inherits**regularizer_name**. If`None`

, no regularization.**Default (CDR)**: inherit;**Default (CDRNN)**: l2_regularizer**ranef_regularizer_scale**:`str`

,`float`

or`"inherit"`

; Scale of random effects regularizer (ignored if`regularizer_name==None`

). If`'inherit'`

, inherits**regularizer_scale**.**Default (CDR)**: inherit;**Default (CDRNN)**: 10.0**save_freq**:`int`

; Frequency (in iterations) with which to save model checkpoints.**Default (CDR)**: 100;**Default (CDRNN)**: 10**log_freq**:`int`

; Frequency (in iterations) with which to log model params to Tensorboard.**Default (CDR)**: 100;**Default (CDRNN)**: 1**log_random**:`bool`

; Log random effects to Tensorboard.**Default**: True**log_graph**:`bool`

; Log the network graph to Tensorboard**Default**: False**indicator_names**:`str`

; Space-delimited list of predictors that are indicators (0 or 1). Used for plotting and effect estimation (value 0 is always used as reference, rather than mean).**Default**:**default_reference_type**:`"mean"`

or`0.0`

; Reference stimulus to use by default for plotting and effect estimation. If 0, zero vector. If mean, training set mean by predictor.**Default (CDR)**: 0.0;**Default (CDRNN)**: mean**reference_values**:`str`

; Predictor values to use as a reference in plotting and effect estimation. Structured as space-delimited pairs`NAME=FLOAT`

. Any predictor without a specified reference value will use either 0 or the training set mean, depending on**plot_mean_as_reference**.**Default**:**plot_step**:`str`

; Size of step by predictor to take above reference in univariate IRF plots. Structured as space-delimited pairs`NAME=FLOAT`

. Any predictor without a specified step size will step 1 SD from training set.**Default**:**plot_step_default**:`str`

or`float`

; Default size of step to take above reference in univariate IRF plots, if not specified in**plot_step**. Either a float or the string`'sd'`

, which indicates training sample standard deviation.**Default**: 1.0**reference_time**:`float`

; Timepoint at which to plot interactions.**Default**: 0.0**plot_n_time_units**:`float`

; Number of time units to use for plotting.**Default**: 5**plot_n_time_points**:`int`

; Resolution of plot axis (for 3D plots, uses sqrt of this number for each axis).**Default**: 1024**plot_x_inches**:`float`

; Width of plot in inches.**Default**: 6.0**plot_y_inches**:`float`

; Height of plot in inches.**Default**: 4.0**plot_legend**:`bool`

; Whether to include a legend in plots with multiple components.**Default**: True**generate_curvature_plots**:`bool`

; Whether to plot IRF curvature at time**reference_time**.**Default (CDR)**: False;**Default (CDRNN)**: True**generate_irf_surface_plots**:`bool`

; Whether to plot IRF surfaces.**Default (CDR)**: False;**Default (CDRNN)**: True**generate_interaction_surface_plots**:`bool`

; Whether to plot IRF interaction surfaces at time**reference_time**.**Default**: True**generate_err_dist_plots**:`bool`

; Whether to plot the average error distribution for real-valued responses.**Default**: True**generate_nonstationarity_surface_plots**:`bool`

; Whether to plot IRF surfaces showing non-stationarity in the response.**Default (CDR)**: False;**Default (CDRNN)**: True**cmap**:`str`

; Name of MatPlotLib cmap specification to use for plotting (determines the color of lines in the plot).**Default**: gist_rainbow**dpi**:`int`

; Dots per inch of saved plot image file.**Default**: 300**keep_plot_history**:`bool`

; Keep IRF plots from each checkpoint of a run, which can help visualize learning trajectories but can also consume a lot of disk space. If`False`

, only the most recent plot of each type is kept.**Default**: False

### All CDR models¶

**irf_regularizer_name**:`str`

,`"inherit"`

or`None`

; Name of IRF parameter regularizer (e.g.`l1_regularizer`

,`l2_regularizer`

); overrides**regularizer_name**. If`'inherit'`

, inherits**regularizer_name**. If`None`

, no regularization.**Default**: inherit**irf_regularizer_scale**:`str`

,`float`

or`"inherit"`

; Scale of IRF parameter regularizer (ignored if`regularizer_name==None`

). If`'inherit'`

, inherits**regularizer_scale**.**Default**: inherit

### CDRMLE¶

### CDRBayes¶

**coef_prior_sd**:`str`

,`float`

or`None`

; Standard deviation of prior on fixed coefficients. Can be a space-delimited list of`;`

-delimited floats (one per distributional parameter per response variable), a`float`

(applied to all responses), or`None`

, in which case the prior is inferred from**prior_sd_scaling_coefficient**and the empirical variance of the response on the training set.**Default**: None**irf_param_prior_sd**:`str`

or`float`

; Standard deviation of prior on convolutional IRF parameters. Can be either a space-delimited list of`;`

-delimited floats (one per distributional parameter per response variable) or a`float`

(applied to all responses)**Default**: 1.0**y_sd_prior_sd**:`float`

or`None`

; Standard deviation of prior on standard deviation of output model. If`None`

, inferred as**y_sd_prior_sd_scaling_coefficient**times the empirical variance of the response on the training set.**Default**: None**prior_sd_scaling_coefficient**:`float`

; Factor by which to multiply priors on intercepts and coefficients if inferred from the empirical variance of the data (i.e. if**intercept_prior_sd**or**coef_prior_sd**is`None`

). Ignored for any prior widths that are explicitly specified.**Default**: 1**y_sd_prior_sd_scaling_coefficient**:`float`

; Factor by which to multiply prior on output model variance if inferred from the empirical variance of the data (i.e. if**y_sd_prior_sd**is`None`

). Ignored if prior width is explicitly specified.**Default**: 1**ranef_to_fixef_prior_sd_ratio**:`float`

; Ratio of widths of random to fixed effects priors. I.e. if less than 1, random effects have tighter priors.**Default**: 0.1**posterior_to_prior_sd_ratio**:`float`

; Ratio of posterior initialization SD to prior SD. Low values are often beneficial to stability, convergence speed, and quality of final fit by avoiding erratic sampling and divergent behavior early in training.**Default**: 0.01

### All CDRNN models¶

**center_X_time**:`bool`

; Whether to center time values as inputs under the hood. Times are automatically shifted back to the source location for plotting and model criticism.**Default**: False**center_t_delta**:`bool`

; Whether to center time offset values under the hood. Offsets are automatically shifted back to the source location for plotting and model criticism.**Default**: False**rescale_X_time**:`bool`

; Whether to rescale time values as inputs by their training SD under the hood. Times are automatically reconverted back to the source scale for plotting and model criticism.**Default**: True**rescale_t_delta**:`bool`

; Whether to rescale time offset values by their training SD under the hood. Offsets are automatically reconverted back to the source scale for plotting and model criticism.**Default**: True**nonstationary**:`bool`

; Whether to model non-stationarity by feeding impulse timestamps as input.**Default**: True**n_layers_input_projection**:`int`

or`None`

; Number of hidden layers in input projection. If`None`

, inferred from length of**n_units_input_projection**.**Default**: 2**n_units_input_projection**:`int`

,`str`

or`None`

; Number of units per input projection hidden layer. Can be an`int`

, which will be used for all layers, or a`str`

with**n_layers_rnn**space-delimited integers, one for each layer in order from bottom to top. If`0`

or`None`

, no hidden layers in input projection.**Default**: 32**n_layers_rnn**:`int`

or`None`

; Number of RNN layers. If`None`

, inferred from length of**n_units_rnn**.**Default**: None**n_units_rnn**:`int`

,`str`

or`None`

; Number of units per RNN layer. Can be an`int`

, which will be used for all layers, or a`str`

with**n_layers_rnn**space-delimited integers, one for each layer in order from bottom to top. Can also be`'infer'`

, which infers the size from the number of predictors, or`'inherit'`

, which uses size**n_units_hidden_state**. If`0`

or`None`

, no RNN encoding (i.e. use a context-independent convolution kernel).**Default**: None**n_layers_rnn_projection**:`int`

or`None`

; Number of hidden layers in projection of RNN state (or of timestamp + predictors if no RNN). If`None`

, inferred automatically.**Default**: None**n_units_rnn_projection**:`int`

,`str`

or`None`

; Number of units per hidden layer in projection of RNN state. Can be an`int`

, which will be used for all layers, or a`str`

with**n_units_rnn_projection**space-delimited integers, one for each layer in order from bottom to top. If`0`

or`None`

, no hidden layers in RNN projection.**Default**: None**n_units_hidden_state**:`int`

or`str`

; Number of units in CDRNN hidden state. Must be an`int`

.**Default**: 32**n_layers_irf**:`int`

or`None`

; Number of IRF hidden layers. If`None`

, inferred from length of**n_units_irf**.**Default**: 2**n_units_irf**:`int`

,`str`

or`None`

; Number of units per hidden layer in IRF. Can be an`int`

, which will be used for all layers, or a`str`

with**n_units_irf**space-delimited integers, one for each layer in order from bottom to top. If`0`

or`None`

, no hidden layers.**Default**: 32**input_projection_inner_activation**:`str`

or`None`

; Name of activation function to use for hidden layers in input projection.**Default**: gelu**input_projection_activation**:`str`

or`None`

; Name of activation function to use for output of input projection.**Default**: None**rnn_activation**:`str`

or`None`

; Name of activation to use in RNN layers.**Default**: tanh**recurrent_activation**:`str`

or`None`

; Name of recurrent activation to use in RNN layers.**Default**: sigmoid**rnn_projection_inner_activation**:`str`

or`None`

; Name of activation function to use for hidden layers in projection of RNN state.**Default**: gelu**rnn_projection_activation**:`str`

or`None`

; Name of activation function to use for final layer in projection of RNN state.**Default**: None**hidden_state_activation**:`str`

or`None`

; Name of activation function to use for CDRNN hidden state.**Default**: gelu**irf_inner_activation**:`str`

or`None`

; Name of activation function to use for hidden layers in IRF.**Default**: gelu**irf_activation**:`str`

or`None`

; Name of activation function to use for final layer in IRF.**Default**: None**kernel_initializer**:`str`

or`None`

; Name of initializer to use in encoder kernels.**Default**: glorot_uniform_initializer**recurrent_initializer**:`str`

or`None`

; Name of initializer to use in encoder recurrent kernels.**Default**: orthogonal_initializer**batch_normalization_decay**:`float`

or`None`

; Decay rate to use for batch normalization in internal layers. If`None`

, no batch normalization.**Default**: None**normalize_input_projection**:`bool`

; Whether to apply normalization (if applicable) to hidden layers of the input projection.**Default**: True**normalize_h**:`bool`

; Whether to apply normalization (if applicable) to the hidden state.**Default**: True**normalize_irf_l1**:`bool`

; Whether to apply normalization (if applicable) to the first IRF layer.**Default**: False**normalize_irf**:`bool`

; Whether to apply normalization (if applicable) to non-initial internal IRF layers.**Default**: False**normalize_after_activation**:`bool`

; Whether to apply normalization (if applicable) after the non-linearity (otherwise, applied before).**Default**: True**normalization_use_gamma**:`bool`

; Whether to use trainable scale in batch/layer normalization layers.**Default**: True**layer_normalization_type**:`str`

or`None`

; Type of layer normalization, one of`['z', 'length', None]`

. If`'z'`

, classical z-transform-based normalization. If`'length'`

, normalize by the norm of the activation vector. If`None`

, no layer normalization. Incompatible with batch normalization.**Default**: None**nn_regularizer_name**:`str`

,`"inherit"`

or`None`

; Name of weight regularizer (e.g.`l1_regularizer`

,`l2_regularizer`

); overrides**regularizer_name**. If`'inherit'`

, inherits**regularizer_name**. If`None`

, no regularization.**Default**: l2_regularizer**nn_regularizer_scale**:`str`

,`float`

or`"inherit"`

; Scale of weight regularizer (ignored if`regularizer_name==None`

). If`'inherit'`

, inherits**regularizer_scale**.**Default**: 1.0**context_regularizer_name**:`str`

,`"inherit"`

or`None`

; Name of regularizer on contribution of context (RNN) to hidden state (e.g.`l1_regularizer`

,`l2_regularizer`

); overrides**regularizer_name**. If`'inherit'`

, inherits**regularizer_name**. If`None`

, no regularization.**Default**: l2_regularizer**context_regularizer_scale**:`float`

or`"inherit"`

; Scale of weight regularizer (ignored if`context_regularizer_name==None`

). If`'inherit'`

, inherits**regularizer_scale**.**Default**: 1.0**maxnorm**:`float`

or`None`

; Bound on norm of dense kernel dimensions for max-norm regularization. If`None`

, no max-norm regularization.**Default**: None**input_dropout_rate**:`float`

or`None`

; Rate at which to drop input_features.**Default**: None**input_projection_dropout_rate**:`float`

or`None`

; Rate at which to drop neurons of input projection layers.**Default**: 0.2**rnn_h_dropout_rate**:`float`

or`None`

; Rate at which to drop neurons of RNN hidden state.**Default**: None**rnn_c_dropout_rate**:`float`

or`None`

; Rate at which to drop neurons of RNN cell state.**Default**: None**h_in_dropout_rate**:`float`

or`None`

; Rate at which to drop neurons of h_in.**Default**: 0.2**h_rnn_dropout_rate**:`float`

or`None`

; Rate at which to drop neurons of h_rnn.**Default**: 0.2**h_dropout_rate**:`float`

or`None`

; Rate at which to drop neurons of h.**Default**: None**rnn_dropout_rate**:`float`

or`None`

; Rate at which to entirely drop the RNN.**Default**: None**irf_dropout_rate**:`float`

or`None`

; Rate at which to drop neurons of IRF layers.**Default**: 0.2**ranef_dropout_rate**:`float`

or`None`

; Rate at which to drop random effects indicators.**Default**: 0.2

### CDRNNMLE¶

**weight_sd_init**:`float`

or`str`

; Standard deviation of kernel initialization distribution (Normal, mean=0). Can also be`'glorot'`

, which uses the SD of the Glorot normal initializer.**Default**: glorot

### CDRNNBayes¶

**declare_priors_weights**:`bool`

; Specify Gaussian priors for all fixed model parameters (if`False`

, use implicit improper uniform priors).**Default**: True**declare_priors_biases**:`bool`

; Specify Gaussian priors for model biases (if`False`

, use implicit improper uniform priors).**Default**: False**declare_priors_gamma**:`bool`

; Specify Gaussian priors for gamma parameters of any batch normalization layers (if`False`

, use implicit improper uniform priors).**Default**: False**weight_prior_sd**:`str`

or`float`

; Standard deviation of prior on CDRNN hidden weights. A`float`

,`'glorot'`

, or`'he'`

.**Default**: glorot**bias_prior_sd**:`str`

or`float`

; Standard deviation of prior on CDRNN hidden biases. A`float`

,`'glorot'`

, or`'he'`

.**Default**: 1.0**gamma_prior_sd**:`str`

or`float`

; Standard deviation of prior on batch norm gammas. A`float`

,`'glorot'`

, or`'he'`

. Ignored unless batch normalization is used**Default**: glorot**y_sd_prior_sd**:`float`

or`None`

; Standard deviation of prior on standard deviation of output model. If`None`

, inferred as**y_sd_prior_sd_scaling_coefficient**times the empirical variance of the response on the training set.**Default**: None**prior_sd_scaling_coefficient**:`float`

; Factor by which to multiply priors on intercepts and coefficients if inferred from the empirical variance of the data (i.e. if**intercept_prior_sd**or**coef_prior_sd**is`None`

). Ignored for any prior widths that are explicitly specified.**Default**: 1**y_sd_prior_sd_scaling_coefficient**:`float`

; Factor by which to multiply prior on output model variance if inferred from the empirical variance of the data (i.e. if**y_sd_prior_sd**is`None`

). Ignored if prior width is explicitly specified.**Default**: 1**ranef_to_fixef_prior_sd_ratio**:`float`

; Ratio of widths of random to fixed effects priors. I.e. if less than 1, random effects have tighter priors.**Default**: 0.1**weight_sd_init**:`str`

,`float`

or`None`

; Initial standard deviation of variational posterior over weights. If`None`

, inferred from other hyperparams.**Default**: None**bias_sd_init**:`str`

,`float`

or`None`

; Initial standard deviation of variational posterior over biases. If`None`

, inferred from other hyperparams.**Default**: None**gamma_sd_init**:`str`

,`float`

or`None`

; Initial standard deviation of variational posterior over batch norm gammas. If`None`

, inferred from other hyperparams. Ignored unless batch normalization is used.**Default**: None**posterior_to_prior_sd_ratio**:`float`

; Ratio of posterior initialization SD to prior SD. Low values are often beneficial to stability, convergence speed, and quality of final fit by avoiding erratic sampling and divergent behavior early in training.**Default**: 0.01

## Section: `[irf_name_map]`

¶

The optional `[irf_name_map]`

section simply permits prettier variable naming in plots.
For example, the internal name for a convolution applied to predictor `A`

may be `ShiftedGammaKgt1.s(A)-Terminal.s(A)`

, which is not very readable.
To address this, the string above can be mapped to a more readable name using an INI key-value pair, as shown:

```
ShiftedGammaKgt1.s(A)-Terminal.s(A) = A
```

The model will then print `A`

in plots rather than `ShiftedGammaKgt1.s(A)-Terminal.s(A)`

.
Unused entries in the name map are ignored, and model variables that do not have an entry in the name map print with their default internal identifier.

## Sections: `[model_CDR_*]`

¶

Arbitrarily many sections named `[model_CDR_*]`

can be provided in the config file, where `*`

stands in for a unique identifier.
Each such section defines a different CDR model and must contain at least one field — `formula`

— whose value is a CDR model formula (see CDR Model Formulas for more on CDR formula syntax)
The identifier `CDR_*`

will be used by the CDR utilities to reference the fitted model and its output files.

For example, to define a CDR model called `readingtimes`

, the section header `[model_CDR_readingtimes]`

is included in the config file along with an appropriate `formula`

specification.
To use this specific model once fitted, it can be referenced using the identifier `CDR_readingtimes`

.
For example, the following call will extract predictions on dev data from a fitted `CDR_readingtimes`

defined in config file **config.ini**:

```
python -m cdr.bin.predict config.ini -m CDR_readingtimes -p dev
```

Additional fields from `[cdr_settings]`

may be specified for a given model, in which case the locally-specified setting (rather than the globally specified setting or the default value) will be used to train the model.
For example, imagine that `[cdr_settings]`

contains the field `n_iter = 1000`

.
All CDR models subsequently specified in the config file will train for 1000 iterations.
However, imagine that model `[model_CDR_longertrain]`

should train for 5000 iterations instead.
This can be specified within the same config file as:

```
[model_CDR_longertrain]
n_iter = 5000
formula = ...
```

This setup allows a single config file to define a variety of CDR models, as long as they all share the same data. Distinct datasets require distinct config files.

For hypothesis testing, fixed effect ablation can be conveniently automated using the `ablate`

model field.
For example, the following specification implicitly defines 7 unique models, one for each of the `|powerset(a, b, c)| - 1 = 7`

non-null ablations of `a`

, `b`

, and `c`

:

```
[model_CDR_example]
n_iter = 5000
ablate = a b c
formula = C(a + b + c, Normal()) + (C(a + b + c, Normal()) | subject)
```

The ablated models are named using `'!'`

followed by the ablated impulse name for each ablated impulse.
Therefore, the above specification is equivalent to (and much easier to write than) the following:

```
[model_CDR_example]
n_iter = 5000
formula = C(a + b + c, Normal()) + (C(a + b + c, Normal()) | subject)
[model_CDR_example!a]
n_iter = 5000
formula = C(b + c, Normal()) + (C(a + b + c, Normal()) | subject)
[model_CDR_example!b]
n_iter = 5000
formula = C(a + c, Normal()) + (C(a + b + c, Normal()) | subject)
[model_CDR_example!c]
n_iter = 5000
formula = C(a + b, Normal()) + (C(a + b + c, Normal()) | subject)
[model_CDR_example!a!b]
n_iter = 5000
formula = C(c, Normal()) + (C(a + b + c, Normal()) | subject)
[model_CDR_example!a!c]
n_iter = 5000
formula = C(b, Normal()) + (C(a + b + c, Normal()) | subject)
[model_CDR_example!b!c]
n_iter = 5000
formula = C(a, Normal()) + (C(a + b + c, Normal()) | subject)
```