CDR Package API

Complete API for all public classes and methods in this package.

cdr.backend module

cdr.config module

class cdr.config.Config(path)[source]

Bases: object

Parses an *.ini file and stores settings needed to define a set of CDR experiments.

Parameters:: path – Path to *.ini file

build_cdr_settings(settings, add_defaults=True, global_settings=None, is_cdr=True, is_cdrnn=False)[source]

Given a settings object parsed from a config file, compute CDR parameter dictionary.

Parameters:

settings – settings from a ConfigParser object.
add_defaults – bool; whether to add default settings not explicitly specified in the config.
global_settings – dict or None; dictionary of global defaults for parameters missing from settings.
is_cdr – bool; whether this is a CDR(NN) model.
is_cdrnn – bool; whether this is a CDRNN model.

Returns:

dict; dictionary of settings key-value pairs.

expand_submodels()[source]

Expand models into cross-validation folds and/or ensembles.

return: None

set_model(model_name=None)[source]

Change internal state to that of model named model_name. Config instances can store settings for multiple models. set_model() determines which model’s settings are returned by Config getter methods.

Parameters:: model_name – str; name of target model
Returns:: None

class cdr.config.PlotConfig(path=None)[source]

Bases: object

Parses an *.ini file and stores settings needed to define CDR plots

Parameters:: path – Path to *.ini file

build_plot_settings(settings)[source]

Given a settings object parsed from a config file, compute plot parameters.

Parameters:: settings – settings from a ConfigParser object.
Returns:: dict; dictionary of settings key-value pairs.

cdr.data module

cdr.data.add_responses(names, y)[source]

Add response variable(s) to a dataframe, applying any preprocessing required by the formula string.

Parameters:

names – str or list of str; name(s) of dependent variable(s)
y – pandas DataFrame; response data.

Returns:

pandas DataFrame; response data with any missing ops applied.

cdr.data.build_CDR_impulse_data(X, first_obs, last_obs, X_in_Y_names=None, X_in_Y=None, impulse_names=None, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]

Construct impulse data arrays in the required format for CDR fitting/evaluation for a single response array.

Parameters:

X – list of pandas tables; impulse (predictor) data.
first_obs – list of index vectors (list, pandas series, or numpy vector) of first observations; the list contains vectors of row indices, one for each element of X, of the first impulse in the time series associated with the response. If None, inferred from Y.
last_obs – list of index vectors (list, pandas series, or numpy vector) of last observations; the list contains vectors of row indices, one for each element of X, of the last impulse in the time series associated with the response. If None, inferred from Y.
X_in_Y_names – list of str; names of predictors contained in Y rather than X. If None, no such predictors.
X_in_Y – pandas DataFrame or None; table of predictors contained in Y rather than X. If None, no such predictors.
impulse_names – list of str; names of columns in X to be used as impulses by the model. If None, all columns returned.
history_length – int; maximum number of history (backward) observations.
future_length – int; maximum number of future (forward) observations.
int_type – str; name of int type.
float_type – str; name of float type.

Returns:

triple of numpy arrays; let N, T, I, R respectively be the number of rows in Y, history length, number of impulse dimensions, and number of response dimensions. Outputs are (1) impulses with shape (N, T, I), (2) impulse timestamps with shape (N, T, I), and impulse mask with shape (N, T, I).

cdr.data.build_CDR_response_data(responses, Y=None, first_obs=None, last_obs=None, Y_time=None, Y_gf=None, X_in_Y_names=None, X_in_Y=None, Y_category_map=None, response_to_df_ix=None, gf_names=None, gf_map=None)[source]

Construct response data arrays in the required format for CDR fitting/evaluation for one or more response arrays.

Parameters:

responses – list of str; names of columns in Y to be used as responses (dependent variables) by the model.
Y – list of pandas tables, or None; response data. If None, does not return a response array.
first_obs – list of list of index vectors (list, pandas series, or numpy vector) of first observations, or None; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the first impulse in the time series associated with each response. If None, inferred from Y.
last_obs – list of list of index vectors (list, pandas series, or numpy vector) of last observations, or None; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the last impulse in the time series associated with each response. If None, inferred from Y.
Y_time – list of response timestamp vectors (list, pandas series, or numpy vector), or None; vector(s) of response timestamps, one for each response array. Needed to timestamp any response-aligned predictors (ignored if none in model).
Y_gf – list of pandas DataFrame, or None; vector(s) of response timestamps, one for each response array. Data frames containing random grouping factor levels, if applicable.
X_in_Y_names – list of str; names of predictors contained in Y rather than X (must be present in all elements of Y). If None, no such predictors.
X_in_Y – list of pandas DataFrame or None; tables (one per response array) of predictors contained in Y rather than X (must be present in all elements of Y). If None, no such predictors.
Y_category_map – dict or None; map from category labels to integers for each categorical response.
response_to_df_ix – dict or None; map from response names to lists of indices of the response files that contain them.
gf_names – list or None; list of names of random grouping factor variables. If None and Y_gf provided, will use all columns of Y_gf.
gf_map – list of dict or None; list maps from random grouping factor levels to their indices, one map per grouping factor variable in gf_names.

Returns:

7-tuple of numpy arrays; let N, R, XF, YF, Z, and K respectively be the number of rows (sum total number of rows in Y), number of response dimensions, number of distinct predictor files (X), number of distinct response files (Y), number of random grouping factor variables, and number of response_aligned predictors. Outputs are (1) responses with shape (N, R) or None if Y is None, (2) an XF-tuple of first observation vectors indexing start indices for each entry in X, (3) a YF-tuple of first observation vectors indexing end indices for each entry in X, (4) response timestamps with shape (N,), (5) response masks (masking out any missing response variables per row) with shape (N, R), (6) random grouping factor matrix with shape (N, Z), or None if no random grouping factors provided, and (7) response-aligned predictors with shape (N, K).

cdr.data.c(df)[source]

Zero-center pandas series or data frame

Parameters:: df – pandas Series or DataFrame; input date
Returns:: pandas Series or DataFrame; centered data

cdr.data.compare_elementwise_perf(a, b, y=None, mode='err')[source]

Compare model performance elementwise.

Parameters:

a – numpy vector; vector of elementwise scores (or predictions if mode is corr) for model a.
b – numpy vector; vector of elementwise scores (or predictions if mode is corr) for model b.
y – numpy vector or None; vector of observations. Used only if mode is corr.
mode – str; Type of performance metric. One of err, loglik, or corr.

Returns:

numpy vector; vector of elementwise performance differences

cdr.data.compute_filter(y, field, cond)[source]

Compute filter given a field and condition

Parameters:

y – pandas DataFrame; response data.
field – str; name of column on whose values to filter.
cond – str; string representation of condition to use for filtering.

Returns:

numpy vector; boolean mask to use for pandas subsetting operations.

cdr.data.compute_filters(Y, filters=None)[source]

Compute filters given a filter map.

Parameters:

Y – pandas DataFrame; response data.
filters – list; list of key-value pairs mapping column names to filtering criteria for their values.

Returns:

numpy vector; boolean mask to use for pandas subsetting operations.

cdr.data.compute_partition(y, modulus, n)[source]

Given a splitID column, use modular arithmetic to partition data into n subparts.

Parameters:

y – pandas DataFrame; response data.
modulus – int; modulus to use for splitting, must be at least as large as n.
n – int; number of subparts in the partition.

Returns:

list of numpy vectors; one boolean vector per subpart of the partition, selecting only those elements of y that belong.

cdr.data.compute_splitID(y, split_fields)[source]

Map tuples in columns designated by split_fields into integer ID to use for data partitioning.

Parameters:

y – pandas DataFrame; response data.
split_fields – list of str; column names to use for computing split ID.

Returns:

numpy vector; integer vector of split ID’s.

cdr.data.compute_time_mask(X_time, first_obs, last_obs, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]

Compute mask for expanded impulse data zeroing out non-existent impulses.

Parameters:

X_time – pandas Series; timestamps associated with each impulse in X.
first_obs – pandas Series; vector of row indices in X of the first impulse in the time series associated with each response.
last_obs – pandas Series; vector of row indices in X of the last preceding impulse in the time series associated with each response.
history_length – int; maximum number of history (backward) observations.
future_length – int; maximum number of future (forward) observations.
int_type – str; name of int type.
float_type – str; name of float type.

Returns:

numpy array; boolean impulse mask.

cdr.data.corr_cdr(X_2d, impulse_names, impulse_names_2d, time, time_mask)[source]

Compute correlation matrix, including correlations across time where necessitated by 2D predictors.

Parameters:

X_2d – numpy array; the impulse data. Must be of shape (batch_len, history_length+future_length, n_impulses), can be computed from sources by build_CDR_impulse_data().
impulse_names – list of str; names of columns in X_2d to be used as impulses by the model.
impulse_names_2d – list of str; names of columns in X_2d that designate to 2D predictors.
time – 3D numpy array; array of timestamps for each event in X_2d.
time_mask – 3D numpy array; array of masks over padding events in X_2d.

Returns:

pandas DataFrame; the correlation matrix.

cdr.data.expand_impulse_sequence(X, X_time, first_obs, last_obs, window_length, int_type='int32', float_type='float32', fill=0.0)[source]

Expand out impulse stream in X for each response in the target data.

Parameters:

X – pandas DataFrame; impulse (predictor) data.
X_time – pandas Series; timestamps associated with each impulse in X.
first_obs – pandas Series; vector of row indices in X of the first impulse in the time series associated with each response.
last_obs – pandas Series; vector of row indices in X of the last preceding impulse in the time series associated with each response.
window_length – int; number of steps in time dimension of output
int_type – str; name of int type.
float_type – str; name of float type.
fill – float; fill value for padding cells.

Returns:

3-tuple of numpy arrays; the expanded impulse array, the expanded timestamp array, and a boolean mask zeroing out locations of non-existent impulses.

cdr.data.filter_invalid_responses(Y, dv, crossval_factor=None, crossval_fold=None)[source]

Filter out rows with non-finite responses.

Parameters:

Y – pandas table or list of pandas tables; response data.
dv – str or list of str; name(s) of column(s) containing the dependent variable(s)
crossval_factor – str or None; name of column containing the selection variable for cross validation. If None, no cross validation filtering.
crossval_fold – list or None; list of valid values for cross-validation selection. Used only if crossval_factor is not None.

Returns:

2-tuple of pandas DataFrame and pandas Series; valid data and indicator vector used to filter out invalid data.

cdr.data.get_first_last_obs_lists(y)[source]

Convenience utility to extract out all first_obs and last_obs columns in Y sorted by file index

Parameters:: y – pandas DataFrame; response data.
Returns:: pair of list of str; first_obs column names and last_obs column names

cdr.data.get_rangf_array(Y, rangf_names, rangf_map)[source]

Collect random grouping factor indicators as numpy integer arrays that can be read by Tensorflow. Returns vertical concatenation of GF arrays from each element of Y.

Parameters:

Y – pandas table or list of pandas tables; response data.
rangf_names – list of str; names of columns containing random grouping factor levels (order is preserved, changing the order will change the resulting array).
rangf_map – list of dict; map for each random grouping factor from levels to unique indices.

Returns:

cdr.data.get_time_windows(X, Y, series_ids, forward=False, window_length=128, t_delta_cutoff=None, verbose=True)[source]

Compute row indices in X of initial and final impulses for each element of y. Assumes time series are already sorted by series_ids.

Parameters:

X – pandas DataFrame; impulse (predictor) data.
Y – pandas DataFrame; response data.
series_ids – list of str; column names whose jointly unique values define unique time series.
forward – bool; whether to compute forward windows (future inputs) or backward windows (past inputs, used if forward is False).
window_length – int; maximum size of time window to consider. If np.inf, no bound on window size.
t_delta_cutoff – float or None; maximum distance in time to consider (can help improve training stability on data with large gaps in time). If 0 or None, no cutoff.
verbose – bool; whether to report progress to stderr

Returns:

2-tuple of numpy vectors; first and last impulse observations (respectively) for each response in y

cdr.data.preprocess_data(X, Y, formula_list, series_ids, filters=None, history_length=128, future_length=0, t_delta_cutoff=None, all_interactions=False, verbose=True, debug=False)[source]

Preprocess CDR data.

Parameters:

X – list of pandas tables; impulse (predictor) data.
Y – list of pandas tables; response data.
formula_list – list of Formula; CDR formula for which to preprocess data.
series_ids – list of str; column names whose jointly unique values define unique time series.
filters – list; list of key-value pairs mapping column names to filtering criteria for their values.
history_length – int; maximum number of history (backward) observations.
future_length – int; maximum number of future (forward) observations.
t_delta_cutoff – float or None; maximum distance in time to consider (can help improve training stability on data with large gaps in time). If 0 or None, no cutoff.
all_interactions – bool; add powerset of all conformable interactions.
verbose – bool; whether to report progress to stderr
debug – bool; print debugging information

Returns:

7-tuple; predictor data, response data, filtering mask, response-aligned predictor names, response-aligned predictors, 2D predictor names, and 2D predictors

cdr.data.s(df)[source]

Rescale pandas series or data frame by its standard deviation

Parameters:: df – pandas Series or DataFrame; input date
Returns:: pandas Series or DataFrame; rescaled data

cdr.data.split_cdr_outputs(outputs, lengths)[source]

Takes a dictionary of arbitrary depth containing CDR outputs with their labels as keys and splits each output into a list of outputs with lengths corresponding to lengths. Useful for aligning CDR outputs to response files, since multiple response files can be provided, which are underlyingly concatenated by CDR. Recursively modifies the dict in place.

Parameters:

outputs – dict of arbitrary depth with numpy arrays at the leaves; the source CDR outputs
lengths – array-like vector of lengths to split the outputs into

Returns:

dict; same key-val structure as outputs but with each leaf split into a list of len(lengths) vectors, one for each length value.

cdr.data.z(df)[source]

Z-transform pandas series or data frame

Parameters:: df – pandas Series or DataFrame; input date
Returns:: pandas Series or DataFrame; z-transformed data

cdr.formula module

class cdr.formula.Formula(bform_str, standardize=True)[source]

Bases: object

A class for parsing R-style mixed-effects CDR model formula strings and applying them to CDR data matrices.

Parameters:: bform_str – str; an R-style mixed-effects CDR model formula string

ablate_impulses(impulse_ids)[source]

Remove impulses in impulse_ids from fixed effects (retaining in any random effects).

Parameters:: impulse_ids – list of str; impulse ID’s
Returns:: None

apply_formula(X, Y, X_in_Y_names=None, all_interactions=False, series_ids=None)[source]

Extract all data and compute all transforms required by the model formula.

Parameters:

X – list of pandas tables; impulse data.
Y – list of pandas tables; response data.
X_in_Y_names – list or None; List of column names for response-aligned predictors (predictors measured for every response rather than for every input) if applicable, None otherwise.
all_interactions – bool; add powerset of all conformable interactions.
series_ids – list of str or None; list of ids to use as grouping factors for lagged effects. If None, lagging will not be attempted.

Returns:

triple; transformed X, transformed y, response-aligned predictor names

apply_op(op, arr)[source]

Apply op op to array arr.

Parameters:

op – str; name of op.
arr – numpy or pandas array; source data.

Returns:

numpy array; transformed data.

apply_op_2d(op, arr, time_mask)[source]

Apply op to 2D predictor (predictor whose value depends on properties of the response).

Parameters:

op – str; name of op.
arr – numpy or array; source data.
time_mask – numpy array; mask for padding cells

Returns:

numpy array; transformed data

apply_ops(impulse, X)[source]

Apply all ops defined for an impulse

Parameters:

impulse – Impulse object; the impulse.
X – list of pandas tables; table containing the impulse data.

Returns:

pandas table; table augmented with transformed impulse.

apply_ops_2d(impulse, X_2d_predictor_names, X_2d_predictors, time_mask)[source]

Apply all ops defined for a 2D predictor (predictor whose value depends on properties of the response).

Parameters:

impulse – Impulse object; the impulse.
X_2d_predictor_names – list of str; names of 2D predictors.
X_2d_predictors – numpy array; source data.
time_mask – numpy array; mask for padding cells

Returns:

2-tuple; list of new predictor name, numpy array of predictor values

static bases(family)[source]

Get the number of bases of a spline kernel.

Parameters:: family – str; name of IRF family
Returns:: int or None; number of bases of spline kernel, or None if family is not a spline.

build(bform_str, standardize=True)[source]

Construct internal data from formula string

Parameters:: bform_str – str; source string.
Returns:: None

categorical_transform(X)[source]

Get transformed formula with categorical predictors in X expanded.

Parameters:: X – list of pandas tables; input data.
Returns:: Formula; transformed Formula object

compute_2d_predictor(predictor_name, X, first_obs, last_obs, history_length=128, future_length=None, minibatch_size=50000)[source]

Compute 2D predictor (predictor whose value depends on properties of the most recent impulse).

Parameters:

predictor_name – str; name of predictor
X – pandas table; input data
first_obs – pandas Series or 1D numpy array; row indices in X of the start of the series associated with each regression target.
last_obs – pandas Series or 1D numpy array; row indices in X of the most recent observation in the series associated with each regression target.
minibatch_size – int; minibatch size for computing predictor, can help with memory footprint

Returns:

2-tuple; new predictor name, numpy array of predictor values

initialize_nns()[source]

Initialize a dictionary mapping ids to metadata for all NN components in this CDR model

Returns:: dict; mapping from NN str id to NN object storing metadata for that NN.

insert_impulses(impulses, irf_str, rangf=None)[source]

Insert impulses in impulse_ids into fixed effects and all random terms.

Parameters:: impulse_ids – list of str; impulse ID’s
Returns:: None

static irf_params(family)[source]

Return list of parameter names for a given IRF family.

Parameters:: family – str; name of IRF family
Returns:: list of str; parameter names

static is_LCG(family)[source]

Check whether a kernel is LCG.

Parameters:: family – str; name of IRF family
Returns:: bool; whether the kernel is LCG (linear combination of Gaussians)

pc_transform(n_pc, pointers=None)[source]

Get transformed formula with impulses replaced by principal components.

Parameters:

n_pc – int; number of principal components in transform.
pointers – dict; map from source nodes to transformed nodes.

Returns:

list of IRFNode; tree forest representing current state of the transform.

process_ast(t, terms=None, has_intercept=None, ops=None, rangf=None, impulses_by_name=None, interactions_by_name=None, under_irf=False, under_interaction=False)[source]

Recursively process a node of the Python abstract syntax tree (AST) representation of the formula string and insert data into internal representation of model formula.

Parameters:

t – AST node.
terms – list or None; CDR terms computed so far, or None if no CDR terms computed.
has_intercept – dict; map from random grouping factors to boolean values representing whether that grouping factor has a random intercept. None is used as a key to refer to the population-level intercept.
ops – list; names of ops computed so far, or None if no ops computed.
rangf – str or None; name of rangf for random term currently being processed, or None if currently processing fixed effects portion of model.

Returns:

None

process_irf(t, input_irf, ops=None, rangf=None, nn_inputs=None, impulses_by_name=None, interactions_by_name=None)[source]

Process data from AST node representing part of an IRF definition and insert data into internal representation of the model.

Parameters:

t – AST node.
input_irf – IRFNode, Impulse, InterationImpulse, or NNImpulse object; child IRF of current node
ops – list of str, or None; ops applied to IRF. If None, no ops applied
rangf – str or None; name of rangf for random term currently being processed, or None if currently processing fixed effects portion of model.
nn_inputs – tuple or None; tuple of input impulses to neural network IRF, or None if not a neural network IRF.

Returns:

IRFNode object; the IRF node

re_transform(X)[source]

Get transformed formula with regex predictors expanded based on matches to the columns in X.

Parameters:: X – list of pandas tables; input data.
Returns:: Formula; transformed Formula object

remove_impulses(impulse_ids)[source]

Remove impulses in impulse_ids from the model (both fixed and random effects).

Parameters:: impulse_ids – list of str; impulse ID’s
Returns:: None

response_names()[source]

Get list of names modeled response variables.

Returns:: list of str; names modeled response variables.

responses()[source]

Get list of modeled response variables.

Returns:: list of Impulse; modeled response variables.

to_lmer_formula_string(z=False, correlated=True)[source]

Generate an lme4-style LMER model string representing the structure of the current CDR model. Useful for 2-step analysis in which data are transformed using CDR, then fitted using LME.

Parameters:

z – bool; z-transform convolved predictors.
correlated – bool; whether to use correlated random intercepts and slopes.

Returns:

str; the LMER formula string.

to_string(t=None)[source]

Stringify the formula, using t as the RHS.

Parameters:: t – IRFNode or None; IRF node to use as RHS. If None, uses root IRF associated with Formula instance.
Returns:: str; stringified formula.

unablate_impulses(impulse_ids)[source]

Insert impulses in impulse_ids into fixed effects (leaving random effects structure unchanged).

Parameters:: impulse_ids – list of str; impulse ID’s
Returns:: None

class cdr.formula.IRFNode(family=None, impulse=None, p=None, irfID=None, coefID=None, ops=None, fixed=True, rangf=None, nn_impulses=None, nn_config=None, impulses_as_inputs=True, inputs_to_add=None, inputs_to_drop=None, param_init=None, trainable=None, response_params_list=None)[source]

Bases: object

Data structure representing a node in a CDR IRF tree. For more information on how the CDR IRF structure is encoded as a tree, see the reference on CDR IRF trees.

Parameters:

family – str; name of IRF kernel family.
impulse – Impulse object or None; the impulse if terminal, else None.
p – IRFNode object or None; the parent IRF node, or None if no parent (parent nodes can be connected after initialization).
irfID – str or None; string ID of node if applicable. If None, automatically-generated ID will discribe node’s family and structural position.
coefID – str or None; string ID of coefficient if applicable. If None, automatically-generated ID will discribe node’s family and structural position. Only applicable to terminal nodes, so this property will not be used if the node is non-terminal.
ops – list of str, or None; ops to apply to IRF node. If None, no ops.
fixed – bool; Whether node exists in the model’s fixed effects structure.
rangf – list of str, str, or None; names of any random grouping factors associated with the node.
nn_impulses – tuple or None; tuple of input impulses to neural network IRF, or None if not a neural network IRF.
nn_config – dict or None; dictionary of settings for NN IRF component.
impulses_as_inputs – bool; whether to include impulses in input of a neural network IRF.
inputs_to_add – list of Impulse/NNImpulse or None; list of impulses to add to input of neural network IRF.
inputs_to_drop – list of Impulse/NNImpulse or None; list of impulses to remove from input of neural network IRF (keeping them in output).
param_init – dict; map from parameter names to initial values, which will also be used as prior means.
trainable – list of str, or None; trainable parameters at this node. If None, all parameters are trainable.
response_params_list – list of 2-tuple of str, or None; Response distribution parameters modeled by this IRF, with each parameter represented as a pair (DIST_NAME, PARAM_NAME). DIST_NAME can be None, in which case the IRF will apply to any distribution parameter matching PARAM_NAME.

ablate_impulses(impulse_ids)[source]

Remove impulses in impulse_ids from fixed effects (retaining in any random effects).

Parameters:: impulse_ids – list of str; impulse ID’s
Returns:: None

add_child(t)[source]

Add child to this node in the IRF tree

Parameters:: t – IRFNode; child node.
Returns:: IRFNode; child node with updated parent.

add_interactions(response_interactions)[source]

Add a ResponseInteraction object (or list of them) to this node.

Parameters:: response_interaction – ResponseInteraction or list of ResponseInteraction; response interaction(s) to add
Returns:: None

add_rangf(rangf)[source]

Add random grouping factor name to this node.

Parameters:: rangf – str; random grouping factor name
Returns:: None

atomic_irf_by_family()[source]

Get map from IRF kernel family names to list of IDs of IRFNode instances belonging to that family.

Returns:: dict from str to list of str; IRF IDs by family.

atomic_irf_param_init_by_family()[source]

Get map from IRF kernel family names to maps from IRF IDs to maps from IRF parameter names to their initialization values.

Returns:: dict; parameter initialization maps by family.

atomic_irf_param_trainable_by_family()[source]

Get map from IRF kernel family names to maps from IRF IDs to lists of trainable parameters.

Returns:: dict; trainable parameter maps by family.

bases()[source]

Get the number of bases of node.

Returns:: int or None; number of bases of node, or None if node is not a spline.

categorical_transform(X, expansion_map=None)[source]

Generate transformed copy of node with categorical predictors in X expanded. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length-1 list representing a single-tree forest, in which case the transformed tree is accessible as the 0th element.

Parameters:

X – list of pandas tables; input data.
expansion_map – dict; Internal variable. Do not use.

Returns:

list of IRFNode; tree forest representing current state of the transform.

coef2impulse()[source]

Get map from coefficient IDs dominated by node to lists of corresponding impulses.

Returns:: dict; map from coefficient IDs to lists of corresponding impulses.

coef2terminal()[source]

Get map from coefficient IDs dominated by node to lists of corresponding terminal IRF nodes.

Returns:: dict; map from coefficient IDs to lists of corresponding terminal IRF nodes.

coef_by_rangf()[source]

Get map from random grouping factor names to associated coefficient IDs dominated by node.

Returns:: dict; map from random grouping factor names to associated coefficient IDs.

coef_id()[source]

Get coefficient ID for this node.

Returns:: str or None; coefficient ID, or None if non-terminal.

coef_names()[source]

Get list of names of coefficients dominated by node.

Returns:: list of str; names of coefficients dominated by node.

depth()[source]

Get depth of node in tree.

Returns:: int; depth

fixed_coef_names()[source]

Get list of names of fixed coefficients dominated by node.

Returns:: list of str; names of fixed coefficients dominated by node.

fixed_interaction_names()[source]

Get list of names of fixed interactions dominated by node.

Returns:: list of str; names of fixed interactions dominated by node.

formula_terms()[source]

Return data structure representing formula terms dominated by node, grouped by random grouping factor. Key None represents the fixed portion of the model (no random grouping factor).

Returns:: dict; map from random grouping factors to data structure representing formula terms. Data structure contains 2 fields, 'impulses' containing impulses and 'irf' containing IRF Nodes.

has_coefficient(rangf)[source]

Report whether rangf has any coefficients in this subtree

Parameters:: rangf – Random grouping factor
Returns:: bool: Whether rangf has any coefficients in this subtree

has_composed_irf()[source]

Check whether node dominates any IRF compositions.

Returns:: bool, whether node dominates any IRF compositions.

has_irf(rangf)[source]

Report whether rangf has any IRFs in this subtree

Parameters:: rangf – Random grouping factor
Returns:: bool: Whether rangf has any IRFs in this subtree

impulse2coef()[source]

Get map from impulses dominated by node to lists of corresponding coefficient IDs.

Returns:: dict; map from impulses to lists of corresponding coefficient IDs.

impulse2terminal()[source]

Get map from impulses dominated by node to lists of corresponding terminal IRF nodes.

Returns:: dict; map from impulses to lists of corresponding terminal IRF nodes.

impulse_names(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]

Get list of names of impulses dominated by node.

Parameters:

include_interactions – bool; whether to return impulses defined by interaction terms.
include_nn – bool; whether to return NN transformations of impulses.
include_nn_inputs – bool; whether to return input impulses to NN transformations.

Returns:

list of str; names of impulses dominated by node.

impulse_set(include_interactions=False, include_nn=False, include_nn_inputs=True, out=None)[source]

Get set of impulses dominated by node.

Parameters:

include_interactions – bool; whether to return impulses defined by interaction terms.
include_nn – bool; whether to return NN transformations of impulses.
include_nn_inputs – bool; whether to return input impulses to NN transformations.

:param set or None; initial dictionary to modify.

Returns:: list of Impulse; impulses dominated by node.

impulses(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]

Get alphabetically sorted list of impulses dominated by node.

Parameters:

include_interactions – bool; whether to return impulses defined by interaction terms.
include_nn – bool; whether to return NN transformations of impulses.
include_nn_inputs – bool; whether to return input impulses to NN transformations.

Returns:

list of Impulse; impulses dominated by node.

impulses_by_name(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]

Get dictionary mapping names of impulses dominated by node to their corresponding impulses.

Parameters:

include_interactions – bool; whether to return impulses defined by interaction terms.
include_nn – bool; whether to return NN transformations of impulses.
include_nn_inputs – bool; whether to return input impulses to NN transformations.

Returns:

list of Impulse; impulses dominated by node.

impulses_from_response_interaction()[source]

Get list of any impulses from response interactions associated with this node.

Returns:: list of Impulse; impulses dominated by node.

interaction_by_rangf()[source]

Get map from random grouping factor names to associated interaction IDs dominated by node.

Returns:: dict; map from random grouping factor names to associated interaction IDs.

interaction_names()[source]

Get list of names of interactions dominated by node.

Returns:: list of str; names of interactions dominated by node.

interactions()[source]

Return list of all response interactions used in this subtree, sorted by name.

Returns:: list of ResponseInteraction

interactions2inputs()[source]

Get map from IDs of ResponseInteractions dominated by node to lists of IDs of their inputs.

Returns:: dict; map from IDs of ResponseInteractions nodes to lists of their inputs.

irf_by_rangf()[source]

Get map from random grouping factor names to IDs of associated IRF nodes dominated by node.

Returns:: dict; map from random grouping factor names to IDs of associated IRF nodes.

irf_id()[source]

Get IRF ID for this node.

Returns:: str or None; IRF ID, or None if terminal.

irf_to_formula(rangf=None)[source]

Generates a representation of this node’s impulse response kernel in formula string syntax

Parameters:: rangf – random grouping factor for which to generate the stringification (fixed effects if rangf==None).
Returns:: str; formula string representation of node

is_LCG()[source]

Check the non-parametric type of a node’s kernel, or return None if parametric.

Parameters:: family – str; name of IRF family
Returns:: str or None; name of kernel type if non-parametric, else ``None.

local_name()[source]

Get descriptive name for this node, ignoring its position in the IRF tree.

Returns:: str; name.

name()[source]

Get descriptive name for this node.

Returns:: str; name.

nns_by_key(nns_by_key=None)[source]

Get a dict mapping NN keys to objects associated with them.

Parameters:: keys – dict or None; dictionary to modify. Empty if None.
Returns:: dict; map from string keys to list of associated IRFNode and/or NNImpulse objects.

node_table()[source]

Get map from names to nodes of all nodes dominated by node (including self).

Returns:: dict; map from names to nodes of all nodes dominated by node.

nonparametric_coef_names()[source]: Get list of names of nonparametric coefficients dominated by node. :return: list of str; names of spline coefficients dominated by node.

static pointers2namemmaps(p)[source]

Get a map from source to transformed IRF node names.

Parameters:: p – dict; map from source to transformed IRF nodes.
Returns:: dict; map from source to transformed IRF node names.

re_transform(X, expansion_map=None)[source]

Generate transformed copy of node with regex-matching predictors in X expanded. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length-1 list representing a single-tree forest, in which case the transformed tree is accessible as the 0th element.

Parameters:

X – list of pandas tables; input data.
expansion_map – dict; Internal variable. Do not use.

Returns:

list of IRFNode; tree forest representing current state of the transform.

remove_impulses(impulse_ids)[source]

Remove impulses in impulse_ids from the model (both fixed and random effects).

Parameters:: impulse_ids – list of str; impulse ID’s
Returns:: None

supports_non_causal()[source]

Check whether model contains only IRF kernels that lack the causality constraint t >= 0.

Returns:: bool: whether model contains only IRF kernels that lack the causality constraint t >= 0.

terminal()[source]

Check whether node is terminal.

Returns:: bool; whether node is terminal.

terminal2coef()[source]

Get map from IDs of terminal IRF nodes dominated by node to lists of corresponding coefficient IDs.

Returns:: dict; map from IDs of terminal IRF nodes to lists of corresponding coefficient IDs.

terminal2impulse()[source]

Get map from terminal IRF nodes dominated by node to lists of corresponding impulses.

Returns:: dict; map from terminal IRF nodes to lists of corresponding impulses.

terminal_names()[source]

Get list of names of terminal IRF nodes dominated by node.

Returns:: list of str; names of terminal IRF nodes dominated by node.

terminals()[source]

Get list of terminal IRF nodes dominated by node.

Returns:: list of IRFNode; terminal IRF nodes dominated by node.

terminals_by_name()[source]

Get dictionary mapping names of terminal IRF nodes dominated by node to their corresponding nodes.

Returns:: dict; map from node names to nodes

unablate_impulses(impulse_ids)[source]

Insert impulses in impulse_ids into fixed effects (leaving random effects structure unchanged).

Parameters:: impulse_ids – list of str; impulse ID’s
Returns:: None

unary_nonparametric_coef_names()[source]

Get list of names of non-parametric coefficients with no siblings dominated by node. Because unary splines are non-parametric, their coefficients are fixed at 1. Trainable coefficients are therefore perfectly confounded with the spline parameters. Splines dominating multiple coefficients are excepted, since the same kernel shape must be scaled in different ways.

Returns:: list of str; names of unary spline coefficients dominated by node.

class cdr.formula.Impulse(name, ops=None, is_re=False)[source]

Bases: object

Data structure representing an impulse in a CDR model.

Parameters:

name – str; name of impulse
ops – list of str, or None; ops to apply to impulse. If None, no ops.
is_re – bool; whether impulse is a regular expression search pattern

categorical(X)[source]

Checks whether impulse is categorical in a dataset

Parameters:: X – list pandas tables; data to to check.
Returns:: bool; True if impulse is categorical in X, False otherwise.

expand_categorical(X)[source]

Expand any categorical predictors in X into 1-hot columns.

Parameters:: X – list of pandas tables; input data
Returns:: 2-tuple of pandas table, list of Impulse; expanded data, list of expanded Impulse objects

expand_re(X)[source]

Expand any regular expression predictors in X into a sequence of all matching columns.

Parameters:: X – list of pandas tables; input data
Returns:: list of Impulse; list of expanded Impulse objects

get_matcher()[source]

Return a compiled regex matcher to compare to data columns

Returns:: re object

is_nn_impulse()[source]

Type check for whether impulse represents an NN transformation of impulses.

Returns:: False

name()[source]

Get name of term.

Returns:: str; name.

class cdr.formula.ImpulseInteraction(impulses, ops=None)[source]

Bases: object

Data structure representing an interaction of impulse-aligned variables (impulses) in a CDR model.

Parameters:

impulses – list of Impulse; impulses to interact.
ops – list of str, or None; ops to apply to interaction. If None, no ops.

expand_categorical(X)[source]

Expand any categorical predictors in X into 1-hot columns.

Parameters:: X – list of pandas tables; input data.
Returns:: 3-tuple of pandas table, list of ImpulseInteraction, list of list of Impulse; expanded data, list of expanded ImpulseInteraction objects, list of lists of expanded Impulse objects, one list for each interaction.

expand_re(X)[source]

Expand any regular expression predictors in X into a sequence of all matching columns.

Parameters:: X – list of pandas tables; input data
Returns:: 2-tuple of list of ImpulseInteraction, list of list of Impulse; list of expanded ImpulseInteraction objects, list of lists of expanded Impulse objects, one list for each interaction.

impulses()[source]

Get list of impulses dominated by interaction.

Returns:: list of Impulse; impulses dominated by interaction.

is_nn_impulse()[source]

Type check for whether impulse represents an NN transformation of impulses.

Returns:: False

name()[source]

Get name of interation impulse.

Returns:: str; name.

class cdr.formula.NN(nodes, nn_type, rangf=None, nn_key=None, nn_config=None)[source]

Bases: object

Data structure representing a neural network within a CDR model.

Parameters:

nodes – list of IRFNode, and/or NNImpulse objects; nodes associated with this NN
nn_type – str; name of NN type ('irf' or 'impulse').
rangf – str or list of str; random grouping factors for which to build random effects for this NN.
nn_type – str or None; key uniquely identifying this NN node (constructed automatically if None).
nn_config – dict or None; map of NN config fields to their values for this NN node.

all_impulse_names()[source]

Get list of all impulse names associated with this NN component.

Returns:: list of str: All impulse names associated with this NN component.

input_impulse_names()[source]

Get list of input impulse names associated with this NN component.

Returns:: list of str: Input impulse names associated with this NN component.

name()[source]

Get name of NN.

Returns:: str; name.

output_impulse_names()[source]

Get list of output impulse names associated with this NN component (NN IRF only).

Returns:: list of str: Output impulse names associated with this NN component.

class cdr.formula.NNImpulse(impulses, impulses_as_inputs=True, inputs_to_add=None, inputs_to_drop=None, nn_config=None)[source]

Bases: object

Data structure representing a feedforward neural network transform of one or more impulses in a CDR model.

Parameters:

impulses – list of Impulse; impulses to transform.
impulses_as_inputs – bool; whether to include impulses as NN inputs.
inputs_to_add – list of Impulse or None; extra impulses to add to NN input.
inputs_to_drop – list of Impulse or None; output impulses to drop from NN input.
nn_config – dict or None; map of NN config fields to their values for this NN node.

expand_categorical(X)[source]

Expand any categorical predictors in X into 1-hot columns.

Parameters:: X – list of pandas tables; input data.
Returns:: 3-tuple of pandas table, list of NNImpulse, list of list of Impulse; expanded data, list of expanded NNImpulse objects, list of lists of expanded Impulse objects, one list for each interaction.

expand_re(X)[source]

Expand any regular expression predictors in X into a sequence of all matching columns.

Parameters:: X – list of pandas tables; input data
Returns:: 2-tuple of list of ImpulseInteraction, list of list of Impulse; list of expanded ImpulseInteraction objects, list of lists of expanded Impulse objects, one list for each interaction.

impulses()[source]

Get list of output impulses dominated by NN.

Returns:: list of Impulse; impulses dominated by NN.

is_nn_impulse()[source]

Type check for whether impulse represents an NN transformation of impulses.

Returns:: True

name()[source]

Get name of NN impulse.

Returns:: str; name.

class cdr.formula.ResponseInteraction(responses, rangf=None)[source]

Bases: object

Data structure representing an interaction of response-aligned variables (containing at least one IRF-convolved impulse) in a CDR model.

Parameters:

responses – list of terminal IRFNode, Impulse, and/or ImpulseInteraction objects; responses to interact.
rangf – str or list of str; random grouping factors for which to build random effects for this interaction.

add_rangf(rangf)[source]

Add random grouping factor name to this interaction.

Parameters:: rangf – str; random grouping factor name
Returns:: None

contains_member(x)[source]

Check if object is a member of the set of responses belonging to this interaction

Parameters:: x – IRFNode, Impulse, and/or ImpulseInteraction object; object to check.
Returns:: bool; whether x is a member of the set of responses

dirac_delta_responses()[source]

Get list of response-aligned Dirac delta variables dominated by interaction.

Returns:: list of Impulse and/or ImpulseInteraction objects; Dirac delta variables dominated by interaction.

irf_responses()[source]

Get list of IRFs dominated by interaction.

Returns:: list of IRFNode objects; terminal IRFs dominated by interaction.

name()[source]

Get name of interation impulse.

Returns:: str; name.

nn_impulse_responses()[source]

Get list of NN impulse terms dominated by interaction.

Returns:: list of NNImpulse objects; NN impulse terms dominated by interaction.

replace(old, new)[source]

Replace an old input with a new one

Parameters:

old – IRFNode, Impulse, and/or ImpulseInteraction object; response to remove.
new – IRFNode, Impulse, and/or ImpulseInteraction object; response to add.

Returns:

None

responses()[source]

Get list of variables dominated by interaction.

Returns:: list of IRFNode, Impulse, and/or ImpulseInteraction objects; impulses dominated by interaction.

cdr.formula.pythonize_string(s)[source]

Convert string to valid python variable name

Parameters:: s – str; source string
Returns:: str; pythonized string

cdr.formula.standardize_formula_string(s)[source]

Standardize a formula string, removing notational variation. IRF specifications C(...) are sorted alphabetically by the IRF call name e.g. Gamma(). The order of impulses within an IRF specification is preserved.

Parameters:: s – str; the formula string to be standardized
Returns:: str; standardization of s

cdr.io module

cdr.io.read_tabular_data(X_paths, Y_paths, series_ids, categorical_columns=None, sep=' ', verbose=True)[source]

Read impulse and response data into pandas dataframes and perform basic pre-processing.

Parameters:

X_paths – str or list of str; path(s) to impulse (predictor) data (multiple tables are concatenated). Each path may also be a ;-delimited list of paths to files containing predictors with different timestamps, where the predictors in each file are all timestamped with respect to the same reference point.
Y_paths – str or list of str; path(s) to response data (multiple tables are concatenated). Each path may also be a ;-delimited list of paths to files containing different response variables with different timestamps, where the response variables in each file are all timestamped with respect to the same reference point.
series_ids – list of str; column names whose jointly unique values define unique time series.
categorical_columns – list of str; column names that should be treated as categorical.
sep – str; string representation of field delimiter in input data.
verbose – bool; whether to log progress to stderr.

Returns:

2-tuple of list(pandas DataFrame); (impulse data, response data). X and Y each have one element for each dataset in X_paths/Y_paths, each containing the column-wise concatenation of all column files in the path.

cdr.kwargs module

class cdr.kwargs.Kwarg(key, default_value, dtypes, descr, aliases=None, default_value_cdrnn='same', suppress=False)[source]

Bases: object

Data structure for storing keyword arguments and their docstrings.

Parameters:

key – str; Key
default_value – Any; Default value
dtypes – list or class; List of classes or single class. Members can also be specific required values, either None or values of type str.
descr – str; Description of kwarg
default_value_cdrnn – Any; Default value for CDRNN if distinct from CDR. If 'same', CDRNN uses default_value.
suppress – bool; Whether to print documentation for this kwarg. Useful for hiding deprecated or little-used kwargs in order to simplify autodoc output.

dtypes_str()[source]

String representation of dtypes permitted for kwarg.

Returns:: str; dtypes string.

get_type_name(x)[source]

String representation of name of a dtype

Parameters:: x – dtype; the dtype to name.
Returns:: str; name of dtype.

in_settings(settings)[source]

Check whether kwarg is specified in a settings object parsed from a config file.

Parameters:: settings – settings from a ConfigParser object.
Returns:: bool; whether kwarg is found in settings.

kwarg_from_config(settings, is_cdrnn=False)[source]

Given a settings object parsed from a config file, return value of kwarg cast to appropriate dtype. If missing from settings, return default.

Parameters:

settings – settings from a ConfigParser object or dict.
is_cdrnn – bool; whether this is for a CDRNN model.

Returns:

value of kwarg

static type_comparator(a, b)[source]

Types precede strings, which precede None

Parameters:

a – First element
b – Second element

Returns:

-1, 0, or 1, depending on outcome of comparison

cdr.kwargs.cdr_kwarg_docstring()[source]

Generate docstring snippet summarizing all CDR kwargs, dtypes, and defaults.

Returns:: str; docstring snippet

cdr.kwargs.docstring_from_kwarg(kwarg)[source]

Generate docstring from CDR keyword argument object.

Parameters:: kwarg – Keyword argument object.
Returns:: str; docstring.

cdr.kwargs.plot_kwarg_docstring()[source]

Generate docstring snippet summarizing all plotting kwargs, dtypes, and defaults.

Returns:: str; docstring snippet

cdr.model module

cdr.opt module

cdr.plot module

cdr.plot.plot_heatmap(m, row_names, col_names, outdir='.', filename='eigenvectors.png', plot_x_inches=7, plot_y_inches=5, cmap='Blues')[source]

Plot a heatmap. Used in CDR for visualizing eigenvector matrices in principal components models.

Parameters:

m – 2D numpy array; source data for plot.
row_names – list of str; row names.
col_names – list of str; column names.
outdir – str; output directory.
filename – str; filename.
plot_x_inches – float; width of plot in inches.
plot_y_inches – float; height of plot in inches.
cmap – str; name of matplotlib cmap object (determines colors of plotted IRF).

Returns:

None

cdr.plot.plot_irf(plot_x, plot_y, irf_names, lq=None, uq=None, density=None, sort_names=True, prop_cycle_length=None, prop_cycle_map=None, outdir='.', filename='irf_plot.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, ylim=None, cmap='gist_rainbow', legend=True, xlab=None, ylab=None, use_line_markers=False, use_grid=True, transparent_background=False, dpi=300, dump_source=False)[source]

Plot impulse response functions.

Parameters:

plot_x – numpy array with shape (T,1); time points for which to plot the response. For example, if the plots contain 1000 points from 0s to 10s, plot_x could be generated as np.linspace(0, 10, 1000).
plot_y – numpy array with shape (T, N); response of each IRF at each time point.
irf_names – list of str; CDR ID’s of IRFs in the same order as they appear in axis 1 of plot_y.
lq – numpy array with shape (T, N), or None; lower bound of credible interval for each time point. If None, no credible interval will be plotted.
uq – numpy array with shape (T, N), or None; upper bound of credible interval for each time point. If None, no credible interval will be plotted.
sort_names – bool; alphabetically sort IRF names.
prop_cycle_length – int or None; Length of plotting properties cycle (defines step size in the color map). If None, inferred from irf_names.
prop_cycle_map – list of int, or None; Integer indices to use in the properties cycle for each entry in irf_names. If None, indices are automatically assigned.
outdir – str; output directory.
filename – str; filename.
irf_name_map – dict of str to str; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.
plot_x_inches – float; width of plot in inches.
plot_y_inches – float; height of plot in inches.
ylim – 2-element tuple or list; (lower_bound, upper_bound) to use for y axis. If None, automatically inferred.
cmap – str; name of matplotlib cmap object (determines colors of plotted IRF).
legend – bool; include a legend.
xlab – str or None; x-axis label. If None, no label.
ylab – str or None; y-axis label. If None, no label.
use_line_markers – bool; add markers to IRF lines.
use_grid – bool; whether to show a background grid.
transparent_background – bool; use a transparent background. If False, uses a white background.
dpi – int; dots per inch.
dump_source – bool; Whether to dump the plot source array to a csv file.

Returns:

None

cdr.plot.plot_irf_as_heatmap(plot_x, plot_y, irf_names, sort_names=True, outdir='.', filename='irf_hm.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, ylim=None, cmap='seismic', xlab=None, ylab=None, transparent_background=False, dpi=300, dump_source=False)[source]

Plot impulse response functions as a heatmap.

Parameters:

plot_x – numpy array with shape (T,1); time points for which to plot the response. For example, if the plots contain 1000 points from 0s to 10s, plot_x could be generated as np.linspace(0, 10, 1000).
plot_y – numpy array with shape (T, N); response of each IRF at each time point.
irf_names – list of str; CDR ID’s of IRFs in the same order as they appear in axis 1 of plot_y.
sort_names – bool; alphabetically sort IRF names.
outdir – str; output directory.
filename – str; filename.
irf_name_map – dict of str to str; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.
plot_x_inches – float; width of plot in inches.
plot_y_inches – float; height of plot in inches.
ylim – 2-element tuple or list; (lower_bound, upper_bound) to use for y axis. If None, automatically inferred.
cmap – str; name of matplotlib cmap object (determines colors of plotted IRF).
xlab – str or None; x-axis label. If None, no label.
ylab – str or None; y-axis label. If None, no label.
transparent_background – bool; use a transparent background. If False, uses a white background.
dpi – int; dots per inch.
dump_source – bool; Whether to dump the plot source array to a csv file.

Returns:

None

cdr.plot.plot_qq(theoretical, actual, actual_color='royalblue', expected_color='firebrick', outdir='.', filename='qq_plot.png', plot_x_inches=6, plot_y_inches=4, legend=True, xlab='Theoretical', ylab='Empirical', ticks=True, as_lines=False, transparent_background=False, dpi=300)[source]

Generate quantile-quantile plot.

Parameters:

theoretical – numpy array with shape (T,); theoretical error quantiles.
actual – numpy array with shape (T,); empirical errors.
actual_color – str; color for actual values.
expected_color – str; color for expected values.
outdir – str; output directory.
filename – str; filename.
plot_x_inches – float; width of plot in inches.
plot_y_inches – float; height of plot in inches.
legend – bool; include a legend.
xlab – str or None; x-axis label. If None, no label.
ylab – str or None; y-axis label. If None, no label.
as_lines – bool; render QQ plot using lines. Otherwise, use points.
transparent_background – bool; use a transparent background. If False, uses a white background.
dpi – int; dots per inch.

Returns:

None

cdr.plot.plot_surface(x, y, z, lq=None, uq=None, density=None, bounds_as_surface=False, outdir='.', filename='surface.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, xlim=None, ylim=None, zlim=None, plot_type='wireframe', cmap='coolwarm', xlab=None, ylab=None, zlab='Response', title=None, transparent_background=False, dpi=300, dump_source=False)[source]

Plot an IRF or interaction surface.

Parameters:

x – numpy array with shape (M,N); x locations for each plot point, copied N times.
y – numpy array with shape (M,N); y locations for each plot point, copied M times.
z – numpy array with shape (M,N); z locations for each plot point.
lq – numpy array with shape (M,N), or None; lower bound of credible interval for each plot point. If None, no credible interval will be plotted.
uq – numpy array with shape (M,N), or None; upper bound of credible interval for each plot point. If None, no credible interval will be plotted.
bounds_as_surface – bool; whether to plot interval bounds using additional surfaces. If False, bounds are plotted with vertical error bars instead. Ignored if lq, uq are None.
outdir – str; output directory.
filename – str; filename.
irf_name_map – dict of str to str; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.
plot_x_inches – float; width of plot in inches.
plot_y_inches – float; height of plot in inches.
xlim – 2-element tuple or list or None; (lower_bound, upper_bound) to use for x axis. If None, automatically inferred.
ylim – 2-element tuple or list or None; (lower_bound, upper_bound) to use for y axis. If None, automatically inferred.
zlim – 2-element tuple or list or None; (lower_bound, upper_bound) to use for z axis. If None, automatically inferred.
plot_type – str; name of plot type to generate. One of ["contour", "surf", "trisurf"].
cmap – str; name of matplotlib cmap object (determines colors of plotted IRF).
legend – bool; include a legend.
xlab – str or None; x-axis label. If None, no label.
ylab – str or None; y-axis label. If None, no label.
zlab – str or None; z-axis label. If None, no label.
use_line_markers – bool; add markers to IRF lines.
transparent_background – bool; use a transparent background. If False, uses a white background.
dpi – int; dots per inch.
dump_source – bool; Whether to dump the plot source array to a csv file.

Returns:

None

cdr.signif module

cdr.signif.correlation_test(y, x1, x2, nested=False, verbose=True)[source]

Perform a parametric test of difference in correlation with observations between two prediction vectors, based on Steiger (1980).

Parameters:

y – numpy vector; observation vector.
x1 – numpy vector; first prediction vector.
x2 – numpy vector; second prediction vector.
nested – bool; assume that the second model is nested within the first.
verbose – bool; report progress logs to standard error.

Returns:

cdr.signif.permutation_test(a, b, n_iter=10000, n_tails=2, mode='loss', agg='mean', nested=False, verbose=True)[source]

Perform a paired permutation test for significance.

Parameters:

a – numpy array; first error/loss/prediction matrix, shape (n_item, n_model).
b – numpy array; second error/loss/prediction matrix, shape (n_item, n_model).
n_iter – int; number of resampling iterations.
n_tails – int; number of tails.
mode – str; one of ["mse", "loglik"], the type of error used (SE’s are averaged while loglik’s are summed).
agg – str; aggregation function over ensemble components. E.g., 'mean', 'median', 'min', 'max'.
nested – bool; assume that the second model is nested within the first.
verbose – bool; report progress logs to standard error.

Returns:

cdr.synth module

class cdr.synth.SyntheticModel(n_pred, irf_name, irf_params=None, coefs=None, fn=None, interactions=False, ranef_range=None, n_ranef_levels=None)[source]

Bases: object

A data structure representing a synthetic “true” model for empirical validation of CDR fits. Contains a randomly generated set of IRFs that can be used to convolve data, and provides methods for sampling data with particular structure and convolving it with the true IRFs in order to generate a response vector.

Parameters:

n_pred – int; Number of predictors in the synthetic model.
irf_name – str; Name of IRF kernel to use. One of ['Exp', 'Normal', 'Gamma', 'ShiftedGamma'].
irf_params – dict or None; Dictionary of IRF parameters to use, with parameter names as keys and numeric arrays as values. Values must each have n_pred cells. If None, parameter values will be randomly sampled.
coefs – numpy array or None; Vector of coefficients to use, where len(coefs) == n_pred. If None, coefficients will be randomly sampled.
fn – str or None; Effect shape to use. One of ['quadratic', 'exp', 'logmod', 'linear']. If ``None, linear effects.
interactions – bool; Whether there are randomly sampled pairwise interactions (same bounds as those used for coefs).
ranef_range – float or None; Maximum magnitude of simulated random effects. If 0 or None, no random effects.
n_ranef_levels – int or None; Number of random effects levels. If 0 or None, no random effects.

convolve(X, t_X, t_y, history_length=None, err_sd=None, allow_instantaneous=True, ranef_level=None, verbose=True)[source]

Convolve data using the model’s IRFs.

Parameters:

X – numpy array; 2-D array of predictors.
t_X – numpy array; 1-D vector of predictor timestamps.
t_y – numpy array; 1-D vector of response timestamps.
history_length – int or None; Drop preceding events more than history_length steps into the past. If None, no history clipping.
err_sd – float or None; Standard deviation of Gaussian noise to inject into responses. If None, use the empirical standard deviation of the response vector.
allow_instantaneous – bool; Whether to compute responses when t==0.
ranef_level – str or None; Random effects level to use (or None to use population-level effect)
verbose – bool; Verbosity.

Returns:

(2-D numpy array, 1-D numpy array); Matrix of convolved predictors, vector of responses

convolve_v2(X, t_X, t_y, err_sd=None, allow_instantaneous=True, verbose=True)[source]

Convolve data using the model’s IRFs. Alternate memory-intensive implementation that is faster for small arrays but can exhaust resources for large ones.

Parameters:

X – numpy array; 2-D array of predictors.
t_X – numpy array; 1-D vector of predictor timestamps.
t_y – numpy array; 1-D vector of response timestamps.
err_sd – float; Standard deviation of Gaussian noise to inject into responses.
allow_instantaneous – bool; Whether to compute responses when t==0.
verbose – bool; Verbosity.

Returns:

(2-D numpy array, 1-D numpy array); Matrix of convolved predictors, vector of responses

get_curves(n_time_units=None, n_time_points=None, ranef_level=None)[source]

Extract response curves as an array.

Parameters:

n_time_units – float; Number of units of time over which to extract curves.
n_time_points – int; Number of samples to extract for each curve (resolution of curve)
ranef_level – str or None; Random effects level to use (or None to use population-level effect)

Returns:

numpy array; 2-D numpy array with shape [T, K], where T is n_time_points and K is the number of predictors in the model.

irf(x, coefs=False, ranef_level=None)[source]

Computes the values of the model’s IRFs elementwise over a vector of timepoints.

Parameters:

x – numpy array; 1-D array with shape [N] containing timepoints at which to query the IRFs.
coefs – bool; Whether to rescale responses by coefficients
ranef_level – str or None; Random effects level to use (or None to use population-level effect)

Returns:

numpy array; 2-D array with shape [N, K] containing values of the model’s K IRFs evaluated at the timepoints in x.

plot_irf(n_time_units=None, n_time_points=None, dir='.', filename='synth_irf.png', plot_x_inches=6, plot_y_inches=4, cmap='gist_rainbow', legend=False, xlab=None, ylab=None, use_line_markers=False, transparent_background=False)[source]

Plot impulse response functions.

Parameters:

n_time_units – float; number if time units to use for plotting.
n_time_points – int; number of points to use for plotting.
dir – str; output directory.
filename – str; filename.
plot_x_inches – float; width of plot in inches.
plot_y_inches – float; height of plot in inches.
cmap – str; name of matplotlib cmap object (determines colors of plotted IRF).
legend – bool; include a legend.
xlab – str or None; x-axis label. If None, no label.
ylab – str or None; y-axis label. If None, no label.
use_line_markers – bool; add markers to IRF lines.
transparent_background – bool; use a transparent background. If False, uses a white background.

Returns:

None

sample_data(m, n=None, X_interval=None, y_interval=None, rho=None, align_X_y=True)[source]

Samples synthetic predictors and time vectors

Parameters:

m – int; Number of predictors.
n – int; Number of response query points.
X_interval – str, float, list, tuple, or None; Predictor interval model. If None, predictor offsets are randomly sampled from an exponential distribution with parameter 1. If float, predictor offsets are evenly spaced with interval X_interval. If list or tuple, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.
y_interval – str, float, list, tuple, or None; Response interval model. If None, response offsets are randomly sampled from an exponential distribution with parameter 1. If float, response offsets are evenly spaced with interval y_interval. If list or tuple, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.
rho – float; Level of pairwise correlation between predictors.
align_X_y – bool; Whether predictors and responses are required to be sampled at the same points in time.

Returns:

(2-D numpy array, 1-D numpy array, 1-D numpy array); Matrix of predictors, vector of predictor timestamps, vector of response timestamps

cdr.util module

cdr.util.filter_models(names, filters=None, cdr_only=False)[source]

Return models contained in names that are permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name. If filters is zero-length, returns names.

Parameters:

names – list of str; pool of model names to filter.
filters – list of {str, SRE_Pattern} or None; filters to apply in order. If None, no additional filters.
cdr_only – bool; if True, only returns CDR models. If False, returns all models admitted by filters.

Returns:

list of str; names in names that pass at least one filter, or all of names if no filters are applied.

cdr.util.filter_names(names, filters)[source]

Return elements of names permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name.

Parameters:

names – list of str; pool of names to filter.
filters – list of {str, SRE_Pattern}; filters to apply in order

Returns:

list of str; names in names that pass at least one filter

cdr.util.get_random_permutation(n)[source]

Draw a random permutation of integers 0 to n. Used to shuffle arrays of length n. For example, a permutation and its inverse can be generated by calling p, p_inv = get_random_permutation(n). To randomly shuffle an n-dimensional vector x, call x[p]. To un-shuffle x after it has already been shuffled, call x[p_inv].

Parameters:: n – maximum value
Returns:: 2-tuple of numpy arrays; the permutation and its inverse

cdr.util.load_cdr(dir_path, suffix='')[source]

Convenience method for reconstructing a saved CDR object. First loads in metadata from m.obj, then uses that metadata to construct the computation graph. Then, if saved weights are found, these are loaded into the graph.

Parameters:

dir_path – Path to directory containing the CDR checkpoint files.
suffix – str; file suffix.

Returns:

The loaded CDR instance.

cdr.util.mae(true, preds)[source]

Compute mean absolute error (MAE).

Parameters:

true – True values
preds – Predicted values

Returns:

float; MAE

cdr.util.mse(true, preds)[source]

Compute mean squared error (MSE).

Parameters:

true – True values
preds – Predicted values

Returns:

float; MSE

cdr.util.names2ix(names, l, dtype=<class 'numpy.int32'>)[source]

Generate 1D numpy array of indices in l corresponding to names in names

Parameters:

names – list of str; names to look up in l
l – list of str; list of names from which to extract indices
dtype – numpy dtype object; return dtype

Returns:

numpy array; indices of names in l

cdr.util.nested(model_name_1, model_name_2)[source]

Check whether two CDR models are nested with 1 degree of freedom

Parameters:

model_name_1 – str; name of first model
model_name_2 – str; name of second model

Returns:

bool; True if models are nested with 1 degree of freedom, False otherwise

cdr.util.pca(X, n_dim=None, dtype=<class 'numpy.float32'>)[source]

Perform principal components analysis on a data table.

Parameters:

X – numpy or pandas array; the input data
n_dim – int or None; maximum number of principal components. If None, all components are retained.
dtype – numpy dtype; return dtype

Returns:

5-tuple of numpy arrays; transformed data, eigenvectors, eigenvalues, input means, and input standard deviations

cdr.util.percent_variance_explained(true, preds)[source]

Compute percent variance explained.

Parameters:

true – True values
preds – Predicted values

Returns:

float; percent variance explained

cdr.util.reg_name(string)[source]

Standardize a variable name for regularization

Parameters:: string – str; input string
Returns:: str; transformed string

cdr.util.sn(string)[source]

Compute a valid scope name version of a string.

Parameters:: string – str; input string
Returns:: str; transformed string