CDR Package API

Complete API for all public classes and methods in this package.

cdr.backend module

cdr.backend.CDRNNStateTuple

alias of cdr.backend.AttentionalLSTMDecoderStateTuple

cdr.base module

cdr.config module

class cdr.config.Config(path)[source]

Bases: object

Parses an *.ini file and stores settings needed to define a set of CDR experiments.

Parameters

path – Path to *.ini file

build_cdr_settings(settings, add_defaults=True, global_settings=None, is_cdr=True, is_cdrnn=False)[source]

Given a settings object parsed from a config file, compute CDR parameter dictionary.

Parameters
  • settings – settings from a ConfigParser object.

  • add_defaultsbool; whether to add default settings not explicitly specified in the config.

  • global_settingsdict or None; dictionary of global defaults for parameters missing from settings.

  • is_cdrbool; whether this is a CDR(NN) model.

  • is_cdrnnbool; whether this is a CDRNN model.

Returns

dict; dictionary of settings key-value pairs.

set_model(model_name=None)[source]

Change internal state to that of model named model_name. Config instances can store settings for multiple models. set_model() determines which model’s settings are returned by Config getter methods.

Parameters

model_namestr; name of target model

Returns

None

class cdr.config.PlotConfig(path=None)[source]

Bases: object

Parses an *.ini file and stores settings needed to define CDR plots

Parameters

path – Path to *.ini file

build_plot_settings(settings)[source]

Given a settings object parsed from a config file, compute plot parameters.

Parameters

settings – settings from a ConfigParser object.

Returns

dict; dictionary of settings key-value pairs.

cdr.data module

cdr.data.add_responses(names, y)[source]

Add response variable(s) to a dataframe, applying any preprocessing required by the formula string.

Parameters
  • namesstr or list of str; name(s) of dependent variable(s)

  • ypandas DataFrame; response data.

Returns

pandas DataFrame; response data with any missing ops applied.

cdr.data.build_CDR_impulse_data(X, first_obs, last_obs, X_in_Y_names=None, X_in_Y=None, impulse_names=None, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]

Construct impulse data arrays in the required format for CDR fitting/evaluation for a single response array.

Parameters
  • Xlist of pandas tables; impulse (predictor) data.

  • first_obslist of index vectors (list, pandas series, or numpy vector) of first observations; the list contains vectors of row indices, one for each element of X, of the first impulse in the time series associated with the response. If None, inferred from Y.

  • last_obslist of index vectors (list, pandas series, or numpy vector) of last observations; the list contains vectors of row indices, one for each element of X, of the last impulse in the time series associated with the response. If None, inferred from Y.

  • X_in_Y_nameslist of str; names of predictors contained in Y rather than X. If None, no such predictors.

  • X_in_Ypandas DataFrame or None; table of predictors contained in Y rather than X. If None, no such predictors.

  • impulse_nameslist of str; names of columns in X to be used as impulses by the model. If None, all columns returned.

  • history_lengthint; maximum number of history (backward) observations.

  • future_lengthint; maximum number of future (forward) observations.

  • int_typestr; name of int type.

  • float_typestr; name of float type.

Returns

triple of numpy arrays; let N, T, I, R respectively be the number of rows in Y, history length, number of impulse dimensions, and number of response dimensions. Outputs are (1) impulses with shape (N, T, I), (2) impulse timestamps with shape (N, T, I), and impulse mask with shape (N, T, I).

cdr.data.build_CDR_response_data(responses, Y=None, first_obs=None, last_obs=None, Y_time=None, Y_gf=None, X_in_Y_names=None, X_in_Y=None, Y_category_map=None, response_to_df_ix=None, gf_names=None, gf_map=None)[source]

Construct response data arrays in the required format for CDR fitting/evaluation for one or more response arrays.

Parameters
  • responseslist of str; names of columns in Y to be used as responses (dependent variables) by the model.

  • Ylist of pandas tables, or None; response data. If None, does not return a response array.

  • first_obslist of list of index vectors (list, pandas series, or numpy vector) of first observations, or None; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the first impulse in the time series associated with each response. If None, inferred from Y.

  • last_obslist of list of index vectors (list, pandas series, or numpy vector) of last observations, or None; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the last impulse in the time series associated with each response. If None, inferred from Y.

  • Y_timelist of response timestamp vectors (list, pandas series, or numpy vector), or None; vector(s) of response timestamps, one for each response array. Needed to timestamp any response-aligned predictors (ignored if none in model).

  • Y_gflist of pandas DataFrame, or None; vector(s) of response timestamps, one for each response array. Data frames containing random grouping factor levels, if applicable.

  • X_in_Y_nameslist of str; names of predictors contained in Y rather than X (must be present in all elements of Y). If None, no such predictors.

  • X_in_Ylist of pandas DataFrame or None; tables (one per response array) of predictors contained in Y rather than X (must be present in all elements of Y). If None, no such predictors.

  • Y_category_mapdict or None; map from category labels to integers for each categorical response.

  • response_to_df_ixdict or None; map from response names to lists of indices of the response files that contain them.

  • gf_nameslist or None; list of names of random grouping factor variables. If None and Y_gf provided, will use all columns of Y_gf.

  • gf_maplist of dict or None; list maps from random grouping factor levels to their indices, one map per grouping factor variable in gf_names.

Returns

7-tuple of numpy arrays; let N, R, XF, YF, Z, and K respectively be the number of rows (sum total number of rows in Y), number of response dimensions, number of distinct predictor files (X), number of distinct response files (Y), number of random grouping factor variables, and number of response_aligned predictors. Outputs are (1) responses with shape (N, R) or None if Y is None, (2) an XF-tuple of first observation vectors indexing start indices for each entry in X, (3) a YF-tuple of first observation vectors indexing end indices for each entry in X, (4) response timestamps with shape (N,), (5) response masks (masking out any missing response variables per row) with shape (N, R), (6) random grouping factor matrix with shape (N, Z), or None if no random grouping factors provided, and (7) response-aligned predictors with shape (N, K).

cdr.data.c(df)[source]

Zero-center pandas series or data frame

Parameters

dfpandas Series or DataFrame; input date

Returns

pandas Series or DataFrame; centered data

cdr.data.compute_filter(y, field, cond)[source]

Compute filter given a field and condition

Parameters
  • ypandas DataFrame; response data.

  • fieldstr; name of column on whose values to filter.

  • condstr; string representation of condition to use for filtering.

Returns

numpy vector; boolean mask to use for pandas subsetting operations.

cdr.data.compute_filters(Y, filters=None)[source]

Compute filters given a filter map.

Parameters
  • Ypandas DataFrame; response data.

  • filterslist; list of key-value pairs mapping column names to filtering criteria for their values.

Returns

numpy vector; boolean mask to use for pandas subsetting operations.

cdr.data.compute_partition(y, modulus, n)[source]

Given a splitID column, use modular arithmetic to partition data into n subparts.

Parameters
  • ypandas DataFrame; response data.

  • modulusint; modulus to use for splitting, must be at least as large as n.

  • nint; number of subparts in the partition.

Returns

list of numpy vectors; one boolean vector per subpart of the partition, selecting only those elements of y that belong.

cdr.data.compute_splitID(y, split_fields)[source]

Map tuples in columns designated by split_fields into integer ID to use for data partitioning.

Parameters
  • ypandas DataFrame; response data.

  • split_fieldslist of str; column names to use for computing split ID.

Returns

numpy vector; integer vector of split ID’s.

cdr.data.compute_time_mask(X_time, first_obs, last_obs, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]

Compute mask for expanded impulse data zeroing out non-existent impulses.

Parameters
  • X_timepandas Series; timestamps associated with each impulse in X.

  • first_obspandas Series; vector of row indices in X of the first impulse in the time series associated with each response.

  • last_obspandas Series; vector of row indices in X of the last preceding impulse in the time series associated with each response.

  • history_lengthint; maximum number of history (backward) observations.

  • future_lengthint; maximum number of future (forward) observations.

  • int_typestr; name of int type.

  • float_typestr; name of float type.

Returns

numpy array; boolean impulse mask.

cdr.data.corr_cdr(X_2d, impulse_names, impulse_names_2d, time, time_mask)[source]

Compute correlation matrix, including correlations across time where necessitated by 2D predictors.

Parameters
  • X_2dnumpy array; the impulse data. Must be of shape (batch_len, history_length+future_length, n_impulses), can be computed from sources by build_CDR_impulse_data().

  • impulse_nameslist of str; names of columns in X_2d to be used as impulses by the model.

  • impulse_names_2dlist of str; names of columns in X_2d that designate to 2D predictors.

  • time – 3D numpy array; array of timestamps for each event in X_2d.

  • time_mask – 3D numpy array; array of masks over padding events in X_2d.

Returns

pandas DataFrame; the correlation matrix.

cdr.data.expand_impulse_sequence(X, X_time, first_obs, last_obs, window_length, int_type='int32', float_type='float32', fill=0.0)[source]

Expand out impulse stream in X for each response in the target data.

Parameters
  • Xpandas DataFrame; impulse (predictor) data.

  • X_timepandas Series; timestamps associated with each impulse in X.

  • first_obspandas Series; vector of row indices in X of the first impulse in the time series associated with each response.

  • last_obspandas Series; vector of row indices in X of the last preceding impulse in the time series associated with each response.

  • window_lengthint; number of steps in time dimension of output

  • int_typestr; name of int type.

  • float_typestr; name of float type.

  • fillfloat; fill value for padding cells.

Returns

3-tuple of numpy arrays; the expanded impulse array, the expanded timestamp array, and a boolean mask zeroing out locations of non-existent impulses.

cdr.data.filter_invalid_responses(Y, dv, crossval_factor=None, crossval_fold=None)[source]

Filter out rows with non-finite responses.

Parameters
  • Ypandas table or list of pandas tables; response data.

  • dvstr or list of str; name(s) of column(s) containing the dependent variable(s)

  • crossval_factorstr or None; name of column containing the selection variable for cross validation. If None, no cross validation filtering.

  • crossval_foldlist or None; list of valid values for cross-validation selection. Used only if crossval_factor is not None.

Returns

2-tuple of pandas DataFrame and pandas Series; valid data and indicator vector used to filter out invalid data.

cdr.data.get_first_last_obs_lists(y)[source]

Convenience utility to extract out all first_obs and last_obs columns in Y sorted by file index

Parameters

ypandas DataFrame; response data.

Returns

pair of list of str; first_obs column names and last_obs column names

cdr.data.get_rangf_array(Y, rangf_names, rangf_map)[source]

Collect random grouping factor indicators as numpy integer arrays that can be read by Tensorflow. Returns vertical concatenation of GF arrays from each element of Y.

Parameters
  • Ypandas table or list of pandas tables; response data.

  • rangf_nameslist of str; names of columns containing random grouping factor levels (order is preserved, changing the order will change the resulting array).

  • rangf_maplist of dict; map for each random grouping factor from levels to unique indices.

Returns

cdr.data.get_time_windows(X, Y, series_ids, forward=False, window_length=128, verbose=True)[source]

Compute row indices in X of initial and final impulses for each element of y. Assumes time series are already sorted by series_ids.

Parameters
  • Xpandas DataFrame; impulse (predictor) data.

  • Ypandas DataFrame; response data.

  • series_idslist of str; column names whose jointly unique values define unique time series.

  • forwardbool; whether to compute forward windows (future inputs) or backward windows (past inputs, used if forward is False).

  • window_lengthint; maximum size of time window to consider. If np.inf, no bound on window size.

  • verbosebool; whether to report progress to stderr

Returns

2-tuple of numpy vectors; first and last impulse observations (respectively) for each response in y

cdr.data.preprocess_data(X, Y, formula_list, series_ids, filters=None, history_length=128, future_length=0, all_interactions=False, verbose=True, debug=False)[source]

Preprocess CDR data.

Parameters
  • X – list of pandas tables; impulse (predictor) data.

  • Y – list of pandas tables; response data.

  • formula_listlist of Formula; CDR formula for which to preprocess data.

  • series_idslist of str; column names whose jointly unique values define unique time series.

  • filterslist; list of key-value pairs mapping column names to filtering criteria for their values.

  • history_lengthint; maximum number of history (backward) observations.

  • future_lengthint; maximum number of future (forward) observations.

  • all_interactionsbool; add powerset of all conformable interactions.

  • verbosebool; whether to report progress to stderr

  • debugbool; print debugging information

Returns

7-tuple; predictor data, response data, filtering mask, response-aligned predictor names, response-aligned predictors, 2D predictor names, and 2D predictors

cdr.data.s(df)[source]

Rescale pandas series or data frame by its standard deviation

Parameters

dfpandas Series or DataFrame; input date

Returns

pandas Series or DataFrame; rescaled data

cdr.data.split_cdr_outputs(outputs, lengths)[source]

Takes a dictionary of arbitrary depth containing CDR outputs with their labels as keys and splits each output into a list of outputs with lengths corresponding to lengths. Useful for aligning CDR outputs to response files, since multiple response files can be provided, which are underlyingly concatenated by CDR. Recursively modifies the dict in place.

Parameters
  • outputsdict of arbitrary depth with numpy arrays at the leaves; the source CDR outputs

  • lengths – array-like vector of lengths to split the outputs into

Returns

dict; same key-val structure as outputs but with each leaf split into a list of len(lengths) vectors, one for each length value.

cdr.data.z(df)[source]

Z-transform pandas series or data frame

Parameters

dfpandas Series or DataFrame; input date

Returns

pandas Series or DataFrame; z-transformed data

cdr.cdrbase module

cdr.cdrbayes module

cdr.cdrmle module

cdr.cdrnnbase module

cdr.cdrnnbayes module

cdr.cdrnnmle module

cdr.formula module

class cdr.formula.Formula(bform_str, standardize=True)[source]

Bases: object

A class for parsing R-style mixed-effects CDR model formula strings and applying them to CDR data matrices.

Parameters

bform_strstr; an R-style mixed-effects CDR model formula string

ablate_impulses(impulse_ids)[source]

Remove impulses in impulse_ids from fixed effects (retaining in any random effects).

Parameters

impulse_idslist of str; impulse ID’s

Returns

None

apply_formula(X, Y, X_in_Y_names=None, all_interactions=False, series_ids=None)[source]

Extract all data and compute all transforms required by the model formula.

Parameters
  • X – list of pandas tables; impulse data.

  • Y – list of pandas tables; response data.

  • X_in_Y_nameslist or None; List of column names for response-aligned predictors (predictors measured for every response rather than for every input) if applicable, None otherwise.

  • all_interactionsbool; add powerset of all conformable interactions.

  • series_idslist of str or None; list of ids to use as grouping factors for lagged effects. If None, lagging will not be attempted.

Returns

triple; transformed X, transformed y, response-aligned predictor names

apply_op(op, arr)[source]

Apply op op to array arr.

Parameters
  • opstr; name of op.

  • arrnumpy or pandas array; source data.

Returns

numpy array; transformed data.

apply_op_2d(op, arr, time_mask)[source]

Apply op to 2D predictor (predictor whose value depends on properties of the response).

Parameters
  • opstr; name of op.

  • arrnumpy or array; source data.

  • time_masknumpy array; mask for padding cells

Returns

numpy array; transformed data

apply_ops(impulse, X)[source]

Apply all ops defined for an impulse

Parameters
  • impulseImpulse object; the impulse.

  • X – list of pandas tables; table containing the impulse data.

Returns

pandas table; table augmented with transformed impulse.

apply_ops_2d(impulse, X_2d_predictor_names, X_2d_predictors, time_mask)[source]

Apply all ops defined for a 2D predictor (predictor whose value depends on properties of the response).

Parameters
  • impulseImpulse object; the impulse.

  • X_2d_predictor_nameslist of str; names of 2D predictors.

  • X_2d_predictorsnumpy array; source data.

  • time_masknumpy array; mask for padding cells

Returns

2-tuple; list of new predictor name, numpy array of predictor values

static bases(family)[source]

Get the number of bases of a spline kernel.

Parameters

familystr; name of IRF family

Returns

int or None; number of bases of spline kernel, or None if family is not a spline.

build(bform_str, standardize=True)[source]

Construct internal data from formula string

Parameters

bform_strstr; source string.

Returns

None

categorical_transform(X)[source]

Get transformed formula with categorical predictors in X expanded.

Parameters

X – list of pandas tables; input data.

Returns

Formula; transformed Formula object

compute_2d_predictor(predictor_name, X, first_obs, last_obs, history_length=128, future_length=None, minibatch_size=50000)[source]

Compute 2D predictor (predictor whose value depends on properties of the most recent impulse).

Parameters
  • predictor_namestr; name of predictor

  • Xpandas table; input data

  • first_obspandas Series or 1D numpy array; row indices in X of the start of the series associated with each regression target.

  • last_obspandas Series or 1D numpy array; row indices in X of the most recent observation in the series associated with each regression target.

  • minibatch_sizeint; minibatch size for computing predictor, can help with memory footprint

Returns

2-tuple; new predictor name, numpy array of predictor values

insert_impulses(impulses, irf_str, rangf=None)[source]

Insert impulses in impulse_ids into fixed effects and all random terms.

Parameters

impulse_idslist of str; impulse ID’s

Returns

None

static irf_params(family)[source]

Return list of parameter names for a given IRF family.

Parameters

familystr; name of IRF family

Returns

list of str; parameter names

static is_LCG(family)[source]

Check whether a kernel is LCG.

Parameters

familystr; name of IRF family

Returns

bool; whether the kernel is LCG (linear combination of Gaussians)

pc_transform(n_pc, pointers=None)[source]

Get transformed formula with impulses replaced by principal components.

Parameters
  • n_pcint; number of principal components in transform.

  • pointersdict; map from source nodes to transformed nodes.

Returns

list of IRFNode; tree forest representing current state of the transform.

process_ast(t, terms=None, has_intercept=None, ops=None, rangf=None, impulses_by_name=None, interactions_by_name=None, under_irf=False, under_interaction=False)[source]

Recursively process a node of the Python abstract syntax tree (AST) representation of the formula string and insert data into internal representation of model formula.

Parameters
  • t – AST node.

  • termslist or None; CDR terms computed so far, or None if no CDR terms computed.

  • has_interceptdict; map from random grouping factors to boolean values representing whether that grouping factor has a random intercept. None is used as a key to refer to the population-level intercept.

  • opslist; names of ops computed so far, or None if no ops computed.

  • rangfstr or None; name of rangf for random term currently being processed, or None if currently processing fixed effects portion of model.

Returns

None

process_irf(t, input, ops=None, rangf=None)[source]

Process data from AST node representing part of an IRF definition and insert data into internal representation of the model.

Parameters
  • t – AST node.

  • inputIRFNode object; child IRF of current node

  • opslist of str, or None; ops applied to IRF. If None, no ops applied

  • rangfstr or None; name of rangf for random term currently being processed, or None if currently processing fixed effects portion of model.

Returns

IRFNode object; the IRF node

remove_impulses(impulse_ids)[source]

Remove impulses in impulse_ids from the model (both fixed and random effects).

Parameters

impulse_idslist of str; impulse ID’s

Returns

None

response_names()[source]

Get list of names modeled response variables.

Returns

list of str; names modeled response variables.

responses()[source]

Get list of modeled response variables.

Returns

list of Impulse; modeled response variables.

to_lmer_formula_string(z=False, correlated=True)[source]

Generate an lme4-style LMER model string representing the structure of the current CDR model. Useful for 2-step analysis in which data are transformed using CDR, then fitted using LME.

Parameters
  • zbool; z-transform convolved predictors.

  • correlatedbool; whether to use correlated random intercepts and slopes.

Returns

str; the LMER formula string.

to_string(t=None)[source]

Stringify the formula, using t as the RHS.

Parameters

tIRFNode or None; IRF node to use as RHS. If None, uses root IRF associated with Formula instance.

Returns

str; stringified formula.

unablate_impulses(impulse_ids)[source]

Insert impulses in impulse_ids into fixed effects (leaving random effects structure unchanged).

Parameters

impulse_idslist of str; impulse ID’s

Returns

None

class cdr.formula.IRFNode(family=None, impulse=None, p=None, irfID=None, coefID=None, ops=None, cont=False, fixed=True, rangf=None, param_init=None, trainable=None)[source]

Bases: object

Data structure representing a node in a CDR IRF tree. For more information on how the CDR IRF structure is encoded as a tree, see the reference on CDR IRF trees.

Parameters
  • familystr; name of IRF kernel family.

  • impulseImpulse object or None; the impulse if terminal, else None.

  • pIRFNode object or None; the parent IRF node, or None if no parent (parent nodes can be connected after initialization).

  • irfIDstr or None; string ID of node if applicable. If None, automatically-generated ID will discribe node’s family and structural position.

  • coefIDstr or None; string ID of coefficient if applicable. If None, automatically-generated ID will discribe node’s family and structural position. Only applicable to terminal nodes, so this property will not be used if the node is non-terminal.

  • opslist of str, or None; ops to apply to IRF node. If None, no ops.

  • contbool; Node connects directly to a continuous predictor. Only applicable to terminal nodes, so this property will not be used if the node is non-terminal.

  • fixedbool; Whether node exists in the model’s fixed effects structure.

  • rangflist of str, str, or None; names of any random grouping factors associated with the node.

  • param_initdict; map from parameter names to initial values, which will also be used as prior means.

  • trainablelist of str, or None; trainable parameters at this node. If None, all parameters are trainable.

ablate_impulses(impulse_ids)[source]

Remove impulses in impulse_ids from fixed effects (retaining in any random effects).

Parameters

impulse_idslist of str; impulse ID’s

Returns

None

add_child(t)[source]

Add child to this node in the IRF tree

Parameters

tIRFNode; child node.

Returns

IRFNode; child node with updated parent.

add_interactions(response_interactions)[source]

Add a ResponseInteraction object (or list of them) to this node.

Parameters

response_interactionResponseInteraction or list of ResponseInteraction; response interaction(s) to add

Returns

None

add_rangf(rangf)[source]

Add random grouping factor name to this node.

Parameters

rangfstr; random grouping factor name

Returns

None

atomic_irf_by_family()[source]

Get map from IRF kernel family names to list of IDs of IRFNode instances belonging to that family.

Returns

dict from str to list of str; IRF IDs by family.

atomic_irf_param_init_by_family()[source]

Get map from IRF kernel family names to maps from IRF IDs to maps from IRF parameter names to their initialization values.

Returns

dict; parameter initialization maps by family.

atomic_irf_param_trainable_by_family()[source]

Get map from IRF kernel family names to maps from IRF IDs to lists of trainable parameters.

Returns

dict; trainable parameter maps by family.

bases()[source]

Get the number of bases of node.

Returns

int or None; number of bases of node, or None if node is not a spline.

categorical_transform(X, expansion_map=None)[source]

Generate transformed copy of node with categorical predictors in X expanded. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length-1 list representing a single-tree forest, in which case the transformed tree is accessible as the 0th element.

Parameters
  • X – list of pandas tables; input data.

  • expansion_mapdict; Internal variable. Do not use.

Returns

list of IRFNode; tree forest representing current state of the transform.

coef2impulse()[source]

Get map from coefficient IDs dominated by node to lists of corresponding impulses.

Returns

dict; map from coefficient IDs to lists of corresponding impulses.

coef2terminal()[source]

Get map from coefficient IDs dominated by node to lists of corresponding terminal IRF nodes.

Returns

dict; map from coefficient IDs to lists of corresponding terminal IRF nodes.

coef_by_rangf()[source]

Get map from random grouping factor names to associated coefficient IDs dominated by node.

Returns

dict; map from random grouping factor names to associated coefficient IDs.

coef_id()[source]

Get coefficient ID for this node.

Returns

str or None; coefficient ID, or None if non-terminal.

coef_names()[source]

Get list of names of coefficients dominated by node.

Returns

list of str; names of coefficients dominated by node.

depth()[source]

Get depth of node in tree.

Returns

int; depth

fixed_coef_names()[source]

Get list of names of fixed coefficients dominated by node.

Returns

list of str; names of fixed coefficients dominated by node.

fixed_interaction_names()[source]

Get list of names of fixed interactions dominated by node.

Returns

list of str; names of fixed interactions dominated by node.

formula_terms()[source]

Return data structure representing formula terms dominated by node, grouped by random grouping factor. Key None represents the fixed portion of the model (no random grouping factor).

Returns

dict; map from random grouping factors data structure representing formula terms. Data structure contains 2 fields, 'impulses' containing impulses and 'irf' containing IRF Nodes.

has_coefficient(rangf)[source]

Report whether rangf has any coefficients in this subtree

Parameters

rangf – Random grouping factor

Returns

bool: Whether rangf has any coefficients in this subtree

has_composed_irf()[source]

Check whether node dominates any IRF compositions.

Returns

bool, whether node dominates any IRF compositions.

has_irf(rangf)[source]

Report whether rangf has any IRFs in this subtree

Parameters

rangf – Random grouping factor

Returns

bool: Whether rangf has any IRFs in this subtree

impulse2coef()[source]

Get map from impulses dominated by node to lists of corresponding coefficient IDs.

Returns

dict; map from impulses to lists of corresponding coefficient IDs.

impulse2terminal()[source]

Get map from impulses dominated by node to lists of corresponding terminal IRF nodes.

Returns

dict; map from impulses to lists of corresponding terminal IRF nodes.

impulse_names(include_interactions=False)[source]

Get list of names of impulses dominated by node.

Parameters

include_interactionsbool; whether to return impulses defined by interaction terms.

Returns

list of str; names of impulses dominated by node.

impulses(include_interactions=False)[source]

Get list of impulses dominated by node.

Parameters

include_interactionsbool; whether to return impulses defined by interaction terms.

Returns

list of Impulse; impulses dominated by node.

impulses_by_name()[source]

Get dictionary mapping names of impulses dominated by node to their corresponding impulses.

Returns

dict; map from impulse names to impulses

impulses_from_response_interaction()[source]

Get list of any impulses from response interactions associated with this node.

Returns

list of Impulse; impulses dominated by node.

interaction_by_rangf()[source]

Get map from random grouping factor names to associated interaction IDs dominated by node.

Returns

dict; map from random grouping factor names to associated interaction IDs.

interaction_names()[source]

Get list of names of interactions dominated by node.

Returns

list of str; names of interactions dominated by node.

interactions()[source]

Return list of all response interactions used in this subtree, sorted by name.

Returns

list of ResponseInteraction

interactions2inputs()[source]

Get map from IDs of ResponseInteractions dominated by node to lists of IDs of their inputs.

Returns

dict; map from IDs of ResponseInteractions nodes to lists of their inputs.

irf_by_rangf()[source]

Get map from random grouping factor names to IDs of associated IRF nodes dominated by node.

Returns

dict; map from random grouping factor names to IDs of associated IRF nodes.

irf_id()[source]

Get IRF ID for this node.

Returns

str or None; IRF ID, or None if terminal.

irf_to_formula(rangf=None)[source]

Generates a representation of this node’s impulse response kernel in formula string syntax

Parameters

rangf – random grouping factor for which to generate the stringification (fixed effects if rangf==None).

Returns

str; formula string representation of node

is_LCG()[source]

Check the non-parametric type of a node’s kernel, or return None if parametric.

Parameters

familystr; name of IRF family

Returns

str or None; name of kernel type if non-parametric, else ``None.

local_name()[source]

Get descriptive name for this node, ignoring its position in the IRF tree.

Returns

str; name.

name()[source]

Get descriptive name for this node.

Returns

str; name.

node_table()[source]

Get map from names to nodes of all nodes dominated by node (including self).

Returns

dict; map from names to nodes of all nodes dominated by node.

nonparametric_coef_names()[source]

Get list of names of nonparametric coefficients dominated by node. :return: list of str; names of spline coefficients dominated by node.

pc_transform(n_pc, pointers=None)[source]

Generate principal-components-transformed copy of node. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length-1 list representing a single-tree forest, in which case the transformed tree is accessible as the 0th element.

Parameters
  • n_pcint; number of principal components in transform.

  • pointersdict; map from source nodes to transformed nodes.

Returns

list of IRFNode; tree forest representing current state of the transform.

static pointers2namemmaps(p)[source]

Get a map from source to transformed IRF node names.

Parameters

pdict; map from source to transformed IRF nodes.

Returns

dict; map from source to transformed IRF node names.

remove_impulses(impulse_ids)[source]

Remove impulses in impulse_ids from the model (both fixed and random effects).

Parameters

impulse_idslist of str; impulse ID’s

Returns

None

supports_non_causal()[source]

Check whether model contains only IRF kernels that lack the causality constraint t >= 0.

Returns

bool: whether model contains only IRF kernels that lack the causality constraint t >= 0.

terminal()[source]

Check whether node is terminal.

Returns

bool; whether node is terminal.

terminal2coef()[source]

Get map from IDs of terminal IRF nodes dominated by node to lists of corresponding coefficient IDs.

Returns

dict; map from IDs of terminal IRF nodes to lists of corresponding coefficient IDs.

terminal2impulse()[source]

Get map from terminal IRF nodes dominated by node to lists of corresponding impulses.

Returns

dict; map from terminal IRF nodes to lists of corresponding impulses.

terminal_names()[source]

Get list of names of terminal IRF nodes dominated by node.

Returns

list of str; names of terminal IRF nodes dominated by node.

terminals()[source]

Get list of terminal IRF nodes dominated by node.

Returns

list of IRFNode; terminal IRF nodes dominated by node.

terminals_by_name()[source]

Get dictionary mapping names of terminal IRF nodes dominated by node to their corresponding nodes.

Returns

dict; map from node names to nodes

unablate_impulses(impulse_ids)[source]

Insert impulses in impulse_ids into fixed effects (leaving random effects structure unchanged).

Parameters

impulse_idslist of str; impulse ID’s

Returns

None

unary_nonparametric_coef_names()[source]

Get list of names of non-parametric coefficients with no siblings dominated by node. Because unary splines are non-parametric, their coefficients are fixed at 1. Trainable coefficients are therefore perfectly confounded with the spline parameters. Splines dominating multiple coefficients are excepted, since the same kernel shape must be scaled in different ways.

Returns

list of str; names of unary spline coefficients dominated by node.

class cdr.formula.Impulse(name, ops=None)[source]

Bases: object

Data structure representing an impulse in a CDR model.

Parameters
  • namestr; name of impulse

  • opslist of str, or None; ops to apply to impulse. If None, no ops.

categorical(X)[source]

Checks whether impulse is categorical in a dataset

Parameters

X – list pandas tables; data to to check.

Returns

bool; True if impulse is categorical in X, False otherwise.

expand_categorical(X)[source]

Expand any categorical predictors in X into 1-hot columns.

Parameters

X – list of pandas tables; input data

Returns

2-tuple of pandas table, list of Impulse; expanded data, list of expanded Impulse objects

name()[source]

Get name of term.

Returns

str; name.

class cdr.formula.ImpulseInteraction(impulses, ops=None)[source]

Bases: object

Data structure representing an interaction of impulse-aligned variables (impulses) in a CDR model.

Parameters
  • impulseslist of Impulse; impulses to interact.

  • opslist of str, or None; ops to apply to interaction. If None, no ops.

expand_categorical(X)[source]

Expand any categorical predictors in X into 1-hot columns.

Parameters

X – list of pandas tables; input data.

Returns

3-tuple of pandas table, list of ImpulseInteraction, list of list of Impulse; expanded data, list of expanded ImpulseInteraction objects, list of lists of expanded Impulse objects, one list for each interaction.

impulses()[source]

Get list of impulses dominated by interaction.

Returns

list of Impulse; impulses dominated by interaction.

name()[source]

Get name of interation impulse.

Returns

str; name.

class cdr.formula.ResponseInteraction(responses, rangf=None)[source]

Bases: object

Data structure representing an interaction of response-aligned variables (containing at least one IRF-convolved impulse) in a CDR model.

Parameters
  • responseslist of terminal IRFNode, Impulse, and/or ImpulseInteraction objects; responses to interact.

  • rangfstr or list of str; random grouping factors for which to build random effects for this interaction.

add_rangf(rangf)[source]

Add random grouping factor name to this interaction.

Parameters

rangfstr; random grouping factor name

Returns

None

contains_member(x)[source]

Check if object is a member of the set of responses belonging to this interaction

Parameters

xIRFNode, Impulse, and/or ImpulseInteraction object; object to check.

Returns

bool; whether x is a member of the set of responses

irf_responses()[source]

Get list of IRFs dominated by interaction.

Returns

list of IRFNode objects; terminal IRFs dominated by interaction.

name()[source]

Get name of interation impulse.

Returns

str; name.

non_irf_responses()[source]

Get list of non-IRF response-aligned variables dominated by interaction.

Returns

list of Impulse and/or ImpulseInteraction objects; non-IRF variables dominated by interaction.

replace(old, new)[source]

Replace an old input with a new one

Parameters
  • oldIRFNode, Impulse, and/or ImpulseInteraction object; response to remove.

  • newIRFNode, Impulse, and/or ImpulseInteraction object; response to add.

Returns

None

responses()[source]

Get list of variables dominated by interaction.

Returns

list of IRFNode, Impulse, and/or ImpulseInteraction objects; impulses dominated by interaction.

cdr.formula.pythonize_string(s)[source]

Convert string to valid python variable name

Parameters

sstr; source string

Returns

str; pythonized string

cdr.formula.standardize_formula_string(s)[source]

Standardize a formula string, removing notational variation. IRF specifications C(...) are sorted alphabetically by the IRF call name e.g. Gamma(). The order of impulses within an IRF specification is preserved.

Parameters

sstr; the formula string to be standardized

Returns

str; standardization of s

cdr.io module

cdr.io.read_tabular_data(X_paths, Y_paths, series_ids, categorical_columns=None, sep=' ', verbose=True)[source]

Read impulse and response data into pandas dataframes and perform basic pre-processing.

Parameters
  • X_pathsstr or list of str; path(s) to impulse (predictor) data (multiple tables are concatenated). Each path may also be a ;-delimited list of paths to files containing predictors with different timestamps, where the predictors in each file all share the same set of timestamps.

  • Y_pathsstr or list of str; path(s) to response data (multiple tables are concatenated). Each path may also be a ;-delimited list of paths to files containing different response variables with different timestamps, where the response variables in each file all share the same set of timestamps.

  • series_idslist of str; column names whose jointly unique values define unique time series.

  • categorical_columnslist of str; column names that should be treated as categorical.

  • sepstr; string representation of field delimiter in input data.

  • verbosebool; whether to log progress to stderr.

Returns

2-tuple of list(pandas DataFrame); (impulse data, response data). X and Y each have one element for each dataset in X_paths/Y_paths, each containing the column-wise concatenation of all column files in the path.

cdr.kwargs module

class cdr.kwargs.Kwarg(key, default_value, dtypes, descr, aliases=None, default_value_cdrnn='same', suppress=False)[source]

Bases: object

Data structure for storing keyword arguments and their docstrings.

Parameters
  • keystr; Key

  • default_value – Any; Default value

  • dtypeslist or class; List of classes or single class. Members can also be specific required values, either None or values of type str.

  • descrstr; Description of kwarg

  • default_value_cdrnn – Any; Default value for CDRNN if distinct from CDR. If 'same', CDRNN uses default_value.

  • suppressbool; Whether to print documentation for this kwarg. Useful for hiding deprecated or little-used kwargs in order to simplify autodoc output.

dtypes_str()[source]

String representation of dtypes permitted for kwarg.

Returns

str; dtypes string.

get_type_name(x)[source]

String representation of name of a dtype

Parameters

x – dtype; the dtype to name.

Returns

str; name of dtype.

in_settings(settings)[source]

Check whether kwarg is specified in a settings object parsed from a config file.

Parameters

settings – settings from a ConfigParser object.

Returns

bool; whether kwarg is found in settings.

kwarg_from_config(settings, is_cdrnn=False)[source]

Given a settings object parsed from a config file, return value of kwarg cast to appropriate dtype. If missing from settings, return default.

Parameters
  • settings – settings from a ConfigParser object.

  • is_cdrnnbool; whether this is for a CDRNN model.

Returns

value of kwarg

static type_comparator(a, b)[source]

Types precede strings, which precede None

Parameters
  • a – First element

  • b – Second element

Returns

-1, 0, or 1, depending on outcome of comparison

cdr.kwargs.cdr_kwarg_docstring()[source]

Generate docstring snippet summarizing all CDR kwargs, dtypes, and defaults.

Returns

str; docstring snippet

cdr.kwargs.docstring_from_kwarg(kwarg)[source]

Generate docstring from CDR keyword argument object.

Parameters

kwarg – Keyword argument object.

Returns

str; docstring.

cdr.kwargs.plot_kwarg_docstring()[source]

Generate docstring snippet summarizing all plotting kwargs, dtypes, and defaults.

Returns

str; docstring snippet

cdr.opt module

cdr.plot module

class cdr.plot.MidpointNormalize(vcenter=0.0, vmin=None, vmax=None, clip=False)[source]

Bases: matplotlib.colors.Normalize

cdr.plot.plot_heatmap(m, row_names, col_names, dir='.', filename='eigenvectors.png', plot_x_inches=7, plot_y_inches=5, cmap='Blues')[source]

Plot a heatmap. Used in CDR for visualizing eigenvector matrices in principal components models.

Parameters
  • m – 2D numpy array; source data for plot.

  • row_nameslist of str; row names.

  • col_nameslist of str; column names.

  • dirstr; output directory.

  • filenamestr; filename.

  • plot_x_inchesfloat; width of plot in inches.

  • plot_y_inchesfloat; height of plot in inches.

  • cmapstr; name of matplotlib cmap object (determines colors of plotted IRF).

Returns

None

cdr.plot.plot_irf(plot_x, plot_y, irf_names, lq=None, uq=None, density=None, sort_names=True, prop_cycle_length=None, prop_cycle_map=None, dir='.', filename='irf_plot.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, ylim=None, cmap='gist_rainbow', legend=True, xlab=None, ylab=None, use_line_markers=False, transparent_background=False, dpi=300, dump_source=False)[source]

Plot impulse response functions.

Parameters
  • plot_xnumpy array with shape (T,1); time points for which to plot the response. For example, if the plots contain 1000 points from 0s to 10s, plot_x could be generated as np.linspace(0, 10, 1000).

  • plot_ynumpy array with shape (T, N); response of each IRF at each time point.

  • irf_nameslist of str; CDR ID’s of IRFs in the same order as they appear in axis 1 of plot_y.

  • lqnumpy array with shape (T, N), or None; lower bound of credible interval for each time point. If None, no credible interval will be plotted.

  • uqnumpy array with shape (T, N), or None; upper bound of credible interval for each time point. If None, no credible interval will be plotted.

  • sort_namesbool; alphabetically sort IRF names.

  • prop_cycle_lengthint or None; Length of plotting properties cycle (defines step size in the color map). If None, inferred from irf_names.

  • prop_cycle_maplist of int, or None; Integer indices to use in the properties cycle for each entry in irf_names. If None, indices are automatically assigned.

  • dirstr; output directory.

  • filenamestr; filename.

  • irf_name_mapdict of str to str; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.

  • plot_x_inchesfloat; width of plot in inches.

  • plot_y_inchesfloat; height of plot in inches.

  • ylim – 2-element tuple or list; (lower_bound, upper_bound) to use for y axis. If None, automatically inferred.

  • cmapstr; name of matplotlib cmap object (determines colors of plotted IRF).

  • legendbool; include a legend.

  • xlabstr or None; x-axis label. If None, no label.

  • ylabstr or None; y-axis label. If None, no label.

  • use_line_markersbool; add markers to IRF lines.

  • transparent_backgroundbool; use a transparent background. If False, uses a white background.

  • dpiint; dots per inch.

  • dump_sourcebool; Whether to dump the plot source array to a csv file.

Returns

None

cdr.plot.plot_qq(theoretical, actual, actual_color='royalblue', expected_color='firebrick', dir='.', filename='qq_plot.png', plot_x_inches=6, plot_y_inches=4, legend=True, xlab='Theoretical', ylab='Empirical', ticks=True, as_lines=False, transparent_background=False, dpi=300)[source]

Generate quantile-quantile plot.

Parameters
  • theoreticalnumpy array with shape (T,); theoretical error quantiles.

  • actualnumpy array with shape (T,); empirical errors.

  • actual_colorstr; color for actual values.

  • expected_colorstr; color for expected values.

  • dirstr; output directory.

  • filenamestr; filename.

  • plot_x_inchesfloat; width of plot in inches.

  • plot_y_inchesfloat; height of plot in inches.

  • legendbool; include a legend.

  • xlabstr or None; x-axis label. If None, no label.

  • ylabstr or None; y-axis label. If None, no label.

  • as_linesbool; render QQ plot using lines. Otherwise, use points.

  • transparent_backgroundbool; use a transparent background. If False, uses a white background.

  • dpiint; dots per inch.

Returns

None

cdr.plot.plot_surface(x, y, z, lq=None, uq=None, density=None, bounds_as_surface=False, dir='.', filename='surface.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, xlim=None, ylim=None, zlim=None, plot_type='wireframe', cmap='coolwarm', xlab=None, ylab=None, zlab='Response', title=None, transparent_background=False, dpi=300, dump_source=False)[source]

Plot an IRF or interaction surface.

Parameters
  • xnumpy array with shape (M,N); x locations for each plot point, copied N times.

  • ynumpy array with shape (M,N); y locations for each plot point, copied M times.

  • znumpy array with shape (M,N); z locations for each plot point.

  • lqnumpy array with shape (M,N), or None; lower bound of credible interval for each plot point. If None, no credible interval will be plotted.

  • uqnumpy array with shape (M,N), or None; upper bound of credible interval for each plot point. If None, no credible interval will be plotted.

  • bounds_as_surfacebool; whether to plot interval bounds using additional surfaces. If False, bounds are plotted with vertical error bars instead. Ignored if lq, uq are None.

  • dirstr; output directory.

  • filenamestr; filename.

  • irf_name_mapdict of str to str; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.

  • plot_x_inchesfloat; width of plot in inches.

  • plot_y_inchesfloat; height of plot in inches.

  • xlim – 2-element tuple or list or None; (lower_bound, upper_bound) to use for x axis. If None, automatically inferred.

  • ylim – 2-element tuple or list or None; (lower_bound, upper_bound) to use for y axis. If None, automatically inferred.

  • zlim – 2-element tuple or list or None; (lower_bound, upper_bound) to use for z axis. If None, automatically inferred.

  • plot_typestr; name of plot type to generate. One of ["contour", "surf", "trisurf"].

  • cmapstr; name of matplotlib cmap object (determines colors of plotted IRF).

  • legendbool; include a legend.

  • xlabstr or None; x-axis label. If None, no label.

  • ylabstr or None; y-axis label. If None, no label.

  • zlabstr or None; z-axis label. If None, no label.

  • use_line_markersbool; add markers to IRF lines.

  • transparent_backgroundbool; use a transparent background. If False, uses a white background.

  • dpiint; dots per inch.

  • dump_sourcebool; Whether to dump the plot source array to a csv file.

Returns

None

cdr.signif module

cdr.signif.correlation_test(y, x1, x2, nested=False, verbose=True)[source]

Perform a parametric test of difference in correlation with observations between two prediction vectors, based on Steiger (1980).

Parameters
  • ynumpy vector; observation vector.

  • x1numpy vector; first prediction vector.

  • x2numpy vector; second prediction vector.

  • nestedbool; assume that the second model is nested within the first.

  • verbosebool; report progress logs to standard error.

Returns

cdr.signif.permutation_test(a, b, n_iter=10000, n_tails=2, mode='loss', nested=False, verbose=True)[source]

Perform a paired permutation test for significance.

Parameters
  • anumpy vector; first error/loss/prediction vector.

  • bnumpy vector; second error/loss/prediction vector.

  • n_iterint; number of resampling iterations.

  • n_tailsint; number of tails.

  • modestr; one of ["mse", "loglik"], the type of error used (SE’s are averaged while loglik’s are summed).

  • nestedbool; assume that the second model is nested within the first.

  • verbosebool; report progress logs to standard error.

Returns

cdr.synth module

class cdr.synth.SyntheticModel(n_pred, irf_name, irf_params=None, coefs=None)[source]

Bases: object

A data structure representing a synthetic “true” model for empirical validation of CDR fits. Contains a randomly generated set of IRFs that can be used to convolve data, and provides methods for sampling data with particular structure and convolving it with the true IRFs in order to generate a response vector.

Parameters
  • n_predint; Number of predictors in the synthetic model.

  • irf_namestr; Name of IRF kernel to use. One of ['Exp', 'Normal', 'Gamma', 'ShiftedGamma'].

  • irf_paramsdict or None; Dictionary of IRF parameters to use, with parameter names as keys and numeric arrays as values. Values must each have n_pred cells. If None, parameter values will be randomly sampled.

  • coefs – numpy array or None; Vector of coefficients to use, where len(coefs) == n_pred. If None, coefficients will be randomly sampled.

convolve(X, t_X, t_y, history_length=None, err_sd=None, allow_instantaneous=True, verbose=True)[source]

Convolve data using the model’s IRFs.

Parameters
  • X – numpy array; 2-D array of predictors.

  • t_X – numpy array; 1-D vector of predictor timestamps.

  • t_y – numpy array; 1-D vector of response timestamps.

  • history_lengthint or None; Drop preceding events more than history_length steps into the past. If None, no history clipping.

  • err_sdfloat or None; Standard deviation of Gaussian noise to inject into responses. If None, use the empirical standard deviation of the response vector.

  • allow_instantaneousbool; Whether to compute responses when t==0.

  • verbosebool; Verbosity.

Returns

(2-D numpy array, 1-D numpy array); Matrix of convolved predictors, vector of responses

convolve_v2(X, t_X, t_y, err_sd=None, allow_instantaneous=True, verbose=True)[source]

Convolve data using the model’s IRFs. Alternate memory-intensive implementation that is faster for small arrays but can exhaust resources for large ones.

Parameters
  • X – numpy array; 2-D array of predictors.

  • t_X – numpy array; 1-D vector of predictor timestamps.

  • t_y – numpy array; 1-D vector of response timestamps.

  • err_sdfloat; Standard deviation of Gaussian noise to inject into responses.

  • allow_instantaneousbool; Whether to compute responses when t==0.

  • verbosebool; Verbosity.

Returns

(2-D numpy array, 1-D numpy array); Matrix of convolved predictors, vector of responses

get_curves(n_time_units=None, n_time_points=None)[source]

Extract response curves as an array.

Parameters
  • n_time_unitsfloat; Number of units of time over which to extract curves.

  • n_time_pointsint; Number of samples to extract for each curve (resolution of curve)

Returns

numpy array; 2-D numpy array with shape [T, K], where T is n_time_points and K is the number of predictors in the model.

irf(x, coefs=False)[source]

Computes the values of the model’s IRFs elementwise over a vector of timepoints.

Parameters
  • x – numpy array; 1-D array with shape [N] containing timepoints at which to query the IRFs.

  • coefsbool; Whether to rescale responses by coefficients

Returns

numpy array; 2-D array with shape [N, K] containing values of the model’s K IRFs evaluated at the timepoints in x.

plot_irf(n_time_units=None, n_time_points=None, dir='.', filename='synth_irf.png', plot_x_inches=6, plot_y_inches=4, cmap='gist_rainbow', legend=False, xlab=None, ylab=None, use_line_markers=False, transparent_background=False)[source]

Plot impulse response functions.

Parameters
  • n_time_unitsfloat; number if time units to use for plotting.

  • n_time_pointsint; number of points to use for plotting.

  • dirstr; output directory.

  • filenamestr; filename.

  • plot_x_inchesfloat; width of plot in inches.

  • plot_y_inchesfloat; height of plot in inches.

  • cmapstr; name of matplotlib cmap object (determines colors of plotted IRF).

  • legendbool; include a legend.

  • xlabstr or None; x-axis label. If None, no label.

  • ylabstr or None; y-axis label. If None, no label.

  • use_line_markersbool; add markers to IRF lines.

  • transparent_backgroundbool; use a transparent background. If False, uses a white background.

Returns

None

sample_data(m, n=None, X_interval=None, y_interval=None, rho=None, align_X_y=True)[source]

Samples synthetic predictors and time vectors

Parameters
  • mint; Number of predictors.

  • nint; Number of response query points.

  • X_intervalstr, float, list, tuple, or None; Predictor interval model. If None, predictor offsets are randomly sampled from an exponential distribution with parameter 1. If float, predictor offsets are evenly spaced with interval X_interval. If list or tuple, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.

  • y_intervalstr, float, list, tuple, or None; Response interval model. If None, response offsets are randomly sampled from an exponential distribution with parameter 1. If float, response offsets are evenly spaced with interval y_interval. If list or tuple, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.

  • rhofloat; Level of pairwise correlation between predictors.

  • align_X_ybool; Whether predictors and responses are required to be sampled at the same points in time.

Returns

(2-D numpy array, 1-D numpy array, 1-D numpy array); Matrix of predictors, vector of predictor timestamps, vector of response timestamps

cdr.util module

cdr.util.filter_models(names, filters, cdr_only=False)[source]

Return models contained in names that are permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name. If filters is zero-length, returns names.

Parameters
  • nameslist of str; pool of model names to filter.

  • filterslist of {str, SRE_Pattern}; filters to apply in order

  • cdr_onlybool; if True, only returns CDR models. If False, returns all models admitted by filters.

Returns

list of str; names in names that pass at least one filter, or all of names if no filters are applied.

cdr.util.filter_names(names, filters)[source]

Return elements of names permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name.

Parameters
  • nameslist of str; pool of names to filter.

  • filterslist of {str, SRE_Pattern}; filters to apply in order

Returns

list of str; names in names that pass at least one filter

cdr.util.get_random_permutation(n)[source]

Draw a random permutation of integers 0 to n. Used to shuffle arrays of length n. For example, a permutation and its inverse can be generated by calling p, p_inv = get_random_permutation(n). To randomly shuffle an n-dimensional vector x, call x[p]. To un-shuffle x after it has already been shuffled, call x[p_inv].

Parameters

n – maximum value

Returns

2-tuple of numpy arrays; the permutation and its inverse

cdr.util.load_cdr(dir_path)[source]

Convenience method for reconstructing a saved CDR object. First loads in metadata from m.obj, then uses that metadata to construct the computation graph. Then, if saved weights are found, these are loaded into the graph.

Parameters

dir_path – Path to directory containing the CDR checkpoint files.

Returns

The loaded CDR instance.

cdr.util.mae(true, preds)[source]

Compute mean absolute error (MAE).

Parameters
  • true – True values

  • preds – Predicted values

Returns

float; MAE

cdr.util.mse(true, preds)[source]

Compute mean squared error (MSE).

Parameters
  • true – True values

  • preds – Predicted values

Returns

float; MSE

cdr.util.names2ix(names, l, dtype=<class 'numpy.int32'>)[source]

Generate 1D numpy array of indices in l corresponding to names in names

Parameters
  • nameslist of str; names to look up in l

  • llist of str; list of names from which to extract indices

  • dtypenumpy dtype object; return dtype

Returns

numpy array; indices of names in l

cdr.util.nested(model_name_1, model_name_2)[source]

Check whether two CDR models are nested with 1 degree of freedom

Parameters
  • model_name_1str; name of first model

  • model_name_2str; name of second model

Returns

bool; True if models are nested with 1 degree of freedom, False otherwise

cdr.util.pca(X, n_dim=None, dtype=<class 'numpy.float32'>)[source]

Perform principal components analysis on a data table.

Parameters
  • Xnumpy or pandas array; the input data

  • n_dimint or None; maximum number of principal components. If None, all components are retained.

  • dtypenumpy dtype; return dtype

Returns

5-tuple of numpy arrays; transformed data, eigenvectors, eigenvalues, input means, and input standard deviations

cdr.util.percent_variance_explained(true, preds)[source]

Compute percent variance explained.

Parameters
  • true – True values

  • preds – Predicted values

Returns

float; percent variance explained

cdr.util.reg_name(string)[source]

Standardize a variable name for regularization

Parameters

stringstr; input string

Returns

str; transformed string

cdr.util.sn(string)[source]

Compute a Tensorboard-compatible version of a string.

Parameters

stringstr; input string

Returns

str; transformed string