CDR Package API¶
Complete API for all public classes and methods in this package.
cdr.backend module¶

cdr.backend.
CDRNNStateTuple
¶ alias of
cdr.backend.AttentionalLSTMDecoderStateTuple
cdr.base module¶
cdr.config module¶

class
cdr.config.
Config
(path)[source]¶ Bases:
object
Parses an *.ini file and stores settings needed to define a set of CDR experiments.
 Parameters
path – Path to *.ini file

build_cdr_settings
(settings, add_defaults=True, global_settings=None, is_cdr=True, is_cdrnn=False)[source]¶ Given a settings object parsed from a config file, compute CDR parameter dictionary.
 Parameters
settings – settings from a
ConfigParser
object.add_defaults –
bool
; whether to add default settings not explicitly specified in the config.global_settings –
dict
orNone
; dictionary of global defaults for parameters missing from settings.is_cdr –
bool
; whether this is a CDR(NN) model.is_cdrnn –
bool
; whether this is a CDRNN model.
 Returns
dict
; dictionary of settings keyvalue pairs.
cdr.data module¶

cdr.data.
add_responses
(names, y)[source]¶ Add response variable(s) to a dataframe, applying any preprocessing required by the formula string.
 Parameters
names –
str
orlist
ofstr
; name(s) of dependent variable(s)y –
pandas
DataFrame
; response data.
 Returns
pandas
DataFrame
; response data with any missing ops applied.

cdr.data.
build_CDR_impulse_data
(X, first_obs, last_obs, X_in_Y_names=None, X_in_Y=None, impulse_names=None, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]¶ Construct impulse data arrays in the required format for CDR fitting/evaluation for a single response array.
 Parameters
X –
list
ofpandas
tables; impulse (predictor) data.first_obs –
list
of index vectors (list
,pandas
series, ornumpy
vector) of first observations; the list contains vectors of row indices, one for each element of X, of the first impulse in the time series associated with the response. IfNone
, inferred from Y.last_obs –
list
of index vectors (list
,pandas
series, ornumpy
vector) of last observations; the list contains vectors of row indices, one for each element of X, of the last impulse in the time series associated with the response. IfNone
, inferred from Y.X_in_Y_names –
list
ofstr
; names of predictors contained in Y rather than X. IfNone
, no such predictors.X_in_Y –
pandas
DataFrame
orNone
; table of predictors contained in Y rather than X. IfNone
, no such predictors.impulse_names –
list
ofstr
; names of columns in X to be used as impulses by the model. IfNone
, all columns returned.history_length –
int
; maximum number of history (backward) observations.future_length –
int
; maximum number of future (forward) observations.int_type –
str
; name of int type.float_type –
str
; name of float type.
 Returns
triple of
numpy
arrays; let N, T, I, R respectively be the number of rows in Y, history length, number of impulse dimensions, and number of response dimensions. Outputs are (1) impulses with shape (N, T, I), (2) impulse timestamps with shape (N, T, I), and impulse mask with shape (N, T, I).

cdr.data.
build_CDR_response_data
(responses, Y=None, first_obs=None, last_obs=None, Y_time=None, Y_gf=None, X_in_Y_names=None, X_in_Y=None, Y_category_map=None, response_to_df_ix=None, gf_names=None, gf_map=None)[source]¶ Construct response data arrays in the required format for CDR fitting/evaluation for one or more response arrays.
 Parameters
responses –
list
ofstr
; names of columns in Y to be used as responses (dependent variables) by the model.Y –
list
ofpandas
tables, orNone
; response data. IfNone
, does not return a response array.first_obs –
list
oflist
of index vectors (list
,pandas
series, ornumpy
vector) of first observations, orNone
; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the first impulse in the time series associated with each response. IfNone
, inferred from Y.last_obs –
list
oflist
of index vectors (list
,pandas
series, ornumpy
vector) of last observations, orNone
; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the last impulse in the time series associated with each response. IfNone
, inferred from Y.Y_time –
list
of response timestamp vectors (list
,pandas
series, ornumpy
vector), orNone
; vector(s) of response timestamps, one for each response array. Needed to timestamp any responsealigned predictors (ignored if none in model).Y_gf –
list
ofpandas
DataFrame
, orNone
; vector(s) of response timestamps, one for each response array. Data frames containing random grouping factor levels, if applicable.X_in_Y_names –
list
ofstr
; names of predictors contained in Y rather than X (must be present in all elements of Y). IfNone
, no such predictors.X_in_Y –
list
ofpandas
DataFrame
orNone
; tables (one per response array) of predictors contained in Y rather than X (must be present in all elements of Y). IfNone
, no such predictors.Y_category_map –
dict
orNone
; map from category labels to integers for each categorical response.response_to_df_ix –
dict
orNone
; map from response names to lists of indices of the response files that contain them.gf_names –
list
orNone
; list of names of random grouping factor variables. IfNone
and Y_gf provided, will use all columns of Y_gf.gf_map –
list
ofdict
orNone
; list maps from random grouping factor levels to their indices, one map per grouping factor variable in gf_names.
 Returns
7tuple of
numpy
arrays; let N, R, XF, YF, Z, and K respectively be the number of rows (sum total number of rows in Y), number of response dimensions, number of distinct predictor files (X), number of distinct response files (Y), number of random grouping factor variables, and number of response_aligned predictors. Outputs are (1) responses with shape (N, R) orNone
if Y isNone
, (2) an XFtuple of first observation vectors indexing start indices for each entry in X, (3) a YFtuple of first observation vectors indexing end indices for each entry in X, (4) response timestamps with shape (N,), (5) response masks (masking out any missing response variables per row) with shape (N, R), (6) random grouping factor matrix with shape (N, Z), orNone
if no random grouping factors provided, and (7) responsealigned predictors with shape (N, K).

cdr.data.
c
(df)[source]¶ Zerocenter pandas series or data frame
 Parameters
df –
pandas
Series
orDataFrame
; input date Returns
pandas
Series
orDataFrame
; centered data

cdr.data.
compute_filter
(y, field, cond)[source]¶ Compute filter given a field and condition
 Parameters
y –
pandas
DataFrame
; response data.field –
str
; name of column on whose values to filter.cond –
str
; string representation of condition to use for filtering.
 Returns
numpy
vector; boolean mask to use forpandas
subsetting operations.

cdr.data.
compute_filters
(Y, filters=None)[source]¶ Compute filters given a filter map.
 Parameters
Y –
pandas
DataFrame
; response data.filters –
list
; list of keyvalue pairs mapping column names to filtering criteria for their values.
 Returns
numpy
vector; boolean mask to use forpandas
subsetting operations.

cdr.data.
compute_partition
(y, modulus, n)[source]¶ Given a
splitID
column, use modular arithmetic to partition data into n subparts. Parameters
y –
pandas
DataFrame
; response data.modulus –
int
; modulus to use for splitting, must be at least as large as n.n –
int
; number of subparts in the partition.
 Returns
list
ofnumpy
vectors; one boolean vector per subpart of the partition, selecting only those elements of y that belong.

cdr.data.
compute_splitID
(y, split_fields)[source]¶ Map tuples in columns designated by split_fields into integer ID to use for data partitioning.
 Parameters
y –
pandas
DataFrame
; response data.split_fields –
list
ofstr
; column names to use for computing split ID.
 Returns
numpy
vector; integer vector of split ID’s.

cdr.data.
compute_time_mask
(X_time, first_obs, last_obs, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]¶ Compute mask for expanded impulse data zeroing out nonexistent impulses.
 Parameters
X_time –
pandas
Series
; timestamps associated with each impulse in X.first_obs –
pandas
Series
; vector of row indices in X of the first impulse in the time series associated with each response.last_obs –
pandas
Series
; vector of row indices in X of the last preceding impulse in the time series associated with each response.history_length –
int
; maximum number of history (backward) observations.future_length –
int
; maximum number of future (forward) observations.int_type –
str
; name of int type.float_type –
str
; name of float type.
 Returns
numpy
array; boolean impulse mask.

cdr.data.
corr_cdr
(X_2d, impulse_names, impulse_names_2d, time, time_mask)[source]¶ Compute correlation matrix, including correlations across time where necessitated by 2D predictors.
 Parameters
X_2d –
numpy
array; the impulse data. Must be of shape(batch_len, history_length+future_length, n_impulses)
, can be computed from sources bybuild_CDR_impulse_data()
.impulse_names –
list
ofstr
; names of columns in X_2d to be used as impulses by the model.impulse_names_2d –
list
ofstr
; names of columns in X_2d that designate to 2D predictors.time – 3D
numpy
array; array of timestamps for each event in X_2d.time_mask – 3D
numpy
array; array of masks over padding events in X_2d.
 Returns
pandas
DataFrame
; the correlation matrix.

cdr.data.
expand_impulse_sequence
(X, X_time, first_obs, last_obs, window_length, int_type='int32', float_type='float32', fill=0.0)[source]¶ Expand out impulse stream in X for each response in the target data.
 Parameters
X –
pandas
DataFrame
; impulse (predictor) data.X_time –
pandas
Series
; timestamps associated with each impulse in X.first_obs –
pandas
Series
; vector of row indices in X of the first impulse in the time series associated with each response.last_obs –
pandas
Series
; vector of row indices in X of the last preceding impulse in the time series associated with each response.window_length –
int
; number of steps in time dimension of outputint_type –
str
; name of int type.float_type –
str
; name of float type.fill –
float
; fill value for padding cells.
 Returns
3tuple of
numpy
arrays; the expanded impulse array, the expanded timestamp array, and a boolean mask zeroing out locations of nonexistent impulses.

cdr.data.
filter_invalid_responses
(Y, dv, crossval_factor=None, crossval_fold=None)[source]¶ Filter out rows with nonfinite responses.
 Parameters
Y –
pandas
table orlist
ofpandas
tables; response data.dv –
str
orlist
ofstr
; name(s) of column(s) containing the dependent variable(s)crossval_factor –
str
orNone
; name of column containing the selection variable for cross validation. IfNone
, no cross validation filtering.crossval_fold –
list
orNone
; list of valid values for crossvalidation selection. Used only ifcrossval_factor
is notNone
.
 Returns
2tuple of
pandas
DataFrame
andpandas
Series
; valid data and indicator vector used to filter out invalid data.

cdr.data.
get_first_last_obs_lists
(y)[source]¶ Convenience utility to extract out all first_obs and last_obs columns in Y sorted by file index
 Parameters
y –
pandas
DataFrame
; response data. Returns
pair of
list
ofstr
; first_obs column names and last_obs column names

cdr.data.
get_rangf_array
(Y, rangf_names, rangf_map)[source]¶ Collect random grouping factor indicators as
numpy
integer arrays that can be read by Tensorflow. Returns vertical concatenation of GF arrays from each element of Y. Parameters
Y –
pandas
table orlist
ofpandas
tables; response data.rangf_names –
list
ofstr
; names of columns containing random grouping factor levels (order is preserved, changing the order will change the resulting array).rangf_map –
list
ofdict
; map for each random grouping factor from levels to unique indices.
 Returns

cdr.data.
get_time_windows
(X, Y, series_ids, forward=False, window_length=128, verbose=True)[source]¶ Compute row indices in X of initial and final impulses for each element of y. Assumes time series are already sorted by series_ids.
 Parameters
X –
pandas
DataFrame
; impulse (predictor) data.Y –
pandas
DataFrame
; response data.series_ids –
list
ofstr
; column names whose jointly unique values define unique time series.forward –
bool
; whether to compute forward windows (future inputs) or backward windows (past inputs, used if forward isFalse
).window_length –
int
; maximum size of time window to consider. Ifnp.inf
, no bound on window size.verbose –
bool
; whether to report progress to stderr
 Returns
2tuple of
numpy
vectors; first and last impulse observations (respectively) for each response in y

cdr.data.
preprocess_data
(X, Y, formula_list, series_ids, filters=None, history_length=128, future_length=0, all_interactions=False, verbose=True, debug=False)[source]¶ Preprocess CDR data.
 Parameters
X – list of
pandas
tables; impulse (predictor) data.Y – list of
pandas
tables; response data.formula_list –
list
ofFormula
; CDR formula for which to preprocess data.series_ids –
list
ofstr
; column names whose jointly unique values define unique time series.filters –
list
; list of keyvalue pairs mapping column names to filtering criteria for their values.history_length –
int
; maximum number of history (backward) observations.future_length –
int
; maximum number of future (forward) observations.all_interactions –
bool
; add powerset of all conformable interactions.verbose –
bool
; whether to report progress to stderrdebug –
bool
; print debugging information
 Returns
7tuple; predictor data, response data, filtering mask, responsealigned predictor names, responsealigned predictors, 2D predictor names, and 2D predictors

cdr.data.
s
(df)[source]¶ Rescale pandas series or data frame by its standard deviation
 Parameters
df –
pandas
Series
orDataFrame
; input date Returns
pandas
Series
orDataFrame
; rescaled data

cdr.data.
split_cdr_outputs
(outputs, lengths)[source]¶ Takes a dictionary of arbitrary depth containing CDR outputs with their labels as keys and splits each output into a list of outputs with lengths corresponding to lengths. Useful for aligning CDR outputs to response files, since multiple response files can be provided, which are underlyingly concatenated by CDR. Recursively modifies the dict in place.
 Parameters
outputs –
dict
of arbitrary depth withnumpy
arrays at the leaves; the source CDR outputslengths – arraylike vector of lengths to split the outputs into
 Returns
dict
; same keyval structure as outputs but with each leaf split into a list oflen(lengths)
vectors, one for each length value.
cdr.cdrbase module¶
cdr.cdrbayes module¶
cdr.cdrmle module¶
cdr.cdrnnbase module¶
cdr.cdrnnbayes module¶
cdr.cdrnnmle module¶
cdr.formula module¶

class
cdr.formula.
Formula
(bform_str, standardize=True)[source]¶ Bases:
object
A class for parsing Rstyle mixedeffects CDR model formula strings and applying them to CDR data matrices.
 Parameters
bform_str –
str
; an Rstyle mixedeffects CDR model formula string

ablate_impulses
(impulse_ids)[source]¶ Remove impulses in impulse_ids from fixed effects (retaining in any random effects).
 Parameters
impulse_ids –
list
ofstr
; impulse ID’s Returns
None

apply_formula
(X, Y, X_in_Y_names=None, all_interactions=False, series_ids=None)[source]¶ Extract all data and compute all transforms required by the model formula.
 Parameters
X – list of
pandas
tables; impulse data.Y – list of
pandas
tables; response data.X_in_Y_names –
list
orNone
; List of column names for responsealigned predictors (predictors measured for every response rather than for every input) if applicable,None
otherwise.all_interactions –
bool
; add powerset of all conformable interactions.series_ids –
list
ofstr
orNone
; list of ids to use as grouping factors for lagged effects. IfNone
, lagging will not be attempted.
 Returns
triple; transformed X, transformed y, responsealigned predictor names

apply_op
(op, arr)[source]¶ Apply op op to array arr.
 Parameters
op –
str
; name of op.arr –
numpy
orpandas
array; source data.
 Returns
numpy
array; transformed data.

apply_op_2d
(op, arr, time_mask)[source]¶ Apply op to 2D predictor (predictor whose value depends on properties of the response).
 Parameters
op –
str
; name of op.arr –
numpy
or array; source data.time_mask –
numpy
array; mask for padding cells
 Returns
numpy
array; transformed data

apply_ops
(impulse, X)[source]¶ Apply all ops defined for an impulse
 Parameters
impulse –
Impulse
object; the impulse.X – list of
pandas
tables; table containing the impulse data.
 Returns
pandas
table; table augmented with transformed impulse.

apply_ops_2d
(impulse, X_2d_predictor_names, X_2d_predictors, time_mask)[source]¶ Apply all ops defined for a 2D predictor (predictor whose value depends on properties of the response).
 Parameters
impulse –
Impulse
object; the impulse.X_2d_predictor_names –
list
ofstr
; names of 2D predictors.X_2d_predictors –
numpy
array; source data.time_mask –
numpy
array; mask for padding cells
 Returns
2tuple;
list
of new predictor name,numpy
array of predictor values

static
bases
(family)[source]¶ Get the number of bases of a spline kernel.
 Parameters
family –
str
; name of IRF family Returns
int
orNone
; number of bases of spline kernel, orNone
if family is not a spline.

build
(bform_str, standardize=True)[source]¶ Construct internal data from formula string
 Parameters
bform_str –
str
; source string. Returns
None

categorical_transform
(X)[source]¶ Get transformed formula with categorical predictors in X expanded.
 Parameters
X – list of
pandas
tables; input data. Returns
Formula
; transformedFormula
object

compute_2d_predictor
(predictor_name, X, first_obs, last_obs, history_length=128, future_length=None, minibatch_size=50000)[source]¶ Compute 2D predictor (predictor whose value depends on properties of the most recent impulse).
 Parameters
predictor_name –
str
; name of predictorX –
pandas
table; input datafirst_obs –
pandas
Series
or 1Dnumpy
array; row indices inX
of the start of the series associated with each regression target.last_obs –
pandas
Series
or 1Dnumpy
array; row indices inX
of the most recent observation in the series associated with each regression target.minibatch_size –
int
; minibatch size for computing predictor, can help with memory footprint
 Returns
2tuple; new predictor name,
numpy
array of predictor values

insert_impulses
(impulses, irf_str, rangf=None)[source]¶ Insert impulses in impulse_ids into fixed effects and all random terms.
 Parameters
impulse_ids –
list
ofstr
; impulse ID’s Returns
None

static
irf_params
(family)[source]¶ Return list of parameter names for a given IRF family.
 Parameters
family –
str
; name of IRF family Returns
list
ofstr
; parameter names

static
is_LCG
(family)[source]¶ Check whether a kernel is LCG.
 Parameters
family –
str
; name of IRF family Returns
bool
; whether the kernel is LCG (linear combination of Gaussians)

pc_transform
(n_pc, pointers=None)[source]¶ Get transformed formula with impulses replaced by principal components.
 Parameters
n_pc –
int
; number of principal components in transform.pointers –
dict
; map from source nodes to transformed nodes.
 Returns
list
ofIRFNode
; tree forest representing current state of the transform.

process_ast
(t, terms=None, has_intercept=None, ops=None, rangf=None, impulses_by_name=None, interactions_by_name=None, under_irf=False, under_interaction=False)[source]¶ Recursively process a node of the Python abstract syntax tree (AST) representation of the formula string and insert data into internal representation of model formula.
 Parameters
t – AST node.
terms –
list
orNone
; CDR terms computed so far, orNone
if no CDR terms computed.has_intercept –
dict
; map from random grouping factors to boolean values representing whether that grouping factor has a random intercept.None
is used as a key to refer to the populationlevel intercept.ops –
list
; names of ops computed so far, orNone
if no ops computed.rangf –
str
orNone
; name of rangf for random term currently being processed, orNone
if currently processing fixed effects portion of model.
 Returns
None

process_irf
(t, input, ops=None, rangf=None)[source]¶ Process data from AST node representing part of an IRF definition and insert data into internal representation of the model.
 Parameters
t – AST node.
input –
IRFNode
object; child IRF of current nodeops –
list
ofstr
, orNone
; ops applied to IRF. IfNone
, no ops appliedrangf –
str
orNone
; name of rangf for random term currently being processed, orNone
if currently processing fixed effects portion of model.
 Returns
IRFNode
object; the IRF node

remove_impulses
(impulse_ids)[source]¶ Remove impulses in impulse_ids from the model (both fixed and random effects).
 Parameters
impulse_ids –
list
ofstr
; impulse ID’s Returns
None

response_names
()[source]¶ Get list of names modeled response variables.
 Returns
list
ofstr
; names modeled response variables.

responses
()[source]¶ Get list of modeled response variables.
 Returns
list
ofImpulse
; modeled response variables.

to_lmer_formula_string
(z=False, correlated=True)[source]¶ Generate an
lme4
style LMER model string representing the structure of the current CDR model. Useful for 2step analysis in which data are transformed using CDR, then fitted using LME. Parameters
z –
bool
; ztransform convolved predictors.correlated –
bool
; whether to use correlated random intercepts and slopes.
 Returns
str
; the LMER formula string.

class
cdr.formula.
IRFNode
(family=None, impulse=None, p=None, irfID=None, coefID=None, ops=None, cont=False, fixed=True, rangf=None, param_init=None, trainable=None)[source]¶ Bases:
object
Data structure representing a node in a CDR IRF tree. For more information on how the CDR IRF structure is encoded as a tree, see the reference on CDR IRF trees.
 Parameters
family –
str
; name of IRF kernel family.impulse –
Impulse
object orNone
; the impulse if terminal, elseNone
.p –
IRFNode
object orNone
; the parent IRF node, orNone
if no parent (parent nodes can be connected after initialization).irfID –
str
orNone
; string ID of node if applicable. IfNone
, automaticallygenerated ID will discribe node’s family and structural position.coefID –
str
orNone
; string ID of coefficient if applicable. IfNone
, automaticallygenerated ID will discribe node’s family and structural position. Only applicable to terminal nodes, so this property will not be used if the node is nonterminal.ops –
list
ofstr
, orNone
; ops to apply to IRF node. IfNone
, no ops.cont –
bool
; Node connects directly to a continuous predictor. Only applicable to terminal nodes, so this property will not be used if the node is nonterminal.fixed –
bool
; Whether node exists in the model’s fixed effects structure.rangf –
list
ofstr
,str
, orNone
; names of any random grouping factors associated with the node.param_init –
dict
; map from parameter names to initial values, which will also be used as prior means.trainable –
list
ofstr
, orNone
; trainable parameters at this node. IfNone
, all parameters are trainable.

ablate_impulses
(impulse_ids)[source]¶ Remove impulses in impulse_ids from fixed effects (retaining in any random effects).
 Parameters
impulse_ids –
list
ofstr
; impulse ID’s Returns
None

add_child
(t)[source]¶ Add child to this node in the IRF tree
 Parameters
t –
IRFNode
; child node. Returns
IRFNode
; child node with updated parent.

add_interactions
(response_interactions)[source]¶ Add a ResponseInteraction object (or list of them) to this node.
 Parameters
response_interaction –
ResponseInteraction
orlist
ofResponseInteraction
; response interaction(s) to add Returns
None

add_rangf
(rangf)[source]¶ Add random grouping factor name to this node.
 Parameters
rangf –
str
; random grouping factor name Returns
None

atomic_irf_by_family
()[source]¶ Get map from IRF kernel family names to list of IDs of IRFNode instances belonging to that family.
 Returns
dict
fromstr
tolist
ofstr
; IRF IDs by family.

atomic_irf_param_init_by_family
()[source]¶ Get map from IRF kernel family names to maps from IRF IDs to maps from IRF parameter names to their initialization values.
 Returns
dict
; parameter initialization maps by family.

atomic_irf_param_trainable_by_family
()[source]¶ Get map from IRF kernel family names to maps from IRF IDs to lists of trainable parameters.
 Returns
dict
; trainable parameter maps by family.

bases
()[source]¶ Get the number of bases of node.
 Returns
int
orNone
; number of bases of node, orNone
if node is not a spline.

categorical_transform
(X, expansion_map=None)[source]¶ Generate transformed copy of node with categorical predictors in X expanded. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length1 list representing a singletree forest, in which case the transformed tree is accessible as the 0th element.
 Parameters
X – list of
pandas
tables; input data.expansion_map –
dict
; Internal variable. Do not use.
 Returns
list
ofIRFNode
; tree forest representing current state of the transform.

coef2impulse
()[source]¶ Get map from coefficient IDs dominated by node to lists of corresponding impulses.
 Returns
dict
; map from coefficient IDs to lists of corresponding impulses.

coef2terminal
()[source]¶ Get map from coefficient IDs dominated by node to lists of corresponding terminal IRF nodes.
 Returns
dict
; map from coefficient IDs to lists of corresponding terminal IRF nodes.

coef_by_rangf
()[source]¶ Get map from random grouping factor names to associated coefficient IDs dominated by node.
 Returns
dict
; map from random grouping factor names to associated coefficient IDs.

coef_id
()[source]¶ Get coefficient ID for this node.
 Returns
str
orNone
; coefficient ID, orNone
if nonterminal.

coef_names
()[source]¶ Get list of names of coefficients dominated by node.
 Returns
list
ofstr
; names of coefficients dominated by node.

fixed_coef_names
()[source]¶ Get list of names of fixed coefficients dominated by node.
 Returns
list
ofstr
; names of fixed coefficients dominated by node.

fixed_interaction_names
()[source]¶ Get list of names of fixed interactions dominated by node.
 Returns
list
ofstr
; names of fixed interactions dominated by node.

formula_terms
()[source]¶ Return data structure representing formula terms dominated by node, grouped by random grouping factor. Key
None
represents the fixed portion of the model (no random grouping factor). Returns
dict
; map from random grouping factors data structure representing formula terms. Data structure contains 2 fields,'impulses'
containing impulses and'irf'
containing IRF Nodes.

has_coefficient
(rangf)[source]¶ Report whether rangf has any coefficients in this subtree
 Parameters
rangf – Random grouping factor
 Returns
bool
: Whether rangf has any coefficients in this subtree

has_composed_irf
()[source]¶ Check whether node dominates any IRF compositions.
 Returns
bool
, whether node dominates any IRF compositions.

has_irf
(rangf)[source]¶ Report whether rangf has any IRFs in this subtree
 Parameters
rangf – Random grouping factor
 Returns
bool
: Whether rangf has any IRFs in this subtree

impulse2coef
()[source]¶ Get map from impulses dominated by node to lists of corresponding coefficient IDs.
 Returns
dict
; map from impulses to lists of corresponding coefficient IDs.

impulse2terminal
()[source]¶ Get map from impulses dominated by node to lists of corresponding terminal IRF nodes.
 Returns
dict
; map from impulses to lists of corresponding terminal IRF nodes.

impulse_names
(include_interactions=False)[source]¶ Get list of names of impulses dominated by node.
 Parameters
include_interactions –
bool
; whether to return impulses defined by interaction terms. Returns
list
ofstr
; names of impulses dominated by node.

impulses
(include_interactions=False)[source]¶ Get list of impulses dominated by node.
 Parameters
include_interactions –
bool
; whether to return impulses defined by interaction terms. Returns
list
ofImpulse
; impulses dominated by node.

impulses_by_name
()[source]¶ Get dictionary mapping names of impulses dominated by node to their corresponding impulses.
 Returns
dict
; map from impulse names to impulses

impulses_from_response_interaction
()[source]¶ Get list of any impulses from response interactions associated with this node.
 Returns
list
ofImpulse
; impulses dominated by node.

interaction_by_rangf
()[source]¶ Get map from random grouping factor names to associated interaction IDs dominated by node.
 Returns
dict
; map from random grouping factor names to associated interaction IDs.

interaction_names
()[source]¶ Get list of names of interactions dominated by node.
 Returns
list
ofstr
; names of interactions dominated by node.

interactions
()[source]¶ Return list of all response interactions used in this subtree, sorted by name.
 Returns
list
ofResponseInteraction

interactions2inputs
()[source]¶ Get map from IDs of ResponseInteractions dominated by node to lists of IDs of their inputs.
 Returns
dict
; map from IDs of ResponseInteractions nodes to lists of their inputs.

irf_by_rangf
()[source]¶ Get map from random grouping factor names to IDs of associated IRF nodes dominated by node.
 Returns
dict
; map from random grouping factor names to IDs of associated IRF nodes.

irf_to_formula
(rangf=None)[source]¶ Generates a representation of this node’s impulse response kernel in formula string syntax
 Parameters
rangf – random grouping factor for which to generate the stringification (fixed effects if rangf==None).
 Returns
str
; formula string representation of node

is_LCG
()[source]¶ Check the nonparametric type of a node’s kernel, or return
None
if parametric. Parameters
family –
str
; name of IRF family Returns
str
orNone; name of kernel type if nonparametric, else ``None
.

local_name
()[source]¶ Get descriptive name for this node, ignoring its position in the IRF tree.
 Returns
str
; name.

node_table
()[source]¶ Get map from names to nodes of all nodes dominated by node (including self).
 Returns
dict
; map from names to nodes of all nodes dominated by node.

nonparametric_coef_names
()[source]¶ Get list of names of nonparametric coefficients dominated by node. :return:
list
ofstr
; names of spline coefficients dominated by node.

pc_transform
(n_pc, pointers=None)[source]¶ Generate principalcomponentstransformed copy of node. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length1 list representing a singletree forest, in which case the transformed tree is accessible as the 0th element.
 Parameters
n_pc –
int
; number of principal components in transform.pointers –
dict
; map from source nodes to transformed nodes.
 Returns
list
ofIRFNode
; tree forest representing current state of the transform.

static
pointers2namemmaps
(p)[source]¶ Get a map from source to transformed IRF node names.
 Parameters
p –
dict
; map from source to transformed IRF nodes. Returns
dict
; map from source to transformed IRF node names.

remove_impulses
(impulse_ids)[source]¶ Remove impulses in impulse_ids from the model (both fixed and random effects).
 Parameters
impulse_ids –
list
ofstr
; impulse ID’s Returns
None

supports_non_causal
()[source]¶ Check whether model contains only IRF kernels that lack the causality constraint t >= 0.
 Returns
bool
: whether model contains only IRF kernels that lack the causality constraint t >= 0.

terminal2coef
()[source]¶ Get map from IDs of terminal IRF nodes dominated by node to lists of corresponding coefficient IDs.
 Returns
dict
; map from IDs of terminal IRF nodes to lists of corresponding coefficient IDs.

terminal2impulse
()[source]¶ Get map from terminal IRF nodes dominated by node to lists of corresponding impulses.
 Returns
dict
; map from terminal IRF nodes to lists of corresponding impulses.

terminal_names
()[source]¶ Get list of names of terminal IRF nodes dominated by node.
 Returns
list
ofstr
; names of terminal IRF nodes dominated by node.

terminals
()[source]¶ Get list of terminal IRF nodes dominated by node.
 Returns
list
ofIRFNode
; terminal IRF nodes dominated by node.

terminals_by_name
()[source]¶ Get dictionary mapping names of terminal IRF nodes dominated by node to their corresponding nodes.
 Returns
dict
; map from node names to nodes

unablate_impulses
(impulse_ids)[source]¶ Insert impulses in impulse_ids into fixed effects (leaving random effects structure unchanged).
 Parameters
impulse_ids –
list
ofstr
; impulse ID’s Returns
None

unary_nonparametric_coef_names
()[source]¶ Get list of names of nonparametric coefficients with no siblings dominated by node. Because unary splines are nonparametric, their coefficients are fixed at 1. Trainable coefficients are therefore perfectly confounded with the spline parameters. Splines dominating multiple coefficients are excepted, since the same kernel shape must be scaled in different ways.
 Returns
list
ofstr
; names of unary spline coefficients dominated by node.

class
cdr.formula.
Impulse
(name, ops=None)[source]¶ Bases:
object
Data structure representing an impulse in a CDR model.
 Parameters
name –
str
; name of impulseops –
list
ofstr
, orNone
; ops to apply to impulse. IfNone
, no ops.

categorical
(X)[source]¶ Checks whether impulse is categorical in a dataset
 Parameters
X – list
pandas
tables; data to to check. Returns
bool
;True
if impulse is categorical in X,False
otherwise.

class
cdr.formula.
ImpulseInteraction
(impulses, ops=None)[source]¶ Bases:
object
Data structure representing an interaction of impulsealigned variables (impulses) in a CDR model.
 Parameters
impulses –
list
ofImpulse
; impulses to interact.ops –
list
ofstr
, orNone
; ops to apply to interaction. IfNone
, no ops.

expand_categorical
(X)[source]¶ Expand any categorical predictors in X into 1hot columns.
 Parameters
X – list of
pandas
tables; input data. Returns
3tuple of
pandas
table,list
ofImpulseInteraction
,list
oflist
ofImpulse
; expanded data, list of expandedImpulseInteraction
objects, list of lists of expandedImpulse
objects, one list for each interaction.

class
cdr.formula.
ResponseInteraction
(responses, rangf=None)[source]¶ Bases:
object
Data structure representing an interaction of responsealigned variables (containing at least one IRFconvolved impulse) in a CDR model.
 Parameters
responses –
list
of terminalIRFNode
,Impulse
, and/orImpulseInteraction
objects; responses to interact.rangf –
str
or list ofstr
; random grouping factors for which to build random effects for this interaction.

add_rangf
(rangf)[source]¶ Add random grouping factor name to this interaction.
 Parameters
rangf –
str
; random grouping factor name Returns
None

contains_member
(x)[source]¶ Check if object is a member of the set of responses belonging to this interaction
 Parameters
x –
IRFNode
,Impulse
, and/orImpulseInteraction
object; object to check. Returns
bool
; whether x is a member of the set of responses

irf_responses
()[source]¶ Get list of IRFs dominated by interaction.
 Returns
list
ofIRFNode
objects; terminal IRFs dominated by interaction.

non_irf_responses
()[source]¶ Get list of nonIRF responsealigned variables dominated by interaction.
 Returns
list
ofImpulse
and/orImpulseInteraction
objects; nonIRF variables dominated by interaction.

cdr.formula.
pythonize_string
(s)[source]¶ Convert string to valid python variable name
 Parameters
s –
str
; source string Returns
str
; pythonized string

cdr.formula.
standardize_formula_string
(s)[source]¶ Standardize a formula string, removing notational variation. IRF specifications
C(...)
are sorted alphabetically by the IRF call name e.g.Gamma()
. The order of impulses within an IRF specification is preserved. Parameters
s –
str
; the formula string to be standardized Returns
str
; standardization of s
cdr.io module¶

cdr.io.
read_tabular_data
(X_paths, Y_paths, series_ids, categorical_columns=None, sep=' ', verbose=True)[source]¶ Read impulse and response data into pandas dataframes and perform basic preprocessing.
 Parameters
X_paths –
str
orlist
ofstr
; path(s) to impulse (predictor) data (multiple tables are concatenated). Each path may also be a;
delimited list of paths to files containing predictors with different timestamps, where the predictors in each file all share the same set of timestamps.Y_paths –
str
orlist
ofstr
; path(s) to response data (multiple tables are concatenated). Each path may also be a;
delimited list of paths to files containing different response variables with different timestamps, where the response variables in each file all share the same set of timestamps.series_ids –
list
ofstr
; column names whose jointly unique values define unique time series.categorical_columns –
list
ofstr
; column names that should be treated as categorical.sep –
str
; string representation of field delimiter in input data.verbose –
bool
; whether to log progress to stderr.
 Returns
2tuple of list(
pandas
DataFrame); (impulse data, response data). X and Y each have one element for each dataset in X_paths/Y_paths, each containing the columnwise concatenation of all column files in the path.
cdr.kwargs module¶

class
cdr.kwargs.
Kwarg
(key, default_value, dtypes, descr, aliases=None, default_value_cdrnn='same', suppress=False)[source]¶ Bases:
object
Data structure for storing keyword arguments and their docstrings.
 Parameters
key –
str
; Keydefault_value – Any; Default value
dtypes –
list
orclass
; List of classes or single class. Members can also be specific required values, eitherNone
or values of typestr
.descr –
str
; Description of kwargdefault_value_cdrnn – Any; Default value for CDRNN if distinct from CDR. If
'same'
, CDRNN uses default_value.suppress –
bool
; Whether to print documentation for this kwarg. Useful for hiding deprecated or littleused kwargs in order to simplify autodoc output.

dtypes_str
()[source]¶ String representation of dtypes permitted for kwarg.
 Returns
str
; dtypes string.

get_type_name
(x)[source]¶ String representation of name of a dtype
 Parameters
x – dtype; the dtype to name.
 Returns
str
; name of dtype.

in_settings
(settings)[source]¶ Check whether kwarg is specified in a settings object parsed from a config file.
 Parameters
settings – settings from a
ConfigParser
object. Returns
bool
; whether kwarg is found in settings.

kwarg_from_config
(settings, is_cdrnn=False)[source]¶ Given a settings object parsed from a config file, return value of kwarg cast to appropriate dtype. If missing from settings, return default.
 Parameters
settings – settings from a
ConfigParser
object.is_cdrnn –
bool
; whether this is for a CDRNN model.
 Returns
value of kwarg

cdr.kwargs.
cdr_kwarg_docstring
()[source]¶ Generate docstring snippet summarizing all CDR kwargs, dtypes, and defaults.
 Returns
str
; docstring snippet
cdr.opt module¶
cdr.plot module¶

class
cdr.plot.
MidpointNormalize
(vcenter=0.0, vmin=None, vmax=None, clip=False)[source]¶ Bases:
matplotlib.colors.Normalize

cdr.plot.
plot_heatmap
(m, row_names, col_names, dir='.', filename='eigenvectors.png', plot_x_inches=7, plot_y_inches=5, cmap='Blues')[source]¶ Plot a heatmap. Used in CDR for visualizing eigenvector matrices in principal components models.
 Parameters
m – 2D
numpy
array; source data for plot.row_names –
list
ofstr
; row names.col_names –
list
ofstr
; column names.dir –
str
; output directory.filename –
str
; filename.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).
 Returns
None

cdr.plot.
plot_irf
(plot_x, plot_y, irf_names, lq=None, uq=None, density=None, sort_names=True, prop_cycle_length=None, prop_cycle_map=None, dir='.', filename='irf_plot.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, ylim=None, cmap='gist_rainbow', legend=True, xlab=None, ylab=None, use_line_markers=False, transparent_background=False, dpi=300, dump_source=False)[source]¶ Plot impulse response functions.
 Parameters
plot_x –
numpy
array with shape (T,1); time points for which to plot the response. For example, if the plots contain 1000 points from 0s to 10s, plot_x could be generated asnp.linspace(0, 10, 1000)
.plot_y –
numpy
array with shape (T, N); response of each IRF at each time point.irf_names –
list
ofstr
; CDR ID’s of IRFs in the same order as they appear in axis 1 of plot_y.lq –
numpy
array with shape (T, N), orNone
; lower bound of credible interval for each time point. IfNone
, no credible interval will be plotted.uq –
numpy
array with shape (T, N), orNone
; upper bound of credible interval for each time point. IfNone
, no credible interval will be plotted.sort_names –
bool
; alphabetically sort IRF names.prop_cycle_length –
int
orNone
; Length of plotting properties cycle (defines step size in the color map). IfNone
, inferred from irf_names.prop_cycle_map –
list
ofint
, orNone
; Integer indices to use in the properties cycle for each entry in irf_names. IfNone
, indices are automatically assigned.dir –
str
; output directory.filename –
str
; filename.irf_name_map –
dict
ofstr
tostr
; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.ylim – 2element
tuple
orlist
; (lower_bound, upper_bound) to use for y axis. IfNone
, automatically inferred.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).legend –
bool
; include a legend.xlab –
str
orNone
; xaxis label. IfNone
, no label.ylab –
str
orNone
; yaxis label. IfNone
, no label.use_line_markers –
bool
; add markers to IRF lines.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.dpi –
int
; dots per inch.dump_source –
bool
; Whether to dump the plot source array to a csv file.
 Returns
None

cdr.plot.
plot_qq
(theoretical, actual, actual_color='royalblue', expected_color='firebrick', dir='.', filename='qq_plot.png', plot_x_inches=6, plot_y_inches=4, legend=True, xlab='Theoretical', ylab='Empirical', ticks=True, as_lines=False, transparent_background=False, dpi=300)[source]¶ Generate quantilequantile plot.
 Parameters
theoretical –
numpy
array with shape (T,); theoretical error quantiles.actual –
numpy
array with shape (T,); empirical errors.actual_color –
str
; color for actual values.expected_color –
str
; color for expected values.dir –
str
; output directory.filename –
str
; filename.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.legend –
bool
; include a legend.xlab –
str
orNone
; xaxis label. IfNone
, no label.ylab –
str
orNone
; yaxis label. IfNone
, no label.as_lines –
bool
; render QQ plot using lines. Otherwise, use points.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.dpi –
int
; dots per inch.
 Returns
None

cdr.plot.
plot_surface
(x, y, z, lq=None, uq=None, density=None, bounds_as_surface=False, dir='.', filename='surface.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, xlim=None, ylim=None, zlim=None, plot_type='wireframe', cmap='coolwarm', xlab=None, ylab=None, zlab='Response', title=None, transparent_background=False, dpi=300, dump_source=False)[source]¶ Plot an IRF or interaction surface.
 Parameters
x –
numpy
array with shape (M,N); x locations for each plot point, copied N times.y –
numpy
array with shape (M,N); y locations for each plot point, copied M times.z –
numpy
array with shape (M,N); z locations for each plot point.lq –
numpy
array with shape (M,N), orNone
; lower bound of credible interval for each plot point. IfNone
, no credible interval will be plotted.uq –
numpy
array with shape (M,N), orNone
; upper bound of credible interval for each plot point. IfNone
, no credible interval will be plotted.bounds_as_surface –
bool
; whether to plot interval bounds using additional surfaces. IfFalse
, bounds are plotted with vertical error bars instead. Ignored if lq, uq areNone
.dir –
str
; output directory.filename –
str
; filename.irf_name_map –
dict
ofstr
tostr
; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.xlim – 2element
tuple
orlist
orNone
; (lower_bound, upper_bound) to use for x axis. IfNone
, automatically inferred.ylim – 2element
tuple
orlist
orNone
; (lower_bound, upper_bound) to use for y axis. IfNone
, automatically inferred.zlim – 2element
tuple
orlist
orNone
; (lower_bound, upper_bound) to use for z axis. IfNone
, automatically inferred.plot_type –
str
; name of plot type to generate. One of["contour", "surf", "trisurf"]
.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).legend –
bool
; include a legend.xlab –
str
orNone
; xaxis label. IfNone
, no label.ylab –
str
orNone
; yaxis label. IfNone
, no label.zlab –
str
orNone
; zaxis label. IfNone
, no label.use_line_markers –
bool
; add markers to IRF lines.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.dpi –
int
; dots per inch.dump_source –
bool
; Whether to dump the plot source array to a csv file.
 Returns
None
cdr.signif module¶

cdr.signif.
correlation_test
(y, x1, x2, nested=False, verbose=True)[source]¶ Perform a parametric test of difference in correlation with observations between two prediction vectors, based on Steiger (1980).
 Parameters
y –
numpy
vector; observation vector.x1 –
numpy
vector; first prediction vector.x2 –
numpy
vector; second prediction vector.nested –
bool
; assume that the second model is nested within the first.verbose –
bool
; report progress logs to standard error.
 Returns

cdr.signif.
permutation_test
(a, b, n_iter=10000, n_tails=2, mode='loss', nested=False, verbose=True)[source]¶ Perform a paired permutation test for significance.
 Parameters
a –
numpy
vector; first error/loss/prediction vector.b –
numpy
vector; second error/loss/prediction vector.n_iter –
int
; number of resampling iterations.n_tails –
int
; number of tails.mode –
str
; one of["mse", "loglik"]
, the type of error used (SE’s are averaged while loglik’s are summed).nested –
bool
; assume that the second model is nested within the first.verbose –
bool
; report progress logs to standard error.
 Returns
cdr.synth module¶

class
cdr.synth.
SyntheticModel
(n_pred, irf_name, irf_params=None, coefs=None)[source]¶ Bases:
object
A data structure representing a synthetic “true” model for empirical validation of CDR fits. Contains a randomly generated set of IRFs that can be used to convolve data, and provides methods for sampling data with particular structure and convolving it with the true IRFs in order to generate a response vector.
 Parameters
n_pred –
int
; Number of predictors in the synthetic model.irf_name –
str
; Name of IRF kernel to use. One of['Exp', 'Normal', 'Gamma', 'ShiftedGamma']
.irf_params –
dict
orNone
; Dictionary of IRF parameters to use, with parameter names as keys and numeric arrays as values. Values must each have n_pred cells. IfNone
, parameter values will be randomly sampled.coefs – numpy array or
None
; Vector of coefficients to use, wherelen(coefs) == n_pred
. IfNone
, coefficients will be randomly sampled.

convolve
(X, t_X, t_y, history_length=None, err_sd=None, allow_instantaneous=True, verbose=True)[source]¶ Convolve data using the model’s IRFs.
 Parameters
X – numpy array; 2D array of predictors.
t_X – numpy array; 1D vector of predictor timestamps.
t_y – numpy array; 1D vector of response timestamps.
history_length –
int
orNone
; Drop preceding events more thanhistory_length
steps into the past. IfNone
, no history clipping.err_sd –
float
orNone
; Standard deviation of Gaussian noise to inject into responses. IfNone
, use the empirical standard deviation of the response vector.allow_instantaneous –
bool
; Whether to compute responses whent==0
.verbose –
bool
; Verbosity.
 Returns
(2D numpy array, 1D numpy array); Matrix of convolved predictors, vector of responses

convolve_v2
(X, t_X, t_y, err_sd=None, allow_instantaneous=True, verbose=True)[source]¶ Convolve data using the model’s IRFs. Alternate memoryintensive implementation that is faster for small arrays but can exhaust resources for large ones.
 Parameters
X – numpy array; 2D array of predictors.
t_X – numpy array; 1D vector of predictor timestamps.
t_y – numpy array; 1D vector of response timestamps.
err_sd –
float
; Standard deviation of Gaussian noise to inject into responses.allow_instantaneous –
bool
; Whether to compute responses whent==0
.verbose –
bool
; Verbosity.
 Returns
(2D numpy array, 1D numpy array); Matrix of convolved predictors, vector of responses

get_curves
(n_time_units=None, n_time_points=None)[source]¶ Extract response curves as an array.
 Parameters
n_time_units –
float
; Number of units of time over which to extract curves.n_time_points –
int
; Number of samples to extract for each curve (resolution of curve)
 Returns
numpy array; 2D numpy array with shape
[T, K]
, whereT
is n_time_points andK
is the number of predictors in the model.

irf
(x, coefs=False)[source]¶ Computes the values of the model’s IRFs elementwise over a vector of timepoints.
 Parameters
x – numpy array; 1D array with shape
[N]
containing timepoints at which to query the IRFs.coefs –
bool
; Whether to rescale responses by coefficients
 Returns
numpy array; 2D array with shape
[N, K]
containing values of the model’sK
IRFs evaluated at the timepoints in x.

plot_irf
(n_time_units=None, n_time_points=None, dir='.', filename='synth_irf.png', plot_x_inches=6, plot_y_inches=4, cmap='gist_rainbow', legend=False, xlab=None, ylab=None, use_line_markers=False, transparent_background=False)[source]¶ Plot impulse response functions.
 Parameters
n_time_units –
float
; number if time units to use for plotting.n_time_points –
int
; number of points to use for plotting.dir –
str
; output directory.filename –
str
; filename.plot_x_inches –
float
; width of plot in inches.plot_y_inches –
float
; height of plot in inches.cmap –
str
; name ofmatplotlib
cmap
object (determines colors of plotted IRF).legend –
bool
; include a legend.xlab –
str
orNone
; xaxis label. IfNone
, no label.ylab –
str
orNone
; yaxis label. IfNone
, no label.use_line_markers –
bool
; add markers to IRF lines.transparent_background –
bool
; use a transparent background. IfFalse
, uses a white background.
 Returns
None

sample_data
(m, n=None, X_interval=None, y_interval=None, rho=None, align_X_y=True)[source]¶ Samples synthetic predictors and time vectors
 Parameters
m –
int
; Number of predictors.n –
int
; Number of response query points.X_interval –
str
,float
,list
,tuple
, orNone
; Predictor interval model. IfNone
, predictor offsets are randomly sampled from an exponential distribution with parameter1
. Iffloat
, predictor offsets are evenly spaced with interval X_interval. Iflist
ortuple
, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.y_interval –
str
,float
,list
,tuple
, orNone
; Response interval model. IfNone
, response offsets are randomly sampled from an exponential distribution with parameter1
. Iffloat
, response offsets are evenly spaced with interval y_interval. Iflist
ortuple
, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.rho –
float
; Level of pairwise correlation between predictors.align_X_y –
bool
; Whether predictors and responses are required to be sampled at the same points in time.
 Returns
(2D numpy array, 1D numpy array, 1D numpy array); Matrix of predictors, vector of predictor timestamps, vector of response timestamps
cdr.util module¶

cdr.util.
filter_models
(names, filters, cdr_only=False)[source]¶ Return models contained in names that are permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name. If
filters
is zerolength, returns names. Parameters
names –
list
ofstr
; pool of model names to filter.filters –
list
of{str, SRE_Pattern}
; filters to apply in ordercdr_only –
bool
; ifTrue
, only returns CDR models. IfFalse
, returns all models admitted by filters.
 Returns
list
ofstr
; names in names that pass at least one filter, or all of names if no filters are applied.

cdr.util.
filter_names
(names, filters)[source]¶ Return elements of names permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name.
 Parameters
names –
list
ofstr
; pool of names to filter.filters –
list
of{str, SRE_Pattern}
; filters to apply in order
 Returns
list
ofstr
; names in names that pass at least one filter

cdr.util.
get_random_permutation
(n)[source]¶ Draw a random permutation of integers 0 to n. Used to shuffle arrays of length n. For example, a permutation and its inverse can be generated by calling
p, p_inv = get_random_permutation(n)
. To randomly shuffle an ndimensional vectorx
, callx[p]
. To unshufflex
after it has already been shuffled, callx[p_inv]
. Parameters
n – maximum value
 Returns
2tuple of
numpy
arrays; the permutation and its inverse

cdr.util.
load_cdr
(dir_path)[source]¶ Convenience method for reconstructing a saved CDR object. First loads in metadata from
m.obj
, then uses that metadata to construct the computation graph. Then, if saved weights are found, these are loaded into the graph. Parameters
dir_path – Path to directory containing the CDR checkpoint files.
 Returns
The loaded CDR instance.

cdr.util.
mae
(true, preds)[source]¶ Compute mean absolute error (MAE).
 Parameters
true – True values
preds – Predicted values
 Returns
float
; MAE

cdr.util.
mse
(true, preds)[source]¶ Compute mean squared error (MSE).
 Parameters
true – True values
preds – Predicted values
 Returns
float
; MSE

cdr.util.
names2ix
(names, l, dtype=<class 'numpy.int32'>)[source]¶ Generate 1D numpy array of indices in l corresponding to names in names
 Parameters
names –
list
ofstr
; names to look up in ll –
list
ofstr
; list of names from which to extract indicesdtype –
numpy
dtype object; return dtype
 Returns
numpy
array; indices of names in l

cdr.util.
nested
(model_name_1, model_name_2)[source]¶ Check whether two CDR models are nested with 1 degree of freedom
 Parameters
model_name_1 –
str
; name of first modelmodel_name_2 –
str
; name of second model
 Returns
bool
;True
if models are nested with 1 degree of freedom,False
otherwise

cdr.util.
pca
(X, n_dim=None, dtype=<class 'numpy.float32'>)[source]¶ Perform principal components analysis on a data table.
 Parameters
X –
numpy
orpandas
array; the input datan_dim –
int
orNone
; maximum number of principal components. IfNone
, all components are retained.dtype –
numpy
dtype; return dtype
 Returns
5tuple of
numpy
arrays; transformed data, eigenvectors, eigenvalues, input means, and input standard deviations

cdr.util.
percent_variance_explained
(true, preds)[source]¶ Compute percent variance explained.
 Parameters
true – True values
preds – Predicted values
 Returns
float
; percent variance explained