CDR Package API
Complete API for all public classes and methods in this package.
cdr.backend module
cdr.config module
- class cdr.config.Config(path)[source]
Bases:
objectParses an *.ini file and stores settings needed to define a set of CDR experiments.
- Parameters:
path – Path to *.ini file
- build_cdr_settings(settings, add_defaults=True, global_settings=None, is_cdr=True, is_cdrnn=False)[source]
Given a settings object parsed from a config file, compute CDR parameter dictionary.
- Parameters:
settings – settings from a
ConfigParserobject.add_defaults –
bool; whether to add default settings not explicitly specified in the config.global_settings –
dictorNone; dictionary of global defaults for parameters missing from settings.is_cdr –
bool; whether this is a CDR(NN) model.is_cdrnn –
bool; whether this is a CDRNN model.
- Returns:
dict; dictionary of settings key-value pairs.
cdr.data module
- cdr.data.add_responses(names, y)[source]
Add response variable(s) to a dataframe, applying any preprocessing required by the formula string.
- Parameters:
names –
strorlistofstr; name(s) of dependent variable(s)y –
pandasDataFrame; response data.
- Returns:
pandasDataFrame; response data with any missing ops applied.
- cdr.data.build_CDR_impulse_data(X, first_obs, last_obs, X_in_Y_names=None, X_in_Y=None, impulse_names=None, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]
Construct impulse data arrays in the required format for CDR fitting/evaluation for a single response array.
- Parameters:
X –
listofpandastables; impulse (predictor) data.first_obs –
listof index vectors (list,pandasseries, ornumpyvector) of first observations; the list contains vectors of row indices, one for each element of X, of the first impulse in the time series associated with the response. IfNone, inferred from Y.last_obs –
listof index vectors (list,pandasseries, ornumpyvector) of last observations; the list contains vectors of row indices, one for each element of X, of the last impulse in the time series associated with the response. IfNone, inferred from Y.X_in_Y_names –
listofstr; names of predictors contained in Y rather than X. IfNone, no such predictors.X_in_Y –
pandasDataFrameorNone; table of predictors contained in Y rather than X. IfNone, no such predictors.impulse_names –
listofstr; names of columns in X to be used as impulses by the model. IfNone, all columns returned.history_length –
int; maximum number of history (backward) observations.future_length –
int; maximum number of future (forward) observations.int_type –
str; name of int type.float_type –
str; name of float type.
- Returns:
triple of
numpyarrays; let N, T, I, R respectively be the number of rows in Y, history length, number of impulse dimensions, and number of response dimensions. Outputs are (1) impulses with shape (N, T, I), (2) impulse timestamps with shape (N, T, I), and impulse mask with shape (N, T, I).
- cdr.data.build_CDR_response_data(responses, Y=None, first_obs=None, last_obs=None, Y_time=None, Y_gf=None, X_in_Y_names=None, X_in_Y=None, Y_category_map=None, response_to_df_ix=None, gf_names=None, gf_map=None)[source]
Construct response data arrays in the required format for CDR fitting/evaluation for one or more response arrays.
- Parameters:
responses –
listofstr; names of columns in Y to be used as responses (dependent variables) by the model.Y –
listofpandastables, orNone; response data. IfNone, does not return a response array.first_obs –
listoflistof index vectors (list,pandasseries, ornumpyvector) of first observations, orNone; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the first impulse in the time series associated with each response. IfNone, inferred from Y.last_obs –
listoflistof index vectors (list,pandasseries, ornumpyvector) of last observations, orNone; the list contains one element for each response array. Inner lists contain vectors of row indices, one for each element of X, of the last impulse in the time series associated with each response. IfNone, inferred from Y.Y_time –
listof response timestamp vectors (list,pandasseries, ornumpyvector), orNone; vector(s) of response timestamps, one for each response array. Needed to timestamp any response-aligned predictors (ignored if none in model).Y_gf –
listofpandasDataFrame, orNone; vector(s) of response timestamps, one for each response array. Data frames containing random grouping factor levels, if applicable.X_in_Y_names –
listofstr; names of predictors contained in Y rather than X (must be present in all elements of Y). IfNone, no such predictors.X_in_Y –
listofpandasDataFrameorNone; tables (one per response array) of predictors contained in Y rather than X (must be present in all elements of Y). IfNone, no such predictors.Y_category_map –
dictorNone; map from category labels to integers for each categorical response.response_to_df_ix –
dictorNone; map from response names to lists of indices of the response files that contain them.gf_names –
listorNone; list of names of random grouping factor variables. IfNoneand Y_gf provided, will use all columns of Y_gf.gf_map –
listofdictorNone; list maps from random grouping factor levels to their indices, one map per grouping factor variable in gf_names.
- Returns:
7-tuple of
numpyarrays; let N, R, XF, YF, Z, and K respectively be the number of rows (sum total number of rows in Y), number of response dimensions, number of distinct predictor files (X), number of distinct response files (Y), number of random grouping factor variables, and number of response_aligned predictors. Outputs are (1) responses with shape (N, R) orNoneif Y isNone, (2) an XF-tuple of first observation vectors indexing start indices for each entry in X, (3) a YF-tuple of first observation vectors indexing end indices for each entry in X, (4) response timestamps with shape (N,), (5) response masks (masking out any missing response variables per row) with shape (N, R), (6) random grouping factor matrix with shape (N, Z), orNoneif no random grouping factors provided, and (7) response-aligned predictors with shape (N, K).
- cdr.data.c(df)[source]
Zero-center pandas series or data frame
- Parameters:
df –
pandasSeriesorDataFrame; input date- Returns:
pandasSeriesorDataFrame; centered data
- cdr.data.compare_elementwise_perf(a, b, y=None, mode='err')[source]
Compare model performance elementwise.
- Parameters:
a –
numpyvector; vector of elementwise scores (or predictions if mode iscorr) for model a.b –
numpyvector; vector of elementwise scores (or predictions if mode iscorr) for model b.y –
numpyvector orNone; vector of observations. Used only if mode iscorr.mode –
str; Type of performance metric. One oferr,loglik, orcorr.
- Returns:
numpyvector; vector of elementwise performance differences
- cdr.data.compute_filter(y, field, cond)[source]
Compute filter given a field and condition
- Parameters:
y –
pandasDataFrame; response data.field –
str; name of column on whose values to filter.cond –
str; string representation of condition to use for filtering.
- Returns:
numpyvector; boolean mask to use forpandassubsetting operations.
- cdr.data.compute_filters(Y, filters=None)[source]
Compute filters given a filter map.
- Parameters:
Y –
pandasDataFrame; response data.filters –
list; list of key-value pairs mapping column names to filtering criteria for their values.
- Returns:
numpyvector; boolean mask to use forpandassubsetting operations.
- cdr.data.compute_partition(y, modulus, n)[source]
Given a
splitIDcolumn, use modular arithmetic to partition data into n subparts.- Parameters:
y –
pandasDataFrame; response data.modulus –
int; modulus to use for splitting, must be at least as large as n.n –
int; number of subparts in the partition.
- Returns:
listofnumpyvectors; one boolean vector per subpart of the partition, selecting only those elements of y that belong.
- cdr.data.compute_splitID(y, split_fields)[source]
Map tuples in columns designated by split_fields into integer ID to use for data partitioning.
- Parameters:
y –
pandasDataFrame; response data.split_fields –
listofstr; column names to use for computing split ID.
- Returns:
numpyvector; integer vector of split ID’s.
- cdr.data.compute_time_mask(X_time, first_obs, last_obs, history_length=128, future_length=0, int_type='int32', float_type='float32')[source]
Compute mask for expanded impulse data zeroing out non-existent impulses.
- Parameters:
X_time –
pandasSeries; timestamps associated with each impulse in X.first_obs –
pandasSeries; vector of row indices in X of the first impulse in the time series associated with each response.last_obs –
pandasSeries; vector of row indices in X of the last preceding impulse in the time series associated with each response.history_length –
int; maximum number of history (backward) observations.future_length –
int; maximum number of future (forward) observations.int_type –
str; name of int type.float_type –
str; name of float type.
- Returns:
numpyarray; boolean impulse mask.
- cdr.data.corr_cdr(X_2d, impulse_names, impulse_names_2d, time, time_mask)[source]
Compute correlation matrix, including correlations across time where necessitated by 2D predictors.
- Parameters:
X_2d –
numpyarray; the impulse data. Must be of shape(batch_len, history_length+future_length, n_impulses), can be computed from sources bybuild_CDR_impulse_data().impulse_names –
listofstr; names of columns in X_2d to be used as impulses by the model.impulse_names_2d –
listofstr; names of columns in X_2d that designate to 2D predictors.time – 3D
numpyarray; array of timestamps for each event in X_2d.time_mask – 3D
numpyarray; array of masks over padding events in X_2d.
- Returns:
pandasDataFrame; the correlation matrix.
- cdr.data.expand_impulse_sequence(X, X_time, first_obs, last_obs, window_length, int_type='int32', float_type='float32', fill=0.0)[source]
Expand out impulse stream in X for each response in the target data.
- Parameters:
X –
pandasDataFrame; impulse (predictor) data.X_time –
pandasSeries; timestamps associated with each impulse in X.first_obs –
pandasSeries; vector of row indices in X of the first impulse in the time series associated with each response.last_obs –
pandasSeries; vector of row indices in X of the last preceding impulse in the time series associated with each response.window_length –
int; number of steps in time dimension of outputint_type –
str; name of int type.float_type –
str; name of float type.fill –
float; fill value for padding cells.
- Returns:
3-tuple of
numpyarrays; the expanded impulse array, the expanded timestamp array, and a boolean mask zeroing out locations of non-existent impulses.
- cdr.data.filter_invalid_responses(Y, dv, crossval_factor=None, crossval_fold=None)[source]
Filter out rows with non-finite responses.
- Parameters:
Y –
pandastable orlistofpandastables; response data.dv –
strorlistofstr; name(s) of column(s) containing the dependent variable(s)crossval_factor –
strorNone; name of column containing the selection variable for cross validation. IfNone, no cross validation filtering.crossval_fold –
listorNone; list of valid values for cross-validation selection. Used only ifcrossval_factoris notNone.
- Returns:
2-tuple of
pandasDataFrameandpandasSeries; valid data and indicator vector used to filter out invalid data.
- cdr.data.get_first_last_obs_lists(y)[source]
Convenience utility to extract out all first_obs and last_obs columns in Y sorted by file index
- Parameters:
y –
pandasDataFrame; response data.- Returns:
pair of
listofstr; first_obs column names and last_obs column names
- cdr.data.get_rangf_array(Y, rangf_names, rangf_map)[source]
Collect random grouping factor indicators as
numpyinteger arrays that can be read by Tensorflow. Returns vertical concatenation of GF arrays from each element of Y.- Parameters:
Y –
pandastable orlistofpandastables; response data.rangf_names –
listofstr; names of columns containing random grouping factor levels (order is preserved, changing the order will change the resulting array).rangf_map –
listofdict; map for each random grouping factor from levels to unique indices.
- Returns:
- cdr.data.get_time_windows(X, Y, series_ids, forward=False, window_length=128, t_delta_cutoff=None, verbose=True)[source]
Compute row indices in X of initial and final impulses for each element of y. Assumes time series are already sorted by series_ids.
- Parameters:
X –
pandasDataFrame; impulse (predictor) data.Y –
pandasDataFrame; response data.series_ids –
listofstr; column names whose jointly unique values define unique time series.forward –
bool; whether to compute forward windows (future inputs) or backward windows (past inputs, used if forward isFalse).window_length –
int; maximum size of time window to consider. Ifnp.inf, no bound on window size.t_delta_cutoff –
floatorNone; maximum distance in time to consider (can help improve training stability on data with large gaps in time). If0orNone, no cutoff.verbose –
bool; whether to report progress to stderr
- Returns:
2-tuple of
numpyvectors; first and last impulse observations (respectively) for each response in y
- cdr.data.preprocess_data(X, Y, formula_list, series_ids, filters=None, history_length=128, future_length=0, t_delta_cutoff=None, all_interactions=False, verbose=True, debug=False)[source]
Preprocess CDR data.
- Parameters:
X – list of
pandastables; impulse (predictor) data.Y – list of
pandastables; response data.formula_list –
listofFormula; CDR formula for which to preprocess data.series_ids –
listofstr; column names whose jointly unique values define unique time series.filters –
list; list of key-value pairs mapping column names to filtering criteria for their values.history_length –
int; maximum number of history (backward) observations.future_length –
int; maximum number of future (forward) observations.t_delta_cutoff –
floatorNone; maximum distance in time to consider (can help improve training stability on data with large gaps in time). If0orNone, no cutoff.all_interactions –
bool; add powerset of all conformable interactions.verbose –
bool; whether to report progress to stderrdebug –
bool; print debugging information
- Returns:
7-tuple; predictor data, response data, filtering mask, response-aligned predictor names, response-aligned predictors, 2D predictor names, and 2D predictors
- cdr.data.s(df)[source]
Rescale pandas series or data frame by its standard deviation
- Parameters:
df –
pandasSeriesorDataFrame; input date- Returns:
pandasSeriesorDataFrame; rescaled data
- cdr.data.split_cdr_outputs(outputs, lengths)[source]
Takes a dictionary of arbitrary depth containing CDR outputs with their labels as keys and splits each output into a list of outputs with lengths corresponding to lengths. Useful for aligning CDR outputs to response files, since multiple response files can be provided, which are underlyingly concatenated by CDR. Recursively modifies the dict in place.
- Parameters:
outputs –
dictof arbitrary depth withnumpyarrays at the leaves; the source CDR outputslengths – array-like vector of lengths to split the outputs into
- Returns:
dict; same key-val structure as outputs but with each leaf split into a list oflen(lengths)vectors, one for each length value.
cdr.formula module
- class cdr.formula.Formula(bform_str, standardize=True)[source]
Bases:
objectA class for parsing R-style mixed-effects CDR model formula strings and applying them to CDR data matrices.
- Parameters:
bform_str –
str; an R-style mixed-effects CDR model formula string
- ablate_impulses(impulse_ids)[source]
Remove impulses in impulse_ids from fixed effects (retaining in any random effects).
- Parameters:
impulse_ids –
listofstr; impulse ID’s- Returns:
None
- apply_formula(X, Y, X_in_Y_names=None, all_interactions=False, series_ids=None)[source]
Extract all data and compute all transforms required by the model formula.
- Parameters:
X – list of
pandastables; impulse data.Y – list of
pandastables; response data.X_in_Y_names –
listorNone; List of column names for response-aligned predictors (predictors measured for every response rather than for every input) if applicable,Noneotherwise.all_interactions –
bool; add powerset of all conformable interactions.series_ids –
listofstrorNone; list of ids to use as grouping factors for lagged effects. IfNone, lagging will not be attempted.
- Returns:
triple; transformed X, transformed y, response-aligned predictor names
- apply_op(op, arr)[source]
Apply op op to array arr.
- Parameters:
op –
str; name of op.arr –
numpyorpandasarray; source data.
- Returns:
numpyarray; transformed data.
- apply_op_2d(op, arr, time_mask)[source]
Apply op to 2D predictor (predictor whose value depends on properties of the response).
- Parameters:
op –
str; name of op.arr –
numpyor array; source data.time_mask –
numpyarray; mask for padding cells
- Returns:
numpyarray; transformed data
- apply_ops(impulse, X)[source]
Apply all ops defined for an impulse
- Parameters:
impulse –
Impulseobject; the impulse.X – list of
pandastables; table containing the impulse data.
- Returns:
pandastable; table augmented with transformed impulse.
- apply_ops_2d(impulse, X_2d_predictor_names, X_2d_predictors, time_mask)[source]
Apply all ops defined for a 2D predictor (predictor whose value depends on properties of the response).
- Parameters:
impulse –
Impulseobject; the impulse.X_2d_predictor_names –
listofstr; names of 2D predictors.X_2d_predictors –
numpyarray; source data.time_mask –
numpyarray; mask for padding cells
- Returns:
2-tuple;
listof new predictor name,numpyarray of predictor values
- static bases(family)[source]
Get the number of bases of a spline kernel.
- Parameters:
family –
str; name of IRF family- Returns:
intorNone; number of bases of spline kernel, orNoneif family is not a spline.
- build(bform_str, standardize=True)[source]
Construct internal data from formula string
- Parameters:
bform_str –
str; source string.- Returns:
None
- categorical_transform(X)[source]
Get transformed formula with categorical predictors in X expanded.
- Parameters:
X – list of
pandastables; input data.- Returns:
Formula; transformedFormulaobject
- compute_2d_predictor(predictor_name, X, first_obs, last_obs, history_length=128, future_length=None, minibatch_size=50000)[source]
Compute 2D predictor (predictor whose value depends on properties of the most recent impulse).
- Parameters:
predictor_name –
str; name of predictorX –
pandastable; input datafirst_obs –
pandasSeriesor 1Dnumpyarray; row indices inXof the start of the series associated with each regression target.last_obs –
pandasSeriesor 1Dnumpyarray; row indices inXof the most recent observation in the series associated with each regression target.minibatch_size –
int; minibatch size for computing predictor, can help with memory footprint
- Returns:
2-tuple; new predictor name,
numpyarray of predictor values
- initialize_nns()[source]
Initialize a dictionary mapping ids to metadata for all NN components in this CDR model
- Returns:
dict; mapping from NNstrid toNNobject storing metadata for that NN.
- insert_impulses(impulses, irf_str, rangf=None)[source]
Insert impulses in impulse_ids into fixed effects and all random terms.
- Parameters:
impulse_ids –
listofstr; impulse ID’s- Returns:
None
- static irf_params(family)[source]
Return list of parameter names for a given IRF family.
- Parameters:
family –
str; name of IRF family- Returns:
listofstr; parameter names
- static is_LCG(family)[source]
Check whether a kernel is LCG.
- Parameters:
family –
str; name of IRF family- Returns:
bool; whether the kernel is LCG (linear combination of Gaussians)
- pc_transform(n_pc, pointers=None)[source]
Get transformed formula with impulses replaced by principal components.
- Parameters:
n_pc –
int; number of principal components in transform.pointers –
dict; map from source nodes to transformed nodes.
- Returns:
listofIRFNode; tree forest representing current state of the transform.
- process_ast(t, terms=None, has_intercept=None, ops=None, rangf=None, impulses_by_name=None, interactions_by_name=None, under_irf=False, under_interaction=False)[source]
Recursively process a node of the Python abstract syntax tree (AST) representation of the formula string and insert data into internal representation of model formula.
- Parameters:
t – AST node.
terms –
listorNone; CDR terms computed so far, orNoneif no CDR terms computed.has_intercept –
dict; map from random grouping factors to boolean values representing whether that grouping factor has a random intercept.Noneis used as a key to refer to the population-level intercept.ops –
list; names of ops computed so far, orNoneif no ops computed.rangf –
strorNone; name of rangf for random term currently being processed, orNoneif currently processing fixed effects portion of model.
- Returns:
None
- process_irf(t, input_irf, ops=None, rangf=None, nn_inputs=None, impulses_by_name=None, interactions_by_name=None)[source]
Process data from AST node representing part of an IRF definition and insert data into internal representation of the model.
- Parameters:
t – AST node.
input_irf –
IRFNode,Impulse,InterationImpulse, orNNImpulseobject; child IRF of current nodeops –
listofstr, orNone; ops applied to IRF. IfNone, no ops appliedrangf –
strorNone; name of rangf for random term currently being processed, orNoneif currently processing fixed effects portion of model.nn_inputs –
tupleorNone; tuple of input impulses to neural network IRF, orNoneif not a neural network IRF.
- Returns:
IRFNodeobject; the IRF node
- re_transform(X)[source]
Get transformed formula with regex predictors expanded based on matches to the columns in X.
- Parameters:
X – list of
pandastables; input data.- Returns:
Formula; transformedFormulaobject
- remove_impulses(impulse_ids)[source]
Remove impulses in impulse_ids from the model (both fixed and random effects).
- Parameters:
impulse_ids –
listofstr; impulse ID’s- Returns:
None
- response_names()[source]
Get list of names modeled response variables.
- Returns:
listofstr; names modeled response variables.
- responses()[source]
Get list of modeled response variables.
- Returns:
listofImpulse; modeled response variables.
- to_lmer_formula_string(z=False, correlated=True)[source]
Generate an
lme4-style LMER model string representing the structure of the current CDR model. Useful for 2-step analysis in which data are transformed using CDR, then fitted using LME.- Parameters:
z –
bool; z-transform convolved predictors.correlated –
bool; whether to use correlated random intercepts and slopes.
- Returns:
str; the LMER formula string.
- class cdr.formula.IRFNode(family=None, impulse=None, p=None, irfID=None, coefID=None, ops=None, fixed=True, rangf=None, nn_impulses=None, nn_config=None, impulses_as_inputs=True, inputs_to_add=None, inputs_to_drop=None, param_init=None, trainable=None, response_params_list=None)[source]
Bases:
objectData structure representing a node in a CDR IRF tree. For more information on how the CDR IRF structure is encoded as a tree, see the reference on CDR IRF trees.
- Parameters:
family –
str; name of IRF kernel family.impulse –
Impulseobject orNone; the impulse if terminal, elseNone.p –
IRFNodeobject orNone; the parent IRF node, orNoneif no parent (parent nodes can be connected after initialization).irfID –
strorNone; string ID of node if applicable. IfNone, automatically-generated ID will discribe node’s family and structural position.coefID –
strorNone; string ID of coefficient if applicable. IfNone, automatically-generated ID will discribe node’s family and structural position. Only applicable to terminal nodes, so this property will not be used if the node is non-terminal.ops –
listofstr, orNone; ops to apply to IRF node. IfNone, no ops.fixed –
bool; Whether node exists in the model’s fixed effects structure.rangf –
listofstr,str, orNone; names of any random grouping factors associated with the node.nn_impulses –
tupleorNone; tuple of input impulses to neural network IRF, orNoneif not a neural network IRF.nn_config –
dictorNone; dictionary of settings for NN IRF component.impulses_as_inputs –
bool; whether to include impulses in input of a neural network IRF.inputs_to_add –
listofImpulse/NNImpulseorNone; list of impulses to add to input of neural network IRF.inputs_to_drop –
listofImpulse/NNImpulseorNone; list of impulses to remove from input of neural network IRF (keeping them in output).param_init –
dict; map from parameter names to initial values, which will also be used as prior means.trainable –
listofstr, orNone; trainable parameters at this node. IfNone, all parameters are trainable.response_params_list –
listof 2-tupleofstr, orNone; Response distribution parameters modeled by this IRF, with each parameter represented as a pair (DIST_NAME, PARAM_NAME). DIST_NAME can beNone, in which case the IRF will apply to any distribution parameter matching PARAM_NAME.
- ablate_impulses(impulse_ids)[source]
Remove impulses in impulse_ids from fixed effects (retaining in any random effects).
- Parameters:
impulse_ids –
listofstr; impulse ID’s- Returns:
None
- add_child(t)[source]
Add child to this node in the IRF tree
- Parameters:
t –
IRFNode; child node.- Returns:
IRFNode; child node with updated parent.
- add_interactions(response_interactions)[source]
Add a ResponseInteraction object (or list of them) to this node.
- Parameters:
response_interaction –
ResponseInteractionorlistofResponseInteraction; response interaction(s) to add- Returns:
None
- add_rangf(rangf)[source]
Add random grouping factor name to this node.
- Parameters:
rangf –
str; random grouping factor name- Returns:
None
- atomic_irf_by_family()[source]
Get map from IRF kernel family names to list of IDs of IRFNode instances belonging to that family.
- Returns:
dictfromstrtolistofstr; IRF IDs by family.
- atomic_irf_param_init_by_family()[source]
Get map from IRF kernel family names to maps from IRF IDs to maps from IRF parameter names to their initialization values.
- Returns:
dict; parameter initialization maps by family.
- atomic_irf_param_trainable_by_family()[source]
Get map from IRF kernel family names to maps from IRF IDs to lists of trainable parameters.
- Returns:
dict; trainable parameter maps by family.
- bases()[source]
Get the number of bases of node.
- Returns:
intorNone; number of bases of node, orNoneif node is not a spline.
- categorical_transform(X, expansion_map=None)[source]
Generate transformed copy of node with categorical predictors in X expanded. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length-1 list representing a single-tree forest, in which case the transformed tree is accessible as the 0th element.
- Parameters:
X – list of
pandastables; input data.expansion_map –
dict; Internal variable. Do not use.
- Returns:
listofIRFNode; tree forest representing current state of the transform.
- coef2impulse()[source]
Get map from coefficient IDs dominated by node to lists of corresponding impulses.
- Returns:
dict; map from coefficient IDs to lists of corresponding impulses.
- coef2terminal()[source]
Get map from coefficient IDs dominated by node to lists of corresponding terminal IRF nodes.
- Returns:
dict; map from coefficient IDs to lists of corresponding terminal IRF nodes.
- coef_by_rangf()[source]
Get map from random grouping factor names to associated coefficient IDs dominated by node.
- Returns:
dict; map from random grouping factor names to associated coefficient IDs.
- coef_id()[source]
Get coefficient ID for this node.
- Returns:
strorNone; coefficient ID, orNoneif non-terminal.
- coef_names()[source]
Get list of names of coefficients dominated by node.
- Returns:
listofstr; names of coefficients dominated by node.
- fixed_coef_names()[source]
Get list of names of fixed coefficients dominated by node.
- Returns:
listofstr; names of fixed coefficients dominated by node.
- fixed_interaction_names()[source]
Get list of names of fixed interactions dominated by node.
- Returns:
listofstr; names of fixed interactions dominated by node.
- formula_terms()[source]
Return data structure representing formula terms dominated by node, grouped by random grouping factor. Key
Nonerepresents the fixed portion of the model (no random grouping factor).- Returns:
dict; map from random grouping factors to data structure representing formula terms. Data structure contains 2 fields,'impulses'containing impulses and'irf'containing IRF Nodes.
- has_coefficient(rangf)[source]
Report whether rangf has any coefficients in this subtree
- Parameters:
rangf – Random grouping factor
- Returns:
bool: Whether rangf has any coefficients in this subtree
- has_composed_irf()[source]
Check whether node dominates any IRF compositions.
- Returns:
bool, whether node dominates any IRF compositions.
- has_irf(rangf)[source]
Report whether rangf has any IRFs in this subtree
- Parameters:
rangf – Random grouping factor
- Returns:
bool: Whether rangf has any IRFs in this subtree
- impulse2coef()[source]
Get map from impulses dominated by node to lists of corresponding coefficient IDs.
- Returns:
dict; map from impulses to lists of corresponding coefficient IDs.
- impulse2terminal()[source]
Get map from impulses dominated by node to lists of corresponding terminal IRF nodes.
- Returns:
dict; map from impulses to lists of corresponding terminal IRF nodes.
- impulse_names(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]
Get list of names of impulses dominated by node.
- Parameters:
include_interactions –
bool; whether to return impulses defined by interaction terms.include_nn –
bool; whether to return NN transformations of impulses.include_nn_inputs –
bool; whether to return input impulses to NN transformations.
- Returns:
listofstr; names of impulses dominated by node.
- impulse_set(include_interactions=False, include_nn=False, include_nn_inputs=True, out=None)[source]
Get set of impulses dominated by node.
- Parameters:
include_interactions –
bool; whether to return impulses defined by interaction terms.include_nn –
bool; whether to return NN transformations of impulses.include_nn_inputs –
bool; whether to return input impulses to NN transformations.
:param
setorNone; initial dictionary to modify.- Returns:
listofImpulse; impulses dominated by node.
- impulses(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]
Get alphabetically sorted list of impulses dominated by node.
- Parameters:
include_interactions –
bool; whether to return impulses defined by interaction terms.include_nn –
bool; whether to return NN transformations of impulses.include_nn_inputs –
bool; whether to return input impulses to NN transformations.
- Returns:
listofImpulse; impulses dominated by node.
- impulses_by_name(include_interactions=False, include_nn=False, include_nn_inputs=True)[source]
Get dictionary mapping names of impulses dominated by node to their corresponding impulses.
- Parameters:
include_interactions –
bool; whether to return impulses defined by interaction terms.include_nn –
bool; whether to return NN transformations of impulses.include_nn_inputs –
bool; whether to return input impulses to NN transformations.
- Returns:
listofImpulse; impulses dominated by node.
- impulses_from_response_interaction()[source]
Get list of any impulses from response interactions associated with this node.
- Returns:
listofImpulse; impulses dominated by node.
- interaction_by_rangf()[source]
Get map from random grouping factor names to associated interaction IDs dominated by node.
- Returns:
dict; map from random grouping factor names to associated interaction IDs.
- interaction_names()[source]
Get list of names of interactions dominated by node.
- Returns:
listofstr; names of interactions dominated by node.
- interactions()[source]
Return list of all response interactions used in this subtree, sorted by name.
- Returns:
listofResponseInteraction
- interactions2inputs()[source]
Get map from IDs of ResponseInteractions dominated by node to lists of IDs of their inputs.
- Returns:
dict; map from IDs of ResponseInteractions nodes to lists of their inputs.
- irf_by_rangf()[source]
Get map from random grouping factor names to IDs of associated IRF nodes dominated by node.
- Returns:
dict; map from random grouping factor names to IDs of associated IRF nodes.
- irf_to_formula(rangf=None)[source]
Generates a representation of this node’s impulse response kernel in formula string syntax
- Parameters:
rangf – random grouping factor for which to generate the stringification (fixed effects if rangf==None).
- Returns:
str; formula string representation of node
- is_LCG()[source]
Check the non-parametric type of a node’s kernel, or return
Noneif parametric.- Parameters:
family –
str; name of IRF family- Returns:
strorNone; name of kernel type if non-parametric, else ``None.
- local_name()[source]
Get descriptive name for this node, ignoring its position in the IRF tree.
- Returns:
str; name.
- nns_by_key(nns_by_key=None)[source]
Get a dict mapping NN keys to objects associated with them.
- Parameters:
keys –
dictorNone; dictionary to modify. Empty ifNone.- Returns:
dict; map from string keys tolistof associatedIRFNodeand/orNNImpulseobjects.
- node_table()[source]
Get map from names to nodes of all nodes dominated by node (including self).
- Returns:
dict; map from names to nodes of all nodes dominated by node.
- nonparametric_coef_names()[source]
Get list of names of nonparametric coefficients dominated by node. :return:
listofstr; names of spline coefficients dominated by node.
- static pointers2namemmaps(p)[source]
Get a map from source to transformed IRF node names.
- Parameters:
p –
dict; map from source to transformed IRF nodes.- Returns:
dict; map from source to transformed IRF node names.
- re_transform(X, expansion_map=None)[source]
Generate transformed copy of node with regex-matching predictors in X expanded. Recursive. Returns a tree forest representing the current state of the transform. When run from ROOT, should always return a length-1 list representing a single-tree forest, in which case the transformed tree is accessible as the 0th element.
- Parameters:
X – list of
pandastables; input data.expansion_map –
dict; Internal variable. Do not use.
- Returns:
listofIRFNode; tree forest representing current state of the transform.
- remove_impulses(impulse_ids)[source]
Remove impulses in impulse_ids from the model (both fixed and random effects).
- Parameters:
impulse_ids –
listofstr; impulse ID’s- Returns:
None
- supports_non_causal()[source]
Check whether model contains only IRF kernels that lack the causality constraint t >= 0.
- Returns:
bool: whether model contains only IRF kernels that lack the causality constraint t >= 0.
- terminal2coef()[source]
Get map from IDs of terminal IRF nodes dominated by node to lists of corresponding coefficient IDs.
- Returns:
dict; map from IDs of terminal IRF nodes to lists of corresponding coefficient IDs.
- terminal2impulse()[source]
Get map from terminal IRF nodes dominated by node to lists of corresponding impulses.
- Returns:
dict; map from terminal IRF nodes to lists of corresponding impulses.
- terminal_names()[source]
Get list of names of terminal IRF nodes dominated by node.
- Returns:
listofstr; names of terminal IRF nodes dominated by node.
- terminals()[source]
Get list of terminal IRF nodes dominated by node.
- Returns:
listofIRFNode; terminal IRF nodes dominated by node.
- terminals_by_name()[source]
Get dictionary mapping names of terminal IRF nodes dominated by node to their corresponding nodes.
- Returns:
dict; map from node names to nodes
- unablate_impulses(impulse_ids)[source]
Insert impulses in impulse_ids into fixed effects (leaving random effects structure unchanged).
- Parameters:
impulse_ids –
listofstr; impulse ID’s- Returns:
None
- unary_nonparametric_coef_names()[source]
Get list of names of non-parametric coefficients with no siblings dominated by node. Because unary splines are non-parametric, their coefficients are fixed at 1. Trainable coefficients are therefore perfectly confounded with the spline parameters. Splines dominating multiple coefficients are excepted, since the same kernel shape must be scaled in different ways.
- Returns:
listofstr; names of unary spline coefficients dominated by node.
- class cdr.formula.Impulse(name, ops=None, is_re=False)[source]
Bases:
objectData structure representing an impulse in a CDR model.
- Parameters:
name –
str; name of impulseops –
listofstr, orNone; ops to apply to impulse. IfNone, no ops.is_re –
bool; whether impulse is a regular expression search pattern
- categorical(X)[source]
Checks whether impulse is categorical in a dataset
- Parameters:
X – list
pandastables; data to to check.- Returns:
bool;Trueif impulse is categorical in X,Falseotherwise.
- expand_categorical(X)[source]
Expand any categorical predictors in X into 1-hot columns.
- Parameters:
X – list of
pandastables; input data- Returns:
2-tuple of
pandastable,listofImpulse; expanded data, list of expandedImpulseobjects
- expand_re(X)[source]
Expand any regular expression predictors in X into a sequence of all matching columns.
- Parameters:
X – list of
pandastables; input data- Returns:
listofImpulse; list of expandedImpulseobjects
- get_matcher()[source]
Return a compiled regex matcher to compare to data columns
- Returns:
reobject
- class cdr.formula.ImpulseInteraction(impulses, ops=None)[source]
Bases:
objectData structure representing an interaction of impulse-aligned variables (impulses) in a CDR model.
- Parameters:
impulses –
listofImpulse; impulses to interact.ops –
listofstr, orNone; ops to apply to interaction. IfNone, no ops.
- expand_categorical(X)[source]
Expand any categorical predictors in X into 1-hot columns.
- Parameters:
X – list of
pandastables; input data.- Returns:
3-tuple of
pandastable,listofImpulseInteraction,listoflistofImpulse; expanded data, list of expandedImpulseInteractionobjects, list of lists of expandedImpulseobjects, one list for each interaction.
- expand_re(X)[source]
Expand any regular expression predictors in X into a sequence of all matching columns.
- Parameters:
X – list of
pandastables; input data- Returns:
2-tuple of
listofImpulseInteraction,listoflistofImpulse; list of expandedImpulseInteractionobjects, list of lists of expandedImpulseobjects, one list for each interaction.
- impulses()[source]
Get list of impulses dominated by interaction.
- Returns:
listofImpulse; impulses dominated by interaction.
- class cdr.formula.NN(nodes, nn_type, rangf=None, nn_key=None, nn_config=None)[source]
Bases:
objectData structure representing a neural network within a CDR model.
- Parameters:
nodes –
listofIRFNode, and/orNNImpulseobjects; nodes associated with this NNnn_type –
str; name of NN type ('irf'or'impulse').rangf –
stror list ofstr; random grouping factors for which to build random effects for this NN.nn_type –
strorNone; key uniquely identifying this NN node (constructed automatically ifNone).nn_config –
dictorNone; map of NN config fields to their values for this NN node.
- all_impulse_names()[source]
Get list of all impulse names associated with this NN component.
- Returns:
listofstr: All impulse names associated with this NN component.
- class cdr.formula.NNImpulse(impulses, impulses_as_inputs=True, inputs_to_add=None, inputs_to_drop=None, nn_config=None)[source]
Bases:
objectData structure representing a feedforward neural network transform of one or more impulses in a CDR model.
- Parameters:
impulses –
listofImpulse; impulses to transform.impulses_as_inputs –
bool; whether to include impulses as NN inputs.inputs_to_add –
listofImpulseorNone; extra impulses to add to NN input.inputs_to_drop –
listofImpulseorNone; output impulses to drop from NN input.nn_config –
dictorNone; map of NN config fields to their values for this NN node.
- expand_categorical(X)[source]
Expand any categorical predictors in X into 1-hot columns.
- Parameters:
X – list of
pandastables; input data.- Returns:
3-tuple of
pandastable,listofNNImpulse,listoflistofImpulse; expanded data, list of expandedNNImpulseobjects, list of lists of expandedImpulseobjects, one list for each interaction.
- expand_re(X)[source]
Expand any regular expression predictors in X into a sequence of all matching columns.
- Parameters:
X – list of
pandastables; input data- Returns:
2-tuple of
listofImpulseInteraction,listoflistofImpulse; list of expandedImpulseInteractionobjects, list of lists of expandedImpulseobjects, one list for each interaction.
- impulses()[source]
Get list of output impulses dominated by NN.
- Returns:
listofImpulse; impulses dominated by NN.
- class cdr.formula.ResponseInteraction(responses, rangf=None)[source]
Bases:
objectData structure representing an interaction of response-aligned variables (containing at least one IRF-convolved impulse) in a CDR model.
- Parameters:
responses –
listof terminalIRFNode,Impulse, and/orImpulseInteractionobjects; responses to interact.rangf –
stror list ofstr; random grouping factors for which to build random effects for this interaction.
- add_rangf(rangf)[source]
Add random grouping factor name to this interaction.
- Parameters:
rangf –
str; random grouping factor name- Returns:
None
- contains_member(x)[source]
Check if object is a member of the set of responses belonging to this interaction
- Parameters:
x –
IRFNode,Impulse, and/orImpulseInteractionobject; object to check.- Returns:
bool; whether x is a member of the set of responses
- dirac_delta_responses()[source]
Get list of response-aligned Dirac delta variables dominated by interaction.
- Returns:
listofImpulseand/orImpulseInteractionobjects; Dirac delta variables dominated by interaction.
- irf_responses()[source]
Get list of IRFs dominated by interaction.
- Returns:
listofIRFNodeobjects; terminal IRFs dominated by interaction.
- nn_impulse_responses()[source]
Get list of NN impulse terms dominated by interaction.
- Returns:
listofNNImpulseobjects; NN impulse terms dominated by interaction.
- cdr.formula.pythonize_string(s)[source]
Convert string to valid python variable name
- Parameters:
s –
str; source string- Returns:
str; pythonized string
- cdr.formula.standardize_formula_string(s)[source]
Standardize a formula string, removing notational variation. IRF specifications
C(...)are sorted alphabetically by the IRF call name e.g.Gamma(). The order of impulses within an IRF specification is preserved.- Parameters:
s –
str; the formula string to be standardized- Returns:
str; standardization of s
cdr.io module
- cdr.io.read_tabular_data(X_paths, Y_paths, series_ids, categorical_columns=None, sep=' ', verbose=True)[source]
Read impulse and response data into pandas dataframes and perform basic pre-processing.
- Parameters:
X_paths –
strorlistofstr; path(s) to impulse (predictor) data (multiple tables are concatenated). Each path may also be a;-delimited list of paths to files containing predictors with different timestamps, where the predictors in each file are all timestamped with respect to the same reference point.Y_paths –
strorlistofstr; path(s) to response data (multiple tables are concatenated). Each path may also be a;-delimited list of paths to files containing different response variables with different timestamps, where the response variables in each file are all timestamped with respect to the same reference point.series_ids –
listofstr; column names whose jointly unique values define unique time series.categorical_columns –
listofstr; column names that should be treated as categorical.sep –
str; string representation of field delimiter in input data.verbose –
bool; whether to log progress to stderr.
- Returns:
2-tuple of list(
pandasDataFrame); (impulse data, response data). X and Y each have one element for each dataset in X_paths/Y_paths, each containing the column-wise concatenation of all column files in the path.
cdr.kwargs module
- class cdr.kwargs.Kwarg(key, default_value, dtypes, descr, aliases=None, default_value_cdrnn='same', suppress=False)[source]
Bases:
objectData structure for storing keyword arguments and their docstrings.
- Parameters:
key –
str; Keydefault_value – Any; Default value
dtypes –
listorclass; List of classes or single class. Members can also be specific required values, eitherNoneor values of typestr.descr –
str; Description of kwargdefault_value_cdrnn – Any; Default value for CDRNN if distinct from CDR. If
'same', CDRNN uses default_value.suppress –
bool; Whether to print documentation for this kwarg. Useful for hiding deprecated or little-used kwargs in order to simplify autodoc output.
- dtypes_str()[source]
String representation of dtypes permitted for kwarg.
- Returns:
str; dtypes string.
- get_type_name(x)[source]
String representation of name of a dtype
- Parameters:
x – dtype; the dtype to name.
- Returns:
str; name of dtype.
- in_settings(settings)[source]
Check whether kwarg is specified in a settings object parsed from a config file.
- Parameters:
settings – settings from a
ConfigParserobject.- Returns:
bool; whether kwarg is found in settings.
- kwarg_from_config(settings, is_cdrnn=False)[source]
Given a settings object parsed from a config file, return value of kwarg cast to appropriate dtype. If missing from settings, return default.
- Parameters:
settings – settings from a
ConfigParserobject ordict.is_cdrnn –
bool; whether this is for a CDRNN model.
- Returns:
value of kwarg
- cdr.kwargs.cdr_kwarg_docstring()[source]
Generate docstring snippet summarizing all CDR kwargs, dtypes, and defaults.
- Returns:
str; docstring snippet
cdr.model module
cdr.opt module
cdr.plot module
- cdr.plot.plot_heatmap(m, row_names, col_names, outdir='.', filename='eigenvectors.png', plot_x_inches=7, plot_y_inches=5, cmap='Blues')[source]
Plot a heatmap. Used in CDR for visualizing eigenvector matrices in principal components models.
- Parameters:
m – 2D
numpyarray; source data for plot.row_names –
listofstr; row names.col_names –
listofstr; column names.outdir –
str; output directory.filename –
str; filename.plot_x_inches –
float; width of plot in inches.plot_y_inches –
float; height of plot in inches.cmap –
str; name ofmatplotlibcmapobject (determines colors of plotted IRF).
- Returns:
None
- cdr.plot.plot_irf(plot_x, plot_y, irf_names, lq=None, uq=None, density=None, sort_names=True, prop_cycle_length=None, prop_cycle_map=None, outdir='.', filename='irf_plot.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, ylim=None, cmap='gist_rainbow', legend=True, xlab=None, ylab=None, use_line_markers=False, use_grid=True, transparent_background=False, dpi=300, dump_source=False)[source]
Plot impulse response functions.
- Parameters:
plot_x –
numpyarray with shape (T,1); time points for which to plot the response. For example, if the plots contain 1000 points from 0s to 10s, plot_x could be generated asnp.linspace(0, 10, 1000).plot_y –
numpyarray with shape (T, N); response of each IRF at each time point.irf_names –
listofstr; CDR ID’s of IRFs in the same order as they appear in axis 1 of plot_y.lq –
numpyarray with shape (T, N), orNone; lower bound of credible interval for each time point. IfNone, no credible interval will be plotted.uq –
numpyarray with shape (T, N), orNone; upper bound of credible interval for each time point. IfNone, no credible interval will be plotted.sort_names –
bool; alphabetically sort IRF names.prop_cycle_length –
intorNone; Length of plotting properties cycle (defines step size in the color map). IfNone, inferred from irf_names.prop_cycle_map –
listofint, orNone; Integer indices to use in the properties cycle for each entry in irf_names. IfNone, indices are automatically assigned.outdir –
str; output directory.filename –
str; filename.irf_name_map –
dictofstrtostr; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.plot_x_inches –
float; width of plot in inches.plot_y_inches –
float; height of plot in inches.ylim – 2-element
tupleorlist; (lower_bound, upper_bound) to use for y axis. IfNone, automatically inferred.cmap –
str; name ofmatplotlibcmapobject (determines colors of plotted IRF).legend –
bool; include a legend.xlab –
strorNone; x-axis label. IfNone, no label.ylab –
strorNone; y-axis label. IfNone, no label.use_line_markers –
bool; add markers to IRF lines.use_grid –
bool; whether to show a background grid.transparent_background –
bool; use a transparent background. IfFalse, uses a white background.dpi –
int; dots per inch.dump_source –
bool; Whether to dump the plot source array to a csv file.
- Returns:
None
- cdr.plot.plot_irf_as_heatmap(plot_x, plot_y, irf_names, sort_names=True, outdir='.', filename='irf_hm.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, ylim=None, cmap='seismic', xlab=None, ylab=None, transparent_background=False, dpi=300, dump_source=False)[source]
Plot impulse response functions as a heatmap.
- Parameters:
plot_x –
numpyarray with shape (T,1); time points for which to plot the response. For example, if the plots contain 1000 points from 0s to 10s, plot_x could be generated asnp.linspace(0, 10, 1000).plot_y –
numpyarray with shape (T, N); response of each IRF at each time point.irf_names –
listofstr; CDR ID’s of IRFs in the same order as they appear in axis 1 of plot_y.sort_names –
bool; alphabetically sort IRF names.outdir –
str; output directory.filename –
str; filename.irf_name_map –
dictofstrtostr; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.plot_x_inches –
float; width of plot in inches.plot_y_inches –
float; height of plot in inches.ylim – 2-element
tupleorlist; (lower_bound, upper_bound) to use for y axis. IfNone, automatically inferred.cmap –
str; name ofmatplotlibcmapobject (determines colors of plotted IRF).xlab –
strorNone; x-axis label. IfNone, no label.ylab –
strorNone; y-axis label. IfNone, no label.transparent_background –
bool; use a transparent background. IfFalse, uses a white background.dpi –
int; dots per inch.dump_source –
bool; Whether to dump the plot source array to a csv file.
- Returns:
None
- cdr.plot.plot_qq(theoretical, actual, actual_color='royalblue', expected_color='firebrick', outdir='.', filename='qq_plot.png', plot_x_inches=6, plot_y_inches=4, legend=True, xlab='Theoretical', ylab='Empirical', ticks=True, as_lines=False, transparent_background=False, dpi=300)[source]
Generate quantile-quantile plot.
- Parameters:
theoretical –
numpyarray with shape (T,); theoretical error quantiles.actual –
numpyarray with shape (T,); empirical errors.actual_color –
str; color for actual values.expected_color –
str; color for expected values.outdir –
str; output directory.filename –
str; filename.plot_x_inches –
float; width of plot in inches.plot_y_inches –
float; height of plot in inches.legend –
bool; include a legend.xlab –
strorNone; x-axis label. IfNone, no label.ylab –
strorNone; y-axis label. IfNone, no label.as_lines –
bool; render QQ plot using lines. Otherwise, use points.transparent_background –
bool; use a transparent background. IfFalse, uses a white background.dpi –
int; dots per inch.
- Returns:
None
- cdr.plot.plot_surface(x, y, z, lq=None, uq=None, density=None, bounds_as_surface=False, outdir='.', filename='surface.png', irf_name_map=None, plot_x_inches=6, plot_y_inches=4, xlim=None, ylim=None, zlim=None, plot_type='wireframe', cmap='coolwarm', xlab=None, ylab=None, zlab='Response', title=None, transparent_background=False, dpi=300, dump_source=False)[source]
Plot an IRF or interaction surface.
- Parameters:
x –
numpyarray with shape (M,N); x locations for each plot point, copied N times.y –
numpyarray with shape (M,N); y locations for each plot point, copied M times.z –
numpyarray with shape (M,N); z locations for each plot point.lq –
numpyarray with shape (M,N), orNone; lower bound of credible interval for each plot point. IfNone, no credible interval will be plotted.uq –
numpyarray with shape (M,N), orNone; upper bound of credible interval for each plot point. IfNone, no credible interval will be plotted.bounds_as_surface –
bool; whether to plot interval bounds using additional surfaces. IfFalse, bounds are plotted with vertical error bars instead. Ignored if lq, uq areNone.outdir –
str; output directory.filename –
str; filename.irf_name_map –
dictofstrtostr; map from CDR IRF ID’s to more readable names to appear in legend. Any plotted IRF whose ID is not found in irf_name_map will be represented with the CDR IRF ID.plot_x_inches –
float; width of plot in inches.plot_y_inches –
float; height of plot in inches.xlim – 2-element
tupleorlistorNone; (lower_bound, upper_bound) to use for x axis. IfNone, automatically inferred.ylim – 2-element
tupleorlistorNone; (lower_bound, upper_bound) to use for y axis. IfNone, automatically inferred.zlim – 2-element
tupleorlistorNone; (lower_bound, upper_bound) to use for z axis. IfNone, automatically inferred.plot_type –
str; name of plot type to generate. One of["contour", "surf", "trisurf"].cmap –
str; name ofmatplotlibcmapobject (determines colors of plotted IRF).legend –
bool; include a legend.xlab –
strorNone; x-axis label. IfNone, no label.ylab –
strorNone; y-axis label. IfNone, no label.zlab –
strorNone; z-axis label. IfNone, no label.use_line_markers –
bool; add markers to IRF lines.transparent_background –
bool; use a transparent background. IfFalse, uses a white background.dpi –
int; dots per inch.dump_source –
bool; Whether to dump the plot source array to a csv file.
- Returns:
None
cdr.signif module
- cdr.signif.correlation_test(y, x1, x2, nested=False, verbose=True)[source]
Perform a parametric test of difference in correlation with observations between two prediction vectors, based on Steiger (1980).
- Parameters:
y –
numpyvector; observation vector.x1 –
numpyvector; first prediction vector.x2 –
numpyvector; second prediction vector.nested –
bool; assume that the second model is nested within the first.verbose –
bool; report progress logs to standard error.
- Returns:
- cdr.signif.permutation_test(a, b, n_iter=10000, n_tails=2, mode='loss', agg='mean', nested=False, verbose=True)[source]
Perform a paired permutation test for significance.
- Parameters:
a –
numpyarray; first error/loss/prediction matrix, shape (n_item, n_model).b –
numpyarray; second error/loss/prediction matrix, shape (n_item, n_model).n_iter –
int; number of resampling iterations.n_tails –
int; number of tails.mode –
str; one of["mse", "loglik"], the type of error used (SE’s are averaged while loglik’s are summed).agg –
str; aggregation function over ensemble components. E.g.,'mean','median','min','max'.nested –
bool; assume that the second model is nested within the first.verbose –
bool; report progress logs to standard error.
- Returns:
cdr.synth module
- class cdr.synth.SyntheticModel(n_pred, irf_name, irf_params=None, coefs=None, fn=None, interactions=False, ranef_range=None, n_ranef_levels=None)[source]
Bases:
objectA data structure representing a synthetic “true” model for empirical validation of CDR fits. Contains a randomly generated set of IRFs that can be used to convolve data, and provides methods for sampling data with particular structure and convolving it with the true IRFs in order to generate a response vector.
- Parameters:
n_pred –
int; Number of predictors in the synthetic model.irf_name –
str; Name of IRF kernel to use. One of['Exp', 'Normal', 'Gamma', 'ShiftedGamma'].irf_params –
dictorNone; Dictionary of IRF parameters to use, with parameter names as keys and numeric arrays as values. Values must each have n_pred cells. IfNone, parameter values will be randomly sampled.coefs – numpy array or
None; Vector of coefficients to use, wherelen(coefs) == n_pred. IfNone, coefficients will be randomly sampled.fn –
strorNone; Effect shape to use. One of['quadratic', 'exp', 'logmod', 'linear']. If ``None, linear effects.interactions –
bool; Whether there are randomly sampled pairwise interactions (same bounds as those used for coefs).ranef_range –
floatorNone; Maximum magnitude of simulated random effects. If0orNone, no random effects.n_ranef_levels –
intorNone; Number of random effects levels. If0orNone, no random effects.
- convolve(X, t_X, t_y, history_length=None, err_sd=None, allow_instantaneous=True, ranef_level=None, verbose=True)[source]
Convolve data using the model’s IRFs.
- Parameters:
X – numpy array; 2-D array of predictors.
t_X – numpy array; 1-D vector of predictor timestamps.
t_y – numpy array; 1-D vector of response timestamps.
history_length –
intorNone; Drop preceding events more thanhistory_lengthsteps into the past. IfNone, no history clipping.err_sd –
floatorNone; Standard deviation of Gaussian noise to inject into responses. IfNone, use the empirical standard deviation of the response vector.allow_instantaneous –
bool; Whether to compute responses whent==0.ranef_level –
strorNone; Random effects level to use (orNoneto use population-level effect)verbose –
bool; Verbosity.
- Returns:
(2-D numpy array, 1-D numpy array); Matrix of convolved predictors, vector of responses
- convolve_v2(X, t_X, t_y, err_sd=None, allow_instantaneous=True, verbose=True)[source]
Convolve data using the model’s IRFs. Alternate memory-intensive implementation that is faster for small arrays but can exhaust resources for large ones.
- Parameters:
X – numpy array; 2-D array of predictors.
t_X – numpy array; 1-D vector of predictor timestamps.
t_y – numpy array; 1-D vector of response timestamps.
err_sd –
float; Standard deviation of Gaussian noise to inject into responses.allow_instantaneous –
bool; Whether to compute responses whent==0.verbose –
bool; Verbosity.
- Returns:
(2-D numpy array, 1-D numpy array); Matrix of convolved predictors, vector of responses
- get_curves(n_time_units=None, n_time_points=None, ranef_level=None)[source]
Extract response curves as an array.
- Parameters:
n_time_units –
float; Number of units of time over which to extract curves.n_time_points –
int; Number of samples to extract for each curve (resolution of curve)ranef_level –
strorNone; Random effects level to use (orNoneto use population-level effect)
- Returns:
numpy array; 2-D numpy array with shape
[T, K], whereTis n_time_points andKis the number of predictors in the model.
- irf(x, coefs=False, ranef_level=None)[source]
Computes the values of the model’s IRFs elementwise over a vector of timepoints.
- Parameters:
x – numpy array; 1-D array with shape
[N]containing timepoints at which to query the IRFs.coefs –
bool; Whether to rescale responses by coefficientsranef_level –
strorNone; Random effects level to use (orNoneto use population-level effect)
- Returns:
numpy array; 2-D array with shape
[N, K]containing values of the model’sKIRFs evaluated at the timepoints in x.
- plot_irf(n_time_units=None, n_time_points=None, dir='.', filename='synth_irf.png', plot_x_inches=6, plot_y_inches=4, cmap='gist_rainbow', legend=False, xlab=None, ylab=None, use_line_markers=False, transparent_background=False)[source]
Plot impulse response functions.
- Parameters:
n_time_units –
float; number if time units to use for plotting.n_time_points –
int; number of points to use for plotting.dir –
str; output directory.filename –
str; filename.plot_x_inches –
float; width of plot in inches.plot_y_inches –
float; height of plot in inches.cmap –
str; name ofmatplotlibcmapobject (determines colors of plotted IRF).legend –
bool; include a legend.xlab –
strorNone; x-axis label. IfNone, no label.ylab –
strorNone; y-axis label. IfNone, no label.use_line_markers –
bool; add markers to IRF lines.transparent_background –
bool; use a transparent background. IfFalse, uses a white background.
- Returns:
None
- sample_data(m, n=None, X_interval=None, y_interval=None, rho=None, align_X_y=True)[source]
Samples synthetic predictors and time vectors
- Parameters:
m –
int; Number of predictors.n –
int; Number of response query points.X_interval –
str,float,list,tuple, orNone; Predictor interval model. IfNone, predictor offsets are randomly sampled from an exponential distribution with parameter1. Iffloat, predictor offsets are evenly spaced with interval X_interval. Iflistortuple, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.y_interval –
str,float,list,tuple, orNone; Response interval model. IfNone, response offsets are randomly sampled from an exponential distribution with parameter1. Iffloat, response offsets are evenly spaced with interval y_interval. Iflistortuple, the first element is the name of a scipy distribution to use for sampling offsets, and all remaining elements are positional arguments to that distribution.rho –
float; Level of pairwise correlation between predictors.align_X_y –
bool; Whether predictors and responses are required to be sampled at the same points in time.
- Returns:
(2-D numpy array, 1-D numpy array, 1-D numpy array); Matrix of predictors, vector of predictor timestamps, vector of response timestamps
cdr.util module
- cdr.util.filter_models(names, filters=None, cdr_only=False)[source]
Return models contained in names that are permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name. If
filtersis zero-length, returns names.- Parameters:
names –
listofstr; pool of model names to filter.filters –
listof{str, SRE_Pattern}orNone; filters to apply in order. IfNone, no additional filters.cdr_only –
bool; ifTrue, only returns CDR models. IfFalse, returns all models admitted by filters.
- Returns:
listofstr; names in names that pass at least one filter, or all of names if no filters are applied.
- cdr.util.filter_names(names, filters)[source]
Return elements of names permitted by filters, preserving order in which filters were matched. Filters can be ordinary strings, regular expression objects, or string representations of regular expressions. For a regex filter to be considered a match, the expression must entirely match the name.
- Parameters:
names –
listofstr; pool of names to filter.filters –
listof{str, SRE_Pattern}; filters to apply in order
- Returns:
listofstr; names in names that pass at least one filter
- cdr.util.get_random_permutation(n)[source]
Draw a random permutation of integers 0 to n. Used to shuffle arrays of length n. For example, a permutation and its inverse can be generated by calling
p, p_inv = get_random_permutation(n). To randomly shuffle an n-dimensional vectorx, callx[p]. To un-shufflexafter it has already been shuffled, callx[p_inv].- Parameters:
n – maximum value
- Returns:
2-tuple of
numpyarrays; the permutation and its inverse
- cdr.util.load_cdr(dir_path, suffix='')[source]
Convenience method for reconstructing a saved CDR object. First loads in metadata from
m.obj, then uses that metadata to construct the computation graph. Then, if saved weights are found, these are loaded into the graph.- Parameters:
dir_path – Path to directory containing the CDR checkpoint files.
suffix –
str; file suffix.
- Returns:
The loaded CDR instance.
- cdr.util.mae(true, preds)[source]
Compute mean absolute error (MAE).
- Parameters:
true – True values
preds – Predicted values
- Returns:
float; MAE
- cdr.util.mse(true, preds)[source]
Compute mean squared error (MSE).
- Parameters:
true – True values
preds – Predicted values
- Returns:
float; MSE
- cdr.util.names2ix(names, l, dtype=<class 'numpy.int32'>)[source]
Generate 1D numpy array of indices in l corresponding to names in names
- Parameters:
names –
listofstr; names to look up in ll –
listofstr; list of names from which to extract indicesdtype –
numpydtype object; return dtype
- Returns:
numpyarray; indices of names in l
- cdr.util.nested(model_name_1, model_name_2)[source]
Check whether two CDR models are nested with 1 degree of freedom
- Parameters:
model_name_1 –
str; name of first modelmodel_name_2 –
str; name of second model
- Returns:
bool;Trueif models are nested with 1 degree of freedom,Falseotherwise
- cdr.util.pca(X, n_dim=None, dtype=<class 'numpy.float32'>)[source]
Perform principal components analysis on a data table.
- Parameters:
X –
numpyorpandasarray; the input datan_dim –
intorNone; maximum number of principal components. IfNone, all components are retained.dtype –
numpydtype; return dtype
- Returns:
5-tuple of
numpyarrays; transformed data, eigenvectors, eigenvalues, input means, and input standard deviations
- cdr.util.percent_variance_explained(true, preds)[source]
Compute percent variance explained.
- Parameters:
true – True values
preds – Predicted values
- Returns:
float; percent variance explained