------------------------------------------------ lines 12-435 of file: at_cascade/csv/simulate.py ------------------------------------------------ {xrst_begin csv.simulate} {xrst_spell cv meas pini sincidence std } Simulate A Cascade Data Set ########################### Prototype ********* {xrst_literal # BEGIN_SIMULATE # END_SIMULATE } sim_dir ******* This string is the directory name where the csv files are located. Example ******* :ref:`csv.simulate_xam-name` Input Files *********** option_sim.csv ============== This csv file has two columns, one called ``name`` and the other called ``value``. The rows of this table are documented below by the name column. If an option name does not appear, the corresponding default value is used for the option. The final value for each of the options is reported in the file :ref:`csv.simulate@Output Files@option_sim_out.csv` . Because each option has a default value, new option are added in such a way that previous option_sim.csv files are still valid. absolute_covariates ------------------- This is a space separated list of the names of the absolute covariates. The reference value for an absolute covariate is always zero. (The reference value for a relative covariate is its average for the location that is being fit.) The default value for *absolute_covariates* is the empty string; i.e., there are no absolute covariates. absolute_tolerance ------------------ This float is the absolute error tolerance for the integrator. It determines the accuracy of :ref:`csv.simulate@Output Files@data_sim.csv@meas_mean` for :ref:`integrand ` that require the ODE; e.g., prevalence requires the ODE and Sincidence does not. The default value for this option is 1e-5. float_precision --------------- This integer is the number of decimal digits of precision to include for float values in the output csv files. The default value for this option is 4. integrand_step_size ------------------- This float is the step size in age and time used to approximate integrand averages from age_lower to age_upper and time_lower to time_upper (in data_sim.csv). It must be greater than zero. The default value for this option is 5.0. random_depend_sex ----------------- If :ref:`csv.simulate@Input Files@option_sim.csv@new_random_effects` is false, this option is not used. Otherwise if this boolean is true, the random effects depend on sex. if it is false, for each *node_name* and *rate*, the random effect for ``female`` and ``male`` will be equal; see :ref:`csv.simulate@random_effect.csv` . The default value for this option is false. new_random_effects ------------------ If this boolean is true, a new set of random effects is generated and :ref:`csv.simulate@random_effect.csv` is an output file. Otherwise random_effect.csv is an input file. The default value for this boolean is true. random_seed ----------- This integer is used to seed the random number generator. The default value for this option is {xrst_code py} random_seed = int( time.time() ) {xrst_code} std_random_effects_rate ----------------------- If :ref:`csv.simulate@Input Files@option_sim.csv@new_random_effects` is false, this option is not used. Otherwise, this float is the standard deviation of the random effects for the corresponding *rate* where *rate* is pini, iota, rho, or chi. The effects are in log of rate space, so this standard deviation is also in log of rate space. Hence only the rates that appear in :ref:`csv.simulate@Input Files@no_effect_rate.csv` have an effect (the other random effects multiply zero). The default value for this option is 0.0; i.e., there are no random effects for the corresponding rate. trace ----- If this boolean is true, a trace will be printed during the simulation. This will show that the simulation is making progress and is useful for cases where there is a lot of data to simulate. The default value for this boolean is true. ----------------------------------------------------------------------------- node.csv ======== This csv file defines the node tree. It has the columns documented below. node_name --------- This string is a name describing the node in a way that is easy for a human to remember. It be unique for each row. parent_name ----------- This string is the node name corresponding to the parent of this node. The root node of the tree has an empty entry for this column. If a node is a parent, it must have at least two children. This avoids fitting the same location twice as one goes from parent to child nodes. ----------------------------------------------------------------------------- covariate.csv ============= This csv file specifies the value of omega and the covariates. For each node_name it has a rectangular grid in age and time. In addition, the rectangular grid is the same for nodes. node_name --------- This string identifies the node, in node.csv, corresponding to this row. sex --- This identifies which sex this row corresponds to. The sex values ``female`` and ``male`` must appear in this table. The sex value ``both`` does not appear. age --- This float is the age, in years, corresponding to this row. time ---- This float is the time, in years, corresponding to this row. omega ----- This float is the value of omega (other cause mortality) for this row. Often other cause mortality is approximated by all cause mortality. Omega is a rate that is assumed to be know ahead of time and hence it is specified together with the covariates. covariate_name -------------- Except for node_name, sex, age. time, and omega, the columns of this file are covariates. The header row specifies the *covariate_name* for a column and the other rows are floats containing the corresponding covariate value. The option_sim.csv :ref:`csv.simulate@Input Files@option_sim.csv@absolute_covariates` specifies which covariates are absolute. All the others are :ref:`relative covariates`. Note that omega and sex are not referred to as covariates for this simulation. ----------------------------------------------------------------------------- no_effect_rate.csv ================== This csv file specifies the grid points at which each rate is modeled during a simulation. For each rate_name it has a :ref:`csv.module@Notation@Rectangular Grid` in age and time. These are no-effect rates; i.e., the rates without the random and covariate effects. Covariate multipliers that are constrained to zero during the fitting can be used to get variation between nodes in the no-effect rates corresponding to the fit. rate_name --------- This string is ``iota``, ``rho``, ``chi``, or ``pini`` and specifies the rate. If one of these rates does not appear, it is modeled as always zero. Other cause mortality ``omega`` is specified in :ref:`csv.simulate@Input Files@covariate.csv` . age --- This float is the age, in years, corresponding to this row. time ---- This float is the time, in years, corresponding to this row. rate_truth ---------- This float is the no-effect rate value for all the nodes. It is used to simulate the data. As mentioned, above knocking out covariate multipliers can be used to get variation in the no-effect rates that correspond to the fit. If *rate_name* is ``pini``, *rate_truth* should be constant w.r.t *age* (because it is prevalence at age zero). ----------------------------------------------------------------------------- multiplier_sim.csv ================== This csv file provides information about the covariate multipliers. Each row of this file, except the header row, corresponds to a different multiplier. The multipliers are constant in age and time. multiplier_id ------------- is an :ref:`csv.module@Notation@Index Column` for multiplier_sim.csv. rate_name --------- This string is ``iota``, ``rho``, ``chi``, or ``pini`` and specifies which rate this covariate multiplier is affecting. covariate_or_sex ---------------- If this is ``sex`` it specifies that this multiplier multiples the sex values where {xrst_code py}""" sex_covariate_value = { 'female' : -0.5, 'both' : 0.0, 'male' : +0.5 } """{xrst_code} female = -0.5, male = +0.5, and both = 0.0. Otherwise this is one of the covariate names in the covariate.csv file and specifies which covariate value is being multiplied. multiplier_truth ---------------- This is the value of the covariate multiplier used to simulate the data. ----------------------------------------------------------------------------- simulate.csv ============ This csv file specifies the simulated data set with each row corresponding to one data point. simulate_id ----------- is an :ref:`csv.module@Notation@Index Column` for simulate.csv. integrand_name -------------- This string is a dismod_at integrand; e.g. ``Sincidence``. node_name --------- This string identifies the node corresponding to this data point. sex --- This string is the sex for this data point. age_lower --------- This float is the lower age limit for this data row. age_upper --------- This float is the upper age limit for this data row. time_lower ---------- This float is the lower time limit for this data row. time_upper ---------- This float is the upper time limit for this data row. meas_std_cv ----------- This float is the coefficient of variation for the measurement noise for this data row; see :ref:`csv.simulate@Output Files@data_sim.csv@meas_std` . meas_std_min ------------ This float is the minimum value for the standard deviation of the measurement noise for this data row; see :ref:`csv.simulate@Output Files@data_sim.csv@meas_std` . ------------------------------------------------------------------------------ random_effect.csv ***************** This file reports the random effect for each node, rate and sex. If :ref:`csv.simulate@Input Files@option_sim.csv@new_random_effects` is true (false) , this an input (output) file. Only the rate names that appear in :ref:`csv.simulate@Input Files@no_effect_rate.csv@rate_name` are included in random_effect.csv . (Random effect for rates not in no_effect_rate.csv have no effect.) node_name ========= This string identifies the row in :ref:`csv.simulate@Input Files@node.csv` that this row corresponds to. All of the nodes in the node table are present in this file. rate_name ========= This is a string and is one of the For each :ref:`csv.simulate@Input Files@no_effect_rate.csv@rate_name` in the no_effect rate table, All of the rates in the no_effect rate table are present in this file. sex === This identifies which sex the random effect corresponds to. The sex values ``female`` and ``male`` will appear and ``both`` will not appear. random_effect ============= This float value is the random effect for the specified node, rate, and sex. If new_random_effects is true and :ref:`csv.simulate@Input Files@option_sim.csv@random_depend_sex` is false, the value in this column will not depend on the value in the sex column. Discussion ========== 1. For a given parent node, rate, and sex, the sum of the random effects with respect to the child nodes is zero. 2. All the random effects for the root node are set to zero (the root node does not have a parent node). ----------------------------------------------------------------------------- Output Files ************ option_sim_out.csv ================== This is a copy of :ref:`csv.simulate@Input Files@option_sim.csv` with the default filled in for missing values. data_sim.csv ============ This contains the simulated data. It is created during a simulate command and has the following columns: simulate_id ----------- This integer identifies the row in the simulate.csv corresponding to this row in data_sim.csv. This is an :ref:`csv.module@Notation@Index Column` for simulate.csv and data_sim.csv. meas_mean --------- This float is the mean value for the measurement. This is the model value without any measurement noise. It corresponds to the simulation value for all the model variables and covariates. We refer to this as the *true value for the average integrand* even when we have model miss-specification; i.e., when the set of model variables or covariates in :ref:`csv.simulate-name` is different from the set in :ref:`csv.fit-name` . meas_std -------- This float is the measurement standard deviation for the simulated data point. This standard deviation is before censoring and given by | *meas_std* = max ( *meas_std_min* , *meas_std_cv* * *meas_mean* ) where :ref:`csv.simulate@Input Files@simulate.csv@meas_std_min` is the minimum measure standard deviation, and :ref:`csv.simulate@Input Files@simulate.csv@meas_std_cv` is the coefficient of variation for the measurement noise. meas_value ---------- This float is the simulated measured value. The data will be generated with a normal distribution that has mean *meas_mean* and standard deviation *meas_std* . If the resulting measurement value would be less than zero, the value zero is used; i.e., a censored normal is used to simulate the data. covariate_name -------------- For each :ref:`csv.simulate@Input Files@covariate.csv@covariate_name` there is a column with this name in simulate.csv. The values in these columns are floats corresponding to the covariate value at the mid point of the ages and time intervals for this data point. This value is obtained using bilinear interpolation of the covariate values in covariate.csv. The interpolate is extended as constant in age (time) for points outside the age rage (time range) in the covariate.csv file. {xrst_end csv.simulate}