-------------------------------------------- lines 14-1056 of file: at_cascade/csv/fit.py -------------------------------------------- {xrst_begin csv.fit} {xrst_spell avg avgint bnd cen const cov cpus cv dage dtime eigen ipopt iter meas mtexcess mtother mtwith mul num pini pos relrisk sincidence sqlite std underbars rcond } Fit a CSV Specified Cascade ########################### Prototype ********* {xrst_literal # BEGIN_FIT # END_FIT } Example ******* :ref:`csv.break_fit_pred-name` . fit_dir ******* This string is the directory name where the csv files are located. max_node_depth ************** This is the number of generations below root node that are included; see :ref:`job_descendant@Node Depth Versus Job Depth` and note that sex is the :ref:`option_all_table@split_covariate_name` . If max_node_depth is zero, only the root node will be included. If max_node_depth is None, the root node and all its descendants are included. Input Files *********** option_fit.csv ============== This csv file has two columns, one called ``name`` and the other called ``value``. The rows of this table are documented below by the name column. If an option name does not appear, the corresponding value is empty, the default value is used for the option. The final value for each of the options is reported in the file :ref:`csv.fit@Output Files@option_fit_out.csv` . Because each option has a default value, new option are added in such a way that previous option_fit.csv files are still valid. absolute_covariates ------------------- This is a space separated list of the names of the absolute covariates. The reference value for an absolute covariate is always zero. (The reference value for a relative covariate is its average for the location that is being fit.) The default value for *absolute_covariates* is the empty string; i.e., there are no absolute covariates. The covariate named ``one`` is automatically created and is always absolute and should not be in this list. age_avg_split ------------- This string contains a space separated list of float values (there is one or more spaces between each float value). Each float value is age at which to split the integration of both the ODE and the average of an integrand over an interval. The default for this value is the empty string; i.e., no extra age splitting over the uniformly spaced grid specified by :ref:`csv.fit@Input Files@option_fit.csv@ode_step_size`. asymptotic_rcond_lower ---------------------- This float is a lower bound for an approximate reciprocal condition number of the Hessian of the fixed effects objective. This Hessian is used as an approximation for the information matrix when using the ``asymptotic`` or ``censor_asymptotic`` :ref:`csv.fit@Input Files@option_fit.csv@sample_method` . This option must be between zero and one and its default value is zero.. If the approximate reciprocal condition number is less than *asymptotic_rcond_lower*, the asymptotic sample method will fail. balance_sex ----------- This is a boolean option. The subsample of a data with size :ref:`csv.fit@Input Files@option_fit.csv@max_fit` always attempts to balance the child nodes; i.e., get an equal number data values for each child of the node currently being fit. If *balance_sex* is true, the selection will also try to balance the sex covariate values; i.e., get an equal amount of male and female data for each child node. bound_random ------------ This float option specifies a bound on the random effects. Sometimes the initial fixed effects are very far from truth and the random effects try to compensate with large values. This bound can stabilize the optimization in this case. It is the intention that this bound not be active at the final value for the fixed effects. The default value for this option is infinity; i.e., no bound. child_prior_dage ---------------- This option is true or false. If it is false, no dage priors are created for the child jobs. The default value for this option is true. See the :ref:`create_shift_db@Problem` for a discussion of why you may want to use this option. child_prior_dtime ----------------- This option is true or false. If it is false, no dtime priors are created for the child jobs. The default value for this option is true. See the :ref:`create_shift_db@Problem` for a discussion of why you may want to use this option. child_prior_std_factor ---------------------- This factor multiplies the parent fit posterior standard deviation for the value priors the during a child fit (except for the covariate multipliers). If it is greater (less) than one, the child priors are larger (smaller) than indicated by the posterior corresponding to the parent fit. The default value for this option is 2.0. child_prior_std_factor_mulcov ----------------------------- This factor multiplies the parent fit posterior standard deviation for the value priors for the covariate multipliers. The default value for this option is *child_prior_std_factor* . compress_interval ----------------- This string contains two float values separated by one or more spaces. The first (second) float value is called *age_size* ( *time_size* ). The default value for this option is both *age_size* and *time_size* are 100. #. If for a :ref:`csv.fit@Input Files@data_in.csv` row, *age_upper* - *age_lower* <= *age_size* , the age average for that data is approximated by its value at age ( *age_upper* - *age_lower* ) / 2. #. If for a data_in.csv row, *time_upper* - *time_lower* <= *time_size* , the time average for that data is approximated by its value at time ( *age_upper* - *age_lower* ) / 2. covariate_reference ------------------- This string is either ``data_in.csv`` or ``covariate.csv`` . If it is ``data_in.csv`` the reference value for each (sex, node, covariate) is the average of the covariate corresponding to the data that is fit for that (sex, node) . If it is ``covariate.csv`` the reference value for each (sex, node, covariate) is the average of the values in covariate.csv that are for that sex, node, and covariate. The default value for this option is ``data_in.csv`` . See :ref:`csv.shock_cov@covariate_reference` in the csv.shock for an example use of this option. freeze_type ----------- This options specifies the type of covariate multiplier freeze that is done. It is either ``mean`` or ``posterior`` and its default is ``mean`` . If :ref:`csv.fit@Input Files@option_fit.csv@refit_split` is false, the freeze fit is the only fit at the root level. If *refit_split* is true, the freeze fit is the second fit at the root level; i.e, the fit directly after the sex split. Note that in general the cascade can freeze the covariate multipliers at any level; see :ref:`option_all_table@freeze_type` in the option_all table. mean .... If the freeze_type is ``mean`` , the mean (optimal value) for the covariate multipliers, determined by the freeze fit, is used as the lower and upper limit for fits that are descendant of the freeze fit. Note that if the lower and upper limits are equal, the corresponding model variable is treated as if it has no uncertainty. posterior ......... If the freeze_type is ``posterior`` , the posterior distribution for the covariate multipliers, determined by the freeze fit, is used as the prior for all the descendants of the freeze fit. This enables one to account for the uncertainty of covariate multiplier values. hold_out_integrand ------------------ This string contains a space separate list of integrand names. These integrands are held out from all the fits except for the :ref:`no_ode_fit-name` . The no_ode_fit is used to initialize the rates. You can use this option to hold out direct measurements of the rates that are only intended to help with the initialization (are not real data). The following is a list of the rates and corresponding integrand that is a direct measurement of the rate: .. csv-table:: :widths: auto :header-rows: 1 Rate,Integrand iota,Sincidence rho,remission chi,mtexcess The default value for *hold_out_integrand* is the empty string; i.e., all of the data is real data and is included in the fits. max_abs_effect -------------- This float option specifies an extra bound on the absolute value of the covariate multipliers, except for the measurement noise multipliers. To be specific, the bound on the covariate multiplier is as large as possible under the condition *max_abs_effect* <= | *mul_bnd* * ( *cov_value* - *cov_ref* ) | where *mul_bnd* is the non-negative covariate multiplier bound, *cov_value* is a data table value of the covariate, and *cov_ref* is the reference value for the covariate. It is an extra bound because it is in addition to the priors for a covariate multiplier. The default value for this option is 2. max_fit ------- This integer is the maximum number of data values to fit per integrand. If for a particular fit an integrand has more than this number of data values, a subsample of this size is randomly selected. There is an exception to this rule, the three fits for the root node (corresponding to sex equal to female, both and male) use twice this number of values per integrand. This is because the sex covariate multiplier is frozen after the both fit and the other covariate multipliers are frozen of the female and male fits. The default value for *max_fit* is 250. max_fit_parent -------------- If this integer is greater than or equal zero, *max_fit* only applies to the child data for a fit, and *max_fit_parent* is the maximum number of data values for the parent. The default value for *max_fit_parent* is minus one in which case *max_fit* only applies to the all the data for a fit. Note that data corresponding to the parent node will not be used when fitting any of its descendants. max_num_iter_fixed ------------------ This integer is the maximum number of Ipopt iterations to try before giving up on fitting the fixed effects. The default value for *max_num_iter_fixed* is 100. max_number_cpu -------------- This integer is the maximum number of cpus (processes) to use. It must be greater than zero. If it is one, the jobs are run sequentially, more output is printed to the screen, and the program can be cleanly stopped with a control-C. The default value for this option is {xrst_code py} max_number_cpu = max(1, multiprocessing.cpu_count() - 1) {xrst_code} minimum_meas_cv --------------- This float must be non-negative (greater than or equal zero). It specifies a lower bound on the standard deviation for each measured data value as a fraction of the measurement value. The default value for *minimum_meas_cv* is zero. no_ode_ignore ------------- The is a space separated list of rate and integrand names. It specifies which integrands are ignored during a :ref:`no_ode_fit-name` . The priors for the following variables will not be changed by no_ode_fit: #. The rate names in *no_ode_ignore* . #. The covariate multiplies that affect the rates in *no_ode_ignore*. #. The covariate multiplies that affect measurement values for the integrands in *no_ode_ignore* . all ... In the special case where *no_ode_ignore* is ``all`` , the no_ode fit is not run and none of the priors are changed before the :ref:`glossary@root_node` fit. no_ode_fit ---------- If this is true (false) a :ref:`no_ode_fit-name` is (is not) used to get better values for the fixed effects prior means. The default value for *no_ode_fit* is true. number_sample ------------- This is the number of independent samples of the posterior distribution for the fitted variables to generate (for each fit). #. This sampled posterior is used to created priors for the children of the node being fit. #. When splitting, the samples are used to create priors for the same node at the new split covariate values. #. These samples are also used by :ref:`csv.predict-name` to create posterior predictions for any function of the fitted variables. The default value for this option is 20. (You can get 1000 MCMC samples by just repeating each of the 20 independent samples 50 times.) ode_method ---------- This default for *ode_method* is ``iota_pos_rho_zero`` (see below). no_ode ...... The *ode_method* value does not matter for the following integrands: ``Sincidence`` , ``remission`` , ``mtexcess`` , ``mtother`` , ``mtwith`` , ``relrisk`` , ``mulcov_`` *mulcov_id* . If all of your integrands are in the set above, you can use ``no_ode`` as the *ode_method* and avoid having to worry about constraining certain rates to be positive or zero. 2DO ,,, This ode_method does not currently work in the context of csv.fit because csv.fit automatically requests the prevalence integrand for predicting values of pini. This should either be fixed or no_ode should be removed from the possible ode_method values. trapezoidal ........... If *ode_method* is ``trapezoidal`` , a trapezoidal method is used to approximation the ODE solution. Like ``no_ode``, you do not have to worry about constraining certain rates to be positive or zero when using the trapezoidal method. iota_zero_rho_zero .................. If *ode_method* is ``iota_zero_rho_zero`` , the smoothing for *iota* and *rho* must always have lower and upper limit zero. In this case an eigen vector method is used to approximate the ODE solution. iota_pos_rho_zero ................. If *ode_method* is ``iota_pos_rho_zero`` , the smoothing for *iota* must always have lower limit greater than zero and for *rho* lower and upper limit zero. In this case an eigen vector method is used to approximate the ODE solution. iota_zero_rho_pos ................. If *ode_method* is ``iota_zero_rho_pos`` , the smoothing for *rho* must always have lower limit greater than zero and for *iota* lower and upper limit zero. In this case an eigen vector method is used to approximate the ODE solution. iota_pos_rho_pos ................ If *ode_method* is ``iota_pos_rho_pos`` , the smoothing for *iota* and *rho* must always have lower limit greater than zero. In this case an eigen vector method is used to approximate the ODE solution. ode_step_size ------------- This float must be positive (greater than zero). It specifies the step size in age and time to use when solving the ODE. It is also used as the step size for approximating average integrands over age-time intervals. The smaller *ode_step_size*, the more computation is required to approximation the ODE solution and the average integrands. Finer resolution for specific ages can be achieved using the :ref:`csv.fit@Input Files@option_fit.csv@age_avg_split` option. The default value for this option is 10.0. perturb_optimization_scale -------------------------- This is the standard deviation of the log of a random multiplier that perturbs the scaling point; see :ref:`option_all_table@perturb_optimization_scale` . The default value for this option is 0.3. perturb_optimization_start -------------------------- This is the standard deviation of the log of a random multiplier that perturbs the starting point; see :ref:`option_all_table@perturb_optimization_start` . The default value for this option is 0.1. quasi_fixed ----------- If this boolean option is true, a quasi-Newton method is used to optimize the fixed effects. Otherwise a Newton method is used The Newton method uses second derivatives of the objective and hence requires more work per iteration but it can often attain much more accuracy in the final solution. The default value *quasi_fixed* is true. random_seed ----------- This integer is used to seed the random number generator. The default value for this option is {xrst_code py} random_seed = int( time.time() ) {xrst_code} refit_split ----------- #. If this boolean is true, there is a female, male, and both fit at the root level. The both fit is used for the female and male priors. The female and male fits are used for the priors below the root level. #. If *refit_split* is false, There is no female or male fit at the root level and the both fit is used for the priors below the root level. #. The default value for this option is true. Multiplier Freeze ................. If *refit_split* is true, the covariate multipliers are frozen after the sex split; i.e., after the separate female, male fits at the root level. If *refit_split* is false, the covariate multipliers are frozen after the both fit at the root level. root_node_name -------------- This string is the name of the root node. The default for *root_node_name* is the top root of the entire node tree. Only the root node and its descendants will be fit. Sometimes it is useful to set :ref:`csv.fit@max_node_depth` to zero and change *root_node_name* to a particular node that the cascade is having trouble fitting. This can greatly speed up model building. root_node_sex ------------- This is either ``female`` , ``male`` , or ``both``. If it is ``both``, then the ``female`` and ``male`` directories occur directory below the directory for the root node; i.e., the sexes are split just after fitting the root node.. If it is not ``both``, there is no ``female`` or ``male`` directory directly below the directory for the root node and all of the fits are for the *root_node_sex* . sample_method ------------- This string specifies the :ref:`option_all_table@sample_method` . It must be ``asymptotic`` , ``censor_asymptotic`` or ``simulate`` 'and it's default value is ``asymptotic`` . shared_memory_prefix -------------------- This string is used added to the front of the name of the shared memory objects used to run the cascade in parallel. No two cascades can run at the same time with the same shared memory prefix. If a cascade does not terminate cleanly, you may have to clear the shared memory before you can run it again; see :ref:`clear_shared-name` . The default value for this option is your user name ($USER) with spaces replaced by underbars. If the USER environment variable is not defined, the value ``none`` is used for this default. tolerance_fixed --------------- is the tolerance for convergence of the fixed effects optimization problem. This is relative to one and its default value is 1e-4. node.csv ======== This file has the same description as the simulate :ref:`csv.simulate@Input Files@node.csv` file. covariate.csv ============= This csv file has the same description as the simulate :ref:`csv.simulate@Input Files@covariate.csv` file. Compression ----------- The :ref:`csv.covariate_same-name` routine is used to detect when two (node, sex) pairs have the same values for a covariate. In addition, csv.fit detects when a covariate is constant with respect to age or time or both. If many (node_name, sex) pairs have the same values for a covariate, or do not depend on age or time, this can result in a large savings in the size of the root node database and the amount of memory required by dismod_at. This depends on the values you choose in covariate.csv. The following summary of this savings is printed when csv.fit is run:: csv.fit: create_root_database: covariate counts number (node, sex, covariate) combinations = ... number of corresponding weights = ... number that are constant w.r.t. age = ... number that are constant w.r.t. time = ... number that are constant w.r.t. both = ... population ---------- If this table has a covariate called ``population`` , it is also used to weight the data as a function of age and time; e.g., see :ref:`csv.population-name` . This function is different for each sex and location. #. The :ref:`csv.simulate-name` routine does not yet do this data weighting. #. No population weighting is used during the predictions in :ref:`csv.predict@Output Files@fit_predict.csv` because these predictions are for a single (age, time) point and not a rectangular (age, time) region. Both Sexes .......... The population weighting, and covariate value, for data with sex equal to ``both`` is the average of the ``female`` and ``male`` populations. One might think the ``both`` population would be the sum of the female and male populations but this would make the population covariate different than all the other covariates (which use the average of the female and male values for both). fit_goal.csv ============ If a :ref:`csv.simulate@Input Files@node.csv@node_name` is in this table, and the node is a descendant of the root node, it will be included in the fit. All the ancestors of goal nodes, up to the root node, are also fit. #. This is different from the :ref:`glossary@fit_goal_set` which only contains nodes that are descendants of the root node. #. A fit_goal.csv file that only has its header line is the same as one that contain all the nodes in the node table. #. If you only have one node in this file, at_cascade will do a drill from the root node to the goal node. node_name --------- Is the name of a node in the fit goal set. Each such node must be an descendant of the root node. predict_integrand.csv ===================== This is the list of integrands at which predictions are made and stored in :ref:`csv.predict@Output Files@fit_predict.csv` . integrand_name -------------- This string is the name of one of the prediction integrands. You can use the integrand name ``mulcov_0`` , ``mulcov_1`` , ... which corresponds to the first , second , ... covariate multiplier in the mulcov.csv file. {xrst_comment ---------------------------------------------------------------} prior.csv ========= This csv file has the following columns: name ---- is a string contain the name of this prior. No two priors can have the same name. density ------- is one of the following strings: uniform, gaussian, cen_gaussian, log_gaussian laplace, cen_laplace, log_laplace. (Only these densities are included, so far, so that we do not have to worry about the degrees of freedom.) mean ---- is a float containing the mean for the density for this prior (before truncation). If density is uniform, this value is only used for starting and scaling the optimization. This column must appear and its value cannot be empty. std --- is a float containing the standard deviation for the density for this prior (before truncation). If density is uniform, this value is not used and can be empty. If all the densities are uniform, this column is optional. eta --- is a float specifying the offset for the log_gaussian, and log_laplace densities. If the density is not log_gaussian or log_laplace, this value is not used and can be empty. If none of the densities are log_gaussian or log_laplace, this column is optional. lower ----- is a float containing the lower limit for the truncated density for this prior. This column is optional, if it does not appear or its value is empty, there is no lower bound. upper ----- is a float containing the upper limit for the truncated density for this prior. This column is optional, if it does not appear or its value is empty, there is no upper bound. {xrst_comment ---------------------------------------------------------------} parent_rate.csv =============== This file specifies the prior for the root node parent rates. These are no effect rates; i.e., no random or covariate effects are included in these rates. For each value of *rate_name*, this file must have a rectangular grid in *age* and *time* . rate_name --------- is a string containing the name for the non-zero rates (except for omega which is specified by covariate.csv). age --- is a float containing the age for this grid point. time ---- is a float containing the time for this grid point. value_prior ----------- is a string containing the name of the value prior for this grid point. Either *value_prior* or *const_value* must be non-empty but not both. The standard deviation for a value prior is always in the same units as the mean for the prior, even when the prior is log-scaled. dage_prior ---------- is a string containing the name of the dage prior for this grid point. If dage_prior is empty, there is no prior for the forward age difference of this rate at this grid point. This prior cannot be censored. If a dage prior is log-scaled, the standard deviation is for the difference w.r.t age of the offset log transform of the corresponding model variable. Otherwise, the standard deviation is for the difference w.r.t age of the corresponding model variable. dtime_prior ----------- is a string containing the name of the dtime prior for this grid point. If dtime_prior is empty, there is no prior for the forward time difference of this rate at this grid point. This prior cannot be censored. If a dtime prior is log-scaled, the standard deviation is for the difference w.r.t time of the offset log transform of the corresponding model variable. Otherwise, the standard deviation is for the difference w.r.t time of the corresponding model variable. const_value ----------- is a float specifying a constant value for this grid point or the empty string. This is equivalent to the upper and lower limits being equal to this value. Either *const_value* or *value_prior* must be non-empty but not both. {xrst_comment ---------------------------------------------------------------} child_rate.csv ============== This csv file specifies the prior for the child rate effects pini, iota, rho and chi. These are random effects. (The parent and child priors for omega are created automatically using the :ref:`csv.simulate@Input Files@covariate.csv@omega` column in the :ref:`csv.fit@Input Files@covariate.csv` file. ) rate_name --------- this string is the name of this rate and is one of the following: pini, iota, rho, chi . If one of these rates does not appear in child_rate.csv , that rate has not random effects. value_prior ----------- is a string containing the name of the value prior for this child rate effects. The child rate effects are constant in age and time (this is a limitation of the csv.fit). Note that the child rate effects are in log of rate space. In other words, if :math:`u` is a child rate effect and :math:`p(a, t)` is the corresponding parent rate as a function of age, time. The corresponding child rate as a function of age and time :math:`c(a, t)` is .. math:: c(a,t) = \exp(u) p(a,t) {xrst_comment ---------------------------------------------------------------} mulcov.csv ========== This csv file specifies the covariate multipliers. covariate --------- this string is the name of the covariate for this multiplier. The covariate ``one`` is an absolute covariate that is always equal to one and ``sex`` is the splitting covariate ( ``sex`` is sex name in :ref:`csv.module@sex_name2value` ). All the other covariates are specified by :ref:`csv.fit@Input Files@covariate.csv`. If one of these covariates appears in the :ref:`csv.fit@Input Files@option_fit.csv@absolute_covariates` list it is an absolute covariate. The other covariates in covariate.csv are :ref:`relative covariates` . For relative covariates, the average of the covariate (for the current node and sex being fit) is subtracted before it is multiplied by a multiplier. type ---- This string is rate_value, meas_value, or meas_noise. rate_value .......... The multiplier times the covariate affects the rate in the effected column; i.e. the exponential of the product multiplies the rate. meas_value .......... The multiplier times the covariate affects the model for the integrand in the effected column; i.e. the exponential of the product multiplies the model for the integrand. meas_noise .......... The multiplier times the covariate affects the model for the measurement noise for the integrand in the effected column. To be more specific, the product is added to the standard deviation for measurements for the integrand. effected -------- is the name of the integrand or rate affected by this multiplier; see type above. value_prior ----------- is a string containing the name of the value prior for this covariate multiplier. Note that the covariate multipliers are constant in age and time (this is a limitation of the csv.fit). Either *value_prior* or *const_value* must be non-empty but not both. const_value ----------- is a float specifying a constant value for this grid point or the empty string. This is equivalent to the upper and lower limits being equal to this value. Either *value_prior* or *const_value* must be non-empty but not both. {xrst_comment ---------------------------------------------------------------} data_in.csv =========== This csv file specifies the data set with each row corresponding to one data point. Optional Columns ---------------- The following columns are optional and the empty string is used for all the rows of a column that does not appear: meas_std, eta, nu, sample_size. data_id ------- is an :ref:`csv.module@Notation@Index Column` for data_in.csv. This is necessary so that the dismod_at data table data_id values correspond to the data_in.csv data_id values. integrand_name -------------- This string is a dismod_at integrand name; e.g. ``Sincidence``. density_name ------------ This string is one of the following dismod_at density names: .. csv-table:: gaussian, cen_gaussian, log_gaussian, cen_log_gaussian laplace, cen_laplace, log_laplace, cen_log_laplace students, , log_students, binomial,,, node_name --------- This string identifies the node corresponding to this data point. sex --- This string is the sex name for this data point. age_lower --------- This float is the lower age limit for this data row. age_upper --------- This float is the upper age limit for this data row. time_lower ---------- This float is the lower time limit for this data row. time_upper ---------- This float is the upper time limit for this data row. meas_value ---------- This float is the measured value for this data point. meas_std -------- This float is the standard deviation of the measurement noise for this data point. This standard deviation is always in the same units as the data, even when the density is log-scaled. binomial ........ The *meas_std* must be empty when the density is binomial. In this case the standard deviation corresponding to a measurement is a function of the sample size and the model for the mean of the data. This requires that the model for the mean of the data is positive; i.e., greater than zero. eta --- This float is the offset in the log transformation for the log densities (it can be empty if this is not a log density). nu -- This float is the degrees of freedom for the students densities (it can be empty if this is not a students density). sample_size ----------- This float should be empty if the density is not binomial. Otherwise, it the sample size for a binomial distribution (see :ref:`csv.binomial-name` for an example): .. csv-table:: :widths: auto y,is the meas_value for this data n,is the sample size k,is the counts in the binomial distribution; k = y * n . p,is the success rate; p is the mean of y The log of the binomial density function is: .. math:: \log {n \choose k} + k \log(p) + (n-k) \log(1 - p) We suggest using gaussian approximation of the binomial when p * n is greater than 5. This approximation will be faster and less likely to have evaluation issues during the optimization. If you do not have a good idea as to the value of p, uses a gaussian when k = y * n is greater than 5. hold_out -------- This integer is one (zero) if this data point is held out (not held out) from the fit. {xrst_comment ---------------------------------------------------------------} Output Files ************ root.db ======= This is the dismod_at sqlite database corresponding to the root node for the cascade. all_node.db =========== This is the at_cascade sqlite all node database for the cascade. dismod.db ========= 1. There is a subdirectory of the :ref:csv.fit@`fit_dir` with the name of the root node. The ``dismod.db`` file in this directory is the `dismod_at_database`_ corresponding to the fit and predictions for the root node fit for both sexes. 2. The root node directory has a ``female`` and ``male`` subdirectory. These directories contain ``dismod.db`` database for the root node fit of the corresponding sex. 3. For each node between the root node and the :ref:`fit_goal nodes ` , and for the ``female`` and ``male`` sex, there is a directory. This is directly below the directory for its parent node and same sex. It contains the ``dismod.db`` data base for the corresponding fit. .. _dismod_at_database: https://dismod-at.readthedocs.io/latest/database.html option_fit_out.csv ================== This is a copy of :ref:`csv.fit@Input Files@option_fit.csv` with the default filled in for missing values. fit_predict.csv =============== This is the predictions for all of the nodes at the age, time and covariate values specified in covariate.csv. The prediction is done using the optimal variable values. avgint_id --------- Each avgint_id corresponds to a different value for age, time, or integrand in the sam_predict file. The age and time values comes from the covariate.csv file. The integrands come for the predict_integrand.csv file. integrand_name -------------- is the integrand for this sample is equal to the integrand names in predict_integrand.csv avg_integrand ------------- This float is the mode value for the average of the integrand, with covariate and other effects but without measurement noise. node_name --------- is the node name for this sample and cycles through the nodes in covariate.csv. age --- is the age for this prediction and is one of the ages in covariate.csv. time ---- is the time for this prediction and is one of the times in covariate.csv. sex --- is the sex name for this data point; i.e., female, both, or male. covariate_names --------------- The rest of the columns are covariate names and contain the value of the corresponding covariate in covariate.csv. sam_predict.csv =============== This is a sampling of the predictions for all of the nodes at the age, time and covariate values specified in covariate.csv. It has the same columns as fit_predict.csv (see above) plus an extra column named sample_index. sample_index ------------ For each sample_index value, there is a complete set of all the values in the fit_predict.csv table. A different (independent) sample from of the model variables from their posterior distribution is used to do the predictions for each sample index. {xrst_end csv.fit}