-----------------------------------------------
lines 13-353 of file: at_cascade/csv/predict.py
-----------------------------------------------

{xrst_begin csv.predict}
{xrst_spell
  avg
  avgint
  cpus
  meas
  pdf
  tru
  std
}

Prediction for a CSV Fit
########################

Prototype
*********
{xrst_literal
    # BEGIN_PREDICT
    # END_PREDICT
}

Example
*******
:ref:`csv.sim_fit_pred-name` .

fit_dir
*******
Same as the csv fit :ref:`csv.fit@fit_dir` .

sim_dir
*******
If this is None, the file
:ref:`csv.predict@Output Files@tru_predict.csv` is not created.
Otherwise, :ref:`csv.simulate@sim_dir` is the directory
used to simulate the data for this fit and the file
tru_predict.csv is created.

start_job_name
**************
Is the name of the job (fit) that the predictions should start at.
This is a node name, followed by a period, followed by a sex.
Only this fit, and its descendants, will be included in the predictions.
If this argument is None, all of the jobs (fits) will be included.

max_job_depth
*************
This is the number of generations below start_job_name that are included;
see :ref:`job_descendant@Node Depth Versus Job Depth`
and note that sex is the :ref:`option_all_table@split_covariate_name` .
If max_job_depth is zero,  only the start job will be included.
If max_job_depth is None,  start job and all its descendants are included;

Input Files
***********

option_predict.csv
==================
This csv file has two columns,
one called ``name`` and the other called ``value``.
The rows of this table are documented below by the name column.
If an option name does not appear, or the corresponding value is empty,
the default value is used for the option.
The final value for each of the options is reported in the file
:ref:`csv.predict@Output Files@option_predict_out.csv` .
Because each option has a default value,
new option are added in such a way that
previous option_predict.csv files are still valid.

db2csv
------
If this boolean option is true,
the dismod_at `db2csv_command`_ is used to generate the csv files
corresponding to each :ref:`csv.fit@Output Files@dismod.db` .
This is only done for (node, sex) pairs that have samples; i.e.,
a successful fit and posterior samples.
If this option is true, the csv files will make it more difficult
to see the tree structure corresponding to the ``dismod.db`` files.
The default value for this option is false .

.. _db2csv_command: https://dismod-at.readthedocs.io/latest/db2csv_command.html

descendant_std_factor
---------------------
This factor scales an ancestor fit posterior samples before predicting
for a descendant job; i.e., (node, sex) pair.
It must be greater than zero and it's default value is 1.
It is only used when predicting for a job that does **not** have samples.
In this case the closest ancestor that does have samples
is used to predict for the (node, sex) pair; see
:ref:`csv.ancestor_fit-name`.
For an example, see :ref:`csv.predict_descend-name` .

float_precision
---------------
This integer is the number of decimal digits of precision to
include for float values in the output csv files.
The default value for this option is 5.

.. _plot_curve: https://dismod-at.readthedocs.io/latest/plot_curve.html

max_number_cpu
--------------
This integer is the maximum number of cpus (processes) to use
This must be greater than zero. If it is one, the jobs are run
sequentially, more output is printed to the screen, and the program
can be cleanly stopped with a control-C.
The default value for this option is
{xrst_code py}
    max_number_cpu = max(1, multiprocessing.cpu_count() - 1)
{xrst_code}

plot
----
The default value for this option is false .
If this boolean option is true,
a ``data_plot.pdf`` and ``rate_plot.pdf`` file is created for each
:ref:`csv.fit@Output Files@dismod.db` database.
This is only done for (node, sex) pairs that have samples; i.e.,
a successful fit and posterior samples.
The data plot includes a maximum of 1,000 randomly chosen points for each
integrand in the predict_integrand.csv file.
The rate plot includes all the non-zero rates.
These are no effect rates; i.e., they are the estimated rate
for this node and sex without any covariate effects.
Predictions with covariate effects can be found in the csv
:ref:`csv.predict@Output Files` .

zero_meas_value
---------------
If this boolean option is true, the
:ref:`csv.fit@Input Files@mulcov.csv@type@meas_value` covariate
multipliers are set to zero during the predictions
(instead of their simulation values, fit, or sample values).
The default value for this option is false .

number_sample_predict
---------------------
This integer option specifies the number of samples generated for each
prediction.  Its default value is the value of
:ref:`csv.fit@Input Files@option_fit.csv@number_sample` in
:ref:`csv.fit@Input Files@option_fit.csv`.
If number_sample_predict does not appear in option_predict.csv,
and number_sample does not appear in option_fit.csv,
the default value for number_sample
is the value used for number_sample_predict.

covariate.csv
=============
Same as the csv fit
:ref:`csv.fit@Input Files@covariate.csv` .

fit_goal.csv
============
Same as the csv fit
:ref:`csv.fit@Input Files@fit_goal.csv` .

option_fit.csv
==============
The value option_fit.csv
:ref:`csv.fit@Input Files@option_fit.csv@refit_split` value is used.

predict_integrand.csv
=====================
This is the list of integrands at which predictions are made
and stored in :ref:`csv.predict@Output Files@fit_predict.csv` .


{xrst_comment ---------------------------------------------------------------}


Output Files
************

option_predict_out.csv
======================
This is a copy of
:ref:`csv.predict@Input Files@option_predict.csv` with the default
filled in for missing values.

fit_predict.csv
===============
#. If :ref:`csv.predict@start_job_name` is None,
   ``fit_predict.csv`` contains the predictions for all the fits.
   These predictions for all of the nodes at the age, time and
   covariate values specified in covariate.csv.
   The prediction is done using the optimal variable values.

#. If :ref:`csv.predict@start_job_name` is not None,
   the predictions are only for jobs at or below the starting job.
   In addition, the predictions are stored below *fit_dir* in the file

        ``predict/fit_``\ *start_job_name*\ ``.csv``

    and not in ``fit_predict.csv`` .


avgint_id
---------
Each avgint_id corresponds to a different value for age, time,
or integrand in the fit_predict file.
The age and time values comes from the covariate.csv file.
The integrands values come from the predict_integrand.csv file
and the covariate multiplier list.

integrand_name
--------------
is the integrand for this sample is equal to the integrand names
in predict_integrand.csv
The integrand names ``mulcov_0`` , ``mulcov_1`` , ...
corresponds to the first , second , ...
covariate multiplier in the csv fit
:ref:`csv.fit@Input Files@mulcov.csv` file.

avg_integrand
-------------
This float is the model value for the average of the integrand,
with covariate and other effects but without measurement noise.

node_name
---------
is the node name for this sample is predicting for.
This cycles through all the nodes in covariate.csv.

sex
---
is the sex, female, both, or male, that the predictions are for.

fit_node_name
-------------
is the node name corresponding to the fit, and samples, that was used
to do these predictions.
This identifies the nearest ancestor that had a successful fit and samples.

fit_sex
-------
is the sex corresponding to the fit, and samples, that were used
to do these prediction.

posterior
.........
If *fit_node_name* and *fit_sex* are the same as *node_name* and *sex* ,
the fit and samples succeeded for this *node_name* and *sex* and
this row contains a posterior prediction for this *node_name* and *sex* .

prior
.....
If *fit_node_name* is not the same as *node_name* ,
or *fit_sex* is not the same as *sex* ,
this row contains a prior prediction for this *node_name* and *sex* .
The pair ( *fit_node_name* , *fit_sex* ) correspond to
the closest ancestor fit that was successful.

age
---
is the age for this prediction and is one of
the ages in covariate.csv.

time
----
is the time for this prediction and is one of
the times in covariate.csv.

covariate_names
---------------
The rest of the columns are covariate names and contain the value
of the corresponding covariate in
:ref:`csv.fit@Input Files@covariate.csv` .

tru_predict.csv
===============
If :ref:`csv.predict@sim_dir` is None, this file is not created.
Otherwise, this file contains the predictions for the model variables
corresponding to the simulation.
It is similar to :ref:`csv.predict@Output Files@fit_predict.csv`
with the following differences:

#. The first line (header line) is the same in this file and
   fit_predict.csv.
#. If the other lines, in both files, are sorted by
   ( *node_name* , *avgint_id* ) ,
   the other lines are the same except for the value in the
   avg_integrand column.
#. The model variables and true values, are for the
   *fit_node_name* and *fit_sex* . Hence this does not really represent truth
   unless these are the same as *node_name* and *sex* .

sam_predict.csv
===============
This is a sampling of the predictions,
using the posterior distribution of the model variables:
It is the same as ref:`csv.predict@Output Files@fit_predict.csv`
with the following differences:

#. The first line (header line) is the same in this file and
   fit_predict.csv except that sam_predict.csv has an extra column
   named sample_index.

#. Suppose that the other lines in sam_predict.csv and fit_predict.csv
   are sorted by ( *node_name* , *avgint_id* ) .

#. Let *n_sample* be the number of other lines in sam_predict.csv divided by
   the number of other lines in fit_predict.csv.

#. For each line in fit_predict.csv (not counting the header line),
   there are *n_sample* lines in sam_predict.csv,
   that are the same as the line in fit_predict.csv except for the value in the
   avg_integrand column and the extra sample_index column in
   sam_predict.csv.

#. If :ref:`option_all_table@sample_method` is asymptotic,
   model variables for each sample are Gaussian correlated with mean equal to
   the optimal value and variance equal to the asymptotic approximation.

#. If :ref:`option_all_table@sample_method` is censor_asymptotic,
   model variables are the same as for asymptotic expect that values above
   (below) their upper bound (lower bound) are converted to the corresponding
   bound.

#. If :ref:`option_all_table@sample_method` is simulate,
   the model variables for each sample at the optimal values corresponding
   to an independent data set.

start_job_name
--------------
If *start_job_name* is not None,
the predictions are only for jobs at or below the starting job.
In addition, the predictions are stored below *fit_dir* in the file

    ``predict/sam_``\ *start_job_name*\ ``.csv``

and not in ``sam_predict.csv`` .

sample_index
------------
For each sample_index value, there is a complete set of all the values
in the fit_predict.csv table.
A different (independent) sample from of the model variables
from their posterior distribution is used to do the predictions for
each sample index.

{xrst_end csv.predict}