create_shift_db

View page source

Create Database With Shifted Covariate References

Prototype

# at_cascade.create_shift_db
def create_shift_db(
    all_node_database    ,
    fit_database         ,
    shift_databases      ,
    no_ode_fit           = False,
    job_table            = None,
) :
    assert type(all_node_database) == str
    assert type(fit_database) == str
    assert type(shift_databases) == dict
    assert type(no_ode_fit) == bool
    if no_ode_fit :
        assert job_table == None
    else :
        assert type(job_table) == list

Problem

The value and difference priors are treated as independent. Currently the standard deviations for the difference priors are fixed at their value for the root level. In addition, the standard deviations for the value priors are set using the posterior samples predicted by the parent job fit. The theory section below discusses the problem with this. The following options provide a way to avoid this problem: child_prior_dage , child_prior_dtime , shift_prior_dage , shift_prior_dtime .

Theory

For this discussion we start our indexing at one (recall that python starts its indexing at zero). Suppose that there is only on rate; e.g. iota, no data, and all the priors are Gaussian. We use the following notation for the rate’s prior: (The fit_info_theory page has a more general discussion of the analysis below.)

Notation

Meaning

\(r_{i,j}\)

value of the rate at the i-th age and j-th time

\(N\)

number of age points in the rate grid

\(M\)

number of time points in the rate grid

\(m_v\)

mean for the rate values

\(m_a\)

mean for the rate age differences

\(m_t\)

mean for the rate time values

\(s_v\)

standard deviation (std) of the value prior for the rate

\(s_a\)

std of the age difference prior for the rate

\(s_t\)

std of the time difference prior for the rate

\(L(r)\)

the negative log-likelihood

Note that the means and standard deviations actually depend on the age and time indices (i,j). We have dropped these indices because they can be inferred from the (i,j) indices of \(r\) .

\[\begin{split}2 L(r) = & + \sum_{i,j} \left( \frac{ r_{i,j} - m_v } { s_v } \right)^2 - \log ( 2 \pi s_v^2 ) \\ & + \sum_{i < N , j} \left( \frac{ r_{i+1,j} - r_{i,j} - m_a } { s_a } \right)^2 - \log ( 2 \pi s_a^2 ) \\ & + \sum_{i, j < M} \left( \frac{ r_{i,j+1} - r_{i,j} - m_t } { s_t } \right)^2 - \log ( 2 \pi s_t^2 )\end{split}\]

The partial of \(L(r)\) with respect to \(r_{N,M}\) is

\[\frac{ \partial L}{\partial r_{N,M}} = \frac{ r_{N,M} - m_v } { s_v^2 } + \frac{ r_{N,M} - r_{N-1,M} - m_a } { s_a^2 } + \frac{ r_{N,M} - r_{N,M-1} - m_t } { s_t^2 }\]

The second partial of \(L(r)\) with respect to \(r_{N,M}\) is

\[\frac{ \partial^2 L}{\partial r_{N,M} \partial r_{N,M} } = s_v^{-2} + s_a^{-2} + s_t^{-2}\]

The inverse of the Hessian is the covariance matrix for the estimate of \(r\) . Suppose that we want the standard deviation for \(r_{N,M}\) to be \(\sigma\) . If we approximation the diagonal of the inverse buy the inverse of the diagonal, it follows that

\[\begin{split}\sigma^2 & = \left( s_v^{-2} + s_a^{-2} + s_t^{-2} \right)^{-1} \\ \sigma^{-2} & = s_v^{-2} + s_a^{-2} + s_t^{-2}\end{split}\]

all_node_database

is a python string containing the name of the all_node_db.

fit_database

is a python string containing the name of a dismod_at database. This is a fit_database which has two predict tables (mentioned below). These tables are used to create priors in the child node databases. This argument can’t be None.

fit_node

We use fit_node to refer to the parent node in the dismod_at option table in the fit_database.

sample Table

This table is not used if no_ode_fit is true. If no_ode_fit is false, it contains the results of a dismod_at sample command for both the fixed and random effects.

c_shift_avgint Table

This is the avgint_parent_grid table corresponding to this fit_database.

c_shift_predict_sample Table

This table is not used if no_ode_fit is true. If no_ode_fit is false, it contains the predict table corresponding to a predict sample command using the c_shift_avgint table. Note that the predict_id column name was changed to c_shift_predict_sample_id (which is not the same as sample_id).

c_shift_predict_fit_var Table

This table contains the predict table corresponding to a predict fit_var command using the c_shift_avgint table. Note that the predict_id column name was changed to c_shift_predict_fit_var_id (which is not the same as var_id).

shift_databases

We use the notation shift_name for the keys in this dict.

shift_name

For each shift_name, shift_databases[shift_name] is the name of a input_node_database that is created by this command. The corresponding directory is assumed to already exist.

split_reference_name

If shift_name is a split_reference_name, the node corresponding to this shift database is the fit_node.

Child Node

If shift_name is the node name for a child of fit_node, the child is the node corresponding to this shift database.

Fit Node

If shift_name is the name of the fit_node, the node corresponding to this shift database is the fit_node. This case is used by no_ode_fit to create priors without shifting the covariate references.

Value Priors

If the upper and lower limits are equal, the value priors in fit_database and the shift_databases are effectively the same. Otherwise the mean in the value priors are replaced using the corresponding values in the predict tables in the fit_database. If no_ode_fit is true (false), the standard deviations in the value priors are not replaced (are replaced). Note that if the value prior is uniform, the standard deviation is not used and the mean is only used to initialize the optimization.

dage and dtime Priors

The mean of the dage and dtime priors are replaced using the corresponding difference in the predict tables in the fit_database.

Log Table

There is no log table in the shifted databases.

no_ode_fit

If this argument is true (false) if the fit_database is (is not) the result of a no_ode_fit . If no_ode_fit is false, the sample table and the c_shift_predict_sample table must be in the fit_database. In this case both the means and standard deviations in the value priors are replaced using the results of the fit. Otherwise, only the means are replaced.

job_table

If no_ode_fit is true this argument must be None. Otherwise it is the job_table for this cascade.