prevalence2iota¶

Example Estimation of iota From Prevalence Data¶

For this example everything is constant in time so the functions below do not depend on time.

Nodes¶

The following is a diagram of the node tree for this example. The root_node is n0, the fit_goal_set is {n1, n5, n6}, and the leaf nodes are {n3, n4, n5, n6}:

               n0
       /-----/\-----\
    n1              n2
   /  \            /  \
n3    n4        n5    n6

fit_goal_set¶

fit_goal_set = { 'n1', 'n5', 'n6' }

Rates¶

The only non-zero dismod_at rates for this example are iota and omega. We use iota(a, n, I) to denote the value for iota as a function of age a, node number n, and income I. We use omega(a, n) to denote the value for omega as a function of age a and node number n.

Covariate¶

There are two covariates for this example is income and one. The reference value for income is the average income corresponding to the fit_node. The one covariate is always equal to 1 and its reference is always zero.

r_n¶

We use r_n for the reference value of income at node n. The code below sets this reference using the name avg_income:

avg_income       = { 'n3':1.0, 'n4':2.0, 'n5':3.0, 'n6':4.0 }
avg_income['n2'] = ( avg_income['n5'] + avg_income['n6'] ) / 2.0
avg_income['n1'] = ( avg_income['n3'] + avg_income['n4'] ) / 2.0
avg_income['n0'] = ( avg_income['n1'] + avg_income['n2'] ) / 2.0

alpha¶

We use alpha for the rate_value covariate multiplier which multiplies income. This multiplier affects the value of iota. The true value for alpha (used which simulating the data) is

alpha_true = - 0.2

gamma¶

We use gamma for the meas_noise covariate multiplier which multiplies one. This multiplier adds to the nose level for prevalence in log space, because the density for the prevalence data is log Gaussian.

Random Effects¶

For each node, there is a random effect on iota that is constant in age and time. Note that the leaf nodes have random effect for the node above them as well as their own random effect.

s_n¶

We use s_n to denote the sum of the random effects for node n. The code below sets this sum using the name sum_random:

size_level1      = 0.2
size_level2      = 0.2
sum_random       = { 'n0': 0.0, 'n1': size_level1, 'n2': -size_level1 }
sum_random['n3'] = sum_random['n1'] + size_level2;
sum_random['n4'] = sum_random['n1'] - size_level2;
sum_random['n5'] = sum_random['n2'] + size_level2;
sum_random['n6'] = sum_random['n2'] - size_level2;

Simulated Data¶

Random Seed¶

The random seed can be used to reproduce results. If the original value of this setting is zero, the clock is used get a random seed. The actual value or random_seed is always printed.

# random_seed = 1629371067
random_seed = 0
if random_seed == 0 :
    random_seed = int( time.time() )
random.seed(random_seed)
print('prevalence2iota: random_seed = ', random_seed)

rate_true(rate, a, t, n, c)¶

For rate equal to iota and omega, this is the true value for rate in node n at age a, time t, and covariate values c. The covariate values are a list in the same order as the covariate table. The values t and c[1] are not used by this function for this example.

def rate_true(rate, a, t, n, c) :
    income = c[0]
    one    = c[1]
    s_n    = sum_random[n]
    r_0    = avg_income['n0']
    r_n    = avg_income[n]
    effect = s_n + alpha_true * ( income - r_0 )
    if rate == 'iota' :
        return (1 + a / 100) * 1e-3 * math.exp(effect)
    if rate == 'omega' :
        return (1 + a / 100) * 1e-2 * math.exp(effect)
    return 0.0

y_i¶

The only simulated integrand for this example is prevalence. This data is simulated without any noise but it is modeled as having noise.

n_i¶

Data is only simulated for the leaf nodes; i.e., each n_i is in the set { n3, n4, n5, n6 }. Since the data does not have any nose, the data residuals are a measure of how good the fit is for the nodes in the fit_goal_set.

a_i¶

For each leaf node, data is generated on the following age_grid:

age_grid = [0.0, 20.0, 40.0, 60.0, 80.0, 100.0 ]

I_i¶

For each leaf node and each age in age_grid, data is generated for the following income_grid:

random_income = False
income_grid   = dict()
for node in [ 'n3', 'n4', 'n5', 'n6' ] :
    max_income  = 2.0 * avg_income[node]
    if random_income :
        n_income_grid = 10
        income_grid[node] = \
            [ random.uniform(0.0, max_income) for j in range(n_income_grid) ]
        income_grid[node] = sorted( income_grid[node] )
    else :
        n_income_grid = 3
        d_income_grid = max_income / (n_income_grid - 1)
        income_grid[node] = [ j * d_income_grid for j in range(n_income_grid) ]

Note that the check of the fit for the nodes in the fit_goal_set expects much more accuracy when the income grid is not chosen randomly.

Omega Constraints¶

The omega_constraint routine is used to set the value of omega in the parent and child nodes.

Parent Rate Smoothing¶

iota¶

This is the smoothing used in the fit_node model for iota. Note that the value part of this smoothing is only used for the root_node. This smoothing uses the age_gird and one time point. There are no dtime priors because there is only one time point.

Value Prior¶

The following is the value prior used for the root_node

    covariate_list = [ avg_income['n0'], None ]
    iota_50        = rate_true('iota', 50.0, None, 'n0', covariate_list)

        {    'name':    'parent_value_prior',
            'density': 'gaussian',
            'lower':   iota_50 / 10.0,
            'upper':   iota_50 * 10.0,
            'mean':    iota_50,
            'std':     iota_50 * 10.0,
            'eta':     iota_50 * 1e-3,
        }

The mean and standard deviation are only used for the root_node. The create_shift_db routine replaces them for other nodes.

dage Prior¶

The following is the dage prior used for the fit_node:

        {    'name':    'prior_iota_dage',
            'density': 'log_gaussian',
            'mean':    0.0,
            'std':     4.0,
            'eta':     iota_50 * 1e-3,
        }

Child Rate Smoothing¶

This is the smoothing used for the random effect for each child of the fit_node. There are no dage or dtime priors because there is only one age and one time point in this smoothing.

Value Prior¶

The following is the value prior used for the children of the fit_node:

        {    'name':    'child_value_prior',
            'density': 'gaussian',
            'mean':    0.0,
            'std':     1.0,
        }

Alpha Smoothing¶

This is the smoothing used for alpha which multiplies the income covariate. There is only one age and one time point in this smoothing so it does not have dage or dtime priors.

Value Prior¶

The following is the value prior used for this smoothing:

        {    'name':    'alpha_value_prior',
            'density': 'gaussian',
            'lower':   - 10 * abs(alpha_true),
            'upper':   + 10 * abs(alpha_true),
            'std':     + 10 * abs(alpha_true),
            'mean':    0.0,
        }

The mean and standard deviation are only used for the root_node. The create_shift_db routine replaces them for other nodes.

Gamma Smoothing¶

This is the smoothing used for gamma which multiplies the one covariate. There is only one age and one time point in this smoothing so it does not have dage or dtime priors. In addition, the value prior has upper and lower limits equal to the constant returned by the lambda function in this smoothing:

    fun = lambda a, t : (1.0, None, None)
    smooth_table.append({
        'name':       'gamma_smooth',
        'age_id':     [0],
        'time_id':    [0],
        'fun':        fun
    })

Checking The Fit¶

The results of the fit are checked by check_cascade_node using the avgint_table that was created by the root_node_db routine. The node_id for each row is replaced by the node_id for the fit being checked. routine uses these tables to check that fit against the truth.

Child	Title
prevalence2iota.py	prevalence2iota: Python Source Code