split_covariate

View page source

Example Using split_reference Table

This example splits the analysis by sex. To simplify the example everything is constant w.r.t. age and time.

Nodes

The following is a diagram of the node tree for this example. The root_node is n0, the fit_goal_set and the leaf nodes are {n3, n4, n5, n6}:

               n0
       /-----/\-----\
    n1              n2
   /  \            /  \
n3    n4        n5    n6

fit_goal_set

fit_goal_set = { 'n3', 'n4', 'n5', 'n6' }

Rates

The only non-zero dismod_at rates for this example are iota.and omega.

Splitting Covariate

This cascade is set up to split by the sex covariate at level zero. Both the refit_split true and false cases are run. The option_all_table for this example is:

option_all            = {
    'result_dir':                 'build/example',
    'root_node_name':             'n0',
    'root_split_reference_name':  'both',
    'split_covariate_name':       'sex',
    'shift_prior_std_factor':      1e3,
}
option_all['root_database'] = option_all['result_dir'] + '/root.db'
    if refit_split :
        option_all['refit_split'] = 'true'
    else :
        option_all['refit_split'] = 'false'

The split_reference_table for this example is:

split_reference_table = [
    {'split_reference_name': 'female', 'split_reference_value': 1.0},
    {'split_reference_name': 'both',   'split_reference_value': 2.0},
    {'split_reference_name': 'male',   'split_reference_value': 3.0},
]
split_reference_list = list()
for row in split_reference_table :
    split_reference_list.append( row['split_reference_value'] )

The node_split_table for this example is

node_split_table = [ { 'node_name' :   'n0'} ]

Note that we have used node_name (instead of node_id) here and let create_all_node_db do the conversion to node_id. The cascade computation tree is:

               /-------------n0-------------\
       /---female---\                /----male----\
      n1             n2            n1              n2
   /  \            /  \          /  \            /  \
n3    n4        n5    n6      n3    n4        n5    n6

The sex reference value for the root node (n0) corresponds to both sexes:

root_split_reference_id = 1
assert  \
split_reference_table[root_split_reference_id]['split_reference_name']=='both'

Covariate

There are two covariates for this example, sex and income. Income is the only Relative Covariate:

avg_income = dict()
leaf_node_set     = { 3, 4, 5, 6 }
for node_id in leaf_node_set :
    node_name = 'n' + str(node_id)
    avg_income[node_name] = [ 1.0 - node_id / 10.0, 1.0, 1.0 + node_id / 10.0 ]
# child_list
# children of node 0, 1, 2 in that order
child_list = [ (1,2), (3,4), (5,6) ]
for node_id in [2, 1, 0] :
    avg_list = list()
    for split_reference_id in range(3) :
        avg = 0.0
        for child_id in child_list[node_id] :
            child_name = 'n' + str(child_id)
            avg += avg_income[child_name][split_reference_id]
        avg = avg / len( child_list[node_id] )
        avg_list.append( avg )
    node_name = 'n' + str(node_id)
    #
    avg_income[node_name] = avg_list

alpha

We use alpha for the rate_value covariate multiplier that multipliers income. This multiplier affects the value of iota. The true value for alpha (used which simulating the data) is

alpha_true = - 0.2

Random Effects

There are no random effect for this example.

Simulated Data

rate_true(rate, a, t, n, c)

For rate equal to iota or omega, this is the true value for rate in node n at age a, time t, and covariate values c=[sex,income]. The covariate values are a list in the same order as the covariate table. The values a and t are not used by this function for this example.

def rate_true(rate, a, t, n, c) :
    # true_iota
    true_iota = {
        'n3' : 1e-2,
        'n4' : 2e-2,
        'n5' : 3e-2,
        'n6' : 4e-2
    }
    true_iota['n1'] = (true_iota['n3'] + true_iota['n4']) / 2.9
    true_iota['n2'] = (true_iota['n5'] + true_iota['n6']) / 2.9
    true_iota['n0'] = (true_iota['n1'] + true_iota['n2']) / 2.9
    #
    # effect
    sex    = c[0]
    income = c[1]
    #
    # split_reference_id
    split_reference_id = None
    for (row_id, row) in enumerate(split_reference_table) :
        if row['split_reference_value'] == sex :
            split_reference_id = row_id
    #
    r_income = avg_income[n][split_reference_id]
    effect   = alpha_true * ( income - r_income )
    #
    if rate == 'iota' :
        return true_iota[n] * exp(effect)
    if rate == 'omega' :
        return 2.0 * true_iota[n] * exp(effect)
    return 0.0

y_i

The only simulated integrand for this example is Sincidence which is a direct measurement of iota. This data is simulated without any noise; i.e., the i-th measurement is simulated as y_i = rate_true(‘iota’, None, None, n, [sex, I_i]) where n is the node, sex is the sex covariate value, and I_i is the income for the i-th measurement. The data is modeled as having noise even though there is no simulated noise.

Cases Simulated

Data is simulated for the leaf nodes for female, male sexes; i.e., each n_i is in the set { n3, n4, n5, n6 } and the female, male sexes. Since the data does not have any nose, the data residuals are a measure of how good the fit is for the nodes in the fit_goal_set below the female and male sexes.

Parent Rate Smoothing

This is the iota smoothing used for the fit_node. There are no dage or dtime priors because there is only one age and one time point in the smoothing grid.

Value Prior

The following is the value prior used for the root_node

        {    'name':    'parent_value_prior',
            'density': 'gaussian',
            'lower':   iota_n0 / 10.0,
            'upper':   iota_n0 * 10.0,
            'mean':    iota_n0 ,
            'std':     iota_n0 * 10.0,
            'eta':     iota_n0 * 1e-3
        }

The mean and standard deviation are only used for the root_node. The create_shift_db routine replaces them for other nodes.

Alpha Smoothing

This is the smoothing used for alpha which multiplies the income covariate. There is only one age and one time point in this smoothing so it does not have dage or dtime priors.

Value Prior

The following is the value prior used for this smoothing:

        {    'name':    'alpha_value_prior',
            'density': 'gaussian',
            'lower':   - 10 * abs(alpha_true),
            'upper':   + 10 * abs(alpha_true),
            'std':     + 10 * abs(alpha_true),
            'mean':    0.0,
        }

The mean and standard deviation are only used for the root_node. The create_shift_db routine replaces them for other nodes.

Checking The Fit

The results of the fit are checked by check_cascade_node using the avgint_table that was created by the root_node_db routine. The node_id for each row is replaced by the node_id for the fit being checked. routine uses these tables to check that fit against the truth.

Child

Title

split_covariate.py

split_covariate: Python Source Code