csv.root_node_sex¶

Start Fitting a a Particular Node and Sex¶

csv_file¶

This dictionary is used to hold the data corresponding to the csv files for this example:

csv_file = dict()

node.csv¶

csv_file['node.csv'] = \
'''node_name,parent_name
n0,
n1,n0
n2,n1
n3,n1
'''

The following is a diagram of this node tree:

option_fit.csv¶

This example uses the default value for all the options in option_fit.csv except for:

The root node name is n1 and root sex is female.
refit_split is set to false
random_seed is chosen using the python time package

csv_file['option_fit.csv']  = \
'''name,value
root_node_name,n1
root_node_sex,female
refit_split,false
tolerance_fixed,1e-8
'''
random_seed    = str( int( time.time() ) )
csv_file['option_fit.csv'] += f'random_seed,{random_seed}\n'

option_predict.csv¶

This example uses the default value for all the options in option_predict.csv.

csv_file['option_predict.csv']  = 'name,value\n'

covariate.csv¶

This example has one covariate called haqi. Other cause mortality, omega, is constant and equal to 0.02. The covariate depends on the node and sex

csv_file['covariate.csv'] = \
'''node_name,sex,age,time,omega,haqi
n0,female,50,2000,0.02,1.0
n1,female,50,2000,0.02,1.0
n2,female,50,2000,0.02,0.5
n3,female,50,2000,0.02,1.5
n0,male,50,2000,0.02,1.2
n1,male,50,2000,0.02,1.2
n2,male,50,2000,0.02,0.7
n3,male,50,2000,0.02,1.7
'''

fit_goal.csv¶

The goal is to fit the model for nodes n2 and n3.

csv_file['fit_goal.csv'] = \
'''node_name
n2
n3
'''

predict_integrand.csv¶

For this example we want to know the values of Sincidence and prevalence for each of the goal nodes. (Note that Sincidence is a direct measurement of iota.)

csv_file['predict_integrand.csv'] = \
'''integrand_name
Sincidence
prevalence
'''

prior.csv¶

We define three priors:

uniform_1_1	a uniform distribution on [ -1, 1 ]
uniform_eps_1	a uniform distribution on [ 1e-6, 1 ]
gauss_01	a mean 0 standard deviation 1 Gaussian distribution

csv_file['prior.csv'] = \
'''name,lower,upper,mean,std,density
uniform_-1_1,-1.0,1.0,0.5,1.0,uniform
uniform_eps_1,1e-6,1.0,0.5,1.0,uniform
gauss_01,,,0.0,1.0,gaussian
'''

parent_rate.csv¶

The only non-zero rates are omega and iota (omega is known and specified by the covariate.csv file). The model for iota is constant (with respect to age and time). Its value prior is uniform_eps_1. It does not have any dage or dtime priors because it is constant (so there are no age or time difference between grid values).

csv_file['parent_rate.csv'] = \
'''rate_name,age,time,value_prior,dage_prior,dtime_prior,const_value
iota,0.0,0.0,uniform_eps_1,,,
'''

child_rate.csv¶

The child rates are random effects that represent the difference between the rate for a node being fit and the rate for one of its child nodes. These random effects are different for each child node. The are constant in age and time so age and time do not appear in child_rate.csv. In this example, when fitting n1, the child nodes are n2 and n3. When fitting n2 and n3, there are no child nodes (no random effects). Our prior for the random effects is gauss_01.

csv_file['child_rate.csv'] = \
'''rate_name,value_prior
iota,gauss_01
'''

mulcov.csv¶

There is one covariate multiplier, it multiplies haqi and affects iota. The root level prior for this multiplier is either uniform_-1,1 or constant and equal to the true value.

true_mulcov_haqi           = 0.5
root_mulcov_prior_constant = True
csv_file['mulcov.csv']  = 'covariate,type,effected,value_prior,const_value\n'
if root_mulcov_prior_constant :
    csv_file['mulcov.csv'] += f'haqi,rate_value,iota,,{true_mulcov_haqi}\n'
else :
    csv_file['mulcov.csv'] += 'haqi,rate_value,iota,uniform_-1_1,\n'

root_mulcov_prior_constant¶

If the root level prior is not constant ( uniform on [-1,+1] ), it is frozen (constant) for the n1 and n2 fits using the value found by the n0 fit. Hence the prior for the n1 and n2 fits has the covariate multiplier constant. On the other hand, the n1 and n2 fit priors for iota have random variation do to the root level fit for the covariate multiplier not being constant.

data_in.csv¶

The data_in.csv file has one point for each combination of node and sex. The integrand is Sincidence (a direct measurement of iota.) The age intervals do not really matter because the true iota for this example is constant. The measurement standard deviation is 1e-4 during the fitting. None of the data is held out. The zero values in the meas_value column below get replaced; see below

header  = 'data_id, integrand_name, node_name, sex, age_lower, age_upper, '
header += 'time_lower, time_upper, meas_value, meas_std, hold_out, '
header += 'density_name, eta, nu'
csv_file['data_in.csv'] = header + \
'''
0, Sincidence, n1, female, 0,  10, 1990, 2000, 0.0000,  1e-4, 0, gaussian, ,
1, Sincidence, n1, male,   0,  10, 1990, 2000, 0.0000,  1e-4, 0, gaussian, ,
2, Sincidence, n2, female, 10, 20, 2000, 2010, 0.0000,  1e-4, 0, gaussian, ,
3, Sincidence, n2, male,   10, 20, 2000, 2010, 0.0000,  1e-4, 0, gaussian, ,
4, Sincidence, n3, female, 20, 30, 2010, 2020, 0.0000,  1e-4, 0, gaussian, ,
5, Sincidence, n3, male,   20, 30, 2010, 2020, 0.0000,  1e-4, 0, gaussian, ,
'''
csv_file['data_in.csv'] = csv_file['data_in.csv'].replace(' ', '')

The measurement value meas_value is 0.0000 above gets replaced the true value for iota with no measurement noise, even though the measurement standard deviation is modeled as 1e-4. See the following code:

        haqi              = node_sex2haqi[(node_name, sex)]
        effect            = true_mulcov_haqi * (haqi - haqi_avg)
        iota              = math.exp(effect) * no_effect_iota
        row['meas_value'] = float_format.format( iota )

Source Code¶#
# main
def main() :
    #
    # fit_dir
    fit_dir = 'build/example/csv'
    at_cascade.empty_directory(fit_dir)
    #
    # write csv files
    for name in csv_file :
        file_name = f'{fit_dir}/{name}'
        file_ptr  = open(file_name, 'w')
        file_ptr.write( csv_file[name] )
        file_ptr.close()
    #
    # node_sex2haqi
    file_name  = f'{fit_dir}/covariate.csv'
    table      = at_cascade.csv.read_table( file_name )
    node_sex2haqi = dict()
    for row in table :
        node_name = row['node_name']
        sex       = row['sex']
        haqi      = float(  row['haqi'] )
        key       = (node_name, sex)
        if key not in node_sex2haqi :
            node_sex2haqi[key] = list()
        node_sex2haqi[key].append( haqi )
    #
    # haqi_avg
    haqi_sum = 0.0
    for key in node_sex2haqi :
        value              = node_sex2haqi[key]
        node_sex2haqi[key] = sum(value) / len(value)
        haqi_sum          += node_sex2haqi[key]
    haqi_avg = haqi_sum / len(node_sex2haqi)
    #
    # data_in.csv
    float_format      = '{0:.5g}'
    no_effect_iota    = 0.1
    file_name         = f'{fit_dir}/data_in.csv'
    table             = at_cascade.csv.read_table( file_name )
    for row in table :
        node_name      = row['node_name']
        sex            = row['sex']
        integrand_name = row['integrand_name']
        assert integrand_name == 'Sincidence'
        #
        # BEGIN_MEAS_VALUE
        haqi              = node_sex2haqi[(node_name, sex)]
        effect            = true_mulcov_haqi * (haqi - haqi_avg)
        iota              = math.exp(effect) * no_effect_iota
        row['meas_value'] = float_format.format( iota )
        # END_MEAS_VALUE
    at_cascade.csv.write_table(file_name, table)
    #
    # fit
    at_cascade.csv.fit(fit_dir)
    #
    # predict
    at_cascade.csv.predict(fit_dir)
    #
    # prefix
    for prefix in [ 'fit' , 'sam' ] :
        #
        # predict_table
        file_name = f'{fit_dir}/{prefix}_predict.csv'
        predict_table = at_cascade.csv.read_table(file_name)
        #
        # node
        for node in [ 'n1', 'n2', 'n3' ] :
            # sex
            for sex in [ 'female', 'both', 'male' ] :
                #
                # sample_list
                sample_list = list()
                for row in predict_table :
                    include = True
                    include = include and row['integrand_name'] == 'Sincidence'
                    include = include and row['node_name'] == node
                    include = include and row['sex'] == sex
                    if not root_mulcov_prior_constant :
                        # Do not include the samples corresponding to priors
                        include = include and row['fit_node_name'] == node
                        include = include and row['fit_sex'] == sex
                    if include :
                        sample_list.append(row)
                if sex != 'female' :
                    assert len(sample_list) == 0
                else :
                    assert len(sample_list) > 0
                #
                if len(sample_list) > 0 :
                    sum_avgint = 0.0
                    for row in sample_list :
                        sum_avgint   += float( row['avg_integrand'] )
                    avgint    = sum_avgint / len(sample_list)
                    haqi      = float( row['haqi'] )
                    effect    = true_mulcov_haqi * (haqi - haqi_avg)
                    iota      = math.exp(effect) * no_effect_iota
                    rel_error = (avgint - iota) / iota
                    #
                    assert haqi == node_sex2haqi[ (node, sex) ]
                    if False :
                        print( node, prefix, rel_error )
                    else :
                        if prefix == 'sam' :
                            assert abs(rel_error) < 1e-2
                        else :
                            assert abs(rel_error) < 1e-4
#
if __name__ == '__main__' :
    main()
    print('root_node_sex.py: OK')