csv.no_data¶

Use a No Data Example to Understand Priors¶

csv_file¶

This dictionary is used to hold the data corresponding to the csv files for this example:

csv_file = dict()

node.csv¶

For this example the root node, n0, has no children.

csv_file['node.csv'] = \
'''node_name,parent_name
n0,
'''

option_fit.csv¶

This example uses the default value for the options that are not listed below. For example, refit_split is true so that we get predictions for the female / male split at node n0. (The separate female / male fits will abort because there is no data for either of these cases and they are not the root job case.) The number of samples is huge so that the sample covariance is close to the true covariance (for testing).

Name, Value
tolerance_fixed	this is set small, 1e-8, so we can check accuracy.
random_seed	chosen using current seconds reported by python time package.
number_sample	The number of samples of the posterior for each fit

random_seed    = str( int( time.time() ) )
number_sample  = 10000
csv_file['option_fit.csv']  = 'name,value\n'
csv_file['option_fit.csv'] += 'tolerance_fixed,1e-8\n'
csv_file['option_fit.csv'] += f'random_seed,{random_seed}\n'
csv_file['option_fit.csv'] += f'number_sample,{number_sample}\n'

option_predict.csv¶

This example uses the default value for all the options in option_predict.csv. (Future versions of this example will discuss the predictions.)

csv_file['option_predict.csv'] = 'name,value\n'
csv_file['option_predict.csv'] += 'db2csv,true\n'

covariate.csv¶

This example has no covariates

csv_file['covariate.csv'] = \
'''node_name,sex,age,time,omega
n0,female,0,2000,0.02
n0,female,100,2000,0.02
n0,male,0,2000,0.02
n0,male,100,2000,0.02
'''

fit_goal.csv¶

This example only fits the root node n0.

csv_file['fit_goal.csv'] = \
'''node_name
n0
'''

predict_integrand.csv¶

For this example we want to know the values of Sincidence. (Note that Sincidence is a direct measurement of iota.)

csv_file['predict_integrand.csv'] = \
'''integrand_name
Sincidence
'''

iota_mean¶

This is the mean value of iota in the gauss_eps_1 prior:

iota_mean = 0.01

iota_std¶

This is the standard deviation value of iota in the gauss_eps_1 prior:

iota_std = 0.1

diota_std¶

This is the standard deviation of the iota age difference in gauss_0d prior:

diota_std = 0.2

prior.csv¶

We define three priors:

gauss_eps_1	a Gaussian with lower eps, upper 1, iota_mean, iota_std
gauss_0_d	a Gaussian with mean 0 standard deviation diota_std

csv_file['prior.csv'] = \
    'name,lower,upper,mean,std,density\n' \
    f'gauss_eps_1,1e-6,1.0,{iota_mean},{iota_std},gaussian\n' \
    f'gauss_0_d,,,0.0,{diota_std},gaussian\n'

parent_rate.csv¶

The only non-zero rates are omega and iota (omega is known and specified by the covariate.csv file). The model for iota is linear w.r.t age and constant w.r.t. time. Its value prior is gauss_eps_1 and its dage prior is gauss_0_d. It does not have any dtime priors because there are no time differences between grid values.

csv_file['parent_rate.csv'] = \
'''rate_name,age,time,value_prior,dage_prior,dtime_prior,const_value
iota,0.0,2000,gauss_eps_1,gauss_0_d,,
iota,100.0,2000,gauss_eps_1,gauss_0_d,,
'''

child_rate.csv¶

The are no children (hence no random effects) in this example.

csv_file['child_rate.csv'] = \
'''rate_name,value_prior
'''

mulcov.csv¶

There are no covariate multipliers in this example:

csv_file['mulcov.csv'] = \
'''covariate,type,effected,value_prior,const_value
'''

data_in.csv¶

There is no data in this example

header  = 'data_id,integrand_name,node_name,sex,age_lower,age_upper,'
header += 'time_lower,time_upper,meas_value,meas_std,hold_out,'
header += 'density_name,eta,nu'
csv_file['data_in.csv'] = header + '\n'

Negative Log Likelihood¶

The standard deviation in the difference prior above is one. Consider the root job; i.e., the fit for node n0 and both sexes. Two times the negative log likelihood for this example \(2 L(x)\) is:

\[2 L( x ) = - \log( 2 \pi d^2 ) + \left( \frac{ x_1 - x_0 }{ d } \right)^2 + \sum_{j=0}^1 - \log( 2 \pi \sigma^2 ) + \left( \frac{ x_j - \bar{\iota} }{ \sigma } \right)^2\]

where \(x_0\) is iota at age 0, \(x_1\) is iota at age 100, \(\bar{\iota}\) is iota_mean , \(\sigma\) is iota_std , \(d\) is diota_std ,

Gradient¶

The partial of \(L(x)\) with respect to \(x_j\) is:

\[\frac{ \partial L(x)} { \partial x_j } = \frac{x_j - x_{1-j}}{ d^2 } + \frac{ x_j - \bar{\iota} }{ \sigma^2 }\]

Hessian¶

The second partial of \(L(x)\) with respect to \(x_j\) is:

\[\frac{ \partial^2 L(x)} { \partial x_j^2 } = d^{-2} + \sigma^{-2}\]

The cross partial with respect to \(x_0\) and \(x_1\) is:

\[\frac{ \partial^2 L(x)} { \partial x_0 \partial x_1 } = - d^{-2}\]

The Hessian of \(L(x)\) is:

\[\begin{split}H & = \begin{bmatrix} d^{-2} + \sigma^{-2} & -d^{-2} \\ -d^{-2} & d^{-2} + \sigma^{-2} \end{bmatrix} \\ H & = d^{-2} \begin{bmatrix} 1 + ( d / \sigma )^2 & - 1 \\ - 1 & 1 + ( d / \sigma )^2 \end{bmatrix}\end{split}\]

The determinant of \(d^2 H\) is:

\[\det(d^2 H ) = 2 ( d / \sigma )^2 + ( d / \sigma )^4\]

The inverse of \(d^2 H\) is:

\[\begin{split}d^{-2} H^{-1} & = \frac{1}{ 2 ( d / \sigma )^2 + ( d / \sigma )^4 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix} \\ d^{-2} H^{-1} & = \frac{ ( d / \sigma )^2 }{ 2 + ( d / \sigma )^2 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix}\end{split}\]

It follows that:

\[\begin{split}H^{-1} & = \frac{ \sigma^{-2} }{ 2 + ( d / \sigma )^2 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix} \\ H^{-1} & = \frac{ \sigma^{-2} }{ 2 ( \sigma / d )^{-2} + 1 } \begin{bmatrix} ( \sigma / d )^{-2} + 1 & ( \sigma / d )^{-2} \\ ( \sigma / d )^{-2} & ( \sigma / d )^{-2} + 1 \end{bmatrix}\end{split}\]

Covariance of sam_predict.csv¶

The covariance of the samples in o sam_predict.csv is equal to \(H^{-1}\) . Note that as \(\sigma / d\) (i.e. iota_std / diota_std ) gets small, the variance of the estimates for iota gets close to \(\sigma^2\) (i.e., the square of iota_std ) .

Source Code¶#
# main
def main() :
    #
    # fit_dir
    fit_dir = 'build/example/csv'
    at_cascade.empty_directory(fit_dir)
    #
    # write csv files
    for name in csv_file :
        file_name = f'{fit_dir}/{name}'
        file_ptr  = open(file_name, 'w')
        file_ptr.write( csv_file[name] )
        file_ptr.close()
    #
    # fit
    at_cascade.csv.fit(fit_dir)
    #
    # predict
    at_cascade.csv.predict(fit_dir)
    #
    # fit_predict
    # 3 sexes, 2 ages
    file_name   = f'{fit_dir}/fit_predict.csv'
    fit_predict = at_cascade.csv.read_table(file_name)
    assert len( fit_predict ) == 3 * 2
    #
    # row
    for row in fit_predict :
        # Sincidence is the only integerand in predict_integrand.csv
        assert row['integrand_name'] == 'Sincidence'
        # n0 is the only node in fit_goal.csv and it has no ancestors
        assert row['node_name'] == 'n0'
        # Since there is no data, only the root job (n0, both) is fit
        assert row['fit_sex'] == 'both'
        # Predictions are made for all sexes
        assert row['sex'] in [ 'female', 'male', 'both' ]
        # There are two age values in covariate.csv
        assert float(row['age']) in [ 0.0, 100.0 ]
        # There is only on time value in covariate.csv
        assert float(row['time']) == 2000.0
        #
        # rel_error
        iota      = float( row['avg_integrand'] )
        rel_error = 1.0 - iota / iota_mean
        assert abs( rel_error ) < 1e-8
    #
    # sam_predict
    # 3 sexes, 2 ages, number_sample samples
    file_name   = f'{fit_dir}/sam_predict.csv'
    sam_predict = at_cascade.csv.read_table(file_name)
    assert len( sam_predict ) == len( fit_predict ) * number_sample
    #
    # sample_array
    age2variable_index = { 0.0 : 0, 100.0 : 1 }
    sample_array    = numpy.empty( (number_sample, 2) )
    sample_array[:] = numpy.nan
    for row in sam_predict :
        if row['sex'] == 'both' :
            iota           = float( row['avg_integrand'] )
            sample_index   = int( row['sample_index'] )
            age            = float( row['age'] )
            variable_index = age2variable_index[age]
            sample_array[sample_index , variable_index]  = iota
    #
    # sample_cov
    sample_cov = numpy.cov( sample_array , rowvar = False )
    #
    # Hinv
    d2    = 1.0 / ( diota_std * diota_std )
    s2    = 1.0 / ( iota_std * iota_std )
    H     = [ [ d2 + s2 , - d2 ], [ - d2 , d2 + s2 ] ]
    H     = numpy.array(H)
    Hinv  = numpy.linalg.inv(H)
    #
    rel_error = (Hinv - sample_cov) / Hinv
    assert numpy.max( numpy.abs( rel_error ) ) < 0.1
#
#
if __name__ == '__main__' :
    main()
    print('prior_sam: OK')