csv.dage_prior¶

Remove dage Priors From Child Jobs¶

csv_file¶

This dictionary is used to hold the data corresponding to the csv files for this example:

csv_file = dict()

node.csv¶

For this example the root node, n0, has no children.

csv_file['node.csv'] = \
'''node_name,parent_name
n0,
'''

option_fit.csv¶

This example uses the default value for the options that are not listed below. The number of samples is huge so that the sample covariance is close to the true covariance (for testing). The child_prior_std_factor is one and the child_prior_dage is false, so that the child job iota variables have the same standard deviations as for the parent job.

Name, Value
tolerance_fixed	this is set small, 1e-8, so we can check accuracy.
random_seed	chosen using current seconds reported by python time package.
number_sample	is the number of samples of the posterior for each fit.
child_prior_dage	is false so there are no dage priors in child jobs.
child_prior_std_factor	is one so child job has same std.

random_seed    = str( int( time.time() ) )
number_sample  = 10000
csv_file['option_fit.csv']  = 'name,value\n'
csv_file['option_fit.csv'] += 'tolerance_fixed,1e-8\n'
csv_file['option_fit.csv'] += f'random_seed,{random_seed}\n'
csv_file['option_fit.csv'] += f'number_sample,{number_sample}\n'
csv_file['option_fit.csv'] += 'child_prior_dage,false\n'
csv_file['option_fit.csv'] += 'child_prior_std_factor,1.0\n'

option_predict.csv¶

This example uses the default value for all the options in option_predict.csv. (Future versions of this example will discuss the predictions.)

csv_file['option_predict.csv'] = 'name,value\n'
csv_file['option_predict.csv'] += 'db2csv,true\n'

covariate.csv¶

This example has no covariates

csv_file['covariate.csv'] = \
'''node_name,sex,age,time,omega
n0,female,0,2000,0.02
n0,female,100,2000,0.02
n0,male,0,2000,0.02
n0,male,100,2000,0.02
'''

fit_goal.csv¶

This example only fits the root node n0.

csv_file['fit_goal.csv'] = \
'''node_name
n0
'''

predict_integrand.csv¶

For this example we want to know the values of Sincidence. (Note that Sincidence is a direct measurement of iota.)

csv_file['predict_integrand.csv'] = \
'''integrand_name
Sincidence
'''

iota_mean¶

This is the mean value of iota in the gauss_eps_1 prior:

iota_mean = 0.01

iota_std¶

This is the standard deviation value of iota in the gauss_eps_1 prior:

iota_std = 0.1

diota_std¶

This is the standard deviation of the iota age difference in gauss_0d prior:

diota_std = 0.01

prior.csv¶

We define three priors:

gauss_eps_1	a Gaussian with lower eps, upper 1, iota_mean, iota_std
gauss_0_d	a Gaussian with mean 0 standard deviation diota_std

csv_file['prior.csv'] = \
   'name,lower,upper,mean,std,density\n' \
   f'gauss_eps_1,1e-6,1.0,{iota_mean},{iota_std},gaussian\n' \
   f'gauss_0_d,,,0.0,{diota_std},gaussian\n'

parent_rate.csv¶

The only non-zero rates are omega and iota (omega is known and specified by the covariate.csv file). The model for iota is linear w.r.t age and constant w.r.t. time. Its value prior is gauss_eps_1 and its dage prior is gauss_0_d. It does not have any dtime priors because there are no time differences between grid values.

csv_file['parent_rate.csv'] = \
'''rate_name,age,time,value_prior,dage_prior,dtime_prior,const_value
iota,0.0,2000,gauss_eps_1,gauss_0_d,,
iota,100.0,2000,gauss_eps_1,gauss_0_d,,
'''

child_rate.csv¶

The are no child nodes (hence no random effects) in this example.

csv_file['child_rate.csv'] = \
'''rate_name,value_prior
'''

mulcov.csv¶

There are no covariate multipliers in this example:

csv_file['mulcov.csv'] = \
'''covariate,type,effected,value_prior,const_value
'''

data_in.csv¶

There is one data point for each sex so that the child jobs get fit (otherwise the child fit jobs would abort). The data for are a direct measurement of chi (mtexcess) so it will not affect the value of iota in the fit.

header  = 'data_id,integrand_name,node_name,sex,age_lower,age_upper,'
header += 'time_lower,time_upper,meas_value,meas_std,hold_out,'
header += 'density_name,eta,nu'
csv_file['data_in.csv']  = header +  \
'''
0,mtexcess,n0,female,50,50,2000,2000,0.01,0.01,0,gaussian,,
1,mtexcess,n0,male,50,50,2000,2000,0.01,0.01,0,gaussian,,
'''

Negative Log Likelihood¶

The standard deviation in the difference prior above is one. Consider the root job; i.e., the fit for node n0 and both sexes. Two times the negative log likelihood for this example \(2 L(x)\) is:

\[2 L( x ) = - \log( 2 \pi d^2 ) + \left( \frac{ x_1 - x_0 }{ d } \right)^2 + \sum_{j=0}^1 - \log( 2 \pi \sigma^2 ) + \left( \frac{ x_j - \bar{\iota} }{ \sigma } \right)^2\]

where \(x_0\) is iota at age 0, \(x_1\) is iota at age 100, \(\bar{\iota}\) is iota_mean , \(\sigma\) is iota_std , \(d\) is diota_std ,

Gradient¶

The partial of \(L(x)\) with respect to \(x_j\) is:

\[\frac{ \partial L(x)} { \partial x_j } = \frac{x_j - x_{1-j}}{ d^2 } + \frac{ x_j - \bar{\iota} }{ \sigma^2 }\]

Hessian¶

The second partial of \(L(x)\) with respect to \(x_j\) is:

\[\frac{ \partial^2 L(x)} { \partial x_j^2 } = d^{-2} + \sigma^{-2}\]

The cross partial with respect to \(x_0\) and \(x_1\) is:

\[\frac{ \partial^2 L(x)} { \partial x_0 \partial x_1 } = - d^{-2}\]

The Hessian of \(L(x)\) is:

\[\begin{split}H & = \begin{bmatrix} d^{-2} + \sigma^{-2} & -d^{-2} \\ -d^{-2} & d^{-2} + \sigma^{-2} \end{bmatrix} \\ H & = d^{-2} \begin{bmatrix} 1 + ( d / \sigma )^2 & - 1 \\ - 1 & 1 + ( d / \sigma )^2 \end{bmatrix}\end{split}\]

The determinant of \(d^2 H\) is:

\[\det(d^2 H ) = 2 ( d / \sigma )^2 + ( d / \sigma )^4\]

The inverse of \(d^2 H\) is:

\[\begin{split}d^{-2} H^{-1} & = \frac{1}{ 2 ( d / \sigma )^2 + ( d / \sigma )^4 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix} \\ d^{-2} H^{-1} & = \frac{ ( d / \sigma )^2 }{ 2 + ( d / \sigma )^2 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix}\end{split}\]

It follows that:

\[\begin{split}H^{-1} & = \frac{ \sigma^{-2} }{ 2 + ( d / \sigma )^2 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix} \\ H^{-1} & = \frac{ \sigma^{-2} }{ 2 ( \sigma / d )^{-2} + 1 } \begin{bmatrix} ( \sigma / d )^{-2} + 1 & ( \sigma / d )^{-2} \\ ( \sigma / d )^{-2} & ( \sigma / d )^{-2} + 1 \end{bmatrix}\end{split}\]

Covariance of sam_predict.csv¶

The covariance of the samples in o sam_predict.csv is equal to \(H^{-1}\) . Note that as \(\sigma / d\) (i.e. iota_std / diota_std ) gets small, the variance of the estimates for iota gets close to \(\sigma^2\) (i.e., the square of iota_std ) .

Source Code¶#
# main
def main() :
   #
   # fit_dir
   fit_dir = 'build/example/csv'
   at_cascade.empty_directory(fit_dir)
   #
   # write csv files
   for name in csv_file :
      file_name = f'{fit_dir}/{name}'
      file_ptr  = open(file_name, 'w')
      file_ptr.write( csv_file[name] )
      file_ptr.close()
   #
   # fit
   at_cascade.csv.fit(fit_dir)
   #
   # predict
   at_cascade.csv.predict(fit_dir)
   #
   # fit_predict
   # fit_sex == both:   3 sexes, 2 ages
   # fit_sex == female: 1 sex,   2 ages
   # fit_sex == male:   1 sex,   2 ages
   file_name   = f'{fit_dir}/fit_predict.csv'
   fit_predict = at_cascade.csv.read_table(file_name)
   assert len( fit_predict ) == 5 * 2
   #
   # row
   for row in fit_predict :
      # Sincidence is the only integerand in predict_integrand.csv
      assert row['integrand_name'] == 'Sincidence'
      # n0 is the only node in fit_goal.csv and it has no ancestors
      assert row['node_name'] == 'n0'
      # Since there is no data, only the root job (n0, both) is fit
      assert row['fit_sex'] in [ 'female', 'male', 'both' ]
      # Predictions are made for all sexes
      assert row['sex'] in [ 'female', 'male', 'both' ]
      # There are two age values in covariate.csv
      assert float(row['age']) in [ 0.0, 100.0 ]
      # There is only on time value in covariate.csv
      assert float(row['time']) == 2000.0
      #
      # rel_error
      iota      = float( row['avg_integrand'] )
      rel_error = 1.0 - iota / iota_mean
      assert abs( rel_error ) < 1e-8
   #
   # sam_predict
   # 3 sexes, 2 ages, number_sample samples
   file_name   = f'{fit_dir}/sam_predict.csv'
   sam_predict = at_cascade.csv.read_table(file_name)
   assert len( sam_predict ) == len( fit_predict ) * number_sample
   #
   # Hinv
   d2    = 1.0 / ( diota_std * diota_std )
   s2    = 1.0 / ( iota_std * iota_std )
   H     = [ [ d2 + s2 , - d2 ], [ - d2 , d2 + s2 ] ]
   H     = numpy.array(H)
   Hinv  = numpy.linalg.inv(H)
   #
   # sex
   for sex in [ 'female', 'male' ] :
      #
      # sample_cov
      sample_cov = dict()
      for fit_sex in [ 'both', sex ] :
         #
         # sample_array
         age2variable_index = { 0.0 : 0, 100.0 : 1 }
         sample_array    = numpy.empty( (number_sample, 2) )
         sample_array[:] = numpy.nan
         for row in sam_predict :
            if row['sex'] == sex and row['fit_sex'] == fit_sex :
               iota           = float( row['avg_integrand'] )
               sample_index   = int( row['sample_index'] )
               age            = float( row['age'] )
               variable_index = age2variable_index[age]
               sample_array[sample_index , variable_index]  = iota
         #
         # sample_cov
         sample_cov[fit_sex] = numpy.cov( sample_array , rowvar = False )
      #
      rel_error = 1.0 - sample_cov[sex] / sample_cov['both']
      assert numpy.max( numpy.abs( rel_error ) ) < 0.1
#
#
if __name__ == '__main__' :
   main()
   print('prior_sam: OK')