csv.dage_prior

View page source

Remove dage Priors From Child Jobs

csv_file

This dictionary is used to hold the data corresponding to the csv files for this example:

csv_file = dict()

node.csv

For this example the root node, n0, has no children.

csv_file['node.csv'] = \
'''node_name,parent_name
n0,
'''

option_fit.csv

This example uses the default value for the options that are not listed below. The number of samples is huge so that the sample covariance is close to the true covariance (for testing). The child_prior_std_factor is one and the child_prior_dage is false, so that the child job iota variables have the same standard deviations as for the parent job.

Name, Value

tolerance_fixed

this is set small, 1e-8, so we can check accuracy.

random_seed

chosen using current seconds reported by python time package.

number_sample

is the number of samples of the posterior for each fit.

child_prior_dage

is false so there are no dage priors in child jobs.

child_prior_std_factor

is one so child job has same std.

random_seed    = str( int( time.time() ) )
number_sample  = 10000
csv_file['option_fit.csv']  = 'name,value\n'
csv_file['option_fit.csv'] += 'tolerance_fixed,1e-8\n'
csv_file['option_fit.csv'] += f'random_seed,{random_seed}\n'
csv_file['option_fit.csv'] += f'number_sample,{number_sample}\n'
csv_file['option_fit.csv'] += 'child_prior_dage,false\n'
csv_file['option_fit.csv'] += 'child_prior_std_factor,1.0\n'

option_predict.csv

This example uses the default value for all the options in option_predict.csv. (Future versions of this example will discuss the predictions.)

csv_file['option_predict.csv'] = 'name,value\n'
csv_file['option_predict.csv'] += 'db2csv,true\n'

covariate.csv

This example has no covariates

csv_file['covariate.csv'] = \
'''node_name,sex,age,time,omega
n0,female,0,2000,0.02
n0,female,100,2000,0.02
n0,male,0,2000,0.02
n0,male,100,2000,0.02
'''

fit_goal.csv

This example only fits the root node n0.

csv_file['fit_goal.csv'] = \
'''node_name
n0
'''

predict_integrand.csv

For this example we want to know the values of Sincidence. (Note that Sincidence is a direct measurement of iota.)

csv_file['predict_integrand.csv'] = \
'''integrand_name
Sincidence
'''

iota_mean

This is the mean value of iota in the gauss_eps_1 prior:

iota_mean = 0.01

iota_std

This is the standard deviation value of iota in the gauss_eps_1 prior:

iota_std = 0.1

diota_std

This is the standard deviation of the iota age difference in gauss_0d prior:

diota_std = 0.01

prior.csv

We define three priors:

gauss_eps_1

a Gaussian with lower eps, upper 1, iota_mean, iota_std

gauss_0_d

a Gaussian with mean 0 standard deviation diota_std

csv_file['prior.csv'] = \
   'name,lower,upper,mean,std,density\n' \
   f'gauss_eps_1,1e-6,1.0,{iota_mean},{iota_std},gaussian\n' \
   f'gauss_0_d,,,0.0,{diota_std},gaussian\n'

parent_rate.csv

The only non-zero rates are omega and iota (omega is known and specified by the covariate.csv file). The model for iota is linear w.r.t age and constant w.r.t. time. Its value prior is gauss_eps_1 and its dage prior is gauss_0_d. It does not have any dtime priors because there are no time differences between grid values.

csv_file['parent_rate.csv'] = \
'''rate_name,age,time,value_prior,dage_prior,dtime_prior,const_value
iota,0.0,2000,gauss_eps_1,gauss_0_d,,
iota,100.0,2000,gauss_eps_1,gauss_0_d,,
'''

child_rate.csv

The are no child nodes (hence no random effects) in this example.

csv_file['child_rate.csv'] = \
'''rate_name,value_prior
'''

mulcov.csv

There are no covariate multipliers in this example:

csv_file['mulcov.csv'] = \
'''covariate,type,effected,value_prior,const_value
'''

data_in.csv

There is one data point for each sex so that the child jobs get fit (otherwise the child fit jobs would abort). The data for are a direct measurement of chi (mtexcess) so it will not affect the value of iota in the fit.

header  = 'data_id,integrand_name,node_name,sex,age_lower,age_upper,'
header += 'time_lower,time_upper,meas_value,meas_std,hold_out,'
header += 'density_name,eta,nu'
csv_file['data_in.csv']  = header +  \
'''
0,mtexcess,n0,female,50,50,2000,2000,0.01,0.01,0,gaussian,,
1,mtexcess,n0,male,50,50,2000,2000,0.01,0.01,0,gaussian,,
'''

Negative Log Likelihood

The standard deviation in the difference prior above is one. Consider the root job; i.e., the fit for node n0 and both sexes. Two times the negative log likelihood for this example \(2 L(x)\) is:

\[2 L( x ) = - \log( 2 \pi d^2 ) + \left( \frac{ x_1 - x_0 }{ d } \right)^2 + \sum_{j=0}^1 - \log( 2 \pi \sigma^2 ) + \left( \frac{ x_j - \bar{\iota} }{ \sigma } \right)^2\]

where \(x_0\) is iota at age 0, \(x_1\) is iota at age 100, \(\bar{\iota}\) is iota_mean , \(\sigma\) is iota_std , \(d\) is diota_std ,

Gradient

The partial of \(L(x)\) with respect to \(x_j\) is:

\[\frac{ \partial L(x)} { \partial x_j } = \frac{x_j - x_{1-j}}{ d^2 } + \frac{ x_j - \bar{\iota} }{ \sigma^2 }\]

Hessian

The second partial of \(L(x)\) with respect to \(x_j\) is:

\[\frac{ \partial^2 L(x)} { \partial x_j^2 } = d^{-2} + \sigma^{-2}\]

The cross partial with respect to \(x_0\) and \(x_1\) is:

\[\frac{ \partial^2 L(x)} { \partial x_0 \partial x_1 } = - d^{-2}\]

The Hessian of \(L(x)\) is:

\[\begin{split}H & = \begin{bmatrix} d^{-2} + \sigma^{-2} & -d^{-2} \\ -d^{-2} & d^{-2} + \sigma^{-2} \end{bmatrix} \\ H & = d^{-2} \begin{bmatrix} 1 + ( d / \sigma )^2 & - 1 \\ - 1 & 1 + ( d / \sigma )^2 \end{bmatrix}\end{split}\]

The determinant of \(d^2 H\) is:

\[\det(d^2 H ) = 2 ( d / \sigma )^2 + ( d / \sigma )^4\]

The inverse of \(d^2 H\) is:

\[\begin{split}d^{-2} H^{-1} & = \frac{1}{ 2 ( d / \sigma )^2 + ( d / \sigma )^4 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix} \\ d^{-2} H^{-1} & = \frac{ ( d / \sigma )^2 }{ 2 + ( d / \sigma )^2 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix}\end{split}\]

It follows that:

\[\begin{split}H^{-1} & = \frac{ \sigma^{-2} }{ 2 + ( d / \sigma )^2 } \begin{bmatrix} 1 + ( d / \sigma )^2 & 1 \\ 1 & 1 + ( d / \sigma )^2 \end{bmatrix} \\ H^{-1} & = \frac{ \sigma^{-2} }{ 2 ( \sigma / d )^{-2} + 1 } \begin{bmatrix} ( \sigma / d )^{-2} + 1 & ( \sigma / d )^{-2} \\ ( \sigma / d )^{-2} & ( \sigma / d )^{-2} + 1 \end{bmatrix}\end{split}\]

Covariance of sam_predict.csv

The covariance of the samples in o sam_predict.csv is equal to \(H^{-1}\) . Note that as \(\sigma / d\) (i.e. iota_std / diota_std ) gets small, the variance of the estimates for iota gets close to \(\sigma^2\) (i.e., the square of iota_std ) .

Source Code

#
# main
def main() :
   #
   # fit_dir
   fit_dir = 'build/example/csv'
   at_cascade.empty_directory(fit_dir)
   #
   # write csv files
   for name in csv_file :
      file_name = f'{fit_dir}/{name}'
      file_ptr  = open(file_name, 'w')
      file_ptr.write( csv_file[name] )
      file_ptr.close()
   #
   # fit
   at_cascade.csv.fit(fit_dir)
   #
   # predict
   at_cascade.csv.predict(fit_dir)
   #
   # fit_predict
   # fit_sex == both:   3 sexes, 2 ages
   # fit_sex == female: 1 sex,   2 ages
   # fit_sex == male:   1 sex,   2 ages
   file_name   = f'{fit_dir}/fit_predict.csv'
   fit_predict = at_cascade.csv.read_table(file_name)
   assert len( fit_predict ) == 5 * 2
   #
   # row
   for row in fit_predict :
      # Sincidence is the only integerand in predict_integrand.csv
      assert row['integrand_name'] == 'Sincidence'
      # n0 is the only node in fit_goal.csv and it has no ancestors
      assert row['node_name'] == 'n0'
      # Since there is no data, only the root job (n0, both) is fit
      assert row['fit_sex'] in [ 'female', 'male', 'both' ]
      # Predictions are made for all sexes
      assert row['sex'] in [ 'female', 'male', 'both' ]
      # There are two age values in covariate.csv
      assert float(row['age']) in [ 0.0, 100.0 ]
      # There is only on time value in covariate.csv
      assert float(row['time']) == 2000.0
      #
      # rel_error
      iota      = float( row['avg_integrand'] )
      rel_error = 1.0 - iota / iota_mean
      assert abs( rel_error ) < 1e-8
   #
   # sam_predict
   # 3 sexes, 2 ages, number_sample samples
   file_name   = f'{fit_dir}/sam_predict.csv'
   sam_predict = at_cascade.csv.read_table(file_name)
   assert len( sam_predict ) == len( fit_predict ) * number_sample
   #
   # Hinv
   d2    = 1.0 / ( diota_std * diota_std )
   s2    = 1.0 / ( iota_std * iota_std )
   H     = [ [ d2 + s2 , - d2 ], [ - d2 , d2 + s2 ] ]
   H     = numpy.array(H)
   Hinv  = numpy.linalg.inv(H)
   #
   # sex
   for sex in [ 'female', 'male' ] :
      #
      # sample_cov
      sample_cov = dict()
      for fit_sex in [ 'both', sex ] :
         #
         # sample_array
         age2variable_index = { 0.0 : 0, 100.0 : 1 }
         sample_array    = numpy.empty( (number_sample, 2) )
         sample_array[:] = numpy.nan
         for row in sam_predict :
            if row['sex'] == sex and row['fit_sex'] == fit_sex :
               iota           = float( row['avg_integrand'] )
               sample_index   = int( row['sample_index'] )
               age            = float( row['age'] )
               variable_index = age2variable_index[age]
               sample_array[sample_index , variable_index]  = iota
         #
         # sample_cov
         sample_cov[fit_sex] = numpy.cov( sample_array , rowvar = False )
      #
      rel_error = 1.0 - sample_cov[sex] / sample_cov['both']
      assert numpy.max( numpy.abs( rel_error ) ) < 0.1
#
#
if __name__ == '__main__' :
   main()
   print('prior_sam: OK')