csv.predict_descend

View page source

Example Predicting From Self and a Descendant

iota_true

There are only two non-zero rate in this model, omega and the parent value for iota (the model for incidence).

omega_true = 0.01
iota_true  = 0.01

node_dict

Keys are nodes and values are corresponding parent node:

node_dict = {
    'n0' : ''   ,
    'n1' : 'n0' ,
    'n2' : 'n0' ,
}

age_grid, time_grid

The only rate for this example is iota and it is constant in age and time so we use a very simple age and time grid:

age_grid   = [0.0, 100.0]
time_grid  = [1980.0, 2020.0]

number_sample

This is the number of samples of the posterior distribution to generate for each successful fit. Note that each sample includes values for all the dismod_at model variables-name . For this example there is only one model variable iota .

number_sample = 4000

descendant_std_factor

This is the descendant_std_factor . This example checks that this factor is (is not) used when predicting for the node that was fit (for a descendant of the node that was fit).

descendant_std_factor = 2.0

random_seed

We us the current time in seconds to seed the random number generator:

import time
random_seed = int( time.time() )

n_data, std_data

There are n_data simulated measurements. These measurements are a Gaussian with mean iota_true and standard deviation std_data :

n_data = 1
std_data = iota_true * 0.2

input_file

Input CSV files that are placed in the fit directory:

input_file = dict()

predict_integrand.csv

All the measurements and predictions for this example are for Sincidence; i.e.. a direct measurement of iota:

input_file['predict_integrand.csv']  = 'integrand_name\n'
input_file['predict_integrand.csv'] += 'Sincidence\n'

There are measurements for n0, but not for n1 or n2. Thus node n0 will predict for itself and for n2 ( n1 is not in fit_goal.csv . )

option_fit.csv

input_file['option_fit.csv']  =  \
"""name,value
refit_split,false
"""
input_file['option_fit.csv'] += f'number_sample,{number_sample}\n'
input_file['option_fit.csv'] += f'random_seed,{random_seed}\n'

option_predict.csv

input_file['option_predict.csv']  =  'name,value\n'
input_file['option_predict.csv']  +=  \
    f'descendant_std_factor,{descendant_std_factor}\n'

fit_goal.csv

There will be no fit or predictions for n1:

input_file['fit_goal.csv']  = 'node_name\n'
input_file['fit_goal.csv'] += 'n2'

node.csv

input_file['node.csv'] = \
'node_name,parent_name\n'
for node_name in node_dict :
    parent_name = node_dict[node_name]
    input_file['node.csv'] += f'{node_name},{parent_name}\n'

covariate.csv

input_file['covariate.csv'] = 'node_name,sex,age,time,omega\n'
for node_name in [ 'n0', 'n1', 'n2' ] :
    for sex in [ 'female', 'male' ] :
        for age in age_grid :
            for time in time_grid :
                row   = f'{node_name},{sex},{age},{time},{omega_true}\n'
                input_file['covariate.csv'] += row

prior.csv

uniform_eps_0.1 is uniform on the interval [eps,.1] where eps = 1e-6, the mean, which is only used for initialization, is 0.03.

input_file['prior.csv'] = \
"""name,density,mean,std,eta,lower,upper
uniform_eps_0.1,uniform,0.01,,,1e-6,0.1
"""

parent_rate.csv

The parent rate for iota is has a uniform_eps_0.1 prior and is constant w.r.t. age and time:

input_file['parent_rate.csv'] = \
"""rate_name,age,time,value_prior,dage_prior,dtime_prior,const_value
iota,0.0,0.0,uniform_eps_0.1,,,
"""

child_rate.csv

All the child rates (rate random effects) are zero

input_file['child_rate.csv'] = 'rate_name,value_prior\n'

mulcov.csv

There are no covariate multipliers

input_file['mulcov.csv'] = 'covariate,type,effected,value_prior,const_value\n'

Hessian of Likelihood

For this example, iota is the only model variable (that does not have equal lower and upper bounds). Let \(m\) be n_data , \(sigma\) be std_data, and \(y_i\) be the measurement values. The negative log likelihood for the n0 fit is:

\[L( \iota , y ) = \frac{1}{2} \sum_{i=0}^{m} \log \left( 2 \pi \sigma^2 \right) + \left( \frac{y_i - \iota }{\sigma} \right)^2\]

The Hessian with respect to iota ( \(\iota\) ) is

\[\frac{ \partial^2 }{\partial \iota^2} L = \frac{m}{ \sigma^2 }\]

Rest of Source Code

Below is the source code, except for the settings above.

#
# os, sys, copy
import os
import sys
import copy
#
# random
import random
random.seed(random_seed)
#
# at_cascade
current_directory = os.getcwd()
if os.path.isfile( current_directory + '/at_cascade/__init__.py' ) :
    sys.path.insert(0, current_directory)
import at_cascade
#
# fit
def create_input_files(fit_dir) :
    #
    # csv files in input_file
    for name in input_file :
        file_name = f'{fit_dir}/{name}'
        file_ptr  = open(file_name, 'w')
        file_ptr.write( input_file[name] )
        file_ptr.close()
    #
    # data_table
    data_table = list()
    data_id    = -1
    age        = 50.0
    time       = 2000.0
    row = {
        'node_name'      : 'n0',
        'integrand_name' : 'Sincidence',
        'hold_out'       : 0,
        'density_name'   : 'gaussian',
        'age_lower'      : age,
        'age_upper'      : age,
        'time_lower'     : time,
        'time_upper'     : time,
        'eta'            : None,
        'nu'             : None,
        'meas_std'       : std_data,
    }
    for i in range(n_data) :
        if i % 2 == 0 :
            sex = 'female'
        else :
            sex = 'male'
        data_id          += 1
        row['data_id']    = data_id
        row['sex']        = sex
        row['meas_value'] = random.normalvariate( iota_true, std_data )
        data_table.append( copy.copy( row ) )
    #
    # data_in.csv
    at_cascade.csv.write_table(
        file_name = f'{fit_dir}/data_in.csv' ,
        table     = data_table ,
    )
#
# main
def main() :
    #
    # fit_dir
    fit_dir = 'build/example/csv/fit'
    at_cascade.empty_directory(fit_dir)
    #
    # create_input_files
    create_input_files(fit_dir)
    #
    # at_cascade.csv.fit
    at_cascade.csv.fit(fit_dir)
    #
    # at_cascade.csv.predict
    at_cascade.csv.predict(fit_dir)
    #
    # fit
    file_name = f'{fit_dir}/fit_predict.csv'
    fit_predict_table = at_cascade.csv.read_table(file_name);
    fit               = dict()
    for row in fit_predict_table :
        avgint_id = int( row['avgint_id'] )
        node_name = row['node_name']
        sex       = row['sex']
        key       = (avgint_id, node_name, sex)
        assert key not in fit
        fit[key]  = float( row['avg_integrand']  )
    #
    # sam
    file_name = f'{fit_dir}/sam_predict.csv'
    sam_predict_table = at_cascade.csv.read_table(file_name);
    sam               = dict()
    for row in sam_predict_table :
        avgint_id = int( row['avgint_id'] )
        node_name = row['node_name']
        sex       = row['sex']
        key       = (avgint_id, node_name, sex)
        if key not in sam :
            sam[key] = list()
        sam[key].append(  float( row['avg_integrand']  ) )
    #
    # hessian_like
    hessian_like = n_data / ( std_data * std_data )
    #
    # check predctions for n0
    for avgint_id in range(4) :
        key   = ( avgint_id, 'n0', 'both' )
        sumsq = 0.0
        assert number_sample == len( sam[key] )
        for avg_integrand in sam[key] :
            sumsq += ( avg_integrand - fit[key] )**2
        sam_cov = sumsq / number_sample
        relerr  = 1.0 -  sam_cov * hessian_like
        assert abs( relerr ) < 0.05
    #
    # check predctions for n2
    for avgint_id in range(4) :
        for sex in [ 'female', 'male' ] :
            key   = ( avgint_id, 'n2', sex )
            sumsq = 0.0
            assert number_sample == len( sam[key] )
            for avg_integrand in sam[key] :
                sumsq  += ( avg_integrand - fit[key] )**2
            sam_cov    =  sumsq / number_sample
            cov_factor = descendant_std_factor * descendant_std_factor
            relerr     = 1.0 -  (sam_cov / cov_factor) * hessian_like
            assert abs( relerr ) < 0.05
    #
    #
    print('predict_descend.py: OK')
#
if __name__ == '__main__' :
    main()
#