fit_info_theory¶

Adjusting Prior to Fit The Information Matrix¶

This method is motivated by the Problem with the current method for creating child job priors.

Simplification¶

We consider each rate separately because the prior does not allow for correlation between the different rates.

Notation¶

Notation	Meaning
$r_{i,j}$	value of the rate at the i-th age and j-th time
$N$	number of age points in the rate grid minus 1
$M$	number of time points in the rate grid minus 1
$\bar{v}_{i,j}$	mean for the rate value at (i,j)
$\bar{a}_{i,j}$	mean for age forward differences at (i,j)
$\bar{t}_{i,j}$	mean for time forward differences at (i,j)
$v_{i,j}$	variance for the rate value at (i,j)
$a_{i,j}$	variance for age forward differences at (i,j)
$t_{i,j}$	variance for time forward differences at (i,j)
$L(r)$	the negative log-likelihood for prior
$V(r)$	value contribution to $L(z)$
$A(r)$	age contribution to $L(z)$
$T(r)$	time contribution to $L(z)$

Hessian of Prior Negative Log Likelihood¶

The negative log likelihood for the prior, as a function of the rate values, is:

\[\begin{split}V (r) = & \sum_{i,j} ( r_{i,j} - \bar{v}_{i,j} )^2 / ( 2 v_{i,j} ) - \log ( 2 \pi v_{i,j} ) \\ A (r) = & \sum_{i < N, j} ( r_{i+i,j} - r_{i,j} - \bar{a}_{i,j} )^2 / ( 2 a_{i,j} ) - \log ( 2 \pi a_{i,j} ) \\ T (r) = & \sum_{i, j < M} ( r_{i,j+1} - r_{i,j} - \bar{t}_{i,j} )^2 / ( 2 t_{i,j} ) - \log ( 2 \pi t_{i,j} ) \\ L(r) = & V (r) + A (r) + B (r)\end{split}\]

The partials of the negative log likelihood for the prior are given by:

\[\begin{split}\frac{ \partial V }{ \partial r_{i,j} } = & ( r_{i,j} - \bar{v}_{i,j} ) / v_{i,j} \\ \frac{ \partial A }{ \partial r_{i,j} } = & \begin{cases} + ( r_{N,j} - r_{N-1,j} - \bar{a}_{N-1,j} ) / a_{N-1,j} & \text{if $i = N$} \\ - ( r_{1,j} - r_{0,j} - \bar{a}_{0,j} ) / a_{0,j} & \text{if $i = 0$} \\ + ( r_{i,j} - r_{i-1,j} - \bar{a}_{i-1,j} ) / a_{i-1,j} - ( r_{i+1,j} - r_{i,j} - \bar{a}_{i,j} ) / a_{i,j} & \text{otherwise} \end{cases} \\ \frac{ \partial T }{ \partial r_{i,j} } = & \begin{cases} + ( r_{i,M} - r_{i,M-1} - \bar{t}_{i,M-1} ) / t_{i,M-1} & \text{if $j = M$} \\ - ( r_{i,1} - r_{i,0} - \bar{t}_{i,0} ) / t_{i,0} & \text{if $j = 0$} \\ + ( r_{i,j} - r_{i,j-1} - \bar{t}_{i,j-1} ) / t_{i,j-1} - ( r_{i,j+1} - r_{i,j} - \bar{t}_{i,j} ) / t_{i,j} & \text{otherwise} \end{cases} \\ \frac{ \partial L }{ \partial r_{i,j} } = & \frac{ \partial V }{ \partial r_{i,j} } + \frac{ \partial A }{ \partial r_{i,j} } + \frac{ \partial T }{ \partial r_{i,j} }\end{split}\]

We use the following notation to simplify the expressions below:

\[u_{i,j} = v_{i,j}^{-1} ~ , ~ b_{i,j} = a_{i,j}^{-1} ~ , ~ s_{i,j} = t_{i,j}^{-1}\]

The second partials of $V(r)$ are given by:

\[\begin{split}\frac{ \partial^2 V }{ \partial r_{k,\ell} \partial r_{i,j} } = \begin{cases} u_{i,j} & \text{if $k=i$ and $\ell=j$} \\ 0 & \text{otherwise} \end{cases}\end{split}\]

The second partials of $A(r)$ when $i = N$ are given by:

\[\begin{split}(i = N) \frac{ \partial^2 A }{ \partial r_{k,\ell} \partial r_{i,j} } = \begin{cases} + b_{i-1,j} & \text{if $k=i$ and $\ell=j$} \\ - b_{i-1,j} & \text{if $k=i-1$ and $\ell=j$} \\ 0 & \text{otherwise} \end{cases}\end{split}\]

The second partials of $A(r)$ when $i = 0$ are given by:

\[\begin{split}(i = 0) \frac{ \partial^2 A }{ \partial r_{k,\ell} \partial r_{i,j} } = \begin{cases} - b_{i,j} & \text{if $k=i$ and $\ell=j$} \\ + b_{i,j} & \text{if $k=i+1$ and $\ell=j$} \\ 0 & \text{otherwise} \end{cases}\end{split}\]

The second partials of $A(r)$ when $0 < i < N$ are given by:

\[\begin{split}(0 < i < N) \frac{ \partial^2 A }{ \partial r_{k,\ell} \partial r_{i,j} } = \begin{cases} - b_{i-1,j} & \text{if $k=i-1$ and $\ell=j$} \\ + b_{i-1,j} + b_{i,j} & \text{if $k=i$ and $\ell=j$} \\ - b_{i,j} & \text{if $k=i+1$ and $\ell=j$} \\ 0 & \text{otherwise} \end{cases}\end{split}\]

The second partials of $T(r)$ are given by:

\[\begin{split}(j = M) \frac{ \partial^2 T }{ \partial r_{k,\ell} \partial r_{i,j} } = & \begin{cases} + s_{i,j-1} & \text{if $k=i$ and $\ell=j$} \\ - s_{i,j-1} & \text{if $k=i$ and $\ell=j-1$} \\ 0 & \text{otherwise} \end{cases} \\ (j = 0) \frac{ \partial^2 T }{ \partial r_{k,\ell} \partial r_{i,j} } = & \begin{cases} - s_{i,j} & \text{if $k=i$ and $\ell=j$} \\ + s_{i,j} & \text{if $k=i$ and $\ell=j+1$} \\ 0 & \text{otherwise} \end{cases} \\ (0 < j < M) \frac{ \partial^2 T }{ \partial r_{k,\ell} \partial r_{i,j} } = & \begin{cases} - s_{i,j-1} & \text{if $k=i$ and $\ell=j-1$} \\ + s_{i,j-1} + s_{i,j} & \text{if $k=i$ and $\ell=j$} \\ - s_{i,j} & \text{if $k=i$ and $\ell=j+1$} \\ 0 & \text{otherwise} \end{cases}\end{split}\]

The second partials of $L(r)$ are given by:

\[\frac{ \partial^2 L }{ \partial r_{k,\ell} \partial r_{i,j} } = \frac{ \partial^2 V }{ \partial r_{k,\ell} \partial r_{i,j} } + \frac{ \partial^2 A }{ \partial r_{k,\ell} \partial r_{i,j} } + \frac{ \partial^2 T }{ \partial r_{k,\ell} \partial r_{i,j} }\]

We use $H( b , s )$ to denote the Hessian of $L(r)$ , with respect to $r$, as a function of the prior parameters $b = \{ b_{i,j} \}$ and $s = \{ s_{i,j} \}$ ; i.e.,

\[H_{i,j}^{k,\ell} (a, b) = \frac{ \partial^2 L }{ \partial r_{k,\ell} \partial r_{i,j} }\]

Problem¶

Given an information matrix $I$ , determine the parameter matrices $b$ and $s$ that minimize

\[\sum_{i,j} \sum_{k,\ell} \left( H_{i,j}^{k,\ell} (b, s) - I_{i,j}^{k,\ell} \right)^2\]

subject to a lower bound of zero and a positive upper bound on the elements of $b$ and $s$ .

This is a linear least square problem subject to lower and upper bounds on the variables being optimized. It could be solved using scipy.optimize.lsq_linear .
The objective above is the Frobenius norm squared of the difference between the approximation and the desired information matrix.
The indices where $| k - i| > 1$ or $| \ell - j | > 1$ do not need to be included in the summation.
If an element of $b$ or $s$ is zero at the solution, the corresponding term is not included in the prior (because its variance is infinite).
The positive upper bound corresponds to a lower limit on the variances in the prior.

Notation	Meaning
\(r_{i,j}\)	value of the rate at the i-th age and j-th time
\(N\)	number of age points in the rate grid minus 1
\(M\)	number of time points in the rate grid minus 1
\(\bar{v}_{i,j}\)	mean for the rate value at (i,j)
\(\bar{a}_{i,j}\)	mean for age forward differences at (i,j)
\(\bar{t}_{i,j}\)	mean for time forward differences at (i,j)
\(v_{i,j}\)	variance for the rate value at (i,j)
\(a_{i,j}\)	variance for age forward differences at (i,j)
\(t_{i,j}\)	variance for time forward differences at (i,j)
\(L(r)\)	the negative log-likelihood for prior
\(V(r)\)	value contribution to \(L(z)\)
\(A(r)\)	age contribution to \(L(z)\)
\(T(r)\)	time contribution to \(L(z)\)