---------------------------------------
lines 5-257 of file: xrst/fit_info.xrst
---------------------------------------

{xrst_begin fit_info_theory}
{xrst_spell
    delim
    frobenius
    scipy
    lsq
}

Adjusting Prior to Fit The Information Matrix
#############################################
This method is motivated by the
:ref:`create_shift_db@Problem` with the current method for
creating child job priors.

Simplification
**************
We consider each rate separately because the prior does not allow for
correlation between the different rates.

Notation
********
.. csv-table::
    :header-rows: 1
    :delim: ;

    Notation; Meaning
    :math:`r_{i,j}`       ; value of the rate at the i-th age and j-th time
    :math:`N`             ; number of age points in the rate grid minus 1
    :math:`M`             ; number of time points in the rate grid minus 1
    :math:`\bar{v}_{i,j}` ; mean for the rate value at (i,j)
    :math:`\bar{a}_{i,j}` ; mean for age forward differences at (i,j)
    :math:`\bar{t}_{i,j}` ; mean for time forward differences at (i,j)
    :math:`v_{i,j}`       ; variance for the rate value at (i,j)
    :math:`a_{i,j}`       ; variance for age forward differences at (i,j)
    :math:`t_{i,j}`       ; variance for time forward differences at (i,j)
    :math:`L(r)`          ; the negative log-likelihood for prior
    :math:`V(r)`          ; value contribution to :math:`L(z)`
    :math:`A(r)`          ; age contribution to :math:`L(z)`
    :math:`T(r)`          ; time contribution to :math:`L(z)`

Hessian of Prior Negative Log Likelihood
****************************************

The negative log likelihood for the prior,
as a function of the rate values, is:

.. math::

    V (r) = & \sum_{i,j}
    ( r_{i,j} - \bar{v}_{i,j} )^2 / ( 2 v_{i,j} )
    - \log ( 2 \pi v_{i,j} )
    \\
    A (r) = & \sum_{i < N, j}
    ( r_{i+i,j} - r_{i,j} - \bar{a}_{i,j} )^2 / ( 2 a_{i,j} )
    - \log ( 2 \pi a_{i,j} )
    \\
    T (r) = & \sum_{i, j < M}
    ( r_{i,j+1} - r_{i,j} - \bar{t}_{i,j} )^2 / ( 2 t_{i,j} )
    - \log ( 2 \pi t_{i,j} )
    \\
    L(r) = & V (r) + A (r) + B (r)

The partials of the negative log likelihood for the prior are given by:

..  math::

    \frac{ \partial V }{ \partial r_{i,j} } = &
    ( r_{i,j} - \bar{v}_{i,j} ) / v_{i,j}
    \\
    \frac{ \partial A }{ \partial r_{i,j} } = &
    \begin{cases}
        + ( r_{N,j} - r_{N-1,j} - \bar{a}_{N-1,j} ) / a_{N-1,j} &
        \text{if $i = N$}
        \\
        - ( r_{1,j} - r_{0,j} - \bar{a}_{0,j} ) / a_{0,j} &
        \text{if $i = 0$}
        \\
        + ( r_{i,j} - r_{i-1,j} - \bar{a}_{i-1,j} ) / a_{i-1,j}
        - ( r_{i+1,j} - r_{i,j} - \bar{a}_{i,j} ) / a_{i,j} &
        \text{otherwise}
    \end{cases}
    \\
    \frac{ \partial T }{ \partial r_{i,j} } = &
    \begin{cases}
        + ( r_{i,M} - r_{i,M-1} - \bar{t}_{i,M-1} ) / t_{i,M-1} &
        \text{if $j = M$}
        \\
        - ( r_{i,1} - r_{i,0} - \bar{t}_{i,0} ) / t_{i,0} &
        \text{if $j = 0$}
        \\
        + ( r_{i,j} - r_{i,j-1} - \bar{t}_{i,j-1} ) / t_{i,j-1}
        - ( r_{i,j+1} - r_{i,j} - \bar{t}_{i,j} ) / t_{i,j} &
        \text{otherwise}
    \end{cases}
    \\
    \frac{ \partial L }{ \partial r_{i,j} } = &
    \frac{ \partial V }{ \partial r_{i,j} } +
    \frac{ \partial A }{ \partial r_{i,j} } +
    \frac{ \partial T }{ \partial r_{i,j} }

We use the following notation to simplify the expressions below:

.. math::

    u_{i,j} = v_{i,j}^{-1} ~ , ~
    b_{i,j} = a_{i,j}^{-1} ~ , ~
    s_{i,j} = t_{i,j}^{-1}

The second partials of :math:`V(r)` are given by:

.. math::

    \frac{ \partial^2 V }{ \partial r_{k,\ell} \partial r_{i,j} } =
    \begin{cases}
        u_{i,j}  & \text{if $k=i$ and $\ell=j$}
        \\
        0        & \text{otherwise}
    \end{cases}

The second partials of :math:`A(r)` when :math:`i = N` are given by:

.. math::

    (i = N)
    \frac{ \partial^2 A }{ \partial r_{k,\ell} \partial r_{i,j} } =
    \begin{cases}
        + b_{i-1,j} & \text{if $k=i$ and $\ell=j$}
        \\
        - b_{i-1,j} & \text{if $k=i-1$ and $\ell=j$}
        \\
        0            & \text{otherwise}
    \end{cases}

The second partials of :math:`A(r)` when :math:`i = 0` are given by:

.. math::

    (i = 0)
    \frac{ \partial^2 A }{ \partial r_{k,\ell} \partial r_{i,j} } =
    \begin{cases}
        - b_{i,j} & \text{if $k=i$ and $\ell=j$}
        \\
        + b_{i,j} & \text{if $k=i+1$ and $\ell=j$}
        \\
        0            & \text{otherwise}
    \end{cases}

The second partials of :math:`A(r)` when :math:`0 < i < N` are given by:

.. math::
    (0 < i < N)
    \frac{ \partial^2 A }{ \partial r_{k,\ell} \partial r_{i,j} } =
    \begin{cases}
        - b_{i-1,j} & \text{if $k=i-1$ and $\ell=j$}
        \\
        + b_{i-1,j} + b_{i,j}  & \text{if $k=i$ and $\ell=j$}
        \\
        - b_{i,j} & \text{if $k=i+1$ and $\ell=j$}
        \\
        0            & \text{otherwise}
    \end{cases}


The second partials of :math:`T(r)` are given by:

.. math::

    (j = M)
    \frac{ \partial^2 T }{ \partial r_{k,\ell} \partial r_{i,j} } = &
    \begin{cases}
        + s_{i,j-1} & \text{if $k=i$ and $\ell=j$}
        \\
        - s_{i,j-1} & \text{if $k=i$ and $\ell=j-1$}
        \\
        0            & \text{otherwise}
    \end{cases}
    \\
    (j = 0)
    \frac{ \partial^2 T }{ \partial r_{k,\ell} \partial r_{i,j} } = &
    \begin{cases}
        - s_{i,j} & \text{if $k=i$ and $\ell=j$}
        \\
        + s_{i,j} & \text{if $k=i$ and $\ell=j+1$}
        \\
        0            & \text{otherwise}
    \end{cases}
    \\
    (0 < j < M)
    \frac{ \partial^2 T }{ \partial r_{k,\ell} \partial r_{i,j} } = &
    \begin{cases}
        - s_{i,j-1} & \text{if $k=i$ and $\ell=j-1$}
        \\
        + s_{i,j-1} + s_{i,j}  & \text{if $k=i$ and $\ell=j$}
        \\
        - s_{i,j} & \text{if $k=i$ and $\ell=j+1$}
        \\
        0            & \text{otherwise}
    \end{cases}

The second partials of :math:`L(r)` are given by:

.. math::

    \frac{ \partial^2 L }{ \partial r_{k,\ell} \partial r_{i,j} } =
    \frac{ \partial^2 V }{ \partial r_{k,\ell} \partial r_{i,j} } +
    \frac{ \partial^2 A }{ \partial r_{k,\ell} \partial r_{i,j} } +
    \frac{ \partial^2 T }{ \partial r_{k,\ell} \partial r_{i,j} }

We use :math:`H( b , s )` to denote the Hessian of :math:`L(r)` ,
with respect to :math:`r`,
as a function of the prior parameters
:math:`b = \{ b_{i,j} \}` and :math:`s = \{ s_{i,j} \}` ; i.e.,

.. math::

    H_{i,j}^{k,\ell} (a, b)
    =
    \frac{ \partial^2 L }{ \partial r_{k,\ell} \partial r_{i,j} }


Problem
*******
Given an information matrix :math:`I` ,
determine the parameter matrices :math:`b` and :math:`s` that minimize

.. math::

    \sum_{i,j} \sum_{k,\ell} \left(
        H_{i,j}^{k,\ell} (b, s) - I_{i,j}^{k,\ell}
    \right)^2

subject to a lower bound of zero and a positive upper bound
on the elements of :math:`b` and :math:`s` .

#. This is a linear least square problem subject to lower and upper bounds
   on the variables being optimized. It could be solved using
   ``scipy.optimize.lsq_linear`` .

#. The objective above is the Frobenius norm squared of the difference between
   the approximation and the desired information matrix.

#. The indices where :math:`| k - i| > 1` or :math:`| \ell - j | > 1`
   do not need to be included in the summation.

#. If an element of :math:`b` or :math:`s` is zero at the solution,
   the corresponding term is not included in the prior
   (because its variance is infinite).

#. The positive upper bound corresponds to a lower limit on the
   variances in the prior.

{xrst_end fit_info_theory}