\section{A General Non-linear Risk-sensitive Filter}
In this section, we consider a more general non-linear signal model than the
one described in Sec. 2. First, we introduce the signal model. Next, we introduce the measure change technique
that is relevant to this case. We also state the lemma describing the infinite-
dimensional linear recursion. But we do not provide a proof of the lemma or
the theorem regarding how to obtain the optimal state estimate, because they
are only trivial extensions of the proof of Lemma \ref{lem1} and Theorem
\ref{maintheo}.
\subsection{General Non-linear Signal Model}
The general non-linear signal model defined in $(\Omega, \cal F, P)$ is given
by:
\beqa
x_{k+1} = A(x_k, w_{k+1}) \nonumber \\
y_k = C(x_k, v_k) \label{eq:nonlinsm} \eeqa
where $\{x_l\}, l \in \nats$ is a discrete-time stochastic process taking
values in $\reals^d$. $\{w_l\}, l \in \nats$ is a sequence of {\em i.i.d}
random variables in $\reals^n$, where $w_l$ has a density $\psi_l$. $\{v_l\},
l \in \nats$ is a sequence of {\em i.i.d} random variables in $\reals^p$ where
$v_l$ has a density $\phi_l$. We assume $\phi_l, l \in \nats$ is strictly
positive. We also assume $A\;:\;\reals^d \times \reals^n \rightarrow \reals^d$
and $C\;:\;\reals^d \times \reals^p \rightarrow \reals^p$ are measurable
functions where the observation process $\{y_l\}, l \in \nats$ is $\reals^p$
valued and that $x_0$ or its density $\pi_0(x)$ is known.
Along with all these, we will assume that there exists an inverse map
$D\;:\;\reals^d \times \reals^d \rightarrow \reals^n$ such that if $
x_{k+1} = A(x_k, w_{k+1})$, then $w_{k+1} = D(x_{k+1}, x_k)$. Similarly, there exists
an inverse map $G\;:\;\reals^p \times \reals^n \rightarrow \reals^p$ such
that $v_k = G(y_k, x_k)$.
Finally, we require the derivatives $c(x_l, v_l) = \frac {\partial C(x_l, v)}
{\partial v} \mid_{v = v_l}, g(y_l, x_l) = \frac {\partial G(y, x_l)}
{\partial y} \mid_{y = y_l}, l \in \nats$ to be nonsingular.
\begin{remark}
Note that the signal model described in Sec. 2 satisfies all these conditions.
\label{remnlin} \end{remark}
\subsection{Change of Measure}
We define
$$
\bar \Lambda_k = \Pi_{l=0}^k \frac {\phi_l (v_l)}{\phi_l (y_l)} \left| \det [c(x_l, v_l)] \right|^{-1} $$
We start with a new probability measure $\bar P$ where $\{y_l\}, l \in \nats$
is a sequence of {\em i.i.d} random variables with density function $\phi_l$
and the dynamics of $x_l, l \in \nats$ are same as in $P$.
By setting $\frac {dP}{d \bar P} \mid_{{\cal G}_k} = \bar \Lambda_k$, we can recover $P$, the existence of which follows from Kolmogorov's Extension Theorem. Here, ${\cal G}_k$ is the complete filtration generated by $\{x_0, \ldots,
x_k, y_0, \ldots, y_{k-1}\}$.
\begin{lemma}
Under $P$, $\{v_l\}, l \in \nats$ is a sequence of independent random variables having densities $\phi_l$.
\label{nlinind} \end{lemma}
\begin{proof}
Let us define $ y_k \ole (y_k^1 \ldots y_k^p)^{\p}, v_k \ole (v_k^1 \ldots v_k^p)^{\p},
t \ole (t^1 \ldots t^p)^{\p}$.
Then, we can write,
$$
dy_k^1 \ldots dy_k^p = \left| \det [c(x_k, v_k)] \right| dv_k^1 \ldots dv_k^p
$$
Hence,
\begin{eqnarray*}
\lefteqn{ \bar E [\bar \lambda_k | {\cal G}_k ] =
\int_{\reals} \ldots \int_{\reals}\frac {\phi_k (v_k)}{\phi_k (y_k)}
\left| \det [c(x_k, v_k)] \right|^{-1} \phi_k (y_k) dy_k^p \ldots dy_k^1} \\
& = & \int_{\reals} \ldots \int_{\reals}\phi_k (v_k) dv_k^p \ldots dv_k^1 \\
& = & 1 \end{eqnarray*}
Using this fact, we have
\beqa
\lefteqn{ P(v_k \leq t | {\cal G}_k) = E [ I(v_k \leq t | {\cal G}_k)] } \nonumber \\
& = & \frac {\bar E [ \bar \Lambda_k I(v_k \leq t | {\cal G}_k) ]}{ \bar E
[ \bar \Lambda_k | {\cal G}_k ]} \nonumber \\
& = & \bar E [ \bar \lambda_k I(v_k \leq t | {\cal G}_k) ] \nonumber \\
& = & \int_{\reals} \ldots \int_{\reals}I(v_k \leq t) \frac {\phi_k (v_k)}{\phi_k (y_k)} \left| \det [c(x_k, v_k)] \right|^{-1} \phi_k (y_k) dy_k^p \ldots dy_k^1 \nonumber \\
& = & \int_{\reals} \ldots \int_{\reals}I(v_k \leq t) \phi_k (v_k) dv_k^p \ldots dv_k^1 \nonumber \\
& = & \int_{-\infty}^{t^1} \int_{-\infty}^{t^p} \phi_k (v_k)dv_k^p \ldots dv_k^1\eeqa
which completes the proof.
\end{proof}
We will conclude this section with the following definition and lemma:
\begin{definition}
Define $\alpha_k(x)$ as the unnormalized density function such that
\beq
\alpha_k(x)dx = \bar E [\bar \Lambda_{k-1}\theta \exp(\theta
\hat \Psi_{0,k-1}) I(x_k \in dx) | {\cal Z}_{k-1}] \label{eq:alphanlin} \eeq
where $\hat \Psi_{m,n}, {\cal Z}_{k}$ have been defined in Sec.2.
\label{def:alphanlin} \end{definition}
\begin{lemma}
The information state $\alpha_k (x)$ obeys the following recursion
\beqa
\alpha_{k+1} (x) = \frac{1}{\phi_k(y_k)}\int_{\reals^d} \phi_k(G(y_k, z))
\exp (\frac{1}{2} \theta (z - \hat x_k)^{\p} Q_k (z - \hat x_k))
\psi_{k+1}(D(x, z)) \left| \bar J(x, z) \right| \nonumber \\
\left| \det [(c(z, G(y_k, z))] \right|^{-1} \alpha_k(z)dz \label{eq:recnlin} \eeqa
\label{lemnlin1} \end{lemma}
\begin{proof}
Similar to that of Lemma \ref{lem1}. \end{proof}
\subsection{Smoothing}
In this section, we briefly mention the smoothing results. We use Definition
\ref{def:nlinbg} for
the unnormalised density of the smoothed estimate , $\gamma_{k,T}(x)$ and the ad
joint process $\beta_{k,T}(x)$, where
obviously the new measure change technique applies. We state without proof
that Theorem \ref{gammatheonlin} holds in this case. We conclude the section
with the following lemma, the proof of which again is not provided because it is
quite similar to that of Lemma \ref{blemmanlin}.
\begin{lemma}
The process $\beta_{k,T}(x)$ satisfies the following backward recursion
\beqa
\beta_{k,T}(x) & = & \frac {\phi_k (G(y_k,x))}{\phi_k(y_k)} \exp (\frac {\theta}
{2} (x - \hat x_k)^{\p} Q_k (x - \hat x_k)) \left| \det [(c(x, G(y_k, x))]\right
|^{-1} \nonumber \\
& & \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
\int_{\reals^n}\beta_{k+1, T}(\xi) \psi_{k+1}(D(\xi, x)) \left|\frac{\partial D}
{\partial \xi}(\xi, x) \right|
d\xi \label{eq:blemmagnlin} \eeqa
\label{blemmagnlin} \end{lemma}
\section{Limiting Results}
In this section, we will consider the case when the risk-sensitive parameter
$\theta$ approaches $0$. It is easy to see that in that case, we identify the
optimization problem in Sec. 5 as the Kalman Filtering problem and the
equations (\ref{eq:estlin1}), (\ref{eq:estlin2}) and the associated Riccati
equation indeed become the Kalman Filtering equations (as has been pointed out
in \cite{Speyer}).
In the non-linear filtering problem, the form of the recursive estimation
involving the information state described by (\ref{eq:recal0}) approaches the
form derived in \cite{MElliot} as $\theta \rightarrow 0$. There is a minor
technical difference of course, due to the fact that in \cite{MElliot}, the
measure $\alpha_k(x)$ is defined as
$$
\alpha_k(x)dx = \bar E [ \bar \Lambda_k I(x_k \in dx) | {\cal Y}_k ] $$
which is obvious for risk-neutral filtering because the conditional-mean
estimate is also the minimum variance estimate. Accordingly, if we redefine
$\alpha_k(x)$ (for the signal model of Sec.2 ) as
$$
\alpha_k(x)dx = \bar E [ \bar \Lambda_k \exp (\theta \hat \Psi_{0,k}) | {\cal Z}_k ] $$
instead of using Definition \ref{def:alpha}, we can obtain the corresponding
infinite-dimensional linear recursion in the information state. It is trivial
to show that when $\theta \rightarrow 0$, this recursion will be of exactly the same form as the recursion obtained in \cite{MElliot}.
These two facts put together imply that we can recover the risk-neutral
filtering problem (the Kalman Filtering problem in the linear case) in both
the linear and the non-linear cases, as a special case of the risk-sensitive
filtering problem when $\theta \rightarrow 0$. Similar connection can be set
up between risk-neutral HMM filtering and risk-sensitive HMM filtering when
$\theta \rightarrow 0$.