diff --git a/Bilder/fig_gradient.jpg b/Bilder/fig_gradient.jpg
new file mode 100644
index 0000000..05e52a3
Binary files /dev/null and b/Bilder/fig_gradient.jpg differ
diff --git a/Bilder/fig_w_opt.jpg b/Bilder/fig_w_opt.jpg
new file mode 100644
index 0000000..86ed63b
Binary files /dev/null and b/Bilder/fig_w_opt.jpg differ
diff --git a/chapter_02.tex b/chapter_02.tex
index 47f53dd..8823da5 100644
--- a/chapter_02.tex
+++ b/chapter_02.tex
@@ -159,55 +159,104 @@ The minimization of the error signal $e[n]$ can by achieved by applying differen
 \end{itemize}
 As computaional efficiency is a key requirement for the implementation of real-time ANR on a low-power digital signal processor, the Least Mean Squares algorithm is chosen for the minimization of the error signal and therefore will be further explained in the following subchapter.
 
-\subsubsection{Use of Least Mean Squares algorithm in adaptive filtering}
-Before the Least Mean Squares algorithm can be explained in detail, the Wiener filter and the concept of gradient descent have to be introduced.
+\subsubsection{The Wiener filter and Gradient Descent}
+Before the Least Mean Squares algorithm can be explained in detail, the Wiener filter and the concept of gradient descent have to be introduced. \\ \\
 \begin{figure}[H]
     \centering
     \includegraphics[width=0.7\linewidth]{Bilder/fig_wien.jpg}
     \caption{Simple implementation of a Wien filter.}
     \label{fig:fig_wien}
 \end{figure}
-\noindent The Wiener filter, the base of many adaptive filter designs, is a statistical filter used to minimize the mean square error between a desired signal and the output of a linear filter. The output $y[n]$ of the Wiener filter is the sum of the weighted input samples, where the weights are represented by the filter coefficients.
+\noindent The Wiener filter, the base of many adaptive filter designs, is a statistical filter used to minimize the Mean Square Error between a target signal and the output of a linear filter. The output $y[n]$ of the Wiener filter is the sum of the weighted input samples, where the weights are represented by the filter coefficients.
 \begin{equation}
 \label{equation_wien}
  y[n] = w_0x[n] + w_1x[n-1] + ... + w_Mx[n-M] = \sum_{k=0}^{M} w_kx[n-k]
 \end{equation}
-The Wiener filter aims to adjust it´s coefficients to generate a filter output, which resembles the corruption-noise $n[n]$ contained in the target signal $d[n]$ as close as possible. After the filter output is substracted from the target signal, we recvieve the error signal $e[n]$, which represents the cleaned signal $š[n]$ after the noise-component has been removed.
+The Wiener filter aims to adjust it´s coefficients to generate a filter output, which resembles the corruption noise signal $n[n]$ contained in the target signal $d[n]$ as close as possible. After the filter output is substracted from the target signal, we recvieve the error signal $e[n]$, which represents the cleaned signal $š[n]$ after the noise component has been removed. For better unsderstanding, a simple Wiener filter with one coefficient shall be illustrated in the following mathematical approach, before the generalization to an n-dimensional filter is made.
 \begin{equation}
 \label{equation_wien_error}
  e[n] = d[n] - y[n] = d[n] - wx[n]
 \end{equation}
 If we square the error signal and calculate the expected value, we receive the Mean Squared Error $J$, mentioned in the previous chapter, which is the metric the Wiener filter aims to minimize by adjusting it´s coefficients $w$.
 \begin{equation}
-\label{equation_wien_error}
+\label{equation_j}
  J = E(e[n]^2) = E(d^2[n])-2wE(d[n]x[n])+w^2E(x^2[n]) = MSE 
 \end{equation}
-The termns contained in Equation \ref{equation_wien_error} can be further be defined as:
+The terms contained in Equation \ref{equation_j} can be further be defined as:
 \begin{itemize}
 \item $\sigma^2$ = $E(d^2[n])$: The expected value of the squared corrupted target signal - a constant term independent of the filter coefficients $w$.
 \item P = $E(d[n]x[n])$: The cross-correlation between the corrupted target signal and the noise reference signal - a measure of how similar these two signals are.
-\item R = $E(x^2[n])$: The auto-correlation of the noise reference signal - a measure of the signal's spectral power.
+\item R = $E(x^2[n])$: The auto-correlation (or serial-correlation) of the noise reference signal - a measure of the similarity of a signal with it´s delayed copy and therefore of the signal's spectral power.
 \end{itemize}
-For a large number of samples, Equation {\ref{equation_wien_error}} can therefore be further simplified and written as:
+Equation {\ref{equation_j}} can therefore be further simplified and written as:
 \begin{equation}
-\label{equation_wien_error_final}
+\label{equation_j_simple}
  J = \sigma^2 - 2wP + w^2R
 \end{equation}
-As every part of Equation \ref{equation_wien_error_final} beside $w^2$ is constant,  the MSE is a quadratic function of the filter coefficients $w$, offering a calculatable minimum. To find this minimum, we can calculate the derivative of $J$ with respect to $w$ and set it to zero:
+As every part of Equation \ref{equation_j_simple} beside $w^2$ is constant, $j$ is a quadratic function of the filter coefficients $w$, offering a calculatable minimum. To find this minimum, the derivative of $J$ with respect to $w$ can be calculated and set to zero:
 \begin{equation}
-\label{equation_gradient_j}
+\label{equation_j_gradient}
  \frac{dJ}{dw} = -2P + 2wR = 0
 \end{equation}
-Solving Equation \ref{equation_gradient_j} for $w$ delivers the equation to calculate the optimal coefficients for the Wiener filter::
+Solving Equation \ref{equation_j_gradient} for $w$ delivers the equation to calculate the optimal coefficients for the Wiener filter:
 \begin{equation}
-\label{equation_wien_optimal}
- w_{opt} = \frac{P}{R}
+\label{equation_w_optimal}
+ w_{opt} = {P}R^{-1}
 \end{equation}
-To find the optimal set of coefficients $w$ minimizing the Mean Squared Error $J$, we can apply the concept of gradient descent. Gradient descent is an iterative optimization algorithm used to find the minimum of a function by moving in the direction of the steepest descent, which is determined by the negative gradient of the function. In our case, we want to minimize the MSE $J$ by adjusting the filter coefficients $w$. The update rule for the coefficients using gradient descent can be expressed as:
+\begin{figure}[H]
+    \centering
+    \includegraphics[width=0.7\linewidth]{Bilder/fig_w_opt.jpg}
+    \caption{Minimum of the Mean Square Error J located at the optimcal coefficient w* \cite{source_dsp_ch9}}
+    \label{fig:fig_mse}
+\end{figure}
+\noindent If the Wiener filter now consists not out of one coefficient, but out of several coefficients, Equation \ref{equation_wien} can be written in a matrix form as
+\begin{equation}
+\label{equation_wien_matrix}
+ y[n] = \sum_{k=0}^{M} w_kx[n-k] = \textbf{W}^T\textbf{X}[n]
+\end{equation}
+where \textbf{X} is the input signal matrix and \textbf{W} the filter coefficient matrix. 
+\begin{align}
+\label{equation_input_vector}
+ \textbf{X}[n] = [x[n],x[n-1],...,x[n-M]]^T \\
+ \label{equation_coefficient_vector}
+ \textbf{W}[n] = [w_0,w_1,...,w_M]^T
+\end{align}
+Equation \ref{equation_j} can therefore also be rewritten in matrix form to:
+\begin{equation}
+\label{equation_j_matrix}
+ J = \sigma^2 - 2\textbf{W}^TP + \textbf{W}^TR\textbf{W}
+\end{equation}
+After settings the derivative of Equation \ref{equation_j_matrix} to zero and solving for $W$, we receive the optimal filter coefficient matrix:
+\begin{equation}
+\label{equation_w_optimal_matrix}
+ \textbf{W}_{opt} = PR^{-1}
+\end{equation}
+\noindent For a large filter, the numerical solution of Equation \ref{equation_w_optimal_matrix} can be computational expensive, as it involves the inversion of potential large matrix. Therefore, to find the optimal set of coefficients $w$, the concept of gradient descent, introduced by Widrow\&Stearns in 1985, can be applied. The gradient decent algortihm aims to to minimize the MSE $J$ iteratively sample by sample by adjusting the filter coefficients $w$ in small steps towards the direction of the steepest descent to find the optimal coefficients. The update rule for the coefficients using gradient descent can be expressed as
 \begin{equation}  
 \label{equation_gradient}
- w(n+1) = w(n) - \mu \nabla J(w(n)) 
+ w(n+1) = w(n) - \mu \frac{dJ}{dw} 
 \end{equation}
-\subsection{Signal flow diagram of an implanted cochlear implant system}
+where $\mu$ is the constant step size determining the rate of convergence. Figure \ref{fig:fig_w_opt} visualizes the concept of stepwise minimization of the MSE $J$ using gradient descent. After the derivative of $J$ with respect to $w$ r4aches zero, the optimal coefficients $w_{opt}$ are found and the coefficients are no longer updated.
+\begin{figure}[H]
+    \centering
+    \includegraphics[width=0.9\linewidth]{Bilder/fig_gradient.jpg}
+    \caption{Vizualization of the steepest decent alorithm used on the Mean Squared Error. \cite{source_dsp_ch9}}
+    \label{fig:fig_w_opt}
+\end{figure}
+\subsubsection{The Least Mean Squares algorithm}
+The given approach of the steepest decent algorithm in the subchapter above still involves the calculation of the derivative of the MSE $\frac{dJ}{dw}$, which is also a compuational expensive operation to calulate, as it requires knowledge of the statistical properties of the input signals (cross-correlation P and auto-correlation R). Therefore, in energy critical real-time applications, like the implementation of ANR on a low-power DSP, a sample-based aproxmation in form of a Least Mean Squares (LMS) algorithm is used instead. The LMS algorithm approximates the gradient of the MSE by using the instantaneous estimates of the cross-correlation and auto-correlation. To achieve this, we remove the statistical expectation out of the MSE $J$ and take the derivative to obtain a samplewise approximate of $\frac{dJ}{dw[n]}$. 
+\begin{align}
+\label{equation_j_lms}
+ J = e[n]^2 = (d[n]-wx[n])^2 \\
+ \label{equation_j_lms_final}
+ \frac{dJ}{dw[n]} = 2(d[n]-w[n]x[n])\frac{d(d[n])-w[n]x[n]}{dw[n]} = -2e[n]x[n]
+\end{align}
+The result of Equation \ref{equation_j_lms_final} can now be inserted into Equation \ref{equation_gradient} to receive the LMS update rule for the filter coefficients:
+\begin{equation}
+\label{equation_lms}
+ w[n+1] = w[n] + 2\mu e[n]x[n]
+\end{equation}
+The LMS algorithm therefore updates the filter coefficients $w[n]$ after every sample by adding a correction term, which is is calculated by the error signal $e[n]$ and the reference noise signal $x[n]$, scaled by the constant step size $\mu$. By iteratively applying the LMS algorithm, the filter coefficients converge towards the optimal values that minimize the mean squared error between the target signal and the filter output. When a predefined acceptable error level is reached, the adaptation process can be stopped to save computing power.\\ \\
+ \subsection{Signal flow diagram of an implanted cochlear implant system}
 \subsection{Derivation of the system’s transfer function based on the problem setup}
 \subsection{Example applications and high-level simulations using Python}
diff --git a/literature.bib b/literature.bib
index 1042729..4cb3984 100644
--- a/literature.bib
+++ b/literature.bib
@@ -23,21 +23,21 @@
 
 @misc{source_dsp1,
   author = {Li Tan, Jean Jiang},
-  title = {Digital Signal Processing Fundamentals and Applications 2nd Ed},
+  title = {Digital Signal Processing Fundamentals and Applications 3rd Ed},
   howpublished = {Elsevier Inc.},
   year = {2013},
   note = {ISBN: 978-0-12-415893-1}
 }
 @misc{source_dsp_ch1,
   author = {Li Tan, Jean Jiang},
-  title = {Digital Signal Processing Fundamentals and Applications 2nd Ed},
+  title = {Digital Signal Processing Fundamentals and Applications 3rd Ed},
   howpublished = {Elsevier Inc.},
   year = {2013},
   note = {Chapter 1}
 }
 @misc{source_dsp_ch2,
   author = {Li Tan, Jean Jiang},
-  title = {Digital Signal Processing Fundamentals and Applications 2nd Ed},
+  title = {Digital Signal Processing Fundamentals and Applications 3rd Ed},
   howpublished = {Elsevier Inc.},
   year = {2013},
   note = {Chapter 2}
@@ -49,3 +49,10 @@
   year = {1960},
   note = {Pat. Nr. 2966549}
 }
+@misc{source_dsp_ch9,
+  author = {Li Tan, Jean Jiang},
+  title = {Digital Signal Processing Fundamentals and Applications 3rd Ed},
+  howpublished = {Elsevier Inc.},
+  year = {2013},
+  note = {Chapter 9}
+}
\ No newline at end of file