Masterarbeit/chapter_05.tex

\section{Performance evaluation of different implementation variants}
\subsection{Verification of the \ac{DSP} implementation}
To verify the general performance of the \ac{DSP}-implemented \ac{ANR} algorithm, the complex usecase of the high-level implemenation is utilized, which includes, again, a 55-tap \ac{FIR} filter and an update of the filter coefficients every cycle. In contary to the high-level implementation, the coeffcient convergence is now not included in the evaluation anymore, but the metric for the \ac{ANR} performance stays the same as for the \ac{SNR}  improvement.
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_plot_1_dsp_complex.png}
    \caption{Desired signal, corrupted signal, reference noise signal and filter output of the complex \ac{ANR} use case, simulated on the \ac{DSP}}
    \label{fig:fig_plot_1_dsp_complex.png}
\end{figure}
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_plot_2_dsp_complex.png}
    \caption{Error signal of the complex \ac{ANR} use case, simulated on the \ac{DSP}}
    \label{fig:fig_plot_2_dsp_complex.png}
\end{figure}
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_high_low_comparison.png}
\caption{Comparison of the high- and low-level simulation output.}
    \label{fig:fig_high_low_comparison.png}
\end{figure}
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_high_low_comparison_hist.png}
\caption{Histogram of the error amplitude between the high- and low-level simulation output.}
    \label{fig:fig_high_low_comparison_hist.png}
\end{figure}
\noindent Figure \ref{fig:fig_plot_1_dsp_complex.png} and \ref{fig:fig_plot_2_dsp_complex.png} show the results of the complex \ac{ANR} use case, simulated on the \ac{DSP} - with a \ac{SNR}-Gain of 10.26 dB it performs equivalent sucessful as the one of the high-level implementation. Figure \ref{fig:fig_high_low_comparison.png} shows both outputs seperately and then together in one subfigure, together with the plotted error amplitude. Figure \ref{fig:fig_high_low_comparison_hist.png} feautres a histogram of the error amplitude between the high- and low-level implemenation, indicating the correct functionality of the \ac{DSP} implementation. The small deviations can be explained by the fact that the \ac{DSP} implementation is based on fixed-point arithmetic, which leads to a slightly different convergence behavior. Nevertheless, the results show that the \ac{DSP} implementation of the \ac{ANR} algorithm is able to achieve the same performance as the high-level implementation. The next step is of evaluate the performance of the \ac{DSP} implementation in terms of computational efficiency under different scenarios and non-synchrone signals.
\subsection{Determination of the optimal filter length}
\noindent The main focus for evaluating the computational efficiency is the determination of the optimal filter length. To achieve this goal, different signal combinations, which are to be expected everyday situiations for a \ac{CI} patient, are considered. Again, a delay of 2 ms bewteen the corruption noise signal and the reference noise signal is applied, increasing the need for a longer filter. The desired signal of a male voice over speaker is now corrupted with 5 different noise signals, ruling out, that a certain combination of signals is not representative for the overall performance of the \ac{ANR} algorithm:
\begin{itemize}
    \item Breathing noise: Already used in the high-level implementation, this noise signal is a typical noise source for \ac{CI} patients, especially in quiet environments. It consists out of slowly rising and falling maxima.
    \item Coughing noise: This noise signal is generated by coughing and consists out few, but long lasting maxima, showing similarities to a rectangular function.
    \item Scratching noise: This noise signal is generated by scratching some material with finger nails, like the hair or clothes. It consists out of a high number of sharp peaks.
    \item Drinking Noise: This noise signal is generated by swallowing a liquid and consists out of a low number of sharp peaks, featuring long pauses between them.
    \item Chewing Noise: This noise signal is generated by consuming food and consists out of a high number of peaks of different amplitude.
\end{itemize}
The vizualization of the noise signals is shown in Figure \ref{fig:fig_noise_signals.png}.
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_noise_signals.png}
    \caption{Noise signals used to corrupt the desired signal in the computational efficiency evaluation}
    \label{fig:fig_noise_signals.png}
\end{figure}
\noindent The combination of stated sets delivers five different scenarious, everyone different in regard of it's challenges for the \ac{ANR} algorithm. For every scenario, the \ac{SNR}-Gain is calculated with an increasing set of filter coeffcients, ranging from 16 to 64.
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_snr_comparison.png}
    \caption{Simulation of the to be expected \ac{SNR}-Gain for different noise signals and filter lengths applied to the desired signal of a male speaker. The applied delay between the signals amounts 2ms. The graphs are smoothed by a third order savigol filter.}
    \label{fig:fig_snr_comparison.png}
\end{figure}
\noindent Figure \ref{fig:fig_snr_comparison.png} shows the expected \ac{SNR}-Gain for the different noise signals and filter lengths. The results shows, that a minimum filter length of about 32 taps is required, before (in any case) a significant rise in the \ac{SNR}-Gain can be observed - this is highly contrary to the synchrone intermediate high level simulation, where a filter length of only 16 taps provided sufficent noise reduction. This circumstance can be explained by the fact, that the corruption noise signal is now delayed to the reference noise signal, meaning, that the filter needs a certain length before it can be sufficently adapted. The results also show, that the \ac{SNR}-Gain is different for the different noise signals, indicating, that the noise signals have different characteristics, like the number of peaks, their frequency spectrum an their amplitude.\\ \\
The mean \ac{SNR}-Gain of the different noise signals, also shown in Figure \ref{fig:fig_snr_comparison.png}, signals, that after reaching 95\% of the maximum \ac{SNR}-Gain, the \ac{SNR}-Gain increase is slowing down. This threshold is reached at a filter length of 45 taps. This means, that a filter length of 45 taps represents an optimal solution for a statisfying performance of the \ac{ANR} algorithm, while a further increase of the filter length does not lead to a significant increase of the \ac{SNR}-Gain in this setup. This is an important finding, as it allows to optimize the computational efficiency of the \ac{ANR} algorithm by choosing an appropriate filter length.
\subsection{Evaluation of the computational load for a fixed update implementation}
\subsubsection{Full-Update implementation}
\noindent Equation \ref{equation_computing_final} can now be utilized to calculate the needed cycles for the calculation of one sample of the filter output, using a filter length of 45 taps and an update of the filter coefficients every cycle. The needed cycles are calculated as follows:
\begin{equation}
\label{equation_computing_calculation_full_update}
 C_{total} = 45 + (6*45+8)*1 + 34 = 357 \text{ cycles}
\end{equation}
As already mentioned in the previous chapters, the sampling rate of the audio data provided to the \ac{PCM} interface amounts 20 kHz. The prefered clock frequency of the \ac{DSP} is chosen as 16 MHz, which means, that the \ac{DSP} core has cycle budget of
\begin{equation}
\label{equation_cycle_budget}
    C_{budget} = \frac{16 MHz}{20 kHz} = 800 \text{ cycles}
\end{equation}
\noindent for one sample. With these two values, the load of the \ac{DSP} core can be calculated as follows:
\begin{equation}
\label{equation_load_calculation_full_update}
    Load_{DSP} = \frac{C_{total}}{C_{budget}} = \frac{357 \text{ cycles}}{800 \text{ cycles}} = 44.6 \%
\end{equation}
\noindent The results, calculated in Equation \ref{equation_computing_calculation_full_update} to \ref{equation_load_calculation_full_update} can also be recapped as follows:\\ \\
With the optimal filter length of 45 taps and an update rate of the filter coefficients every cycle, the \ac{ANR} algorithm is able to achieve a \ac{SNR}-Gain of about 11.54 dB, averaged over different signal/noise combinations. Under this circumstances, the computational load of the \ac{DSP} core amounts about 45\%, which means that 55\% of the time, which a new sample takes to arrive, it can be halted, and therefore, the overall power consumption can be reduced.\\ \\
The initial signal/noise combination of a male speaker disturbed by a breathing noise, which is used for the verification of the \ac{DSP} implementation, shall serve as a benchmark for the coming evaluations. With 45 filter coefficients it delivers an \ac{SNR}-Gain of about 9.47 dB.
\subsubsection{Reduced-update implementation for the benchmark case}
The most straight-forward method to further reduce the computing effort for the \ac{DSP} core is to reduce the update frequency of the filter coeffcients. For every sample, the new filter coefficients are calculated, but not written to the into the Filter Line - this means, that the filter, calculated for the previous sample, is applied to the actual sample. Depending on the acoustic situation, the savings in computing power will most likely lead to a degredation of the noise reduction quality, depending if the current situation is highly dynamic (and therefore would require a frequent update of the filter coefficients) or is rather static. Changing the update frequency, changes the denominator in Equation \ref{equation_c_5} and therefore in Equation \ref{equation_computing_final}.\\ \\
As already mentioned, the reduction of the update rate is initially evaluated for the benchmark case (male speaker disturbed by a breathing noise) and then checked for general validity. Therefore the \ac{SNR}-Gain of 9.47 dB with 45 filter coefficients represent 100\% achievable noise reduction with a maximum of 357 cycles (also 100\%) in the following figure.
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_snr_update_rate.png}
    \caption{Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the update rate of the ANR algorithm for the benchmark case. The baseline of 100\% is the full update implementation. The marked dots represent the results of the simulation for an explicit setup.}
    \label{fig:fig_snr_update_rate.png}
\end{figure}

\noindent Figure \ref{fig:fig_snr_update_rate.png} descriptively illustrates the trend of the \ac{SNR}-Gain, the executed cycles per sample and the \ac{DSP} load compared to the full-update variant of the benchmark case. Contrary to the executed cycles per sample and the load of the processor, the \ac{SNR}-Gain does not behave linear over the course of reducing the update frequency. This behavior allows us to determinte the update rate, where the most benevolent ratio of \ac{SNR}-Gain in regard to \ac{DSP} load can be expected.\\ \\
The maximum offset bewteen the two graphs can be cound found at an updat rate of 0.39, meaning, that an update of the filter coefficients is only conducted in roughly 2 out of 5 samples. Updating Equation \ref{equation_computing_calculation_full_update} and \ref{equation_load_calculation_full_update} therefore delivers:
\begin{equation}
\label{equation_computing_calculation_reduced_update_1}
 C_{total} = 45 + (6*45+8)*0.39 + 34 = 188 \text{ cycles}
\end{equation}
\begin{equation}
\label{equation_load_calculation_reduced_update_1}
    Load_{DSP} = \frac{C_{total}}{C_{budget}} = \frac{188 \text{ cycles}}{800 \text{ cycles}} = 23.5 \%
\end{equation}
The interpretation of this results leads to the conclusion, that the most cost-effective way to reduce the load of the \ac{DSP} would be to reduce the update rate of the filter coefficients to 0.39. In the case of the benchmark signal/noise combination, this action nearly halfs the processor load from 44.6\% to 23.5\%, while only reducing the \ac{SNR}-Gain by rougly 31 \% from 9.47 dB to 6.40 dB. In the next step, the same analysis will be applied on all introduced noise signal, to get an idea of the general validity of the mad eobservation.
\subsubsection{Reduced-update implementation for multiple noise signals}
Now the same evaluation as in the previous subchapter is conducted for the five introduced noise signals, with the difference, that now on the y-axis the performance gain (the distance between relative SNR-Gain and needed relative cycles/sample) instead of the \ac{SNR}-Gain is plotted.
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_gain_update_rate.png}
    \caption{Performance gain (distance between relative SNR-Gain and needed relative cycles/sample) in relation to the update rate of the ANR algorithm for different noise signals.}
    \label{fig:fig_gain_update_rate.png}
\end{figure}
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_load_update_rate.png}
    \caption{Absolute \ac{DSP} load in relation to the update rate of the ANR algorithm for different noise signals.}
    \label{fig:fig_load_update_rate.png}
\end{figure}
\noindent Figure \ref{fig:fig_gain_update_rate.png} shows the performance gain for the five different scenarios. The mean performance gain for all scenarious now wandered to an update rate of 0.32. Figure \ref{fig:fig_load_update_rate.png} shows the load of the \ac{DSP} core for the different update rates, which is the same for all scenarios, as it is only dependent on the update rate itself.
\begin{equation}
\label{equation_computing_calculation_reduced_update_2}
 C_{total} = 45 + (6*45+8)*0.32 + 34 = 168 \text{ cycles}
\end{equation}
\begin{equation}
\label{equation_load_calculation_reduced_update_2}
    Load_{DSP} = \frac{C_{total}}{C_{budget}} = \frac{188 \text{ cycles}}{800 \text{ cycles}} = 20.8 \%
\end{equation}
Equation \ref{equation_computing_calculation_reduced_update_2} and \ref{equation_load_calculation_reduced_update_2} confirm, that for an update rate of 0.32, a reduction of the \ac{DSP} load to 20.8\% can be achieved, correlating with a performance gain of 24.9\%. This means, that for all viewed scenarios, an update rate of 0.32 represents the best cost-value ratio, for reducing the load while geting the best possible noise reduction. The relative performance for all scenarios result in a mean \ac{SNR}-Gain reduction of XX\% from 11.54 dB to XX.XX dB, while the load of the \ac{DSP} core is reduced by about 53.4\% from 44.6\% to 20.8\%.
\subsubsection{Computational load for reduced-update implementation}
\subsection{Evaluation of the computational load for an error driven implementation}
\subsubsection{Error threshold implementation for the benchmark case}
In contrary to the fixed update implementation of the previous chapter, the error-driven implementation is a more sophisticated approach, which focuses on an error metric, over which the decision for an coefficient update is made. The idea is, that in a more static acoustic situation, the filter coefficients do not need to be updated as frequently as in a more dynamic situation, where the characteristics of the noise signal are changing more rapidly. As the fixed update implementation is not able to detect such changes, the reduction in update frequency is applied in a static way, which means, that there are situations were it is beneficial and situations where it is not. The error-driven implementation, on the other hand, is able to detect such changes and therefore can adapt the update frequency accordingly. Therefore, the error-driven implementation is expected to deliver a better cost-value ratio than the fixed update implementation. !!!It has to be taken into the mind, that this more complex approach also requires more computing power for the decision making, which affects the overall load of the \ac{DSP} core.!!!
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_snr_error_threshold.png}
\caption{Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the fixed error threshold. The baseline is the full update variant the complex usecase. The marked dots represent the results of the simulation for an explicit setup.}
    \label{fig:fig_snr_error_threshold.png.png}
\end{figure}
\subsubsection{Error threshold implementaion for multiple noise signals}
The first approach for the error-driven implementation is to use a fixed error threshold. This means, that if the error signal remains below an, in advance set, certain threshold, the filter coefficients remain unchanged and are not updated. If the error signal exceeds the threshold, the filter coefficients are updated as in the full-update implementation. \\ \\ The crucial aspect of this approach, is the right choise of the error threshold, which is expected to be highly dependent on the acoustic situation. To get an idea of a beneficial error threshold, different values are evaluated for the already used signal/noise benchmark of a male speaker disturbed by a breathing noise. The reduction in computational load must now be calculated for the whole audio track by the percentage of samples, where the error signal exceeds the threshold. This means in detail, that if for a certain error threshold, 50000 of 200000 samples exceed said threshold, the update rate of the filter coefficients amounts 0.25, which means that the filter coefficients are only updated in 25\% of the samples. The result can therefore be expressed in the same way as for the fixed update implementation, where the update rate is directly calculated for one sample.

\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_gain_error_threshold.png}
\caption{Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the fixed error threshold. The baseline is the full update variant the complex usecase. The marked dots represent the results of the simulation for an explicit setup.}
    \label{fig:fig_gain_error_threshold.png.png}
\end{figure}
\begin{figure}[H]
    \centering
    \includegraphics[width=1.0\linewidth]{Bilder/fig_load_error_threshold.png}
\caption{Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the fixed error threshold. The baseline is the full update variant the complex usecase. The marked dots represent the results of the simulation for an explicit setup.}
    \label{fig:fig_load_error_threshold.png.png}
\end{figure}
\noindent Our benchmark track is evaluated for error tresholds ranging from 0 to 0.5. The results, represented in Figure \ref{fig:fig_snr_error_threshold.png.png}, show for small thresholds, especially smaller than 0.1, a highly beneficial behavior can be anticipated, where the \ac{SNR}-Gain is only slightly reduced, while the load of the \ac{DSP} core is significantly reduced. The maximum offset between the two graphs can be found at an error threshold of 0.02 - at this point the \ac{SNR}-Gain is reduced by only 8.9\% to 8.63 dB, while the coefficient adaption is only conducted in ~81400 of 200000 samples, which equivalents an update rate of about 41\%.
\begin{equation}
\label{equation_computing_calculation_error_threshold}
 C_{total} = 45 + (6*45+8)*0.407 + 34 = 192 \text{ cycles}
\end{equation}
\begin{equation}
\label{equation_load_calculation_error_threshold}
    Load_{DSP} = \frac{C_{total}}{C_{budget}} = \frac{192 \text{ cycles}}{800 \text{ cycles}} = 24.0 \%
\end{equation}
\subsubsection{Computational load for error threshold implementation}
\subsection{Summary of the performance evaluation}
The results of the fixed error threshold implementation shows, that at it´s optimum setting, about the same reduction in needed cycles/sample than in the reduced updated implemenation can be achieved (188 vs 192 cycles), while the \ac{SNR}-Gain is reduced by only 8.9\% compared to 31\% for the reduced update implementation. This means, that the fixed error threshold implementation delivers a far better cost-value ratio than the fixed update implementation, while still being a rather simple approach.