Compare commits

..

5 Commits

Author SHA1 Message Date
Patrick Hangl 0a5244ec3f 6.0 2026-05-05 09:53:04 +02:00
Patrick Hangl 5e5331f099 6 2026-05-04 10:48:09 +02:00
Patrick Hangl dd514337d7 Korrekturcommit 2026-04-25 10:32:46 +02:00
Patrick Hangl 52e5b87f76 Überschreiben der REchtschreibkorrektur 2026-04-25 10:26:49 +02:00
Patrick Hangl 66385250bc 5.3 2026-04-25 10:17:50 +02:00
15 changed files with 524 additions and 266 deletions
Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 825 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 652 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 633 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 703 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 737 KiB

+2
View File
@@ -165,5 +165,7 @@
\setcounter{FV@BreakBufferDepth}{0} \setcounter{FV@BreakBufferDepth}{0}
\setcounter{minted@FancyVerbLineTemp}{0} \setcounter{minted@FancyVerbLineTemp}{0}
\setcounter{listing}{0} \setcounter{listing}{0}
\setcounter{caption@flags}{0}
\setcounter{continuedfloat}{0}
\setcounter{lstlisting}{0} \setcounter{lstlisting}{0}
} }
+20 -21
View File
@@ -10,40 +10,39 @@ The high-level implementation of the \ac{ANR} algorithm follows the theoretical
\item Coefficient Update: The filter coefficients are updated by the corrector, which consists out of the error signal, scaled by the step size. The adaption step parameter allows controlling how often the coefficients are updated. \item Coefficient Update: The filter coefficients are updated by the corrector, which consists out of the error signal, scaled by the step size. The adaption step parameter allows controlling how often the coefficients are updated.
\item Iteration: Repeat the process for all samples in the input signal. \item Iteration: Repeat the process for all samples in the input signal.
\end{itemize} \end{itemize}
The flow diagram in Figure \ref{fig:fig_anr_logic} illustrates the logical flow of the \ac{ANR} algorithm, while the code snippet in Figure \ref{fig:fig_anr_code} provides the concrete code implementation of the \ac{ANR}-function. The flow diagram in Figure \ref{fig:fig_anr_logic} illustrates the logical flow of the \ac{ANR} algorithm, while the code snippet in Listing \ref{lst:lst_anr_code} provides the concrete code implementation of the \ac{ANR}-function.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=0.9\linewidth]{Bilder/fig_anr_logic.jpg} \includegraphics[width=0.9\linewidth]{Bilder/fig_anr_logic.jpg}
\caption{Flow diagram of the code implementation of the \ac{ANR} algorithm.} \caption{Flow diagram of the code implementation of the \ac{ANR} algorithm.}
\label{fig:fig_anr_logic} \label{fig:fig_anr_logic}
\end{figure} \end{figure}
\begin{figure}[H] \begin{listing}[H]
\centering \centering
\begin{lstlisting}[language=Python] \begin{lstlisting}[style=pythonstyle]
def anr_function(input, ref_noise, coefficients, mu, adaption_step=1): def anr_function(input, ref_noise, coefficients, mu, adaption_step=1):
sample_count = len(input)
filter_line = np.zeros(coefficients)
sample_line = np.zeros(coefficients)
output = np.zeros(sample_count)
coefficient_matrix = np.zeros((sample_count, coefficients))
coefficient_matrix = np.zeros((len(input), coefficients), for n in range(sample_count):
dtype=np.float32) sample_line = np.roll(sample_line, 1)
output = np.zeros(input.shape[0], dtype=np.float32) sample_line[0] = ref_noise[n]
filter = np.zeros(coefficients, dtype=np.float32)
for j in range(0, len(input) - len(filter)): accumulator = np.dot(filter_line, sample_line)
accumulator=0 error = input[n] - accumulator
for i in range(coefficients): output[n] = error
noise=ref_noise[j+i]
accumulator+=filter[i] * noise filter_line += mu * error * sample_line
output[j] = input[j] - accumulator coefficient_matrix[n, :] = filter_line
corrector = mu * output[j]
if (j % adaption_step) == 0:
for k in range(coefficients):
filter[k] += corrector*ref_noise[j+k]
coefficient_matrix[j, :] = filter[:]
return output, coefficient_matrix return output, coefficient_matrix
\end{lstlisting} \end{lstlisting}
\caption{High-level implementation of the \ac{ANR} algorithm in Python} \caption{High-level implementation of the \ac{ANR} algorithm in Python}
\label{fig:fig_anr_code} \label{lst:lst_anr_code}
\end{figure} \end{listing}
\noindent The algorithm implementation shall now be put under test by different use cases to demonstrate the functionality and performance under different scenarios, varying from simple to complex ones. Every use case includes graphical representations of the desired signal, the corrupted signal, the reference noise signal, the filter output, the error signal and the evolution of selected filter coefficients over time. In contrary to a realistic setup, the desired signal is available, allowing to evaluate the performance of the algorithm based on the \ac{SNR}-Gain in dB and also visually by the amplitude of the error signal (difference between the desired signal and the filter output). The error signal and the \ac{SNR}-Gain are calculated as follows: \noindent The algorithm implementation shall now be put under test by different use cases to demonstrate the functionality and performance under different scenarios, varying from simple to complex ones. Every use case includes graphical representations of the desired signal, the corrupted signal, the reference noise signal, the filter output, the error signal and the evolution of selected filter coefficients over time. In contrary to a realistic setup, the desired signal is available, allowing to evaluate the performance of the algorithm based on the \ac{SNR}-Gain in dB and also visually by the amplitude of the error signal (difference between the desired signal and the filter output). The error signal and the \ac{SNR}-Gain are calculated as follows:
\begin{gather} \begin{gather}
\label{equation_snr_gain_error} \label{equation_snr_gain_error}
@@ -53,7 +52,7 @@ The flow diagram in Figure \ref{fig:fig_anr_logic} illustrates the logical flow
\end{gather} \end{gather}
with $P_{Desired-signal}$ being the power of the desired signal, $P_{Noise-signal}$ being the power of the noise signal and $P_{Error-signal}$ being the power of the error signal, which is the difference between the desired signal and the filter output. A positive \ac{SNR}-Gain indicates an improvement in signal quality, while a negative \ac{SNR}-Gain indicates a degradation in signal quality after applying the \ac{ANR} algorithm. with $P_{Desired-signal}$ being the power of the desired signal, $P_{Noise-signal}$ being the power of the noise signal and $P_{Error-signal}$ being the power of the error signal, which is the difference between the desired signal and the filter output. A positive \ac{SNR}-Gain indicates an improvement in signal quality, while a negative \ac{SNR}-Gain indicates a degradation in signal quality after applying the \ac{ANR} algorithm.
\subsection{Simple ANR use cases} \subsection{Simple ANR use cases}
To evaluate the general functionality and performance of the \ac{ANR} algorithm from Figure \ref{fig:fig_anr_code} a set of three simple, artificial scenarios are introduced. These examples shall serve as a showcase to demonstrate the general functionality, the possibilities and the limitations of the \ac{ANR} algorithm.\\ \\ To evaluate the general functionality and performance of the \ac{ANR} algorithm from Listing \ref{lst:lst_anr_code} a set of three simple, artificial scenarios are introduced. These examples shall serve as a showcase to demonstrate the general functionality, the possibilities and the limitations of the \ac{ANR} algorithm.\\ \\
In all three scenarios, a chirp signal with a frequency range from 100-1000 Hz is used as the desired signal, which is then corrupted with a sine wave (Use case 1 and 2) or a Gaussian white noise (Use case 3) as noise signal respectively. In this simple setup, the corruption noise signal is also available as the reference noise signal. Every approach is conducted with 16 filter coefficients and a step size of 0.01. The four graphs in the respective first plot show the desired signal, the corrupted signal, the reference noise signal and the filter output. The two graphs in the respective second plot show the performance of the filter in form of the resulting error signal and the evolution of three filter coefficients over time.\\ \\ In all three scenarios, a chirp signal with a frequency range from 100-1000 Hz is used as the desired signal, which is then corrupted with a sine wave (Use case 1 and 2) or a Gaussian white noise (Use case 3) as noise signal respectively. In this simple setup, the corruption noise signal is also available as the reference noise signal. Every approach is conducted with 16 filter coefficients and a step size of 0.01. The four graphs in the respective first plot show the desired signal, the corrupted signal, the reference noise signal and the filter output. The two graphs in the respective second plot show the performance of the filter in form of the resulting error signal and the evolution of three filter coefficients over time.\\ \\
\noindent This artificial setup could be solved analytically, as the signals do not pass separate, different transfer functions, meaning, that the reference noise signal is the same as the corruption noise signal. Though, this simple setup would not require an adaptive filter approach, it nevertheless allows to clearly evaluate the performance of the \ac{ANR} algorithm in different scenarios. Also, due to the fact that the desired signal is known, it is possible to evaluate the performance of the algorithm in a simple way. \noindent This artificial setup could be solved analytically, as the signals do not pass separate, different transfer functions, meaning, that the reference noise signal is the same as the corruption noise signal. Though, this simple setup would not require an adaptive filter approach, it nevertheless allows to clearly evaluate the performance of the \ac{ANR} algorithm in different scenarios. Also, due to the fact that the desired signal is known, it is possible to evaluate the performance of the algorithm in a simple way.
\subsubsection{Simple use case 1: Sine noise at 2000 Hz} \subsubsection{Simple use case 1: Sine noise at 2000 Hz}
+51 -49
View File
@@ -7,10 +7,10 @@
\acronymused{ANR} \acronymused{ANR}
\@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Low-power system architecture and integration}{42}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Low-power system architecture and integration}{42}{}\protected@file@percent }
\AC@undonewlabel{acro:SOC} \AC@undonewlabel{acro:SOC}
\newlabel{acro:SOC}{{4.1}{42}{}{subsection.4.1}{}} \newlabel{acro:SOC}{{4.1}{42}{}{}{}}
\acronymused{SOC} \acronymused{SOC}
\AC@undonewlabel{acro:ARM} \AC@undonewlabel{acro:ARM}
\newlabel{acro:ARM}{{4.1}{42}{}{subsection.4.1}{}} \newlabel{acro:ARM}{{4.1}{42}{}{}{}}
\acronymused{ARM} \acronymused{ARM}
\acronymused{DSP} \acronymused{DSP}
\acronymused{ARM} \acronymused{ARM}
@@ -22,7 +22,7 @@
\acronymused{DSP} \acronymused{DSP}
\acronymused{DSP} \acronymused{DSP}
\AC@undonewlabel{acro:MAC} \AC@undonewlabel{acro:MAC}
\newlabel{acro:MAC}{{4.1.1}{42}{}{subsubsection.4.1.1}{}} \newlabel{acro:MAC}{{4.1.1}{42}{}{}{}}
\acronymused{MAC} \acronymused{MAC}
\acronymused{ARM} \acronymused{ARM}
\acronymused{ANR} \acronymused{ANR}
@@ -37,10 +37,10 @@
\acronymused{ARM} \acronymused{ARM}
\acronymused{DSP} \acronymused{DSP}
\AC@undonewlabel{acro:DMA} \AC@undonewlabel{acro:DMA}
\newlabel{acro:DMA}{{4.1.1}{43}{}{subsubsection.4.1.1}{}} \newlabel{acro:DMA}{{4.1.1}{43}{}{}{}}
\acronymused{DMA} \acronymused{DMA}
\AC@undonewlabel{acro:PCM} \AC@undonewlabel{acro:PCM}
\newlabel{acro:PCM}{{4.1.1}{43}{}{subsubsection.4.1.1}{}} \newlabel{acro:PCM}{{4.1.1}{43}{}{}{}}
\acronymused{PCM} \acronymused{PCM}
\acronymused{DSP} \acronymused{DSP}
\acronymused{PCM} \acronymused{PCM}
@@ -51,7 +51,7 @@
\acronymused{DSP} \acronymused{DSP}
\acronymused{DSP} \acronymused{DSP}
\AC@undonewlabel{acro:ALU} \AC@undonewlabel{acro:ALU}
\newlabel{acro:ALU}{{4.1.1}{43}{}{subsubsection.4.1.1}{}} \newlabel{acro:ALU}{{4.1.1}{43}{}{}{}}
\acronymused{ALU} \acronymused{ALU}
\acronymused{DSP} \acronymused{DSP}
\acronymused{MAC} \acronymused{MAC}
@@ -65,12 +65,12 @@
\acronymused{DSP} \acronymused{DSP}
\acronymused{DSP} \acronymused{DSP}
\acronymused{ARM} \acronymused{ARM}
\@writefile{lof}{\contentsline {figure}{\numberline {32}{\ignorespaces Simplified visualization of the interaction between the \ac {CI}-System, the \ac {ARM} core and the \ac {DSP} core, making use of the \ac {PCM} interface and shared memory for audio data exchange.}}{44}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {31}{\ignorespaces Simplified visualization of the interaction between the \ac {CI}-System, the \ac {ARM} core and the \ac {DSP} core, making use of the \ac {PCM} interface and shared memory for audio data exchange.}}{44}{}\protected@file@percent }
\acronymused{CI} \acronymused{CI}
\acronymused{ARM} \acronymused{ARM}
\acronymused{DSP} \acronymused{DSP}
\acronymused{PCM} \acronymused{PCM}
\newlabel{fig:fig_dsp_setup.jpg}{{32}{44}{}{figure.32}{}} \newlabel{fig:fig_dsp_setup.jpg}{{31}{44}{}{}{}}
\acronymused{ARM} \acronymused{ARM}
\acronymused{PCM} \acronymused{PCM}
\acronymused{DMA} \acronymused{DMA}
@@ -82,10 +82,10 @@
\acronymused{DSP} \acronymused{DSP}
\acronymused{DMA} \acronymused{DMA}
\acronymused{PCM} \acronymused{PCM}
\@writefile{lof}{\contentsline {figure}{\numberline {33}{\ignorespaces Simplified flowchart of the sample processing between the \ac {ARM} core and the \ac {DSP} core via interrupts and shared memory.}}{45}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {32}{\ignorespaces Simplified flowchart of the sample processing between the \ac {ARM} core and the \ac {DSP} core via interrupts and shared memory.}}{45}{}\protected@file@percent }
\acronymused{ARM} \acronymused{ARM}
\acronymused{DSP} \acronymused{DSP}
\newlabel{fig:fig_dsp_comm.jpg}{{33}{45}{}{figure.33}{}} \newlabel{fig:fig_dsp_comm.jpg}{{32}{45}{}{}{}}
\@writefile{toc}{\contentsline {subsection}{\numberline {4.2}Software architecture and execution flow}{45}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsection}{\numberline {4.2}Software architecture and execution flow}{45}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.1}ARMDSP communication and data exchange details}{45}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.1}ARMDSP communication and data exchange details}{45}{}\protected@file@percent }
\acronymused{ANR} \acronymused{ANR}
@@ -117,11 +117,11 @@
\acronymused{PCM} \acronymused{PCM}
\acronymused{ARM} \acronymused{ARM}
\acronymused{DSP} \acronymused{DSP}
\@writefile{lof}{\contentsline {figure}{\numberline {34}{\ignorespaces Detailed visualization of the \ac {DMA} operations between the PCM interface to the shared memory section. When the memory buffer occupied, an interrupt is triggered, either to the \ac {DSP} core or to the \ac {ARM} core, depending on, if triggered during a Read- or Write-operation.}}{47}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {33}{\ignorespaces Detailed visualization of the \ac {DMA} operations between the PCM interface to the shared memory section. When the memory buffer occupied, an interrupt is triggered, either to the \ac {DSP} core or to the \ac {ARM} core, depending on, if triggered during a Read- or Write-operation.}}{47}{}\protected@file@percent }
\acronymused{DMA} \acronymused{DMA}
\acronymused{DSP} \acronymused{DSP}
\acronymused{ARM} \acronymused{ARM}
\newlabel{fig:fig_dsp_dma.jpg}{{34}{47}{}{figure.34}{}} \newlabel{fig:fig_dsp_dma.jpg}{{33}{47}{}{}{}}
\acronymused{DMA} \acronymused{DMA}
\acronymused{DMA} \acronymused{DMA}
\acronymused{PCM} \acronymused{PCM}
@@ -138,10 +138,10 @@
\acronymused{DSP} \acronymused{DSP}
\acronymused{PCM} \acronymused{PCM}
\acronymused{DSP} \acronymused{DSP}
\newlabel{fig:fig_dps_code_memory}{{4.2.2}{48}{}{lstnumber.-2.13}{}} \@writefile{lol}{\contentsline {listing}{\numberline {2}{\ignorespaces Low-level implementation: Memory initialization and mapping}}{48}{}\protected@file@percent }
\@writefile{lof}{\contentsline {figure}{\numberline {35}{\ignorespaces Low-level implementation: Memory initialization and mapping}}{48}{}\protected@file@percent } \newlabel{lst:lst_dsp_code_memory}{{2}{48}{}{}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {36}{\ignorespaces Exemplary memory map of the 4-element input buffer array. As it is initialized as a 16-bit integer array, each element occupies 2 bytes of memory, resulting in a total size of 8 bytes for the entire array. As the DSP architecture works in 32-bit double-words, the bytewise addressing is a result of the compiler abstraction.}}{48}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {34}{\ignorespaces Exemplary memory map of the 4-element input buffer array. As it is initialized as a 16-bit integer array, each element occupies 2 bytes of memory, resulting in a total size of 8 bytes for the entire array. As the DSP architecture works in 32-bit double-words, the bytewise addressing is a result of the compiler abstraction.}}{48}{}\protected@file@percent }
\newlabel{fig:fig_compiler.jpg}{{36}{48}{}{figure.36}{}} \newlabel{fig:fig_compiler.jpg}{{34}{48}{}{}{}}
\@writefile{toc}{\contentsline {paragraph}{Main loop and interrupt handling}{48}{}\protected@file@percent } \@writefile{toc}{\contentsline {paragraph}{Main loop and interrupt handling}{48}{}\protected@file@percent }
\acronymused{DSP} \acronymused{DSP}
\acronymused{ANR} \acronymused{ANR}
@@ -150,11 +150,11 @@
\acronymused{DSP} \acronymused{DSP}
\acronymused{ARM} \acronymused{ARM}
\acronymused{DSP} \acronymused{DSP}
\@writefile{lof}{\contentsline {figure}{\numberline {37}{\ignorespaces Low-level implementation: Main loop and interrupt handling}}{49}{}\protected@file@percent } \@writefile{lol}{\contentsline {listing}{\numberline {3}{\ignorespaces Low-level implementation: Main loop and interrupt handling}}{49}{}\protected@file@percent }
\newlabel{fig:fig_dps_code_mainloop}{{37}{49}{}{figure.37}{}} \newlabel{lst:lst_dsp_code_mainloop}{{3}{49}{}{}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {38}{\ignorespaces Flow diagram of the code implementation of the main loop and interrupt handling on the \ac {DSP} core.}}{50}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {35}{\ignorespaces Flow diagram of the code implementation of the main loop and interrupt handling on the \ac {DSP} core.}}{50}{}\protected@file@percent }
\acronymused{DSP} \acronymused{DSP}
\newlabel{fig:fig_dsp_logic.jpg}{{38}{50}{}{figure.38}{}} \newlabel{fig:fig_dsp_logic.jpg}{{35}{50}{}{}{}}
\@writefile{toc}{\contentsline {paragraph}{calculate\_output()-function}{50}{}\protected@file@percent } \@writefile{toc}{\contentsline {paragraph}{calculate\_output()-function}{50}{}\protected@file@percent }
\acronymused{DSP} \acronymused{DSP}
\acronymused{ANR} \acronymused{ANR}
@@ -167,12 +167,12 @@
\@writefile{toc}{\contentsline {paragraph}{Logic operations}{51}{}\protected@file@percent } \@writefile{toc}{\contentsline {paragraph}{Logic operations}{51}{}\protected@file@percent }
\acronymused{DSP} \acronymused{DSP}
\acronymused{DSP} \acronymused{DSP}
\@writefile{lof}{\contentsline {figure}{\numberline {39}{\ignorespaces Manual implementation of a max-function, returning the maximum of two integer values, taking 12 cycles to execute. The intrinsic functions of the DSP compiler allows a 4-cycle implementation of such an operation.}}{51}{}\protected@file@percent } \@writefile{lol}{\contentsline {listing}{\numberline {4}{\ignorespaces Manual implementation of a max-function, returning the maximum of two integer values, taking 12 cycles to execute. The intrinsic functions of the DSP compiler allow a 4-cycle implementation of such an operation.}}{51}{}\protected@file@percent }
\newlabel{fig:fig_dsp_code_find_max}{{39}{51}{}{figure.39}{}} \newlabel{lst:lst_dsp_code_find_max}{{4}{51}{}{}{}}
\@writefile{toc}{\contentsline {paragraph}{Cyclic array iteration}{51}{}\protected@file@percent } \@writefile{toc}{\contentsline {paragraph}{Cyclic array iteration}{51}{}\protected@file@percent }
\acronymused{ANR} \acronymused{ANR}
\@writefile{lof}{\contentsline {figure}{\numberline {40}{\ignorespaces Manual implementation of a cyclic array iteration function in C, taking the core 20 cycles to execute a pointer inremen of 1. The intrinsic functions of the DSP compiler allows a single-cycle implementation of such cyclic additions.}}{52}{}\protected@file@percent } \@writefile{lol}{\contentsline {listing}{\numberline {5}{\ignorespaces Manual implementation of a cyclic array iteration function in C, taking the core 20 cycles to execute a pointer inremen of 1. The intrinsic functions of the DSP compiler allows a single-cycle implementation of such cyclic additions.}}{52}{}\protected@file@percent }
\newlabel{fig:fig_dsp_code_cyclic_add}{{40}{52}{}{figure.40}{}} \newlabel{lst:lst_dsp_code_cyclic_add}{{5}{52}{}{}{}}
\acronymused{DSP} \acronymused{DSP}
\acronymused{DSP} \acronymused{DSP}
\@writefile{toc}{\contentsline {paragraph}{Fractional fixed-point arithmetic}{52}{}\protected@file@percent } \@writefile{toc}{\contentsline {paragraph}{Fractional fixed-point arithmetic}{52}{}\protected@file@percent }
@@ -186,40 +186,40 @@
\acronymused{FIR} \acronymused{FIR}
\acronymused{FIR} \acronymused{FIR}
\acronymused{DSP} \acronymused{DSP}
\@writefile{toc}{\contentsline {paragraph}{write\_buffer}{53}{}\protected@file@percent } \@writefile{toc}{\contentsline {paragraph}{write\_buffer()}{53}{}\protected@file@percent }
\acronymused{DSP} \acronymused{DSP}
\@writefile{toc}{\contentsline {paragraph}{apply\_fir\_filter}{53}{}\protected@file@percent } \@writefile{toc}{\contentsline {paragraph}{apply\_fir\_filter()}{53}{}\protected@file@percent }
\acronymused{FIR} \acronymused{FIR}
\acronymused{MAC} \acronymused{MAC}
\acronymused{DSP} \acronymused{DSP}
\acronymused{DSP} \acronymused{DSP}
\@writefile{lof}{\contentsline {figure}{\numberline {41}{\ignorespaces Code snippet of the $apply\_fir\_filter$-function, showing the use of the dual \ac {MAC} architecture of the \ac {DSP} and the fractional multiplication function. The loop iterates through the filter coefficients and reference noise signal samples, performing two multiplications and two additions in each cycle.}}{54}{}\protected@file@percent } \@writefile{lol}{\contentsline {listing}{\numberline {6}{\ignorespaces Code snippet of the $apply\_fir\_filter()$-function, showing the use of the dual \ac {MAC} architecture of the \ac {DSP} and the fractional multiplication function. The loop iterates through the filter coefficients and reference noise signal samples, performing two multiplications and two additions in each cycle.}}{54}{}\protected@file@percent }
\newlabel{lst:lst_dsp_code_apply_fir_filter}{{6}{54}{}{}{}}
\acronymused{MAC} \acronymused{MAC}
\acronymused{DSP} \acronymused{DSP}
\newlabel{fig:fig_dsp_code_apply_fir_filter}{{41}{54}{}{figure.41}{}} \@writefile{lof}{\contentsline {figure}{\numberline {36}{\ignorespaces Visualization of the FIR filter calculation in the $apply\_fir\_filter()$-function during the 2nd cyclce of a calculation loop. The reference noise signal samples are stored in the sample line, while the filter coefficients are stored in a separate memory section (filter line).}}{54}{}\protected@file@percent }
\@writefile{lof}{\contentsline {figure}{\numberline {42}{\ignorespaces Visualization of the FIR filter calculation in the $apply\_fir\_filter$-function during the 2nd cyclce of a calculation loop. The reference noise signal samples are stored in the sample line, while the filter coefficients are stored in a separate memory section (filter line).}}{54}{}\protected@file@percent } \newlabel{fig:fig_dsp_fir_cycle.jpg}{{36}{54}{}{}{}}
\newlabel{fig:fig_dsp_fir_cycle.jpg}{{42}{54}{}{figure.42}{}} \@writefile{toc}{\contentsline {paragraph}{update\_output()}{55}{}\protected@file@percent }
\@writefile{toc}{\contentsline {paragraph}{update\_output}{55}{}\protected@file@percent } \@writefile{toc}{\contentsline {paragraph}{update\_filter\_coefficient()}{55}{}\protected@file@percent }
\@writefile{toc}{\contentsline {paragraph}{update\_filter\_coefficient}{55}{}\protected@file@percent }
\acronymused{DSP} \acronymused{DSP}
\acronymused{MAC} \acronymused{MAC}
\@writefile{lof}{\contentsline {figure}{\numberline {43}{\ignorespaces Code snippet of the $update\_filter\_coefficient$-function, again making use of the dual \ac {MAC} architecture of the \ac {DSP} and the fractional multiplication function. Additionaly, 32-bit values are loaded and stored as 64-bit values, using two also intrinisc functions, allowing to update two filter coefficients in a single cycle.}}{55}{}\protected@file@percent } \@writefile{lol}{\contentsline {listing}{\numberline {7}{\ignorespaces Code snippet of the $update\_filter\_coefficient()$-function, again making use of the dual \ac {MAC} architecture of the \ac {DSP} and the fractional multiplication function. Additionaly, 32-bit values are loaded and stored as 64-bit values, using two also intrinisc functions, allowing to update two filter coefficients in a single cycle.}}{55}{}\protected@file@percent }
\newlabel{lst:lst_dsp_code_update_filter_coefficients}{{7}{55}{}{}{}}
\acronymused{MAC} \acronymused{MAC}
\acronymused{DSP} \acronymused{DSP}
\newlabel{fig:fig_dsp_code_update_filter_coefficients}{{43}{55}{}{figure.43}{}} \@writefile{lof}{\contentsline {figure}{\numberline {37}{\ignorespaces Visualization of the coefficient calculation in the $update\_filter\_coefficient()$-function during the 2nd cyclce of a calculation loop. The output is multiplied with the step size and the corresponding sample from the sample line, before being added to the current filter coefficient.}}{56}{}\protected@file@percent }
\@writefile{lof}{\contentsline {figure}{\numberline {44}{\ignorespaces Visualization of the coefficient calculation in the $update\_filter\_coefficient$-function during the 2nd cyclce of a calculation loop. The output is multiplied with the step size and the corresponding sample from the sample line, before being added to the current filter coefficient.}}{56}{}\protected@file@percent } \newlabel{fig:fig_dsp_coefficient_cycle.jpg}{{37}{56}{}{}{}}
\newlabel{fig:fig_dsp_coefficient_cycle.jpg}{{44}{56}{}{figure.44}{}} \@writefile{toc}{\contentsline {paragraph}{update\_output()}{56}{}\protected@file@percent }
\@writefile{toc}{\contentsline {paragraph}{write\_output}{56}{}\protected@file@percent } \newlabel{equation_computing}{{24}{56}{}{}{}}
\newlabel{equation_computing}{{24}{56}{}{equation.24}{}} \newlabel{equation_c_1}{{25}{56}{}{}{}}
\newlabel{equation_c_1}{{25}{56}{}{equation.25}{}} \newlabel{equation_c_2}{{26}{56}{}{}{}}
\newlabel{equation_c_2}{{26}{56}{}{equation.26}{}} \newlabel{equation_c_3}{{27}{56}{}{}{}}
\newlabel{equation_c_3}{{27}{56}{}{equation.27}{}} \newlabel{equation_c_4}{{28}{56}{}{}{}}
\newlabel{equation_c_4}{{28}{56}{}{equation.28}{}} \newlabel{equation_c_5}{{29}{56}{}{}{}}
\newlabel{equation_c_5}{{29}{56}{}{equation.29}{}} \newlabel{equation_computing_final}{{31}{57}{}{}{}}
\newlabel{equation_computing_final}{{31}{57}{}{equation.31}{}}
\acronymused{DSP} \acronymused{DSP}
\@writefile{lof}{\contentsline {figure}{\numberline {45}{\ignorespaces Dependence of the total computing effort on the filter length $N$ and update rate $1/U$.}}{57}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {38}{\ignorespaces Dependence of the total computing effort on the filter length $\text {N}$ and update rate $\text {1/U}$.}}{57}{}\protected@file@percent }
\newlabel{fig:fig_c_total.png}{{45}{57}{}{figure.45}{}} \newlabel{fig:fig_c_total.png}{{38}{57}{}{}{}}
\@setckpt{chapter_04}{ \@setckpt{chapter_04}{
\setcounter{page}{58} \setcounter{page}{58}
\setcounter{equation}{31} \setcounter{equation}{31}
@@ -235,7 +235,7 @@
\setcounter{subsubsection}{2} \setcounter{subsubsection}{2}
\setcounter{paragraph}{0} \setcounter{paragraph}{0}
\setcounter{subparagraph}{0} \setcounter{subparagraph}{0}
\setcounter{figure}{45} \setcounter{figure}{38}
\setcounter{table}{0} \setcounter{table}{0}
\setcounter{float@type}{16} \setcounter{float@type}{16}
\setcounter{tabx@nest}{0} \setcounter{tabx@nest}{0}
@@ -247,7 +247,7 @@
\setcounter{citetotal}{0} \setcounter{citetotal}{0}
\setcounter{multicitecount}{0} \setcounter{multicitecount}{0}
\setcounter{multicitetotal}{0} \setcounter{multicitetotal}{0}
\setcounter{instcount}{18} \setcounter{instcount}{22}
\setcounter{maxnames}{3} \setcounter{maxnames}{3}
\setcounter{minnames}{1} \setcounter{minnames}{1}
\setcounter{maxitems}{3} \setcounter{maxitems}{3}
@@ -366,6 +366,8 @@
\setcounter{FancyVerbLineBreakLast}{0} \setcounter{FancyVerbLineBreakLast}{0}
\setcounter{FV@BreakBufferDepth}{0} \setcounter{FV@BreakBufferDepth}{0}
\setcounter{minted@FancyVerbLineTemp}{0} \setcounter{minted@FancyVerbLineTemp}{0}
\setcounter{listing}{0} \setcounter{listing}{7}
\setcounter{caption@flags}{2}
\setcounter{continuedfloat}{0}
\setcounter{lstlisting}{0} \setcounter{lstlisting}{0}
} }
+66 -55
View File
@@ -60,9 +60,9 @@ As the \ac{ARM} operation is not the main focus of this thesis and its behavior
The implementation of the \ac{ANR} algorithm on the \ac{DSP} core is structured into several key sections, each responsible for specific aspects of the algorithm's functionality. The following paragraphs outline the main components: The implementation of the \ac{ANR} algorithm on the \ac{DSP} core is structured into several key sections, each responsible for specific aspects of the algorithm's functionality. The following paragraphs outline the main components:
\paragraph{Memory initialization and mapping} \paragraph{Memory initialization and mapping}
The memory initialization section starts with the definition of the interrupt register (0xC00004) and the corresponding bit masks used to control the interrupt behavior of the \ac{DSP} core. Afterwards, a section in the shared memory is defined for the storage of input and output audio samples after/before the transport to/from the \ac{PCM} interface. The output section is initialized with an offset of 16 bytes from the input section (0x800000), resulting in a storage capability of 4 32-bit double-words for each of the two memory sections - this is more than needed, but prevents future memory relocation, if the necessity for more space would arise. After this initialization, the interrupt register and the memory sections are declared as volatile variables, telling the compiler, that these variables can be changed outside the normal program flow (e.g., by hardware interrupts), preventing certain optimizations. The final input/output buffers are then declared in form of two 16-bit arrays, consisting of 4 elements each. Finally, a variable is declared to signal the \ac{DSP} core, an interrupt has occurred, which changes the state of the interrupt register and signals a processing request. The memory initialization section starts with the definition of the interrupt register (0xC00004) and the corresponding bit masks used to control the interrupt behavior of the \ac{DSP} core. Afterwards, a section in the shared memory is defined for the storage of input and output audio samples after/before the transport to/from the \ac{PCM} interface. The output section is initialized with an offset of 16 bytes from the input section (0x800000), resulting in a storage capability of 4 32-bit double-words for each of the two memory sections - this is more than needed, but prevents future memory relocation, if the necessity for more space would arise. After this initialization, the interrupt register and the memory sections are declared as volatile variables, telling the compiler, that these variables can be changed outside the normal program flow (e.g., by hardware interrupts), preventing certain optimizations. The final input/output buffers are then declared in form of two 16-bit arrays, consisting of 4 elements each. Finally, a variable is declared to signal the \ac{DSP} core, an interrupt has occurred, which changes the state of the interrupt register and signals a processing request.
\begin{figure}[H] \begin{listing}[H]
\centering \centering
\begin{lstlisting}[language=C] \begin{lstlisting}[style=cstyle]
#define CSS_CMD 0xC00004 #define CSS_CMD 0xC00004
#define CSS_CMD_0 (1 << 0) #define CSS_CMD_0 (1 << 0)
#define CSS_CMD_1 (1 << 1) #define CSS_CMD_1 (1 << 1)
@@ -77,9 +77,9 @@ The memory initialization section starts with the definition of the interrupt re
static volatile int action_required; static volatile int action_required;
\end{lstlisting} \end{lstlisting}
\label{fig:fig_dps_code_memory}
\caption{Low-level implementation: Memory initialization and mapping} \caption{Low-level implementation: Memory initialization and mapping}
\end{figure} \label{lst:lst_dsp_code_memory}
\end{listing}
\noindent Figure \ref{fig:fig_compiler.jpg} shows an exemplary memory map of the input buffer array, taken from the compiler debugger. As the array is initialized as a 16-bit integer array, each element occupies 2 bytes of memory. \noindent Figure \ref{fig:fig_compiler.jpg} shows an exemplary memory map of the input buffer array, taken from the compiler debugger. As the array is initialized as a 16-bit integer array, each element occupies 2 bytes of memory.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
@@ -90,30 +90,41 @@ The memory initialization section starts with the definition of the interrupt re
\paragraph{Main loop and interrupt handling} \paragraph{Main loop and interrupt handling}
The main loop of the \ac{DSP} core is quite compact, as it mainly focuses on handling interrupts and delegating the sample processing to the \ac{ANR} function. The loop starts by enabling interrupts with a compiler-specific function and setting up pointers for the output buffer and the sample variable. After setting the action flag to zero, the main function enters an infinite loop, signaling the \ac{ARM} core it's halted state by setting the interrupt register to 1 and halting the core.\\ \\ The main loop of the \ac{DSP} core is quite compact, as it mainly focuses on handling interrupts and delegating the sample processing to the \ac{ANR} function. The loop starts by enabling interrupts with a compiler-specific function and setting up pointers for the output buffer and the sample variable. After setting the action flag to zero, the main function enters an infinite loop, signaling the \ac{ARM} core it's halted state by setting the interrupt register to 1 and halting the core.\\ \\
If the \ac{ARM} core requests a sample to be processed, it activates the \ac{DSP} core and triggers an interrupt, which sets the action flag to 1. The main loop then checks the action flag, and sets the interrupt register back to 0, indicating the \ac{ARM} core it is now processing the sample. After resetting the action flag, the output pointer is updated to point to the next position in the output buffer using a cyclic addition function. Before triggering the calculate\_output()-function, the calculated sample from the previous cycle is moved from its temporary memory location to the current position in the output buffer. Afterwards, the calculate\_output()-function is triggered for the current cycle and the loop restarts. The flow diagram in Figure \ref{fig:fig_dsp_logic.jpg} visualizes the described behavior of the main loop and interrupt handling on the \ac{DSP} core. If the \ac{ARM} core requests a sample to be processed, it activates the \ac{DSP} core and triggers an interrupt, which sets the action flag to 1. The main loop then checks the action flag, and sets the interrupt register back to 0, indicating the \ac{ARM} core it is now processing the sample. After resetting the action flag, the output pointer is updated to point to the next position in the output buffer using a cyclic addition function. Before triggering the calculate\_output()-function, the calculated sample from the previous cycle is moved from its temporary memory location to the current position in the output buffer. Afterwards, the calculate\_output()-function is triggered for the current cycle and the loop restarts. The flow diagram in Figure \ref{fig:fig_dsp_logic.jpg} visualizes the described behavior of the main loop and interrupt handling on the \ac{DSP} core.
\begin{figure}[H] \begin{listing}[H]
\centering \centering
\begin{lstlisting}[language=C] \begin{lstlisting}[style=cstyle]
int main(void) { int main(void) {
enable_interrupts(); enable_interrupts();
output_pointer = &output_port[1]; output_pointer = &output_port[1];
sample_pointer = &sample; sample_pointer = &sample;
action_required = 0; action_required = 0;
while (1) { while (1) {
css_cmd_flag = CSS_CMD_1; css_cmd_flag = CSS_CMD_1;
core_halt(); core_halt();
if (action_required == 1) { if (action_required == 1) {
css_cmd_flag = CSS_CMD_0; css_cmd_flag = CSS_CMD_0;
action_required = 0; action_required = 0;
out_pointer = cyclic_add(output_pointer, 2, output_port, 4); out_pointer = cyclic_add(output_pointer, 2, output_port, 4);
*output_pointer = *sample_pointer; *output_pointer = *sample_pointer;
calculate_output(&corrupted_signal, &reference_noise_signal, mode, &input_port[1], &input_port[0], sample_pointer);
calculate_output(
&corrupted_signal,
&reference_noise_signal,
mode,
&input_port[1],
&input_port[0],
sample_pointer
);
} }
} }
} }
\end{lstlisting} \end{lstlisting}
\caption{Low-level implementation: Main loop and interrupt handling} \caption{Low-level implementation: Main loop and interrupt handling}
\label{fig:fig_dps_code_mainloop} \label{lst:lst_dsp_code_mainloop}
\end{figure} \end{listing}
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_logic.jpg} \includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_logic.jpg}
@@ -129,22 +140,22 @@ The ability to process audio samples in real-time on the \ac{DSP} core is strong
In the following, some examples of optimization possibilities shall be outlined, before the entire \ac{ANR} implementation on the \ac{DSP} is analyzed in regard of its performance. In the following, some examples of optimization possibilities shall be outlined, before the entire \ac{ANR} implementation on the \ac{DSP} is analyzed in regard of its performance.
\paragraph{Logic operations} \paragraph{Logic operations}
Logic operstions, such as finding the maximum or minimum of two values, are quite common in signal processing algorithms. However, their implementation in C usually involves conditional statements (if-else), which can be inefficient on certain architectures due to pipeline stalls.\\ \\ Logic operstions, such as finding the maximum or minimum of two values, are quite common in signal processing algorithms. However, their implementation in C usually involves conditional statements (if-else), which can be inefficient on certain architectures due to pipeline stalls.\\ \\
The simple function shown in Figure \ref{fig:fig_dsp_code_find_max} returns the maximum of two given integer values. Processing this manual implementation on the \ac{DSP} takes 12 cycles to execute, while the intrinsic function of the \ac{DSP} compiler allows a 4-cycle execution. The simple function shown in Listing \ref{lst:lst_dsp_code_find_max} returns the maximum of two given integer values. Processing this manual implementation on the \ac{DSP} takes 12 cycles to execute, while the intrinsic function of the \ac{DSP} compiler allows a 4-cycle execution.
\begin{figure}[H] \begin{listing}[H]
\centering \centering
\begin{lstlisting}[language=C] \begin{lstlisting}[style=cstyle]
int find_max(int a, int b) { int find_max(int a, int b) {
return (a > b) ? a : b; return (a > b) ? a : b;
} }
\end{lstlisting} \end{lstlisting}
\caption{Manual implementation of a max-function, returning the maximum of two integer values, taking 12 cycles to execute. The intrinsic functions of the DSP compiler allows a 4-cycle implementation of such an operation.} \caption{Manual implementation of a max-function, returning the maximum of two integer values, taking 12 cycles to execute. The intrinsic functions of the DSP compiler allow a 4-cycle implementation of such an operation.}
\label{fig:fig_dsp_code_find_max} \label{lst:lst_dsp_code_find_max}
\end{figure} \end{listing}
\paragraph{Cyclic array iteration} \paragraph{Cyclic array iteration}
Basically every part of the \ac{ANR} algorithm relies on iterating through memory sections in a cyclic manner. In C, this is usually implemented by defining an array, containing the data, and a pointer, which is incremented after each access. When the pointer reaches the end of the array, it is reset to the beginning again. This approach requires several different operations, such as pointer incrementation, if-clauses and for-loops. Basically every part of the \ac{ANR} algorithm relies on iterating through memory sections in a cyclic manner. In C, this is usually implemented by defining an array, containing the data, and a pointer, which is incremented after each access. When the pointer reaches the end of the array, it is reset to the beginning again. This approach requires several different operations, such as pointer incrementation, if-clauses and for-loops.
\begin{figure}[H] \begin{listing}[H]
\centering \centering
\begin{lstlisting}[language=C] \begin{lstlisting}[style=cstyle]
int* cyclic array iteration(int *pointer, int increment, int *pointer_start, int buffer_length){ int* cyclic array iteration(int *pointer, int increment, int *pointer_start, int buffer_length){
int *new_pointer=pointer; int *new_pointer=pointer;
for (int i=0; i < abs(increment); i+=1){ for (int i=0; i < abs(increment); i+=1){
@@ -157,9 +168,9 @@ Basically every part of the \ac{ANR} algorithm relies on iterating through memor
} }
\end{lstlisting} \end{lstlisting}
\caption{Manual implementation of a cyclic array iteration function in C, taking the core 20 cycles to execute a pointer inremen of 1. The intrinsic functions of the DSP compiler allows a single-cycle implementation of such cyclic additions.} \caption{Manual implementation of a cyclic array iteration function in C, taking the core 20 cycles to execute a pointer inremen of 1. The intrinsic functions of the DSP compiler allows a single-cycle implementation of such cyclic additions.}
\label{fig:fig_dsp_code_cyclic_add} \label{lst:lst_dsp_code_cyclic_add}
\end{figure} \end{listing}
\noindent Figure \ref{fig:fig_dsp_code_cyclic_add} shows a manual implementation of such a cyclic array iteration function in C, which updates the pointer to a new address. This implementation takes the \ac{DSP} 20 cycles to execute, while the already implemented compiler-optimized version only takes one cycle, making use of the specific architecture of the \ac{DSP} allowing such a single-cycle operation. \noindent Listing \ref{lst:lst_dsp_code_cyclic_add} shows a manual implementation of such a cyclic array iteration function in C, which updates the pointer to a new address. This implementation takes the \ac{DSP} 20 cycles to execute, while the already implemented compiler-optimized version only takes one cycle, making use of the specific architecture of the \ac{DSP} allowing such a single-cycle operation.
\paragraph{Fractional fixed-point arithmetic} \paragraph{Fractional fixed-point arithmetic}
As already mentioned during the beginning of the current chapter, the used \ac{DSP} is a fixed point processor, meaning, that it does not support floating-point arithmetic natively. Instead, it relies on fixed-point arithmetic, which represents numbers as integers scaled by a fixed factor. This is a key requirement, as it allows the use of the implemented dual \ac{MAC} \ac{ALU}s. This approach is also faster and more energy efficient, and therefore more suitable for embedded systems. However, it also introduces challenges in terms of precision and range, which need to be taken into account when conducting certain calculations.\\ \\ As already mentioned during the beginning of the current chapter, the used \ac{DSP} is a fixed point processor, meaning, that it does not support floating-point arithmetic natively. Instead, it relies on fixed-point arithmetic, which represents numbers as integers scaled by a fixed factor. This is a key requirement, as it allows the use of the implemented dual \ac{MAC} \ac{ALU}s. This approach is also faster and more energy efficient, and therefore more suitable for embedded systems. However, it also introduces challenges in terms of precision and range, which need to be taken into account when conducting certain calculations.\\ \\
To tackle this issues, the \ac{DSP} compiler provides intrinsic functions for fractional fixed-point arithmetic, such as a fractional multiplication function, which takes two 32-bit integers as input and return an already bit-shifted 64-bit output, representing the fractional multiplication result. This approach prevents the need for manual bit-shifting operations after each multiplication.\\ \\ To tackle this issues, the \ac{DSP} compiler provides intrinsic functions for fractional fixed-point arithmetic, such as a fractional multiplication function, which takes two 32-bit integers as input and return an already bit-shifted 64-bit output, representing the fractional multiplication result. This approach prevents the need for manual bit-shifting operations after each multiplication.\\ \\
@@ -169,18 +180,18 @@ To support such operations, a 72-bit accumulator is provided, allowing to store
The $calculate\_output()$-function, forms the center of the \ac{ANR} algorithm on the \ac{DSP} core and is responsbile for the actual processing of the audio samples. The general functionality of the function in C is the same as in the high-level implementation (refer to Figure \ref{fig:fig_anr_logic}), and will therefore not be described in detail again. The main focus lies now on the computational efficiency of the different parts of the function, with the goal of generating a formula by quantizizing the computational effort of the different sub-parts in relation to changeable parameters like the filter length.\\ \\ The $calculate\_output()$-function, forms the center of the \ac{ANR} algorithm on the \ac{DSP} core and is responsbile for the actual processing of the audio samples. The general functionality of the function in C is the same as in the high-level implementation (refer to Figure \ref{fig:fig_anr_logic}), and will therefore not be described in detail again. The main focus lies now on the computational efficiency of the different parts of the function, with the goal of generating a formula by quantizizing the computational effort of the different sub-parts in relation to changeable parameters like the filter length.\\ \\
The $calculate\_output()$ functions consists out of the following five main parts: The $calculate\_output()$ functions consists out of the following five main parts:
\begin{itemize} \begin{itemize}
\item $write\_buffer$: Pointer handling and buffer management \item $write\_buffer()$: Pointer handling and buffer management
\item $apply\_fir\_filter$: Application of the \ac{FIR} filter on the reference noise signal \item $apply\_fir\_filter()$: Application of the \ac{FIR} filter on the reference noise signal
\item $update\_output$: Calculation of the output sample (=error signal) \item $update\_output()$: Calculation of the output sample (=error signal)
\item $update\_filter\_coefficients$: Update of the \ac{FIR} filter coefficients based on the error signal \item $update\_filter\_coefficients()$: Update of the \ac{FIR} filter coefficients based on the error signal
\item $write\_output$: Writing the output sample back to the output port in the shared memory section \item $write\_output()$: Writing the output sample back to the output port in the shared memory section
\end{itemize} \end{itemize}
These sub-functions feature \ac{DSP}-spefic optimizations and are partly depenend on the setable parameters like the filter length in regard of their computational cost. The following paragraphs will analyze the computational efficiency of these sub-functions in detail. These sub-functions feature \ac{DSP}-spefic optimizations and are partly depenend on the setable parameters like the filter length in regard of their computational cost. The following paragraphs will analyze the computational efficiency of these sub-functions in detail.
\paragraph{write\_buffer}The $write\_buffer$-function is responsible for managing the input line, where the samples of the reference noise signal are stored for further processing. The buffer management mainly consits out of a cyclic pointer increase operation and a pointer dereference operation to write the new sample into the buffer. The cyclic pointer increase operation is implemented using the already mentioned intrinsic function of the \ac{DSP} compiler, while the pointer dereference operation takes 15 cycles to execute. This results in a total duration of 16 cycles for the $write\_buffer$-function to process, indipentent of the filter length or other parameters. \paragraph{write\_buffer()}The $write\_buffer()$-function is responsible for managing the input line, where the samples of the reference noise signal are stored for further processing. The buffer management mainly consits out of a cyclic pointer increase operation and a pointer dereference operation to write the new sample into the buffer. The cyclic pointer increase operation is implemented using the already mentioned intrinsic function of the \ac{DSP} compiler, while the pointer dereference operation takes 15 cycles to execute. This results in a total duration of 16 cycles for the $write\_buffer()$-function to process, indipentent of the filter length or other parameters.
\paragraph{apply\_fir\_filter} The $apply\_fir\_filter$-function is responsible for applying the coefficients of the \ac{FIR} filter on the reference noise signal samples stored in the input line. The needed cycles for this function are mainly depenendent on the lenght of the filter, as the number of multiplications and additions increase with the filter length. To increase the performance, the dual \ac{MAC} architecture of the \ac{DSP} is utilized, allowing two multiplications and two additions to be performed in a single cycle. Another \ac{DSP}-specific optimization is the use of the already introduced 72-bit accumulators and the fractional multiplication function, which allows performing multiplications on two 32-bit integers without losing precision or the need for manual bit-shifting operations. \paragraph{apply\_fir\_filter()} The $apply\_fir\_filter()$-function is responsible for applying the coefficients of the \ac{FIR} filter on the reference noise signal samples stored in the input line. The needed cycles for this function are mainly depenendent on the lenght of the filter, as the number of multiplications and additions increase with the filter length. To increase the performance, the dual \ac{MAC} architecture of the \ac{DSP} is utilized, allowing two multiplications and two additions to be performed in a single cycle. Another \ac{DSP}-specific optimization is the use of the already introduced 72-bit accumulators and the fractional multiplication function, which allows performing multiplications on two 32-bit integers without losing precision or the need for manual bit-shifting operations.
\begin{figure}[H] \begin{listing}[H]
\centering \centering
\begin{lstlisting}[language=C] \begin{lstlisting}[style=cstyle]
for (int i=0; i < n_coeff; i+=2) chess_loop_range(1,){ for (int i=0; i < n_coeff; i+=2) chess_loop_range(1,){
x0 = *p_x0; x0 = *p_x0;
w0 = *p_w; w0 = *p_w;
@@ -195,21 +206,21 @@ These sub-functions feature \ac{DSP}-spefic optimizations and are partly depenen
acc_fir_2+=fract_mult(x1, w1); acc_fir_2+=fract_mult(x1, w1);
} }
\end{lstlisting} \end{lstlisting}
\caption{Code snippet of the $apply\_fir\_filter$-function, showing the use of the dual \ac{MAC} architecture of the \ac{DSP} and the fractional multiplication function. The loop iterates through the filter coefficients and reference noise signal samples, performing two multiplications and two additions in each cycle.} \caption{Code snippet of the $apply\_fir\_filter()$-function, showing the use of the dual \ac{MAC} architecture of the \ac{DSP} and the fractional multiplication function. The loop iterates through the filter coefficients and reference noise signal samples, performing two multiplications and two additions in each cycle.}
\label{fig:fig_dsp_code_apply_fir_filter} \label{lst:lst_dsp_code_apply_fir_filter}
\end{figure} \end{listing}
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_fir_cycle.jpg} \includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_fir_cycle.jpg}
\caption{Visualization of the FIR filter calculation in the $apply\_fir\_filter$-function during the 2nd cyclce of a calculation loop. The reference noise signal samples are stored in the sample line, while the filter coefficients are stored in a separate memory section (filter line).} \caption{Visualization of the FIR filter calculation in the $apply\_fir\_filter()$-function during the 2nd cyclce of a calculation loop. The reference noise signal samples are stored in the sample line, while the filter coefficients are stored in a separate memory section (filter line).}
\label{fig:fig_dsp_fir_cycle.jpg} \label{fig:fig_dsp_fir_cycle.jpg}
\end{figure} \end{figure}
\noindent The final result is represented in a computing effort of 1 cycle per item in the sample line buffer (which equals the filter length) plus 12 cycles for general function overhead, resulting in a total of $N+12$ cycles for the $apply\_fir\_filter$-function, with $N$ being the filter length. \noindent The final result is represented in a computing effort of 1 cycle per item in the sample line buffer (which equals the filter length) plus 12 cycles for general function overhead, resulting in a total of $\text{N+12}$ cycles for the $apply\_fir\_filter()$-function, with $N$ being the filter length.
\paragraph{update\_output} The $update\_output$-function is responsible for calculating the output sample based on the error signal and the accumulated filter output. The calculation is a simple subtraction and only takes 1 cycle to execute, independent of the filter length or other parameters. \paragraph{update\_output()} The $update\_output()$-function is responsible for calculating the output sample based on the error signal and the accumulated filter output. The calculation is a simple subtraction and only takes 1 cycle to execute, independent of the filter length or other parameters.
\paragraph{update\_filter\_coefficient} The $update\_filter\_coefficient$-function represents the second computationally expensive part of the $calculate\_output()$-function. The calculated output from the previous function is now multiplied with the step size and the corresponding sample from the reference noise signal, which is stored in the sample line buffer. The result is then added to the current filter coefficient to update it for the next cycle. Again, \ac{DSP}-specific optimizations, like the dual \ac{MAC} architecture, are used, resulting in a computing effort of 6 cycles per filter coeffcient. Per function call, an overhead of 8 cycles is added, resulting in a total of $6*N+8$ cycles for the $update\_filter\_coefficient$-function, with $N$ again being the filter length. \paragraph{update\_filter\_coefficient()} The $update\_filter\_coefficient()$-function represents the second computationally expensive part of the $calculate\_output()$-function. The calculated output from the previous function is now multiplied with the step size and the corresponding sample from the reference noise signal, which is stored in the sample line buffer. The result is then added to the current filter coefficient to update it for the next cycle. Again, \ac{DSP}-specific optimizations, like the dual \ac{MAC} architecture, are used, resulting in a computing effort of 6 cycles per filter coeffcient. Per function call, an overhead of 8 cycles is added, resulting in a total of $\text{6*N+8}$ cycles for the $update\_filter\_coefficient()$-function, with $\text{N}$ again being the filter length.
\begin{figure}[H] \begin{listing}[H]
\centering \centering
\begin{lstlisting}[language=C] \begin{lstlisting}[style=cstyle]
for (int i=0; i< n_coeff; i+=2) chess_loop_range(1,){ for (int i=0; i< n_coeff; i+=2) chess_loop_range(1,){
lldecompose(*((long long *)p_w0), w0, w1); lldecompose(*((long long *)p_w0), w0, w1);
acc_w0 = to_accum(w0); acc_w0 = to_accum(w0);
@@ -225,47 +236,47 @@ These sub-functions feature \ac{DSP}-spefic optimizations and are partly depenen
p_w0+=2; p_w0+=2;
} }
\end{lstlisting} \end{lstlisting}
\caption{Code snippet of the $update\_filter\_coefficient$-function, again making use of the dual \ac{MAC} architecture of the \ac{DSP} and the fractional multiplication function. Additionaly, 32-bit values are loaded and stored as 64-bit values, using two also intrinisc functions, allowing to update two filter coefficients in a single cycle.} \caption{Code snippet of the $update\_filter\_coefficient()$-function, again making use of the dual \ac{MAC} architecture of the \ac{DSP} and the fractional multiplication function. Additionaly, 32-bit values are loaded and stored as 64-bit values, using two also intrinisc functions, allowing to update two filter coefficients in a single cycle.}
\label{fig:fig_dsp_code_update_filter_coefficients} \label{lst:lst_dsp_code_update_filter_coefficients}
\end{figure} \end{listing}
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_coefficient_cycle.jpg} \includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_coefficient_cycle.jpg}
\caption{Visualization of the coefficient calculation in the $update\_filter\_coefficient$-function during the 2nd cyclce of a calculation loop. The output is multiplied with the step size and the corresponding sample from the sample line, before being added to the current filter coefficient.} \caption{Visualization of the coefficient calculation in the $update\_filter\_coefficient()$-function during the 2nd cyclce of a calculation loop. The output is multiplied with the step size and the corresponding sample from the sample line, before being added to the current filter coefficient.}
\label{fig:fig_dsp_coefficient_cycle.jpg} \label{fig:fig_dsp_coefficient_cycle.jpg}
\end{figure} \end{figure}
\paragraph{write\_output} The $update\_output$-function is responsible for writing the calculated output sample back into the shared memory section. The operation takes 5 cycles to execute, independent of the filter length or other parameters. \paragraph{update\_output()} The $update\_output()$-function is responsible for writing the calculated output sample back into the shared memory section. The operation takes 5 cycles to execute, independent of the filter length or other parameters.
\noindent The total computing effort of the $calculate\_output()$-function in dependency of the filter length $N$ can now be calculated by summing up the computing efforts of the different sub-functions: \noindent The total computing effort of the $calculate\_output()$-function in dependency of the filter length $\text{N}$ can now be calculated by summing up the computing efforts of the different sub-functions:
\begin{equation} \begin{equation}
\label{equation_computing} \label{equation_computing}
\begin{aligned} \begin{aligned}
C_{total} = C_{write\_buffer} + C_{apply\_fir\_filter} + C_{update\_output} + \\ \text{C}_{\text{total}} = \text{C}_{\text{write\_buffer}} + \text{C}_{\text{apply\_fir\_filter}} + \text{C}_{\text{update\_output}} + \\
C_{update\_filter\_coefficient} + C_{write\_output} \text{C}_{\text{update\_filter\_coefficient}} + \text{C}_{\text{write\_output}}
\end{aligned} \end{aligned}
\end{equation} \end{equation}
The sub-functions can seperatly be expressed in dependency of the filter length $N$ and also in dependency of the update rate of the filter coefficients, which is represented by the parameter $1/U$ (e.g., if the coefficients are updated every 2 cycles, $1/U$ would result in a vaule of 0.5): The sub-functions can seperatly be expressed in dependency of the filter length $\text{N}$ and also in dependency of the update rate of the filter coefficients, which is represented by the parameter $\text{1/U}$ (e.g., if the coefficients are updated every 2 cycles, $\text{1/U}$ would result in a vaule of 0.5):
\begin{gather} \begin{gather}
\label{equation_c_1} \label{equation_c_1}
C_{write\_buffer} = 16 \\ \text{C}_{\text{write\_buffer()}} = 16 \\
\label{equation_c_2} \label{equation_c_2}
C_{apply\_fir\_filter} = N + 12 \\ \text{C}_{\text{apply\_fir\_filter()}} = \text{N} + 12 \\
\label{equation_c_3} \label{equation_c_3}
C_{update\_output} = 1 \\ \text{C}_{\text{update\_output()}} = 1 \\
\label{equation_c_4} \label{equation_c_4}
C_{update\_filter\_coefficient} = \frac{1}{U}(6*N + 8)\\ \text{C}_{\text{update\_filter\_coefficient()}} = \frac{1}{\text{U}}(6*\text{N} + 8)\\
\label{equation_c_5} \label{equation_c_5}
C_{write\_output} = 5 \\ \text{C}_{\text{write\_output()}} = 5 \\
\end{gather} \end{gather}
\noindent By inserting the sub-function costs into the total computing effort formula, Equation \ref{equation_computing} can now be expressed as: \noindent By inserting the sub-function costs into the total computing effort formula, Equation \ref{equation_computing} can now be expressed as:
\begin{equation} \begin{equation}
\label{equation_computing_final} \label{equation_computing_final}
C_{total} = N + \frac{6*N+8}{U} + 34 \text{C}_{\text{total}} = \text{N} + \frac{6*\text{N}+8}{\text{U}} + 34
\end{equation} \end{equation}
Equation \ref{equation_computing_final} now provides an estimation of the necessary computing effort for one output sample in relation to the filter length $N$ and the update rate of the filter coefficients $1/U$. This formula can now be used to estimate the needed computing power (and therefore the power consumption) of the \ac{DSP} core for different parameter settings, alowing to find an optimal parameter configuration in regard of the quality of the noise reduction and the power consumption of the system. Equation \ref{equation_computing_final} now provides an estimation of the necessary computing effort for one output sample in relation to the filter length $\text{N}$ and the update rate of the filter coefficients $\text{1/U}$. This formula can now be used to estimate the needed computing power (and therefore the power consumption) of the \ac{DSP} core for different parameter settings, alowing to find an optimal parameter configuration in regard of the quality of the noise reduction and the power consumption of the system.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_c_total.png} \includegraphics[width=1.0\linewidth]{Bilder/fig_c_total.png}
\caption{Dependence of the total computing effort on the filter length $N$ and update rate $1/U$.} \caption{Dependence of the total computing effort on the filter length $\text{N}$ and update rate $\text{1/U}$.}
\label{fig:fig_c_total.png} \label{fig:fig_c_total.png}
\end{figure} \end{figure}
+92 -29
View File
@@ -7,18 +7,18 @@
\acronymused{FIR} \acronymused{FIR}
\acronymused{ANR} \acronymused{ANR}
\acronymused{SNR} \acronymused{SNR}
\@writefile{lof}{\contentsline {figure}{\numberline {46}{\ignorespaces Desired signal, corrupted signal, reference noise signal and filter output of the complex \ac {ANR} use case, simulated on the \ac {DSP}}}{58}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {39}{\ignorespaces Desired signal, corrupted signal, reference noise signal and filter output of the complex \ac {ANR} use case, simulated on the \ac {DSP}}}{58}{}\protected@file@percent }
\acronymused{ANR} \acronymused{ANR}
\acronymused{DSP} \acronymused{DSP}
\newlabel{fig:fig_plot_1_dsp_complex.png}{{46}{58}{}{figure.46}{}} \newlabel{fig:fig_plot_1_dsp_complex.png}{{39}{58}{}{}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {47}{\ignorespaces Error signal of the complex \ac {ANR} use case, simulated on the \ac {DSP}}}{59}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {40}{\ignorespaces Error signal of the complex \ac {ANR} use case, simulated on the \ac {DSP}}}{59}{}\protected@file@percent }
\acronymused{ANR} \acronymused{ANR}
\acronymused{DSP} \acronymused{DSP}
\newlabel{fig:fig_plot_2_dsp_complex.png}{{47}{59}{}{figure.47}{}} \newlabel{fig:fig_plot_2_dsp_complex.png}{{40}{59}{}{}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {48}{\ignorespaces Comparison of the high- and low-level simulation output.}}{59}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {41}{\ignorespaces Comparison of the high- and low-level simulation output.}}{59}{}\protected@file@percent }
\newlabel{fig:fig_high_low_comparison.png}{{48}{59}{}{figure.48}{}} \newlabel{fig:fig_high_low_comparison.png}{{41}{59}{}{}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {49}{\ignorespaces Histogram of the error amplitude between the high- and low-level simulation output.}}{60}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {42}{\ignorespaces Histogram of the error amplitude between the high- and low-level simulation output.}}{60}{}\protected@file@percent }
\newlabel{fig:fig_high_low_comparison_hist.png}{{49}{60}{}{figure.49}{}} \newlabel{fig:fig_high_low_comparison_hist.png}{{42}{60}{}{}{}}
\acronymused{ANR} \acronymused{ANR}
\acronymused{DSP} \acronymused{DSP}
\acronymused{SNR} \acronymused{SNR}
@@ -31,13 +31,13 @@
\acronymused{CI} \acronymused{CI}
\acronymused{ANR} \acronymused{ANR}
\acronymused{CI} \acronymused{CI}
\@writefile{lof}{\contentsline {figure}{\numberline {50}{\ignorespaces Noise signals used to corrupt the desired signal in the computational efficiency evaluation}}{61}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {43}{\ignorespaces Noise signals used to corrupt the desired signal in the computational efficiency evaluation}}{61}{}\protected@file@percent }
\newlabel{fig:fig_noise_signals.png}{{50}{61}{}{figure.50}{}} \newlabel{fig:fig_noise_signals.png}{{43}{61}{}{}{}}
\acronymused{ANR} \acronymused{ANR}
\acronymused{SNR} \acronymused{SNR}
\@writefile{lof}{\contentsline {figure}{\numberline {51}{\ignorespaces Simulation of the to be expected \ac {SNR}-Gain for different noise signals and filter lengths applied to the desired signal of a male speaker. The applied delay between the signals amounts 2ms. The graphs are smoothed by a third order savigol filter.}}{62}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {44}{\ignorespaces Simulation of the to be expected \ac {SNR}-Gain for different noise signals and filter lengths applied to the desired signal of a male speaker. The applied delay between the signals amounts 2ms. The graphs are smoothed by a third order savigol filter.}}{62}{}\protected@file@percent }
\acronymused{SNR} \acronymused{SNR}
\newlabel{fig:fig_snr_comparison.png}{{51}{62}{}{figure.51}{}} \newlabel{fig:fig_snr_comparison.png}{{44}{62}{}{}{}}
\acronymused{SNR} \acronymused{SNR}
\acronymused{SNR} \acronymused{SNR}
\acronymused{SNR} \acronymused{SNR}
@@ -47,39 +47,100 @@
\acronymused{ANR} \acronymused{ANR}
\acronymused{SNR} \acronymused{SNR}
\acronymused{ANR} \acronymused{ANR}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.3}Evaluation of the computational load for fixed implementation}{62}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsection}{\numberline {5.3}Evaluation of a a fixed update implementation}{62}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsubsection}{\numberline {5.3.1}Full-Update implementation}{62}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsubsection}{\numberline {5.3.1}Full-Update implementation}{62}{}\protected@file@percent }
\newlabel{equation_computing_calculation}{{32}{63}{}{equation.32}{}} \newlabel{equation_computing_calculation_full_update}{{32}{62}{}{}{}}
\acronymused{PCM} \acronymused{PCM}
\acronymused{DSP} \acronymused{DSP}
\acronymused{DSP} \acronymused{DSP}
\newlabel{equation_cycle_budget}{{33}{63}{}{equation.33}{}} \newlabel{equation_cycle_budget}{{33}{63}{}{}{}}
\acronymused{DSP} \acronymused{DSP}
\newlabel{equation_load_calculation}{{34}{63}{}{equation.34}{}} \newlabel{equation_load_calculation_full_update}{{34}{63}{}{}{}}
\acronymused{ANR} \acronymused{ANR}
\acronymused{SNR} \acronymused{SNR}
\acronymused{DSP} \acronymused{DSP}
\acronymused{DSP} \acronymused{DSP}
\acronymused{SNR} \acronymused{SNR}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {5.3.2}Reduced-update implementation}{63}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsubsection}{\numberline {5.3.2}Reduced-update implementation for the benchmark case}{63}{}\protected@file@percent }
\acronymused{DSP} \acronymused{DSP}
\acronymused{SNR} \acronymused{SNR}
\@writefile{lof}{\contentsline {figure}{\numberline {52}{\ignorespaces Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the update rate of the ANR algorithm. The baseline is the full update variant the complex usecase. The marked dots represent the results of the simulation for an explicit setup.}}{64}{}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {45}{\ignorespaces Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the update rate for the benchmark case. The baseline of 100\% is the full update implementation. The marked dots represent the results of the simulation for an explicit setup.}}{64}{}\protected@file@percent }
\newlabel{fig:fig_snr_reduced_update.png}{{52}{64}{}{figure.52}{}} \newlabel{fig:fig_snr_update_rate.png}{{45}{64}{}{}{}}
\acronymused{SNR} \acronymused{SNR}
\acronymused{DSP} \acronymused{DSP}
\acronymused{SNR} \acronymused{SNR}
\acronymused{SNR} \acronymused{SNR}
\acronymused{DSP} \acronymused{DSP}
\newlabel{equation_computing_calculation}{{35}{64}{}{equation.35}{}} \newlabel{equation_computing_calculation_reduced_update_1}{{35}{64}{}{}{}}
\newlabel{equation_load_calculation}{{36}{64}{}{equation.36}{}} \newlabel{equation_load_calculation_reduced_update_1}{{36}{64}{}{}{}}
\acronymused{DSP} \acronymused{DSP}
\acronymused{SNR} \acronymused{SNR}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.4}Evaluation of the computational load for error driven implementation}{64}{}\protected@file@percent } \@writefile{toc}{\contentsline {subsubsection}{\numberline {5.3.3}Reduced-update implementation for multiple noise signals}{64}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {5.5}Summary of the performance evaluation}{65}{}\protected@file@percent } \acronymused{SNR}
\@writefile{lof}{\contentsline {figure}{\numberline {46}{\ignorespaces Performance gain (distance between relative SNR-Gain and needed relative cycles/sample) in relation to the update rate of the ANR algorithm for different noise signals.}}{65}{}\protected@file@percent }
\newlabel{fig:fig_gain_update_rate.png}{{46}{65}{}{}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {47}{\ignorespaces Absolute \ac {DSP} load in relation to the update rate of the ANR algorithm for different noise signals.}}{65}{}\protected@file@percent }
\acronymused{DSP}
\newlabel{fig:fig_load_update_rate.png}{{47}{65}{}{}{}}
\acronymused{DSP}
\newlabel{equation_computing_calculation_reduced_update_2}{{37}{65}{}{}{}}
\newlabel{equation_load_calculation_reduced_update_2}{{38}{66}{}{}{}}
\acronymused{DSP}
\acronymused{SNR}
\acronymused{DSP}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {5.3.4}Computational load for reduced-update implementation}{66}{}\protected@file@percent }
\newlabel{equation_update_1}{{39}{66}{}{}{}}
\newlabel{equation_update_2}{{40}{66}{}{}{}}
\acronymused{DSP}
\newlabel{equation_computing_calculation_reduced_update_3}{{41}{66}{}{}{}}
\newlabel{equation_load_calculation_reduced_update_3}{{42}{66}{}{}{}}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.4}Evaluation of an error driven implementation}{66}{}\protected@file@percent }
\@writefile{toc}{\contentsline {subsubsection}{\numberline {5.4.1}Error threshold implementation for the benchmark case}{67}{}\protected@file@percent }
\@writefile{lof}{\contentsline {figure}{\numberline {48}{\ignorespaces Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the error threshold for the benchmark case. The baseline of 100\% is the full update implementation. The marked dots represent the results of the simulation for an explicit setup.}}{67}{}\protected@file@percent }
\newlabel{fig:fig_snr_error_threshold.png}{{48}{67}{}{}{}}
\acronymused{SNR}
\acronymused{DSP}
\newlabel{equation_computing_calculation_error threshold_1}{{43}{68}{}{}{}}
\newlabel{equation_load_calculation_error threshold_1}{{44}{68}{}{}{}}
\acronymused{DSP}
\acronymused{SNR}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {5.4.2}Error threshold implementaion for multiple noise signals}{68}{}\protected@file@percent }
\acronymused{SNR}
\@writefile{lof}{\contentsline {figure}{\numberline {49}{\ignorespaces Performance gain (distance between relative SNR-Gain and needed relative cycles/sample) in relation to the error threshold for different noise signals.}}{68}{}\protected@file@percent }
\newlabel{fig:fig_gain_error_threshold.png}{{49}{68}{}{}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {50}{\ignorespaces Absolute \ac {DSP} load in relation to the error threshold for different noise signals.}}{69}{}\protected@file@percent }
\acronymused{DSP}
\newlabel{fig:fig_load_error_threshold.png}{{50}{69}{}{}{}}
\acronymused{DSP}
\acronymused{DSP}
\newlabel{equation_computing_calculation_error_threshold_2}{{45}{69}{}{}{}}
\newlabel{equation_load_calculation_error_threshold_2}{{46}{69}{}{}{}}
\acronymused{DSP}
\acronymused{SNR}
\acronymused{DSP}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {5.4.3}Computational load for error threshold implementation}{69}{}\protected@file@percent }
\newlabel{equation_update_3}{{47}{70}{}{}{}}
\acronymused{DSP}
\newlabel{equation_computing_calculation_error_threshold_3}{{48}{70}{}{}{}}
\newlabel{equation_load_calculation_error_threshold_3}{{49}{70}{}{}{}}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.5}Summary of the performance evaluation}{70}{}\protected@file@percent }
\acronymused{ANR}
\acronymused{SNR}
\acronymused{DSP}
\acronymused{DSP}
\acronymused{SNR}
\acronymused{SNR}
\acronymused{DSP}
\acronymused{DSP}
\acronymused{DSP}
\acronymused{SNR}
\acronymused{SNR}
\acronymused{DSP}
\acronymused{ANR}
\acronymused{CI}
\@setckpt{chapter_05}{ \@setckpt{chapter_05}{
\setcounter{page}{66} \setcounter{page}{72}
\setcounter{equation}{36} \setcounter{equation}{49}
\setcounter{enumi}{0} \setcounter{enumi}{0}
\setcounter{enumii}{0} \setcounter{enumii}{0}
\setcounter{enumiii}{0} \setcounter{enumiii}{0}
@@ -92,7 +153,7 @@
\setcounter{subsubsection}{0} \setcounter{subsubsection}{0}
\setcounter{paragraph}{0} \setcounter{paragraph}{0}
\setcounter{subparagraph}{0} \setcounter{subparagraph}{0}
\setcounter{figure}{52} \setcounter{figure}{50}
\setcounter{table}{0} \setcounter{table}{0}
\setcounter{float@type}{16} \setcounter{float@type}{16}
\setcounter{tabx@nest}{0} \setcounter{tabx@nest}{0}
@@ -104,7 +165,7 @@
\setcounter{citetotal}{0} \setcounter{citetotal}{0}
\setcounter{multicitecount}{0} \setcounter{multicitecount}{0}
\setcounter{multicitetotal}{0} \setcounter{multicitetotal}{0}
\setcounter{instcount}{18} \setcounter{instcount}{22}
\setcounter{maxnames}{3} \setcounter{maxnames}{3}
\setcounter{minnames}{1} \setcounter{minnames}{1}
\setcounter{maxitems}{3} \setcounter{maxitems}{3}
@@ -213,7 +274,7 @@
\setcounter{lstnumber}{15} \setcounter{lstnumber}{15}
\setcounter{FancyVerbLine}{0} \setcounter{FancyVerbLine}{0}
\setcounter{linenumber}{1} \setcounter{linenumber}{1}
\setcounter{LN@truepage}{65} \setcounter{LN@truepage}{71}
\setcounter{FancyVerbWriteLine}{0} \setcounter{FancyVerbWriteLine}{0}
\setcounter{FancyVerbBufferLine}{0} \setcounter{FancyVerbBufferLine}{0}
\setcounter{FV@TrueTabGroupLevel}{0} \setcounter{FV@TrueTabGroupLevel}{0}
@@ -223,6 +284,8 @@
\setcounter{FancyVerbLineBreakLast}{0} \setcounter{FancyVerbLineBreakLast}{0}
\setcounter{FV@BreakBufferDepth}{0} \setcounter{FV@BreakBufferDepth}{0}
\setcounter{minted@FancyVerbLineTemp}{0} \setcounter{minted@FancyVerbLineTemp}{0}
\setcounter{listing}{0} \setcounter{listing}{7}
\setcounter{caption@flags}{2}
\setcounter{continuedfloat}{0}
\setcounter{lstlisting}{0} \setcounter{lstlisting}{0}
} }
+136 -26
View File
@@ -49,48 +49,158 @@ The vizualization of the noise signals is shown in Figure \ref{fig:fig_noise_sig
\caption{Simulation of the to be expected \ac{SNR}-Gain for different noise signals and filter lengths applied to the desired signal of a male speaker. The applied delay between the signals amounts 2ms. The graphs are smoothed by a third order savigol filter.} \caption{Simulation of the to be expected \ac{SNR}-Gain for different noise signals and filter lengths applied to the desired signal of a male speaker. The applied delay between the signals amounts 2ms. The graphs are smoothed by a third order savigol filter.}
\label{fig:fig_snr_comparison.png} \label{fig:fig_snr_comparison.png}
\end{figure} \end{figure}
\noindent Figure \ref{fig:fig_snr_comparison.png} shows the expected \ac{SNR}-Gain for the different noise signals and filter lengths. The results show, that a minimum filter length of about 32 taps is required, before (in any case) a significant rise in the \ac{SNR}-Gain can be observed - this is highly contrary to the synchrone intermediate high level simulation, where a filter length of only 16 taps provided sufficent noise reduction. This circumstance can be explained by the fact, that the corruption noise signal is now delayed to the reference noise signal, meaning, that the filter needs a certain length before it can be sufficently adapted. The results also show, that the \ac{SNR}-Gain is different for the different noise signals, indicating, that the noise signals have different characteristics, like the number of peaks, their frequency spectrum and their amplitude.\\ \\ \noindent Figure \ref{fig:fig_snr_comparison.png} shows the expected \ac{SNR}-Gain for the different noise signals and filter lengths. The results shows, that a minimum filter length of about 32 taps is required, before (in any case) a significant rise in the \ac{SNR}-Gain can be observed - this is highly contrary to the synchrone intermediate high level simulation, where a filter length of only 16 taps provided sufficent noise reduction. This circumstance can be explained by the fact, that the corruption noise signal is now delayed to the reference noise signal, meaning, that the filter needs a certain length before it can be sufficently adapted. The results also show, that the \ac{SNR}-Gain is different for the different noise signals, indicating, that the noise signals have different characteristics, like the number of peaks, their frequency spectrum an their amplitude.\\ \\
The mean \ac{SNR}-Gain of the different noise signals, also shown in Figure \ref{fig:fig_snr_comparison.png}, signals, that after reaching 95\% of the maximum \ac{SNR}-Gain, the \ac{SNR}-Gain increase is slowing down. This threshold is reached at a filter length of 45 taps. This means, that a filter length of 45 taps represents an optimal solution for a statisfying performance of the \ac{ANR} algorithm, while a further increase of the filter length does not lead to a significant increase of the \ac{SNR}-Gain in this setup. This is an important finding, as it allows optimizing the computational efficiency of the \ac{ANR} algorithm by choosing an appropriate filter length. The mean \ac{SNR}-Gain of the different noise signals, also shown in Figure \ref{fig:fig_snr_comparison.png}, signals, that after reaching 95\% of the maximum \ac{SNR}-Gain, the \ac{SNR}-Gain increase is slowing down. This threshold is reached at a filter length of 45 taps. This means, that a filter length of 45 taps represents an optimal solution for a statisfying performance of the \ac{ANR} algorithm, while a further increase of the filter length does not lead to a significant increase of the \ac{SNR}-Gain in this setup. This is an important finding, as it allows to optimize the computational efficiency of the \ac{ANR} algorithm by choosing an appropriate filter length.
\subsection{Evaluation of the computational load for fixed implementation} \subsection{Evaluation of a a fixed update implementation}
\subsubsection{Full-Update implementation} \subsubsection{Full-Update implementation}
\noindent Equation \ref{equation_computing_final} can now be utilized to calculate the needed cycles for the calculation of one sample of the filter output, using a filter length of 45 taps and an update of the filter coefficients every cycle. The needed cycles are calculated as follows: \noindent Equation \ref{equation_computing_final} can now be utilized to calculate the needed cycles for the calculation of one sample of the filter output, using a filter length of 45 taps and an update of the filter coefficients every cycle. The needed cycles are calculated as follows:
\begin{equation} \begin{equation}
\label{equation_computing_calculation} \label{equation_computing_calculation_full_update}
C_{total} = 45 + (6*45+8)*1 + 34 = 357 \text{ cycles} \text{C}_{\text{total}} = 45 + (6*45+8)*1 + 34 = 357 \text{ cycles}
\end{equation} \end{equation}
As already mentioned in the previous chapters, the sampling rate of the audio data provided to the \ac{PCM} interface amounts 20 kHz. The prefered clock frequency of the \ac{DSP} is chosen as 16 MHz, which means, that the \ac{DSP} core has cycle budget of As already mentioned in the previous chapters, the sampling rate of the audio data provided to the \ac{PCM} interface amounts 20 kHz. The prefered clock frequency of the \ac{DSP} is chosen as 16 MHz, which means, that the \ac{DSP} core has cycle budget of
\begin{equation} \begin{equation}
\label{equation_cycle_budget} \label{equation_cycle_budget}
C_{budget} = \frac{16 MHz}{20 kHz} = 800 \text{ cycles} \text{C}_{\text{budget}} = \frac{16 \text{ MHz}}{20 \text{ kHz}} = 800 \text{ cycles}
\end{equation} \end{equation}
\noindent for one sample. With these two values, the load of the \ac{DSP} core can be calculated as follows: \noindent for one sample. With these two values, the load of the \ac{DSP} core can be calculated as follows:
\begin{equation} \begin{equation}
\label{equation_load_calculation} \label{equation_load_calculation_full_update}
Load_{DSP} = \frac{C_{total}}{C_{budget}} = \frac{357 \text{ cycles}}{800 \text{ cycles}} = 44.6 \% \text{Load}_{\text{DSP}} = \frac{\text{C}_{\text{total}}}{\text{C}_{\text{budget}}} = \frac{357 \text{ cycles}}{800 \text{ cycles}} = 44.6 \%
\end{equation} \end{equation}
\noindent The results, calculated in Equation \ref{equation_computing_calculation} to \ref{equation_load_calculation} can be summarized as follows:\\ \\ \noindent The results, calculated in Equation \ref{equation_computing_calculation_full_update} to \ref{equation_load_calculation_full_update} can also be recapped as follows:\\ \\
With the optimal filter length of 45 taps and an update rate of the filter coefficients every cycle, the \ac{ANR} algorithm is able to achieve a \ac{SNR}-Gain of about 11.54 dB, averaged over different signal/noise combinations. Under these circumstances, the computational load of the \ac{DSP} core amounts about 45\%, which means that 55\% of the time, which a new sample takes to arrive, it can be halted, and therefore, the overall power consumption can be reduced.\\ \\ With the optimal filter length of 45 taps and an update rate of the filter coefficients every cycle, the \ac{ANR} algorithm is able to achieve a \ac{SNR}-Gain of about 11.54 dB, averaged over different signal/noise combinations. Under this circumstances, the computational load of the \ac{DSP} core amounts about 45\%, which means that 55\% of the time, which a new sample takes to arrive, it can be halted, and therefore, the overall power consumption can be reduced.\\ \\
The initial signal/noise combination of a male speaker disturbed by a breathing noise, which is used for the verification of the \ac{DSP} implementation, delivers with 45 filter coefficients a \ac{SNR}-Gain of about 9.47 dB, which will be again used as a benchmark for the coming evaluations. The initial signal/noise combination of a male speaker disturbed by a breathing noise, which is used for the verification of the \ac{DSP} implementation, shall serve as a benchmark for the coming evaluations. With 45 filter coefficients it delivers an \ac{SNR}-Gain of about 9.47 dB.
\subsubsection{Reduced-update implementation} \subsubsection{Reduced-update implementation for the benchmark case}
The most straight-forward method to further reduce the computing effort for the \ac{DSP} core is to reduce the update frequency of the filter coeffcients. This means, that for every sample, the new filter coefficients are calculated, but not written to the into the Filter Line, meaning that the filter, calculated for the previous sample, is applied to the actual sample. Depending on the acoustic situation, the savings in computing power will most likely lead to a degredation of the noise reduction quality, depending on if the current situation is highly dynamic (and therefore would require a frequent update of the filter coefficients) or is rather static. Changing the update frequency, changes the denominator in Equation \ref{equation_c_5} and therefore in Equation \ref{equation_computing_final}.\\ \\ The most straight-forward method to further reduce the computing effort for the \ac{DSP} core is to reduce the update frequency of the filter coeffcients. For every sample, the new filter coefficients are calculated, but not written to the into the Filter Line - this means, that the filter, calculated for the previous sample, is applied to the actual sample. Depending on the acoustic situation, the savings in computing power will most likely lead to a degredation of the noise reduction quality, depending if the current situation is highly dynamic (and therefore would require a frequent update of the filter coefficients) or is rather static. Changing the update frequency, changes the denominator in Equation \ref{equation_c_5} and therefore in Equation \ref{equation_computing_final}.\\ \\
As already mentioned, the reduction of the update rate is evaluated for the signal/noise combination of a male speaker disturbed by a breathing noise. Therefore, the \ac{SNR}-Gain of 9.47 dB with 45 filter coefficients represent 100\% achievable noise reduction with a maximum of 357 cycles. As already mentioned, the reduction of the update rate is initially evaluated for the benchmark case (male speaker disturbed by a breathing noise) and then checked for general validity. Therefore the \ac{SNR}-Gain of 9.47 dB with 45 filter coefficients represent 100\% achievable noise reduction with a maximum of 357 cycles (also 100\%) in the following figure.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_snr_reduced_update.png} \includegraphics[width=1.0\linewidth]{Bilder/fig_snr_update_rate.png}
\caption{Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the update rate of the ANR algorithm. The baseline is the full update variant the complex usecase. The marked dots represent the results of the simulation for an explicit setup.} \caption{Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the update rate for the benchmark case. The baseline of 100\% is the full update implementation. The marked dots represent the results of the simulation for an explicit setup.}
\label{fig:fig_snr_reduced_update.png} \label{fig:fig_snr_update_rate.png}
\end{figure} \end{figure}
\noindent Figure \ref{fig:fig_snr_reduced_update.png} descriptively illustrates the trend of the \ac{SNR}-Gain, the executed cycles per sample and the \ac{DSP} load compared to the full-update algorithm. Contrary to the executed cycles per sample and the load of the processor, the \ac{SNR}-Gain does not behave linear over the course of reducing the update frequency. This behavior allows us, to determinte the update rate, where the benevolent ratio of \ac{SNR}-Gain in regard to \ac{DSP} load can be expected.\\ \\
The maximum offset bewteen the two graphs can be cound found at an updat rate of 0.39, meaning, that an update of the filter coefficients is only conducted in roughly 2 out of 5 samples. Updating Equation \ref{equation_computing_calculation} and \ref{equation_load_calculation} therefore delivers: \noindent Figure \ref{fig:fig_snr_update_rate.png} descriptively illustrates the trend of the \ac{SNR}-Gain, the executed cycles per sample and the \ac{DSP} load compared to the full-update variant of the benchmark case. Contrary to the executed cycles per sample and the load of the processor, the \ac{SNR}-Gain does not behave linear over the course of reducing the update frequency. This behavior allows us to determinte the update rate, where the most benevolent ratio of \ac{SNR}-Gain in regard to \ac{DSP} load can be expected.\\ \\
The maximum offset bewteen the two graphs can be found at an updat rate of 0.39, meaning, that an update of the filter coefficients is only conducted in roughly 2 out of 5 samples. Updating Equation \ref{equation_computing_calculation_full_update} and \ref{equation_load_calculation_full_update} therefore delivers:
\begin{equation} \begin{equation}
\label{equation_computing_calculation} \label{equation_computing_calculation_reduced_update_1}
C_{total} = 45 + (6*45+8)*0.39 + 34 = 188 \text{ cycles} \text{C}_{\text{total}} = 45 + (6*45+8)*0.39 + 34 = 188 \text{ cycles}
\end{equation} \end{equation}
\begin{equation} \begin{equation}
\label{equation_load_calculation} \label{equation_load_calculation_reduced_update_1}
Load_{DSP} = \frac{C_{total}}{C_{budget}} = \frac{188 \text{ cycles}}{800 \text{ cycles}} = 23.5 \% \text{Load}_{\text{DSP}} = \frac{\text{C}_{\text{total}}}{\text{C}_{\text{budget}}} = \frac{188 \text{ cycles}}{800 \text{ cycles}} = 23.5 \%
\end{equation} \end{equation}
The interpreation of these results leads to the coclusion, that the most cost-effective way to reduce the load of the \ac{DSP} would be to reduce the update rate of the filter coefficients to 0.39. This action nearly halfs the processor load, while only reducing the \ac{SNR}-Gain by rougly 31 \% to 6.40 dB. The next step will be to determine the possibilites of a dynamic reduction of the update frequency to further improve the cost-value ratio of our implemenation. The interpretation of this results leads to the conclusion, that the most cost-effective way to reduce the load of the \ac{DSP} would be to reduce the update rate of the filter coefficients to 0.39. In the case of the benchmark signal/noise combination, this action nearly halfs the processor load from 44.6\% to 23.5\%, while only reducing the \ac{SNR}-Gain by rougly 31 \% from 9.47 dB to 6.40 dB. In the next step, the same analysis will be applied on all introduced noise signal, to get an idea of the general validity of the mad eobservation.
\subsection{Evaluation of the computational load for error driven implementation} \subsubsection{Reduced-update implementation for multiple noise signals}
The error-driven implemenation approach focuses on an error metric, over which the decision for a coefficient update is made. Now the same evaluation as in the previous subchapter is conducted for the five introduced noise signals, with the difference, that now on the y-axis the performance gain (the distance between relative SNR-Gain and needed relative cycles/sample) instead of the \ac{SNR}-Gain is plotted.
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_gain_update_rate.png}
\caption{Performance gain (distance between relative SNR-Gain and needed relative cycles/sample) in relation to the update rate of the ANR algorithm for different noise signals.}
\label{fig:fig_gain_update_rate.png}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_load_update_rate.png}
\caption{Absolute \ac{DSP} load in relation to the update rate of the ANR algorithm for different noise signals.}
\label{fig:fig_load_update_rate.png}
\end{figure}
\noindent Figure \ref{fig:fig_gain_update_rate.png} shows the performance gain for the five different scenarios. The mean performance gain for all scenarious now wandered to an update rate of 0.32. Figure \ref{fig:fig_load_update_rate.png} shows the load of the \ac{DSP} core for the different update rates, which is the same for all scenarios, as it is only dependent on the update rate itself.
\begin{equation}
\label{equation_computing_calculation_reduced_update_2}
\text{C}_{\text{total}} = 45 + (6*45+8)*0.32 + 34 = 168 \text{ cycles}
\end{equation}
\begin{equation}
\label{equation_load_calculation_reduced_update_2}
\text{Load}_{\text{DSP}} = \frac{\text{C}_{\text{total}}}{\text{C}_{\text{budget}}} = \frac{168 \text{ cycles}}{800 \text{ cycles}} = 20.8 \%
\end{equation}
Equation \ref{equation_computing_calculation_reduced_update_2} and \ref{equation_load_calculation_reduced_update_2} confirm, that for an update rate of 0.32, a reduction of the \ac{DSP} load to 20.8\% can be achieved, correlating with a performance gain of 24.9\%. This means, that for all viewed scenarios, an update rate of 0.32 represents the best cost-value ratio, for reducing the load while geting the best possible noise reduction. The relative performance for all scenarios result in a mean \ac{SNR}-Gain reduction of 24.5\% from 11.54 dB to 8.72 dB, while the load of the \ac{DSP} core is reduced by about 53.4\% from 44.6\% to 20.8\%.
\subsubsection{Computational load for reduced-update implementation}
The most straight forward implmementation of a reduced update rate is through the use of a counter and a modulo operation, which checks, if for the current sample the filter coefficients has to be updated or not. The code must therefore be extended by two blocks which are responsible for additional computational load:
\begin{gather}
\label{equation_update_1}
\text{C}_{\text{increment\_counter}} = 5 \text{ cycles} \\
\label{equation_update_2}
\text{C}_{\text{check\_counter}} = 23 (24) \text{ cycles}
\end{gather}
Incrementing the counter and checking if the counter has reached the update rate through a modulo operation adds 29 cycles to cycle count for one sample (28 when the coefficients are updated and 29 when they are not updated). Equation \ref{equation_computing_calculation_reduced_update_3} and \ref{equation_load_calculation_reduced_update_3} show the new calculation of the needed cycles and the load of the \ac{DSP} core for an update rate of 0.32:
\begin{equation}
\label{equation_computing_calculation_reduced_update_3}
\text{C}_{\text{total}} = 45 + (6*45+8)*0.32 + 63 = 197 \text{ cycles}
\end{equation}
\begin{equation}
\label{equation_load_calculation_reduced_update_3}
\text{Load}_{\text{DSP}} = \frac{\text{C}_{\text{total}}}{\text{C}_{\text{budget}}} = \frac{197 \text{ cycles}}{800 \text{ cycles}} = 24.6 \%
\end{equation}
The results from the updated equations show, that the computational load for an update rate of 0.32 increase substantially from 20.8\% to 24.6\% through the use of a counter and a modulo operation, as the latter is quite computational expensive. A better alternative would be the use of a bitwise check, but this would reduce the possible update rates to powers of 2.
\subsection{Evaluation of an error driven implementation}
In contrary to the fixed update implementation of the previous chapter, the error-driven implementation is a more sophisticated approach, which focuses on an error metric, over which the decision for an coefficient update is made.
The idea is, as the size update of the filter coefficients gets smaller, the benefit of updating them decreases. In practice, a closer look at the update of the filter coefficients is taken: As shown in Figure \ref{fig:fig_dsp_coefficient_cycle.jpg}, the size of the update of the filter coefficients is directly related to the error signal - if the error signal decreases, the update-size of the filter coefficients also decreases. \\ \\
As the fixed update implementation is not able to detect such changes, the reduction in update frequency is applied in a static way, which means, that there are situations were it is beneficial and situations where it is not. The error-driven implementation, on the other hand, is able to detect such changing behaviorr and therefore can adapt the update frequency accordingly. Therefore, the error-driven implementation is expected to deliver a better cost-value ratio than the fixed update implementation.
\subsubsection{Error threshold implementation for the benchmark case}
The chosen approach for this thessis the use a fixed error threshold. This means, that if the error signal remains below an, in advance set, certain threshold, the filter coefficients remain unchanged and are not updated. If the error signal exceeds the threshold, the filter coefficients are updated as in the full-update implementation. \\ \\ The crucial aspect of this approach, is the right choise of the error threshold, which is expected to be highly dependent on the acoustic situation. To get an idea of a beneficial error threshold, different values are initially evaluated for the already used benchmark case.\\ \\ The reduction in computational load must now be calculated for the whole audio track by the percentage of samples, where the error signal exceeds the threshold and therefore the coefficients are adapted. This means in detail, that if for a certain error threshold, 50000 of 200000 samples exceed said threshold, the filter coefficients are updated in 25\% of the samples - therefore the update rate of the filter coefficients amounts to 0.25. The result can therefore be expressed in the same way as for the fixed update implementation, where the update rate is directly calculated for one sample.
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_snr_error_threshold.png}
\caption{Relative performance of the SNR-Gain, the cycles per samples and the DSP load in regard of the error threshold for the benchmark case. The baseline of 100\% is the full update implementation. The marked dots represent the results of the simulation for an explicit setup.}
\label{fig:fig_snr_error_threshold.png}
\end{figure}
\noindent Our benchmark track is evaluated for error tresholds ranging from 0 to 0.5. The results, represented in Figure \ref{fig:fig_snr_error_threshold.png}, show for small thresholds, especially smaller than 0.1, a highly beneficial behavior can be anticipated, where the \ac{SNR}-Gain is only slightly reduced, while the load of the \ac{DSP} core significantly drops. The maximum offset between the two graphs can be found at an error threshold of 0.02 - at this point, the coefficient adaption is only conducted in ~81400 of 200000 samples, which equivalents an update rate of about 41\%. Updating Equation \ref{equation_computing_calculation_full_update} and \ref{equation_load_calculation_full_update} therefore delivers:
\begin{equation}
\label{equation_computing_calculation_error threshold_1}
\text{C}_{\text{total}} = 45 + (6*45+8)*0.41 + 34 = 193 \text{ cycles}
\end{equation}
\begin{equation}
\label{equation_load_calculation_error threshold_1}
\text{Load}_{\text{DSP}} = \frac{\text{C}_{\text{total}}}{\text{C}_{\text{budget}}} = \frac{193 \text{ cycles}}{800 \text{ cycles}} = 24.1 \%
\end{equation}
The performance difference to reducing the update rate is already clearly for the benchmark case: With a similar \ac{DSP} load of 24.1\% (again, nearly half the load of the full update implementation), the \ac{SNR}-Gain is reduced by only 8.9\% from 9.47dB to 8.63 dB. The same analysis will be applied on all introduced noise signal, to get an idea of the general validity of the made observation.
\subsubsection{Error threshold implementaion for multiple noise signals}
Again, the same evaluation as for the benchmark case is conducted for the five introduced noise signals, featuring the the performance gain instead of the \ac{SNR}-Gain as a performance metric.
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_gain_error_threshold.png}
\caption{Performance gain (distance between relative SNR-Gain and needed relative cycles/sample) in relation to the error threshold for different noise signals.}
\label{fig:fig_gain_error_threshold.png}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_load_error_threshold.png}
\caption{Absolute \ac{DSP} load in relation to the error threshold for different noise signals.}
\label{fig:fig_load_error_threshold.png}
\end{figure}
\noindent Similar to the reduced update rate implementation, the obersvation made for every signal/noise combination is comparable to the benchmark case, but not the same. Figure \ref{fig:fig_gain_error_threshold.png} shows the performance gain for the five different scenarios. The most beneficial error threshold shifted noticeable to a value of 0.07. It´s interesting to notice, that the benchmark case seems to be a bit of an exception compared to the behavior of the other scenarios.\\ \\
A mean error threshold of 0.07 results in a mean update of 38244 out of 200000 samples, which equivalents an update rate of 19.1\%. The \ac{DSP} load for all scenarios is now not the same anymore, but still quite similar - Figure \ref{fig:fig_load_error_threshold.png} shows the absolute load of the \ac{DSP} core for an error threshold of 0.07 results in only 16.6\%.
\begin{equation}
\label{equation_computing_calculation_error_threshold_2}
\text{C}_{\text{total}} = 45 + (6*45+8)*0.191 + 34 = 132 \text{ cycles}
\end{equation}
\begin{equation}
\label{equation_load_calculation_error_threshold_2}
\text{Load}_{\text{DSP}} = \frac{\text{C}_{\text{total}}}{\text{C}_{\text{budget}}} = \frac{132 \text{ cycles}}{800 \text{ cycles}} = 16.6 \%
\end{equation}
Equation \ref{equation_computing_calculation_error_threshold_2} and \ref{equation_load_calculation_error_threshold_2} confirm, that for an error threshold of 0.07, a reduction of the \ac{DSP} load to 16.6\% can be achieved, correlating with a performance gain of 48.4\%. This means, that for all viewed scenarios, an error threshold of 0.07 represents the best cost-value ratio, for reducing the load while geting the best possible noise reduction. The relative performance for all scenarios result in a mean \ac{SNR}-Gain reduction of 11.7\% from 11.54 dB to 10.19 dB, while the load of the \ac{DSP} core is reduced by about 62.8\% from 44.6\% to 16.6\%.
\subsubsection{Computational load for error threshold implementation}
In contrary to the fixed update implementation, the error threshold implementation for a fixed error threshold does not require computational expensive operations: The threshold is implemented as a 32-bit integer which is simply checked for every sample by a single if-clause.
\begin{gather}
\label{equation_update_3}
\text{C}_{\text{check\_threshold}} = 10 \text{ cycles}
\end{gather}
The check of the 32-bit threshold adds 10 cycles to cycle count for one sample.
Equation \ref{equation_computing_calculation_error_threshold_3} and \ref{equation_load_calculation_error_threshold_3} show the new calculation of the needed cycles and the load of the \ac{DSP} core for an error threshold of 0.07:
\begin{equation}
\label{equation_computing_calculation_error_threshold_3}
\text{C}_{\text{total}} = 45 + (6*45+8)*0.191 + 44 = 142 \text{ cycles}
\end{equation}
\begin{equation}
\label{equation_load_calculation_error_threshold_3}
\text{Load}_{\text{DSP}} = \frac{\text{C}_{\text{total}}}{\text{C}_{\text{budget}}} = \frac{142 \text{ cycles}}{800 \text{ cycles}} = 17.8 \%
\end{equation}
Contrary to the fixed update implementation, the computational load for an error threshold of 0.07 only shows only a minimal increase from 16.6\% to 17.8\% through the use of a computational cheap if-clause. This is a clear advantage compared to the fixed update implementation.
\subsection{Summary of the performance evaluation} \subsection{Summary of the performance evaluation}
The results of two analysis can be summarized as follows: \\ \\ With the optimal filter length of 45 taps and an update rate of the filter coefficients every cycle, the \ac{ANR} algorithm is able to achieve a \ac{SNR}-Gain of about 11.54 dB, averaged over all different signal/noise combinations. Under this circumstances, the computational load of the \ac{DSP} core amounts about 45\%, which means that 55\% of the time, which a new sample takes to arrive, it can be halted, and therefore, the overall power consumption can be reduced.\\ \\
A simple method to further reduce the load of the \ac{DSP} core is to reduce the update frequency of the filter coeffcients. For the benchmark signal/noise combination, an update rate of 0.39 nearly halfs the processor load from 44.6\% to 23.5\%, while only reducing the \ac{SNR}-Gain by rougly 31 \% from 9.47 dB to 6.40 dB. For all viewed scenarios, an update rate of 0.32 represents the best cost-value ratio, for reducing the load while geting the best possible noise reduction - with a mean \ac{SNR}-Gain reduction of 24.5\% from 11.54 dB to 8.72 dB, while the load of the \ac{DSP} core is reduced by about 53.4\% from 44.6\% to 20.8\%. While the perfromance benefit of this approach is reasonable, the computanional effort of the implementation is significant - the 20.8\% total load rise to 24.6\%\\ \\
A more sophisticated method to reduce the load of the \ac{DSP} core is to use an error-driven implementation, where the update of the filter coefficients is only conducted, if the error signal exceeds a certain threshold. For the benchmark case, with a similar \ac{DSP} load of 24.1\% the \ac{SNR}-Gain is reduced by only 8.9\% from 9.47dB to 8.63 dB. For all viewed scenarios, an error threshold of 0.07 represents the best cost-value ratio, for reducing the load while geting the best possible noise reduction - with a mean \ac{SNR}-Gain reduction of 11.7\% from 11.54 dB to 10.19 dB, while the load of the \ac{DSP} core is reduced by about 62.8\% from 44.6\% to 16.6\%. This substentional performance gain is bought by only a slight increase in computing effort - the 16.6\% total load rise only to 17.8\%\\ \\
This result proofes, that an error-driven implementation of the \ac{ANR} algorithm is highly suitable to reduce the load needed for adaptive noise reduction in a \ac{CI} application, while still providing nearly 90\% of the maximum achievable performance under the viewed circumstances.
+25 -7
View File
@@ -1,8 +1,24 @@
\relax \relax
\@writefile{toc}{\contentsline {section}{\numberline {6}Conclusion and outlook}{66}{}\protected@file@percent } \@writefile{toc}{\contentsline {section}{\numberline {6}Conclusion and outlook}{72}{}\protected@file@percent }
\acronymused{ANR}
\acronymused{CI}
\acronymused{LMS}
\acronymused{SNR}
\acronymused{ANR}
\acronymused{SNR}
\acronymused{DSP}
\acronymused{DSP}
\acronymused{DSP}
\acronymused{SNR}
\acronymused{SNR}
\acronymused{ANR}
\acronymused{DSP}
\acronymused{SNR}
\acronymused{DSP}
\acronymused{SNR}
\@setckpt{chapter_06}{ \@setckpt{chapter_06}{
\setcounter{page}{67} \setcounter{page}{74}
\setcounter{equation}{36} \setcounter{equation}{49}
\setcounter{enumi}{0} \setcounter{enumi}{0}
\setcounter{enumii}{0} \setcounter{enumii}{0}
\setcounter{enumiii}{0} \setcounter{enumiii}{0}
@@ -15,7 +31,7 @@
\setcounter{subsubsection}{0} \setcounter{subsubsection}{0}
\setcounter{paragraph}{0} \setcounter{paragraph}{0}
\setcounter{subparagraph}{0} \setcounter{subparagraph}{0}
\setcounter{figure}{52} \setcounter{figure}{50}
\setcounter{table}{0} \setcounter{table}{0}
\setcounter{float@type}{16} \setcounter{float@type}{16}
\setcounter{tabx@nest}{0} \setcounter{tabx@nest}{0}
@@ -27,7 +43,7 @@
\setcounter{citetotal}{0} \setcounter{citetotal}{0}
\setcounter{multicitecount}{0} \setcounter{multicitecount}{0}
\setcounter{multicitetotal}{0} \setcounter{multicitetotal}{0}
\setcounter{instcount}{18} \setcounter{instcount}{22}
\setcounter{maxnames}{3} \setcounter{maxnames}{3}
\setcounter{minnames}{1} \setcounter{minnames}{1}
\setcounter{maxitems}{3} \setcounter{maxitems}{3}
@@ -136,7 +152,7 @@
\setcounter{lstnumber}{15} \setcounter{lstnumber}{15}
\setcounter{FancyVerbLine}{0} \setcounter{FancyVerbLine}{0}
\setcounter{linenumber}{1} \setcounter{linenumber}{1}
\setcounter{LN@truepage}{66} \setcounter{LN@truepage}{73}
\setcounter{FancyVerbWriteLine}{0} \setcounter{FancyVerbWriteLine}{0}
\setcounter{FancyVerbBufferLine}{0} \setcounter{FancyVerbBufferLine}{0}
\setcounter{FV@TrueTabGroupLevel}{0} \setcounter{FV@TrueTabGroupLevel}{0}
@@ -146,6 +162,8 @@
\setcounter{FancyVerbLineBreakLast}{0} \setcounter{FancyVerbLineBreakLast}{0}
\setcounter{FV@BreakBufferDepth}{0} \setcounter{FV@BreakBufferDepth}{0}
\setcounter{minted@FancyVerbLineTemp}{0} \setcounter{minted@FancyVerbLineTemp}{0}
\setcounter{listing}{0} \setcounter{listing}{7}
\setcounter{caption@flags}{2}
\setcounter{continuedfloat}{0}
\setcounter{lstlisting}{0} \setcounter{lstlisting}{0}
} }
+10
View File
@@ -1 +1,11 @@
\section{Conclusion and outlook} \section{Conclusion and outlook}
The focus of this thesis was to investigate the possibilities for the efficient implementation of a real-time capable \ac{ANR} algorithm in \ac{CI} systems.\\ \\ The initial high-level implementation in Python proofed the general feasibility of the proposed \ac{LMS}method, where the \ac{SNR}-Gain was introduced as a metric for the quality of the \ac{ANR} algorithm. Said metric was used to evaluate the performance of the algorithm in various settings and noise conditions. First a fictional desired signal (sine wave) and noise signal (sine wave or white noise) were used to check the algrotihm for it´s general functionality. Then the step to real, recorded signals was made. The final and most complex combinations (which then served as a benchmark for the remaining implementations) was the use of the same real world signals, but now different transfer functions and delays were introduced, to mimic a complex, practical situation. In every case, the algorithm was able to achieve significant performance improvement in the \ac{SNR} for the processed signals. \\ \\
\noindent The next challenge was to implement the algorithm in a efficient way in the C programming language, to achieve real-time capability. This was achieved by the use of \ac{DSP} compiler instrinsic functions, which allow to perform logic operations with a minimum of needed instructions. After the C-implementation was functional, the performacne in the case of the benchmark track was compared to the initial Python implementation. A histrogram of the differences between the two ouptuts showed only minor deviations, which can be attributed to the fixed-point calculations of the \ac{DSP} compiler.\\ \\
\noindent With the working C-implementation in place, a closer look on the performance, especially the needed cycles to compute on sample, was taken - the result was a formula, which calculates the needed samples as a function of the filter length and the update rate. With this information in mind, several noise sources were put under test, to evaluatue the optimal filter length, which is a trade-off between the performance improvement and the computational cost - the result was 45 coefficients.\\ \\
With a set filter length of 45 coefficients, the final improvement of the algorithm regarding performance and computational cost could be evaluated. The base was the computational most costly full-update implementation, needing 357 cycles to process one sample - this correspondends with about 45\% \ac{DSP} load.\\ \\
\noindent The first approach was a rather simple reduction in the update rate, evaluated for the benchmark case and different signal/noise combinations. The result was a significant reduction in the needed cycles, but with a also quite significant drop in the \ac{SNR}-Gain. Additionaly, the implementation of such an universal reduction, required computational expensive processor operations, further reducing the cost-benefit-ratio.\\ \\
\noindent The second approach was the proposed method of an error driven optimization, utilizing the idea of a fixed threshold for the error signal. Again, evaluated for the benchmark case and different signal/noise combinations, this approach can be considered a success, as it was able to achieve a significant reduction in the needed cycles, while only reducing the \ac{SNR}-Gain by a small amount. The implementation of this method is also computationally efficient, as it only requires a simple comparison operation to check if an update is necessary.\\ \\
\noindent The error driven optimization approach can therefore be seen as a the clear winner, as it was able to further improve an already real-time capable \ac{ANR} algorithm, by significantly reducing the computational load of the \ac{DSP} core, while only slightly reducing the performance improvement in terms of \ac{SNR}-Gain.\\ \\
\noindent For future work, a more advanced method to further optimize the system could be the use of a dynamic threshold, which could be adapted according to the current noise conditions. The background for this proposal is the fact, that beside the error-signal, also the noise signal itself influences the size of the filter-coeffcient update. In the current implementation, the threshold is only dependend on the error signal - if a sitatuion arises, where the noise signal is very small, but the error/output signal is high due to a high input signal, an update of the filter coefficients would be triggered, even if not necessary. A dynamic threshold, which also takes the noise signal into account, could further reduce the number of updates, but with a potentially higher computational effort.
\noindent Also, the already in Chapter 2 mentioned hybrid filter approach, which splits the filter into a static and adaptive part, could be further investigated. The idea behind this approach is, that the static part of the filter covers certain signal paths, which are to be expected time invariant, while the adaptive part of the filter only needs to cover changing signals.
\noindent Therefore, the final result of this thesis shows, that the approach of an error driven optimization, utilizing the idea of a fixed threshold for the error signal, is a viable method to achieve significant performance improvement, reducing the computational load of the \ac{DSP} core by over 62\% while only redcuing the \ac{SNR}-Gain by roughly 12\%.\\ \\
+43
View File
@@ -17,9 +17,52 @@
\usepackage[a4paper,margin=2.5cm]{geometry} %Seitenmaße \usepackage[a4paper,margin=2.5cm]{geometry} %Seitenmaße
\usepackage{setspace} \usepackage{setspace}
\usepackage{listings} \usepackage{listings}
\usepackage{xcolor}
\usepackage{minted} \usepackage{minted}
\usepackage{caption}
\usepackage{acronym} % Nur verwendete Abkürzungen anzeigen \usepackage{acronym} % Nur verwendete Abkürzungen anzeigen
\captionsetup[listing]{
justification=raggedright,
singlelinecheck=false
}
% --- Listings Style ---
\lstdefinestyle{pythonstyle}{
language=Python,
basicstyle=\ttfamily\small,
keywordstyle=\color{blue!70!black}\bfseries,
commentstyle=\color{gray}\itshape,
stringstyle=\color{green!50!black},
numbers=left,
numberstyle=\tiny,
stepnumber=1,
numbersep=8pt,
backgroundcolor=\color{black!5},
frame=single,
rulecolor=\color{black!30},
breaklines=true,
tabsize=4,
showstringspaces=false
}
\lstdefinestyle{cstyle}{
language=C,
basicstyle=\ttfamily\small,
keywordstyle=\color{blue!70!black}\bfseries,
commentstyle=\color{gray}\itshape,
stringstyle=\color{green!50!black},
numbers=left,
numberstyle=\tiny,
stepnumber=1,
numbersep=8pt,
backgroundcolor=\color{black!5},
frame=single,
rulecolor=\color{black!30},
breaklines=true,
tabsize=4,
showstringspaces=false
}
\addbibresource{literature.bib} \addbibresource{literature.bib}
\renewcommand{\thefootnote}{\arabic{footnote}} \renewcommand{\thefootnote}{\arabic{footnote}}
\parskip.5\baselineskip \parskip.5\baselineskip