This commit is contained in:
2026-05-16 13:18:32 +02:00
parent 37728f813f
commit fb1cc9ae89
9 changed files with 116 additions and 117 deletions
+4 -4
View File
@@ -1,5 +1,5 @@
\section{Hardware implementation and performance quantization of the ANR Algorithm on a low-power system}
Now, with a functioning high-level implementation in place, the focus shifts to the hardware implementation of the \ac{ANR} algorithm on a low-power system. The first subchapter describes the hardware, on which the \ac{ANR} algorithm is implemented, including its environment, which serves as a link to the \ac{CI} system itself. The following subchapter continues with the basic implementation of the \ac{ANR} algorithm on the hardware itself and shall provide the reader with a basic understanding of its challenges, possibilities and limitations. This implementation is then tested on a simulator to be compared to the high-level implementation.\\
Now, with a functioning high-level implementation in place, the focus shifts to the hardware implementation of the \ac{ANR} algorithm on a low-power system. The first subchapter describes the hardware, on which the \ac{ANR} algorithm is implemented, including its environment, which serves as a link to the \ac{CI} system itself. The following subchapter continues with the basic implementation of the \ac{ANR} algorithm on the hardware itself and shall provide the reader with a basic understanding of its challenges, possibilities and limitations. This implementation is then tested on a simulator to be compared to the high-level implementation.
\subsection{Low-power system architecture and integration}
This thesis considers a low-power \ac{SOC} architecture that integrates a general-purpose \ac{ARM} core with a dedicated \ac{DSP} core. The system combines the flexibility of an \ac{ARM}-based control processor with the computational efficiency of a specialized \ac{DSP}, splitting general computing tasks from real-time signal processing workloads.
\subsubsection{ARM and DSP hardware architecture overview}
@@ -22,7 +22,7 @@ In order to ensure a smooth, but power-efficient, operation together with the \a
\caption{Simplified visualization of the interaction between the \ac{CI}-System, the \ac{ARM} core and the \ac{DSP} core, making use of the \ac{PCM} interface and shared memory for audio data exchange.}
\label{fig:fig_dsp_setup.jpg}
\end{figure}
\noindent The \ac{ARM} Core receives the 16-bit audio data (the corrupted signal and the reference noise signal via two channels) from the CI system via a \ac{PCM} interface, which offers one 32-bit input and one 32-bit output register. An interrupt triggers the integrated \ac{DMA} controller when the input register is occupied, which transfers the audio data from the \ac{PCM} interface to the input buffer in a predefined memory location (now in two 16-bit samples again). Once completed, the \ac{DSP} core is requested to start processing the audio data. The \ac{DSP} core then reads the audio samples from the shared memory, processes them using the implemented \ac{ANR} algorithm, and writes the 16-bit processed sample back to an output buffer, also located in the shared memory. Finally, the \ac{ARM} core is notified via an interrupt from the \ac{DSP} core, that the processing is complete - the \ac{DMA} controller then transfers the processed audio samples from the output buffer back to the \ac{PCM} interface for playback (refer to Figure \ref{fig:fig_dsp_comm.jpg}).\\ \\
\noindent The \ac{ARM} Core receives the 16-bit audio data (the corrupted signal and the reference noise signal on two seperate channels) from the CI system via a \ac{PCM} interface, which offers one 32-bit input and one 32-bit output register. An interrupt triggers the integrated \ac{DMA} controller when the input register is occupied, which transfers the audio data from the \ac{PCM} interface to the input buffer in a predefined memory location (now in two 16-bit samples again). Once completed, the \ac{DSP} core is requested to start processing the audio data. The \ac{DSP} core then reads the audio samples from the shared memory, processes them using the implemented \ac{ANR} algorithm, and writes the 16-bit processed sample back to an output buffer, also located in the shared memory. Finally, the \ac{ARM} core is notified via an interrupt from the \ac{DSP} core, that the processing is complete - the \ac{DMA} controller then transfers the processed audio samples from the output buffer back to the \ac{PCM} interface for playback (refer to Figure \ref{fig:fig_dsp_comm.jpg}).\\ \\
\begin{figure}[H]
\centering
\includegraphics[width=0.9\linewidth]{Bilder/fig_dsp_comm.jpg}
@@ -33,7 +33,7 @@ In order to ensure a smooth, but power-efficient, operation together with the \a
\subsection{Software architecture and execution flow}
\subsubsection{ARMDSP communication and data exchange details}
In contrary, to the high-level simulation environment written in Python from the previous chapter, the implementation of the \ac{ANR} algorithm on the \ac{DSP} requires a low-level programming approach, as which takes into account the specific architecture and capabilities of the processor and its environment. This includes considerations such as memory management, data types, and optimization techniques specific to the \ac{DSP} architecture. The implementation is required to be done in the C programming language, which is a standard for embedded systems.\\ \\
The implementation of the \ac{ANR} algorithm on the \ac{DSP} follows the same overall structure as the high-level variant, but now the focus lies on memory management, interrupt-handling and communication between the two cores. The \ac{ARM} operates in a continuous loop, structured into several states:
The implementation of the \ac{ANR} algorithm on the \ac{DSP} follows the same overall structure as the high-level variant, but now the focus lies on memory management, interrupt-handling and communication between the two cores. The \ac{ARM} core operates in a continuous loop, structured into several states:
\begin{itemize}
\item \textbf{Idle:} The \ac{ARM} core waits for an interrupt from the \ac{DMA} controller, indicating that new audio samples are available in the input buffer.
\item \textbf{Work:} After receiving the interrupt, the \ac{ARM} core triggers an interrupt on the \ac{DSP} core to start processing the audio samples.
@@ -187,7 +187,7 @@ The $calculate\_output()$ functions consists out of the following five main part
\end{itemize}
These sub-functions feature \ac{DSP}-specific optimizations and are partly depenend on the setable parameters like the filter length in regard of their computational cost. The following paragraphs will analyze the computational efficiency of these sub-functions in detail.
\paragraph{write\_buffer()}The $write\_buffer()$-function is responsible for managing the sample line, where the samples of the reference noise signal are stored for further processing. The buffer management mainly consists out of a cyclic pointer increase operation and a pointer dereference operation to write the new sample into the buffer. The cyclic pointer increase operation is implemented using the already mentioned intrinsic function of the \ac{DSP} compiler, while the pointer dereference operation takes 15 cycles to execute. This results in a total duration of 16 cycles for the $write\_buffer()$-function to process, indipendent of the filter length or other parameters.
\paragraph{apply\_fir\_filter()} The $apply\_fir\_filter()$-function is responsible for applying the coefficients of the \ac{FIR} filter on the reference noise signal samples stored in the sample line. The needed cycles for this function are mainly depenendent on the lenght of the filter, as the number of multiplications and additions increase with the filter length. To increase the performance, the dual \ac{MAC} architecture of the \ac{DSP} is utilized, allowing two multiplications and two additions to be performed in a single cycle. Another \ac{DSP}-specific optimization is the use of the already introduced 72-bit accumulators and the fractional multiplication function, which allows performing multiplications on two 32-bit integers without losing precision or the need for manual bit-shifting operations.
\paragraph{apply\_fir\_filter()} The $apply\_fir\_filter()$-function is responsible for applying the coefficients of the \ac{FIR} filter on the reference noise signal samples stored in the sample line. The needed cycles for this function are mainly depenendent on the length of the filter, as the number of multiplications and additions increase with the filter length. To increase the performance, the dual \ac{MAC} architecture of the \ac{DSP} is utilized, allowing two multiplications and two additions to be performed in a single cycle. Another \ac{DSP}-specific optimization is the use of the already introduced 72-bit accumulators and the fractional multiplication function, which allow performing multiplications on two 32-bit integers without losing precision or the need for manual bit-shifting operations.
\begin{listing}[H]
\centering
\begin{lstlisting}[style=cstyle]