Files
Masterarbeit/chapter_04.tex
Patrick Hangl 8ad14d2268 4.1
2026-01-12 16:25:37 +01:00

68 lines
13 KiB
TeX
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
\section{Hardware implementation and optimization of the ANR algorithm}
This section aims to be the main part of this thesis. The first subchapter describes the hardware, on which the \ac{ANR} algorithm is implemented, including its environment, which serves as a link to the \ac{CI} system itself. The following subchapter continues with the basic implementation of the \ac{ANR} algorithm on the hardware itself and shall provide the reader with a basic understanding of its challenges, possibilities and limitations. This basic implementation shall serve as a baseline for the following optimizations.\\
During the third chapter, this initial implementation is further optimized in order to achieve an improved real-time performance on the \ac{DSP}. The last subchapter picks the final optimizations of the \ac{ANR} algorithm itself as a central theme, especially with respect to the capabilities of a hybrid \ac{ANR} approach.
\subsection{Description of the low-power DSP and its environment}
This thesis considers a low-power \ac{SOC} architecture that integrates a general-purpose \ac{ARM} core with a dedicated \ac{DSP} core. The system combines the flexibility of an \ac{ARM}-based control processor with the computational efficiency of a specialized \ac{DSP}, splitting general computing tasks from real-time signal processing workloads.
\subsubsection{Hardware overview}
A 32-bit \ac{ARM} core serves as the primary control unit of the system. It is responsible for high-level application logic, system configuration, peripheral management as also scheduling and serves as a general-purpose processing unit. Due to its universal instruction set and extensive input/output interface, the \ac{ARM} core is well suited for handling general tasks and the interaction with the \ac{CI} system. Time-critical numerical processing is intentionally offloaded to the \ac{DSP} core in order to reduce computational load and power consumption on the control processor.\\ \\
The \ac{DSP} used for the implementation features a 32-bit dual Harvard, dual \ac{MAC} architecture primarily designed for audio signal-processing applications in low-power embedded systems. It doesn´t feature a designated boot ROM, as it is initialized by the \ac{ARM} core. The firmware executing the \ac{ANR} algorithm is developed and programmed in the C programming language. The used propretiery compiler is highly efficient and generates optimized assembler code, which is then translated in machine code to execute the \ac{ANR} algorithm on incoming samples.\\ \\
All memory instances and registers of the \ac{SOC} are directly addressable by the \ac{ARM} through the standard busses, also enabling a simplified control of the \ac{DSP} through a shared memory section. The memory consists mainly out of the two followng parts:
\begin{itemize}
\item \textbf{Program Memory:} This memory section stores the executable code for both the \ac{ARM} core and the \ac{DSP} core. It contains the compiled instructions that define the behavior of the system, including the \ac{ANR} algorithm implemented on the \ac{DSP}.
\item \textbf{Data Memory:} This memory section is used for storing runtime data and variables, required during the execution of the program. This also includes the memory section for input/output audio samples and intermediate processing results. The shared memory section between the \ac{ARM} core and the \ac{DSP} core is also part of the data memory, featuring a total size of 64 KB.
\end{itemize}
The data memory is supported by an integrated \ac{DMA} controller, which allows efficient data transfers between peripherals and memory without burdening the processing cores. This is particularly needed for transferring audio samples from the \ac{PCM} interface to the shared memory section for further processing by the \ac{DSP}, as well as transferring processed samples back to the \ac{PCM} interface for playback.\\ \\
The mentioned 64 KB shared memory section between the \ac{ARM} core and the \ac{DSP} core is crucial for enabling efficient communication and data exchange between the two processing units, further described in the following subchapter.\\ \\
When the \ac{DSP} is not required to process audio data, it can be paused by pausing the clock provided to the \ac{DSP} core. When paused, the \ac{DSP} core enters a low-power state, still allowing the \ac{ARM} core to access its shared memory and wake up the \ac{DSP} core when needed. This mechanism helps to reduce overall power consumption, which is crucial for battery-operated devices like cochlear implants.\\ \\
The processing unit of the \ac{DSP} is equipped with load/store architecture, meaning that, initially all operands need to be moved from the memory to the registers, before any operation can be performed. After this task is performed, the execution units (\ac{ALU} and multiplier) can perform their operations on the data and write back the results into the registers. Finally, the results need to be explicitly moved back to the memory.\\ \\
Processor-wise, the \ac{DSP} includes a three stage pipeline consisting of fetch, decode, and execute stages, allowing for overlapping instruction execution and improved throughput. The architecture is optimized for high cycle efficiency when executing computationally intensive signal-processing workloads. The featured dual Harvard, dual \ac{MAC} architecture (two separate \ac{ALU}s) enables the execution of two \ac{MAC} operations, two memory operations (load/store) and two pointer updates in a single processor cycle.
\subsubsection{Communication between the ARM core and the DSP}
In order to ensure a smooth, but power-efficient, operation together with the \ac{CI} system, a interrupt-driven communication between the \ac{ARM} core and the \ac{DSP} core is crucial. Center of communication between the the cores is the already mentioned shared memory region accessible by both processing units. This shared memory enables the exchange of data without the need for separate communication protocols or input/output interfaces (refer to Figure \ref{fig:fig_dsp_setup.jpg}). Synchronization between the cores is achieved using interrupt-based signaling: the \ac{ARM} core initiates processing requests by triggering an interrupt on the \ac{DSP}, while the \ac{DSP} notifies the \ac{ARM} core upon completion of a task also by rasing an inerrupt. This approach ensures efficient coordination while minimizing active waiting (polling) and therefore unnecessary power consumption.
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_setup.jpg}
\caption{Simplified visualization of the interaction between the \ac{ARM} core and the \ac{DSP} core}
\label{fig:fig_dsp_setup.jpg}
\end{figure}
\noindent The \ac{ARM} Core receives the audio data from the CI system via a \ac{PCM} interface, which offers one input and one output register. An interrupt trigger the integrated \ac{DMA} controller, which transfers the audio data from the \ac{PCM} interface to buffer in a predefined memory location. Once the buffer is filled with enough samples (), another interrupt is triggered by the DMA controller itself, notifying the \ac{DSP} core to start processing the audio data. The \ac{DSP} core then reads the audio samples from the shared memory, processes them using the implemented \ac{ANR} algorithm, and writes the processed samples back to an output buffer, also located in the shared memory. Finally, the \ac{ARM} core is notified via an interrupt from the \ac{DSP} core, that the processing is complete - the \ac{DMA} controller then transfers the processed audio samples from the output buffer back to the \ac{PCM} interface for playback (refer to Figure \ref{fig:fig_dsp_comm.jpg}).\\ \\
\begin{figure}[H]
\centering
\includegraphics[width=0.9\linewidth]{Bilder/fig_dsp_comm.jpg}
\caption{Simplified flowchart of the sample processing between the \ac{ARM} core and the \ac{DSP} core via interrupts and shared memory.}
\label{fig:fig_dsp_comm.jpg}
\end{figure}
\subsection{Implementation of the ANR algorithm on the DSP}
\subsubsection{High-level description of the ANR algorithm implementation}
In contrary, to the high-level simulation environment written in Python from the previous chapter, the implementation of the \ac{ANR} algorithm on the \ac{DSP} requires a low-level programming approach, as which takes into account the specific architecture and capabilities of the processor and its environment. This includes considerations such as memory management, data types, and optimization techniques specific to the \ac{DSP} architecture. The implementation is required to be done in the C programming language, which is a standard for embedded systems, as it allows low-level hardware implementation.\\ \\
The implementation of the \ac{ANR} algorithm on the \ac{DSP} follows the same overall structure as the high-level variant, but now the focus lies on memory management, interrupt-handling and communication between the two cores. The \ac{ARM} operates in a continious loop, structured into several states:
\begin{itemize}
\item \textbf{Idle:} The \ac{ARM} core waits for an interrupt from the \ac{DMA} controller, indicating that new audio samples are available in the input buffer.
\item \textbf{Work:} After receiving the interrupt, the \ac{ARM} core triggers an interrupt on the \ac{DSP} core to start processing the audio samples.
\item \textbf{Wait:} The \ac{ARM} core waits for an interrupt from the \ac{DSP} core, indicating that the processing is complete.
\item \textbf{Done/Idle:} Once the processing is complete, the \ac{ARM} core triggers the \ac{DMA} controller to transfer the processed audio samples from the output buffer back to the \ac{PCM} interface for playback. The \ac{ARM} core then returns to the idle state, waiting for the next batch of audio samples.
\end{itemize}
On the contrary, the \ac{DSP} core operates in an interrupt-driven manner:
\begin{itemize}
\item \textbf{Idle:} The \ac{DSP} core remains in a halted state, waiting for an interrupt from the \ac{ARM} core to start processing.
\item \textbf{Work:} Upon receiving the interrupt, the \ac{DSP} core reads the audio samples from the input buffer located in the shared memory, processes them using the implemented \ac{ANR} algorithm, and writes the processed samples back to the output buffer in the shared memory.
\item \textbf{Done/Idle:} After completing the processing, the \ac{DSP} core triggers an interrupt on the \ac{ARM} core to notify that the processing is complete. The \ac{DSP} core then returns to the idle state, waiting for the next processing request.
\end{itemize}
\noindent The \ac{DMA} controller plays a crucial role in this architecture by handling the data transfers between the \ac{PCM} interface and the shared memory buffers. It operates independently of both the \ac{ARM} core and the \ac{DSP} core, allowing for efficient data movement without burdening the processing units. The \ac{DMA} controller is configured to transfer audio samples in blocks, triggering interrupts to notify the respective cores when new data is available or when processing is complete.
\begin{figure}[H]
\centering
\includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_dma.jpg}
\caption{Visualization of the \ac{DMA} operations between the PCM interface to the shared memory section. When the memory buffer is filled by 50\%, an interrupt is triggerd, either to the \ac{DSP} core or to the \ac{ARM} core, depending on the input or output direction.}
\label{fig:fig_dsp_dma.jpg}
\end{figure}
\noindent Figure \ref{fig:fig_dsp_dma.jpg} visualizes the concrete operation of the \ac{DMA} controller during the audio sample processing. The \ac{DMA} controller is configured to samplewise transfer the audio samples from the \ac{PCM} interface into a defined memory location of the shared memory (input buffer). When the input buffer is half full, an interrupt is triggered to the \ac{DSP} core, notifying it to start processing the available samples. After processing, the results are written to another designated section of the memory (output buffer). When the output buffer is filled is half full, another interrupt is triggered to the \ac{DMA} controller, indicating that the processed samples are ready to be transferred back to the \ac{PCM} interface for playback. \\ \\
As the \ac{ARM} operation is not the main focus of this thesis, the following section will focus on implementation of the \ac{ANR} algorithm on the \ac{DSP} core itself. As the behavior of the \ac{ARM} core is already sufficiently described, furhter implementaion details will be omitted in the following.
\subsubsection{Code implementation of the ANR algorithm implementation}
- Definition Speicherbereiche und Konstanten
- Main loop
- Initialisierung der Signale
- Calc Funktion
\subsection{First optimization approach: algorithm implementation}
\subsection{Second optimization approach: hybrid ANR algorithm}