Abgabe
This commit is contained in:
+10
-10
@@ -19,14 +19,14 @@ In order to ensure a smooth, but power-efficient, operation together with the \a
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_setup.jpg}
|
||||
\caption{Simplified visualization of the interaction between the \ac{CI}-System, the \ac{ARM} core and the \ac{DSP} core, making use of the \ac{PCM} interface and shared memory for audio data exchange.}
|
||||
\caption{Simplified visualization of the interaction between the \ac{CI}-System, the \ac{ARM} core and the \ac{DSP} core, making use of the \ac{PCM} interface and shared memory for audio data exchange}
|
||||
\label{fig:fig_dsp_setup.jpg}
|
||||
\end{figure}
|
||||
\noindent The \ac{ARM} Core receives the 16-bit audio data (the corrupted signal and the reference noise signal on two seperate channels) from the CI system via a \ac{PCM} interface, which offers one 32-bit input and one 32-bit output register. An interrupt triggers the integrated \ac{DMA} controller when the input register is occupied, which transfers the audio data from the \ac{PCM} interface to the input buffer in a predefined memory location (now in two 16-bit samples again). Once completed, the \ac{DSP} core is requested to start processing the audio data. The \ac{DSP} core then reads the audio samples from the shared memory, processes them using the implemented \ac{ANR} algorithm, and writes the 16-bit processed sample back to an output buffer, also located in the shared memory. Finally, the \ac{ARM} core is notified via an interrupt from the \ac{DSP} core, that the processing is complete - the \ac{DMA} controller then transfers the processed audio samples from the output buffer back to the \ac{PCM} interface for playback (refer to Figure \ref{fig:fig_dsp_comm.jpg}).\\ \\
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{Bilder/fig_dsp_comm.jpg}
|
||||
\caption{Simplified flowchart of the sample processing between the \ac{ARM} core and the \ac{DSP} core via interrupts and shared memory.}
|
||||
\caption{Simplified flowchart of the sample processing between the \ac{ARM} core and the \ac{DSP} core via interrupts and shared memory}
|
||||
\label{fig:fig_dsp_comm.jpg}
|
||||
\end{figure}
|
||||
|
||||
@@ -127,7 +127,7 @@ int main(void) {
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_logic.jpg}
|
||||
\caption{Flow diagram of the code implementation of the main loop and interrupt handling on the \ac{DSP} core.}
|
||||
\caption{Flow diagram of the code implementation of the main loop and interrupt handling on the \ac{DSP} core}
|
||||
\label{fig:fig_dsp_logic.jpg}
|
||||
\end{figure}
|
||||
\paragraph{calculate\_output()-function}
|
||||
@@ -166,7 +166,7 @@ int* cyclic array iteration(int *pointer, int increment, int *pointer_start, int
|
||||
return new_pointer;
|
||||
}
|
||||
\end{lstlisting}
|
||||
\caption{Manual implementation of a cyclic array iteration function in C, taking the core 20 cycles to execute a pointer inremen of 1. The intrinsic functions of the DSP compiler allows a single-cycle implementation of such cyclic additions.}
|
||||
\caption{Manual implementation of a cyclic array iteration function in C, taking the core 20 cycles to execute a pointer increment of 1. The intrinsic functions of the DSP compiler allows a single-cycle implementation of such cyclic additions.}
|
||||
\label{lst:lst_dsp_code_cyclic_add}
|
||||
\end{listing}
|
||||
\noindent Listing \ref{lst:lst_dsp_code_cyclic_add} shows a manual implementation of such a cyclic array iteration function in C, which updates the pointer to a new address. This implementation takes the \ac{DSP} 20 cycles to execute, while the already implemented compiler-optimized version only takes one cycle, making use of the specific architecture of the \ac{DSP} allowing such a single-cycle operation.
|
||||
@@ -187,7 +187,7 @@ The $calculate\_output()$ functions consists out of the following five main part
|
||||
\end{itemize}
|
||||
These sub-functions feature \ac{DSP}-specific optimizations and are partly depenend on the setable parameters like the filter length in regard of their computational cost. The following paragraphs will analyze the computational efficiency of these sub-functions in detail.
|
||||
\paragraph{write\_buffer()}The $write\_buffer()$-function is responsible for managing the Sample Line, where the samples of the reference noise signal are stored for further processing. The buffer management mainly consists out of a cyclic pointer increase operation and a pointer dereference operation to write the new sample into the buffer. The cyclic pointer increase operation is implemented using the already mentioned intrinsic function of the \ac{DSP} compiler, while the pointer dereference operation takes 15 cycles to execute. This results in a total duration of 16 cycles for the $write\_buffer()$-function to process, indipendent of the filter length or other parameters.
|
||||
\paragraph{apply\_fir\_filter()} The $apply\_fir\_filter()$-function is responsible for applying the coefficients of the \ac{FIR} filter on the reference noise signal samples stored in the Sample Line. The needed cycles for this function are mainly depenendent on the length of the filter, as the number of multiplications and additions increase with the filter length. To increase the performance, the dual \ac{MAC} architecture of the \ac{DSP} is utilized, allowing two multiplications and two additions to be performed in a single cycle. Another \ac{DSP}-specific optimization is the use of the already introduced 72-bit accumulators and the fractional multiplication function, which allow performing multiplications on two 32-bit integers without losing precision or the need for manual bit-shifting operations.
|
||||
\paragraph{apply\_fir\_filter()} The $apply\_fir\_filter()$-function is responsible for applying the coefficients stored in the Filter Line on the reference noise signal samples stored in the Sample Line. The needed cycles for this function are mainly depenendent on the length of the filter, as the number of multiplications and additions increase with the filter length. To increase the performance, the dual \ac{MAC} architecture of the \ac{DSP} is utilized, allowing two multiplications and two additions to be performed in a single cycle. Another \ac{DSP}-specific optimization is the use of the already introduced 72-bit accumulators and the fractional multiplication function, which allow performing multiplications on two 32-bit integers without losing precision or the need for manual bit-shifting operations.
|
||||
\begin{listing}[H]
|
||||
\centering
|
||||
\begin{lstlisting}[style=cstyle]
|
||||
@@ -241,10 +241,10 @@ for (int i=0; i< n_coeff; i+=2) chess_loop_range(1,){
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=1.0\linewidth]{Bilder/fig_dsp_coefficient_cycle.jpg}
|
||||
\caption{Visualization of the coefficient calculation in the $update\_filter\_coefficient()$-function during the 2nd cyclce of a calculation loop. The output is multiplied with the step size and the corresponding sample from the Sample Line, before being added to the current filter coefficient.}
|
||||
\caption{Visualization of the coefficient calculation in the $update\_filter\_coefficient()$-function during the 2nd cyclce of a calculation loop. The output is multiplied with the step size and the corresponding sample from the Sample Line, before being added to the current filter coefficient in the Filter Line.}
|
||||
\label{fig:fig_dsp_coefficient_cycle.jpg}
|
||||
\end{figure}
|
||||
\paragraph{update\_output()} The $update\_output()$-function is responsible for writing the calculated output sample back into the shared memory section. The operation takes 5 cycles to execute, independent of the filter length or other parameters.
|
||||
\paragraph{update\_output()} The $update\_output()$-function is responsible for writing the calculated output sample back into the shared memory section. The operation takes 5 cycles to execute, independent of the filter length or other parameters.\\ \\
|
||||
\noindent The total computing effort of the $calculate\_output()$-function in dependency of the filter length $\text{N}$ can now be calculated by summing up the computing efforts of the different sub-functions:
|
||||
\begin{equation}
|
||||
\label{equation_computing}
|
||||
@@ -275,7 +275,7 @@ Equation \ref{equation_computing_final} now provides an estimation of the necess
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=1.0\linewidth]{Bilder/fig_c_total.png}
|
||||
\caption{Dependence of the total computing effort on the filter length $\text{N}$ and update rate $\text{1/U}$.}
|
||||
\caption{Dependence of the total computing effort on the filter length $\text{N}$ and update rate $\text{1/U}$}
|
||||
\label{fig:fig_c_total.png}
|
||||
\end{figure}
|
||||
\subsection{Verification of the DSP implementation}
|
||||
@@ -295,13 +295,13 @@ To verify the general performance of the \ac{DSP}-implemented \ac{ANR} algorithm
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=1.0\linewidth]{Bilder/fig_high_low_comparison.png}
|
||||
\caption{Comparison of the high- and low-level simulation output.}
|
||||
\caption{Comparison of the high- and low-level simulation output}
|
||||
\label{fig:fig_high_low_comparison.png}
|
||||
\end{figure}
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=1.0\linewidth]{Bilder/fig_high_low_comparison_hist.png}
|
||||
\caption{Histogram of the error amplitude between the high- and low-level simulation output.}
|
||||
\caption{Histogram of the error amplitude between the high- and low-level simulation output}
|
||||
\label{fig:fig_high_low_comparison_hist.png}
|
||||
\end{figure}
|
||||
\noindent Figure \ref{fig:fig_plot_1_dsp_complex.png} and \ref{fig:fig_plot_2_dsp_complex.png} show the results of the complex \ac{ANR} use case, simulated on the \ac{DSP} - with a \ac{SNR}-Gain of 10.26 dB it performs equivalent sucessful as the one of the high-level implementation. Figure \ref{fig:fig_high_low_comparison.png} shows both outputs seperately and then together in one subfigure, together with the plotted error amplitude. Lastly, Figure \ref{fig:fig_high_low_comparison_hist.png} features a histogram of the error amplitude between the high- and low-level implemenation, indicating the correct functionality of the \ac{DSP} implementation. The small deviations can be explained by the fact that the \ac{DSP} implementation is based on fixed-point arithmetic, which leads to a slightly different convergence behavior. Nevertheless, the results show that the \ac{DSP} implementation of the \ac{ANR} algorithm is able to achieve the same performance as the high-level implementation. The next step is of evaluate the performance of the \ac{DSP} implementation in terms of computational efficiency under different scenarios and non-synchrone signals.
|
||||
|
||||
Reference in New Issue
Block a user