Hardware Implementation of Task-based Quantization in Multi-user Signal Recovery

Xing Zhang, Member, IEEE, Haiyang Zhang, Member, IEEE, Nimrod Glazer, Member, IEEE, Oded Cohen, Eliya Reznitskiy, Shlomi Savariego, Moshe Namer, and Yonina C. Eldar, Fellow, IEEE

Abstract—Quantization plays a critical role in digital signal processing systems, allowing the representation of continuous-amplitude signals with a finite number of bits. However, accurately representing signals requires a large number of quantization bits, which causes severe cost, power consumption, and memory burden. A promising way to address this issue is task-based quantization. By exploiting the task information for the overall system design, task-based quantization can achieve satisfying performance with low quantization costs. In this work, we apply task-based quantization to multi-user signal recovery and present a hardware prototype implementation. The prototype consists of a tailored configurable combining board, and a software-based processing and demonstration system. Through experiments, we verify that with proper design, the task-based quantization achieves a reduction of 25 fold in memory by reducing from 16 receivers with 16 bits each to 2 receivers with 5 bits each, without compromising signal recovery performance.

Index terms— Task-based quantization, multi-user signal recovery, analog combiner, hardware implementation.

I. INTRODUCTION

Processing and storing information that originates as analog signals involves converting this information to bits by analog-to-digital converters (ADCs) [1]. In conventional receivers, the ADC is employed as a separate unit regardless of other parts of the system. ADCs typically sample at the Nyquist rate of the received signal and use high-resolution quantizers, so that sampling and quantization errors can be minimized. However, since the power consumption of ADCs and the required storage memory grow with the sampling rate and quantization resolution, conventional ADCs pose great challenges to practical applications with high data rates. For example, in future 6G wireless communication systems, where hundreds or even thousands of antennas and millimeter-wave (mmWave) or sub-terahertz (THz) signaling are employed, it is expected that up to 1 Tb/s data rate will be achieved [2], [3]. In such cases, the hardware implementation of high-resolution ADCs becomes a bottleneck. Therefore, more efficient sampling and quantization schemes are necessary.

Two prominent research directions to alleviate the power burden of ADCs are sub-Nyquist sampling and low-resolution quantization. Sub-Nyquist sampling aims to reduce the sampling rate by exploiting the underlying structure information of the signal [1], [4]. For instance, the nature of finite rate of innovation (FRI) signals has been exploited to reduce the sampling rate of received signals in ultrasound [5], radar [6] and cognitive radio [7]. However, the sub-Nyquist sampling framework does not take into account the effect of quantization. As the power consumption of ADC increases in an exponential manner with the number of quantization bits, low precision quantization has attracted great interest in recent years. Low-resolution quantization uses a few or even 1-bit to discretize the signal amplitude. It has been applied in various applications such as massive multi-input multi-output (MIMO) communications [8], [9], radar [10], direction of arrival estimation [11], and spectrum sensing [12]. Compared with conventional high resolution quantizers, significant rate reduction can be expected by using low-bit quantizers. However, compensation for the distortion induced in quantization is required in subsequent digital processing, which results in complicated information extraction in the digital domain and an overall system performance degradation.

To address these issues, the authors of [13] proposed task-based quantization. By taking into account the underlying task in the system design, task-based quantization dramatically reduces the number of bits while allowing for accurate signal recovery. This is achieved by introducing an analog combiner, followed by joint optimization of the analog and digital processing and the bridge between them, i.e., the ADC. Task-based quantization has been applied to graph signal processing [14], channel estimation in massive MIMO communications [15], and target identification in radar [16]. Theoretical maturity of the concept suggests the need to demonstrate and evaluate the implementation of such systems in hardware, which is the focus of this work.

Here, we apply task-based quantization to multi-user signal recovery and present a prototype, which consists of a hardware board and a software-aided demonstration system. In the considered setting, the system task is to recover multi-user transmitted signals, rather than the received signals on all antennas. Therefore, following the principle of task-based quantization, a tailored analog combiner board was built to properly pre-process the received signals prior to quantization. The outputs are then quantized by scalar quantizers with limited bits. Finally, the task vector is recovered by an optimized digital matrix. To visually demonstrate the above process, a MATLAB-based graphical user interface (GUI) was developed, which includes parameter controlling, data processing and results displaying. Experimental results illustrate the superiority of task-based quantization over the conventional task-ignorant one, mitigating the gap between the theory and its practical application.
The rest of this paper is organized as follows: Section II formulates the problem of task-based quantization for multi-user signal recovery and provides the theoretical results. Next, in Section III, the system architecture and each component of the hardware prototype are introduced in detail. Experimental results are provided in Section IV, followed by conclusions in Section V.

Notation: Scalar quantities, column vectors and matrices are denoted by lowercase letters, \( a \), bold lowercase letters, \( \mathbf{a} \), and bold uppercase letters, \( \mathbf{A} \), respectively. The superscripts \( (\cdot)^T \), \( (\cdot)^{-1} \) and \( (\cdot)^H \) are, respectively, the transpose, inverse and Hermitian transpose operators. The symbol \( E[\cdot] \) represents statistical expectation, \( ||\cdot|| \) is the Euclidean norm, \( \mathbf{C} \) is the set of complex numbers, and \( \mathbf{I}_K \) is the \( K \times K \) identity matrix. We use \( a^+ \) to denote \( \max(a, 0) \), and \( \lfloor \cdot \rfloor \) to denote rounding down to the next smaller integer.

II. TASK-BASED QUANTIZATION FOR MULTI-USER SIGNAL RECOVERY

In this section, we provide a mathematical description of multi-user signal recovery under task-based quantization. In particular, we begin by introducing the system model of task-based quantization for multi-user signal recovery in Subsection II-A, followed by theoretical results in Subsection II-B.

A. System Model

Consider a single-cell network in which a base station (BS) is equipped with \( N \) antennas and serves \( K \) single-antenna user terminals (UTs), as shown in Fig. 1. In uplink, let a \( K \times 1 \) vector \( \mathbf{s} \) be the transmitted signals of all the UTs in the cell at one time instant. The received \( N \times 1 \) signal vector \( \mathbf{y} \) at the BS can be expressed as

\[
\mathbf{y} = \mathbf{Hs} + \mathbf{v},
\]

where \( \mathbf{v} \) represents additive white Gaussian noise (AWGN). The \( N \times K \) matrix \( \mathbf{H} \) denotes the wireless channel, with its \( k \)-th column representing the channel between user \( k \) and the antenna array, given by \( [17], [18] \)

\[
\mathbf{h}_k = g_k e^{-j2\pi \frac{r_k c}{f_c}} \mathbf{a} (\theta_k).
\]

Here \( g_k \) denotes the path gain, where without loss of generality, we assume only the line of sight (LoS) path exists, \( c \) is the speed of light, \( f_c \) is the carrier frequency, \( r_k \) and \( \theta_k \) are respectively the distance and angle of arrival of the \( k \)-th user. The vector \( \mathbf{a} (\theta_k) \) is the steering vector, given by

\[
\mathbf{a} (\theta_k) = \left[ 1, e^{j\pi \sin \theta_k}, \ldots, e^{j\pi (N-1) \sin \theta_k} \right]^T.
\]

We assume the channel is quasi-static over the signal transmission time, so that \( \mathbf{H} \) can be estimated by pilots and is assumed to be known for the task of recovering \( \mathbf{s} \).

In conventional quantization systems, ADCs are only used to discretize the received signals. The task of recovering \( \mathbf{s} \) is performed separately in the digital domain. By contrast, task-based quantization proposed in [13] jointly designs the overall analog and digital system to estimate \( \mathbf{s} \). Specifically, the received signal \( \mathbf{y} \) is first projected to a \( K \times 1 \) vector \( \mathbf{z} \) by using an analog combiner \( \mathbf{A} \), i.e.,

\[
\mathbf{z} = \mathbf{Ay} = \mathbf{AHs} + \mathbf{Av}.
\]

Then, each entry of \( \mathbf{z} \) is sampled and quantized using scalar quantizers with dynamic range \( \gamma \) and resolution \( M_K \). The symbol \( M \) is the overall number of quantization levels, which represents the memory requirement of the system and is also directly related to the ADC power consumption. When the input is inside the dynamic range of the quantizer, the output can be written as the sum of the input and an additive zero-mean white noise signal according to the theory of dithered quantization [19], that is,

\[
\hat{\mathbf{z}} = \mathbf{AHs} + \mathbf{Av} + \mathbf{e},
\]

where \( \mathbf{e} \) is the quantization noise with covariance \( \Delta^2 \mathbf{I}_K \). The symbol \( \Delta \) denotes the quantization spacing defined as \( \Delta = \frac{\gamma}{M_K} \). When \( M_K \) is given, the value of \( \gamma \) determines the quantization spacing, and therefore, the variance of the quantization noise.

In the digital domain, the estimation of \( \mathbf{s} \), denoted as \( \hat{\mathbf{s}} \), is obtained as the output of the digital processing module \( \mathbf{B} \), yielding

\[
\hat{\mathbf{s}} = \mathbf{Bz}.
\]

The problem now is to jointly design the analog combiner \( \mathbf{A} \), the dynamic range \( \gamma \), and the digital processing matrix \( \mathbf{B} \), so that the mean square error (MSE) of the task estimate can be minimized. Mathematically, we have the following objective

\[
\min_{\mathbf{A}, \gamma, \mathbf{B}} E \left[ ||\mathbf{s} - \hat{\mathbf{s}}||^2 \right].
\]

B. Theoretical Results

According to the orthogonality principle, the MSE in (7), \( E[||\mathbf{s} - \hat{\mathbf{s}}||^2] \), can be re-expressed as

\[
E[||\mathbf{s} - \hat{\mathbf{s}}||^2] = E[||\mathbf{s} - \tilde{\mathbf{s}}||^2] + E[||\tilde{\mathbf{s}} - \hat{\mathbf{s}}||^2],
\]

where \( \tilde{\mathbf{s}} \) is the linear minimum mean square error (LMMSE) estimate of \( \mathbf{s} \) from \( \mathbf{y} \), that is, \( \tilde{\mathbf{s}} = \mathbf{y}\Gamma \), with \( \Gamma \) denoting the LMMSE estimation matrix. Note that the first term in the above equation is independent of \( \tilde{\mathbf{s}} \). The optimization problem in (7) can thus be equivalently replaced by

\[
\min_{\mathbf{A}, \gamma, \mathbf{B}} E \left[ ||\tilde{\mathbf{s}} - \hat{\mathbf{s}}||^2 \right],
\]

which is the same as [13]. Therefore, in the following, we directly provide the obtained optimization results and omit the proof.

Let \( \Sigma_{\mathbf{y}} \) be the covariance matrix of the received signal \( \mathbf{y} \), and \( w_l \), \( l = 1, \ldots, K \) the dither signal added to the input of
the \(i\)th quantizer. Let \(A^\circ\) and \(B^\circ\) the optimal analog and digital processing matrices that achieve the minimal MSE distortion. Then we have the following results [13]:

**Theorem 1:** For any analog combining matrix \(A\) and dynamic range \(\gamma\) such that \(\Pr(||Ay||_2 + w_l| > \gamma) = 0\), namely, the quantizers operate within their dynamic range with probability one, the digital processing matrix which minimizes the MSE is given by

\[
B^\circ(A) = \Gamma\Sigma_A A^H \left(A\Sigma_A A^H + \frac{2\gamma^2}{M^2_K} I_K\right)^{-1}. \tag{10}
\]

**Theorem 2:** For the hardware-limited quantization system based on the model depicted in Fig. 1, the optimal analog combining matrix is given by \(A^\circ = U_A A_A V_A^H \Sigma_V^{-1/2}\), where

1. \(V_A \in \mathbb{C}^{N \times N}\) is the right singular vectors matrix of \(\tilde{\Gamma} = \Gamma\Sigma_N^{1/2}\).
2. \(A_A \in \mathbb{C}^{K \times N}\) is a diagonal matrix with diagonal entries

\[
(A_A)_{i,i}^2 = \frac{2\kappa_p}{M^2_K \cdot K} \left(\zeta \cdot \lambda_{F,i} - 1\right)^+ \tag{11}
\]

where \(\kappa_p = \eta^2 \left(1 - \frac{2\eta^2}{3M^2_K}\right)^{-1}\) with \(\eta\) denoting a constant that is set to guarantee that the quantizer operates within the dynamic range [13]. \(\{\lambda_{F,i}\}\) are singular values of \(\tilde{\Gamma}\) arranged in a descending order, and \(\zeta\) is chosen such that

\[
\frac{2\kappa_p}{M^2_K \cdot K} \sum_{i=1}^{K} \left(\zeta \cdot \lambda_{F,i} - 1\right)^+ = 1.
\]

3. \(U_A \in \mathbb{C}^{K \times K}\) is a unitary matrix which guarantees that \(U_A A_A A_A^H U_A^H\) has identical diagonal entries.

The dynamic range of the quantizer is given by

\[
\gamma^2 = \frac{\eta^2}{K} \left(1 - \frac{2\eta^2}{3M^2_K}\right)^{-1}, \tag{11}
\]

and the resulting minimal achievable distortion is

\[
E[||\tilde{s} - \hat{s}||^2] = \sum_{i=1}^{K} \frac{\lambda_{F,i}^2}{\left(\zeta \cdot \lambda_{F,i} - 1\right)^+ + 1}. \tag{12}
\]

In our prototype, we configure the analog combiner according to Theorem 2, and the dynamic range of the scalar quantizer based on (11). The calculated matrix \(B^\circ\) in (10) is used for the task vector recovery in the digital domain. Details of the hardware implementation are discussed in the next section.

### III. Hardware Implementation

This section elaborates on the system architecture of the hardware prototype, which realizes task-based quantization for multi-user signal recovery detailed in the previous section. We first present the high-level system architecture in subsection III-A. The concrete structure of each component is provided in subsection III-B, and the design challenges are detailed in subsection III-C.

---

**Fig. 2.** The task-based quantization system.

**A. High-level Architecture**

Fig. 2 shows our hardware board, which consists of five main blocks: GUI, Signal generator, Analog combiner board, Sampling, and Computing center. Details of the employed hardware components are presented in Table I, and the major building components are as follows:

1. **GUI:** The graphical user interface (GUI) is used for controlling the system parameters, which allows the user to configure the experimental setup in a user-friendly environment. The main controllable parameters include the number of user terminals, receiving antennas, quantization bits, and the SNRs of the received signals. Based on these parameters, the MATLAB running on the computing unit generates input data for 16-channel digital-to-analog converters (DACs) that are located at the analog combiner board (details in III-A3), and the optimal weights for the analog combiner configuration.

2. **Signal Generator:** The digital data generated by MATLAB is then fed to an Field Programmable Gate Array (FPGA) board with 16 transmit channels DACs to generate analog.
waveform signals. This process is adopted to mimic the real-world receiving signals at the base station (BS).

3) Analog Combiner Board: The 16 baseband analog signals from the DAC are next fed into the analog combiner board, as illustrated in Fig. 3. To transmit them in the desired frequency, they are up-converted to 2.3GHz by 16 dedicated modulators. Then, the signal of each channel is passed through a 4-way power divider (splitter), yielding 64 analog RF signals in total. The 64 signals are then fed into a 4-combiner boards. Each combiner board is fed by 16 channels and has a single output. The combiner board is controlled by an analog vector multipliers device designed to control a signal’s gain and phase. The overall process is illustrated in Fig. 4. In this way, the tailored analog combiner board converts 16-channel signals to 4, implementing the function of the aforementioned analog combiner matrix $A$.

4) Sampling: The outputs (both I and Q) of the analog combiner board are fed into a sampling board. The four analog signals are down-converted from 2.3 GHz to 20 MHz, and are converted to digital signals by using 4DSP FMC168 16-bit digitizer card.

5) Computing Center: The four digital streams are then transferred to the Matlab application on the computing center. The Matlab mimics a digital low-bit quantization and then recovers multi-user signals in the digital domain. Finally, the results are displayed on the GUI to demonstrate the signal recovery performance of the task-based hardware prototype.

B. Details of Each Block

1) Waveform generation: The 16 digital baseband signals generated by the host application are transferred to the FPGA board in real-time by an Ethernet cable. The FPGA board generates the corresponding analog baseband signals waveform with a maximal frequency range of 100 MHz.

2) Analog Combiner: The analog combiner board is a self-designed dedicated hardware that realizes a controllable analog combiner network. As shown in Fig. 3, the board consists of four parts:

A. Up-conversion: The 16 input complex baseband (BB) signals, whose maximal frequency range is 100MHz, are up-converted to RF signals using a 2.3 GHz carrier waveform. The carrier is generated by a VSG25A vector signal generator. By up-conversion, the RF signals can represent the passband signals observed at the base station.

B. Passband signals splitting: The analog passband signal of each channel is split into four. In the considered setting here, we have 64 analog RF signals in total, which are combined for further processing. The board can support 4 RF-chain processing. Since the number of users is set as 2 in this experiment, we only use 2 of them to process the output RF signals.

C. Parameter generation and configuration for the combiner: Each split signal is fed into an amplifier, split again into two signals with a 90-degree offset. The two signals then enter into an ADL5390 analog vector multiplier. The analog vector multiplier implements the phase and gain of each analog combining weight, which is applied to combine the input signal. The applied weights are determined by the output DC level of an AD5674 octal 12-bit DACs with serial load capabilities, which receives control commands via Arduino Nano microcontroller device to configure the analog combining weights. The usage of controllable gains and phases requires a calibration stage when the interconnections are established, to guarantee that the configured weights are correctly translated into the desired phase and gain values.
D. Summing up of the incoming signals and down-conversion: The final step is summing the 16 output signals of each group after weighting, to obtain a combined passband signal. The signal is then down-converted by the same local oscillator that is employed for up-conversion, and filtered to baseband with a maximum 100 MHz bandwidth.

3) Quantization: The four output signals are forwarded to be sampled by the 4DSP FMC168 16-bit digitizer card. However, in task-based quantization, it is expected to use low-bit quantizers. We here use software simulation to mimic the hardware implementation of such a scalar quantizer defined as

\[ q(x) = \begin{cases} 
\Delta \left( \frac{x}{\Delta} + \frac{1}{2} \right), & \text{for } |x| < \gamma \\
\text{sign}(x)(\gamma - \frac{1}{2}), & \text{else,}
\end{cases} \tag{13} \]

where \( x \) is the input signal, \( \Delta = \frac{2\gamma}{M_K} \) represents the quantization spacing. The variable \( M_K \) is varied in the experiments for different number of bits. The symbol \( \text{sign}(\cdot) \) denotes the signum function, given by

\[ \text{sign}(x) = \begin{cases} 
+1, & x \geq 0 \\
-1, & \text{else.}
\end{cases} \tag{14} \]

4) Software (digital processing): The software part consists of two components: a computing center running the MATLAB-based host application, and a GUI-based control and display interface.

The computing center is a 64-bit computer with 8 CPU cores and 16GB RAM running the MATLAB-based host application. The application is responsible for generating the digital baseband signal, computing the optimal analog and digital processing matrices as detailed in Theorems 1 and 2, computing the dynamic range of the quantizer, and post-processing the digital output to recover the task vector.

The display part of the GUI presents the experiment results in two modes: the MSE distortion with respect to the number of bits, or SNR, as shown in Fig. 5. The control part provides a way for users to interact with the experiment setup, that is, it allows users to change the parameters used in the experiment. The main controllable parameters include the dimensionality of the received signal and the task vector, the SNR level for plotting MSE distortion versus the number of bits, and the number of bits for plotting MSE distortion versus SNR. Details of the supported parameter combinations are summarized in Table II.

C. Design Challenges

One of the critical challenges in implementing the analog combiner board is to guarantee that all RF chains operate within the linear dynamic range of the device. This will ensure...
that the combination of all 16 channels for each of our four output boards will result in an accurate summation. In our case, there are $16 \times 4 = 64$ RF chains that need to be calibrated, and each of their amplitude and phases need to be adjusted. In order to overcome this challenge, we introduced a calibration process that scanned through the amplitude and phase of each RF-chain and performed relevant modifications. This process is done by setting the DAC value for adjusting the I and Q amplitude for each signal, as shown in Fig. 6(a) and Fig. 6(b). Specifically, Fig. 6(a) presents the In-Phases signals which are received from the 16 channels in a single board, while Fig. 6(b) presents the Quadrature-Phases signals which are received from the 16 channels in the same board. Fig. 6(c) represents the calibrated 16 RF-chain signals from the output board. The process is an iterative process that identifies the best linear point in which the Euclidean distance from the center is optimal.

IV. HARDWARE RESULTS

In this section, hardware experiments are carried out to evaluate the performance of task-based quantization in multi-user signal recovery. We consider the case where the number of users is $K = 2$, and the number of antennas at the BS is $N = 16$. The transmitted signal from the two users obeys zero-mean and unit variance Gaussian distribution, and the channel is generated based on (2) with $L = 3$ paths for each user. All the results are obtained by averaging 2000 experiments.

As a comparison, task-agnostic vector quantization results are included. Different from scalar quantizers which operate on a scalar input, vector quantizers have a multivariate input. Therefore, vector quantization cannot be implemented using practical serial scalar ADCs. Here, we employ simulated task-agnostic vector quantization as a comparison since it represents the best system one can construct when the quantizer is designed separately from the task [13]. Furthermore, the ideal case where no quantization is imposed on the sampled signal is also provided as a benchmark. The GUI provides a much smaller number of bits, i.e., from $\log_2 16$ to $\log_2 5$, task-based quantization can achieve satisfying performance with a much smaller number of bits, i.e., from 16 receivers with 16 bits each to 2 receivers with 5 bits each. Furthermore, the hardware results agree with the simulated ones, with only a small performance gap caused by imperfect hardware calibration and hardware noise, verifying the effectiveness of the task-based quantization hardware prototype.

V. CONCLUSION

With the increase of data rate, conventional analog-to-digital converters (ADCs) which sample at the Nyquist rate and use high-resolution quantizers face challenges in storage and power consumption. To reduce quantization bits, task-based quantization has been proposed by exploiting the underlying task for the system design. In this work, we presented the application of task-based quantization in multi-user signal recovery and provided a hardware implementation. The prototype consists of a tailored configurable analog combiner board and a software-based processing and demonstration system. Experimental results illustrate the superiority of task-based quantization over conventional ADCs, mitigating the gap between the theory and its practical application.
REFERENCES


