Hi @Alperen,I've read through your post and I think you're on exactly the right track suspecting fixed-point quantization as the culprit. The fact that your setup works fine when stationary but shows frame errors during motion is a classic signature of insufficient precision in the receiver's tracking loops and estimation algorithms. When the channel is static, even degraded estimates can lock on, but once you introduce Doppler shifts and time-varying fading from motion, the quantization noise really starts to hurt your synchronization and channel estimation accuracy.
You mentioned the default output type being something like (16,13) and the samples looking small. That's actually a good observation because the HDL OFDM examples do tend to use fairly conservative word lengths to keep the generated code compact. The problem is that 16 bits with 13 fractional bits doesn't leave much headroom, especially through operations like FFTs where you need bit growth, and it definitely impacts things like correlators and phase estimators that are critical for dealing with a moving receiver.
Let me walk through what I'd suggest based on your questions about where to increase word lengths and how to handle scaling. Starting with the stages that typically need more precision, the synchronization correlators are probably your biggest pain point right now. The SS detector uses Zadoff-Chu correlation and when you're moving, even small quantization errors in the correlation peak detection can throw off your timing synchronization. I'd recommend bumping the correlation accumulator up to at least 28 or even 32 bits. Yes it costs more resources but timing sync is make-or-break for OFDM.
Next up would be your frequency estimation and the CORDIC/NCO blocks. The CFO estimator is trying to measure phase differences from the cyclic prefix correlation, and CORDIC precision directly impacts how accurately you can convert that to a phase angle. The rule of thumb for CORDIC is that if you want L bits of precision, you need a fraction length around L plus log2(L plus integer length). So for 16-bit target precision you're looking at 20-21 fractional bits. Also make sure your CORDIC iterations are set to word length minus one. For the NCO accumulator that's driving your frequency correction, I'd go with at least 24 to 32 bits because phase quantization in the NCO will show up as residual frequency error that gets worse over time.
The FFT and IFFT blocks are interesting because they naturally grow bits to prevent overflow. With a 128-point FFT you get 7 bits of growth which is mathematically correct but can lead to awkward data types. What I'd do is increase your input precision going into the FFT from the default (16,13) up to something like (20,16) or even (24,20). Then let the FFT do its bit growth thing but also configure the multiply output scaling carefully so you don't end up with crazy wide data paths downstream. Also switch the FFT rounding mode from Floor to Convergent if you can afford the extra logic because Convergent rounding is unbiased and will give you better SNR.
For channel estimation and equalization, this is where your motion problem is probably getting amplified. The channel estimator is working with pilot symbols and trying to interpolate across subcarriers, and the equalizer is doing division which is notoriously sensitive to quantization. I'd match whatever increased precision you gave to the FFT output, so if you went to (20,16) there, use the same for your channel estimates. For the equalizer's division operations, consider going even wider for intermediate calculations, maybe 28 or 32 bits, and only quantize back down at the final output. The equalizer typically adds one integer bit and drops one fractional bit which is fine if you started with enough precision but problematic if you're already starved for bits.
Timing recovery PLL loop filters are another sensitive spot. Any loop that has an accumulator really suffers from quantization because errors accumulate over time. I'd use 32-bit accumulators with saturation enabled for anything in your timing or frequency tracking loops.
Now for your scaling question, which is really important. The tricky thing about OFDM is you've got signals going through multiple stages of filtering, FFTs, multiplication, and division, so you need to think about scaling end-to-end. Looking at the MathWorks examples, they typically use a global scaling approach where you scale at the receiver input by something like 0.875 to give yourself about 12 percent headroom. This prevents overflow in the early stages. Then at various points through the receiver you might scale again, but the key is to do it deliberately and not just randomly multiply things hoping it helps.
One approach is to scale right after your FFT output before it goes into channel estimation. You could empirically tune this by looking at the magnitude of your FFT outputs in simulation and choosing a scale factor that keeps you in a good range, maybe using 70-80 percent of your available range. At the transmitter side, you'd scale before the DAC to hit your target output range. The example models often multiply by the reciprocal of the FFT size, so 1/128 in your case, which normalizes the IFFT output.
What you want to avoid is having multiple sequential scaling operations that each introduce rounding errors that compound. It's better to have fewer, well-thought-out scaling points. Also be really careful about where you round versus truncate. Truncation biases your signal downward and can cause DC offsets to build up over time. Rounding, especially convergent rounding, is much better.
For HDL Coder settings, definitely switch your rounding method to Convergent instead of Floor in the Model Configuration Parameters under Fixed-Point Tool. Floor is the default but it introduces bias. Convergent costs you maybe 10-20 percent more LUTs but typically buys you 1-2 dB better SNR which is probably worth it in your case. For saturation, don't just enable it globally because that's wasteful. Turn it on specifically for accumulators, loop filters, and anywhere you have feedback that could integrate errors. For things like straight data paths through filters, wrapping on overflow actually uses fewer resources and is usually fine.
For your FFT block specifically, go into the block parameters and look at the complex multiplier implementation. The "use 3 multipliers and 5 adders" option saves DSP blocks if you're resource constrained. Also make sure automatic pipelining is enabled in the HDL Workflow Advisor especially for high sample rate paths.
If you really want to be thorough, use the Fixed-Point Tool to analyze your design. You can set up a test bench and run it with the Fixed-Point Tool enabled, which will log min/max ranges for every signal in your design. Then you can see where you're using too many bits and where you're cutting it too close. The tool can even propose data types automatically based on simulation ranges, though you'll want to add some margin to those proposals because simulation might not cover worst-case conditions.
In terms of implementation sequence, I'd start with the highest impact, lowest effort changes first. Bump up your correlator precision to 28-32 bits and improve your CORDIC/NCO precision to 20-24 bits. Run your motion test again and see if that alone fixes most of your issues. If you're better but still seeing errors, then move to increasing FFT precision and then channel estimation/equalization precision. Finally, add the global scaling factors and switch to convergent rounding.
Resource-wise, going from 16-bit to 20-bit data paths typically costs you 30-40 percent more LUTs and 25-35 percent more DSP blocks. Enabling saturation where you need it adds another 15-20 percent LUTs. Convergent rounding adds 10-15 percent over Floor. So you're probably looking at 60-80 percent total resource increase, but you'll get dramatically better robustness. If resources are really tight, you could consider using HDL Coder's native floating-point support for just the critical blocks like CORDIC, NCO, and channel estimator. That uses more resources than fixed-point but less than you'd think, and it basically eliminates quantization as a problem in those blocks. You could keep your filters and FFTs in fixed-point and just use single-precision floating-point for the estimation and correction loops.
One more thing to check is whether your AGC or any gain control in the receiver is properly scaled. If your input signal level is varying a lot, that can interact badly with fixed-point quantization. You want your signal to be using most of your available dynamic range but not clipping.
I think if you follow this approach, starting with the sync correlators and tracking loops, you should see a big improvement in your frame error rate during motion. The key insight is that static channels are forgiving and hide quantization problems, but dynamic channels expose them ruthlessly. Once you get your precision up in the right places, your FPGA implementation should track the MATLAB floating-point behavior much more closely.
Let me know how it goes.