DDR4 A Controller PHY For Managed DRAM Solution With Damping-Resistor-Aided Pulse-Based Feed-Forward Equalizer
DDR4 A Controller PHY For Managed DRAM Solution With Damping-Resistor-Aided Pulse-Based Feed-Forward Equalizer
Abstract— A controller PHY for high-capacity DRAM is (DIMMs). When using multiple DIMMs for additional capac-
presented. To reduce precursor and postcursor intersymbol ity, registered DIMMs (RDIMMs) are used to reduce the load-
interference due to its dispersive channel characteristics and ing on the command/address (C/A) lines [2], and load-reduced
a heavy load of many DRAM chips and to attenuate reflec-
tion on a highly reflective command/address (C/A) channel, DIMMs (LRDIMMs) with additional data buffers are also used
a damping-resistor-aided three-tap pulse-based feed-forward to reduce further the loading of the data bus [3]. Another way
equalizer (PB-FFE) is introduced. An appropriate damping resis- to increase capacity is to stack multiple DRAM chips on one
tance can attenuate reflection, and the PB-FFE compensates for package. In this case, to prevent an increase in input and output
increased insertion loss due to the damping resistor. In addition, (IO) loadings, the internal data buses of each DRAM are
the current flows only before and after a signal transition in
the PB-FFE, improving energy efficiency and maintaining the usually connected using a through-silicon-via (TSV); however,
turn-ON resistance during the no-transition region. A controller the TSV process increases the manufacturing cost. To over-
PHY based on this equalizer was fabricated in a 55-nm CMOS come the cost problem, a managed DRAM solution (MDS)
process. The PB-FFE increases the timing margin of the C/A was recently proposed as a cost-efficient solution with the
signal from 0.23 to 0.29 UI at 1067 Mb/s. At 2133 Mb/s, the read moderate performance [4]. In the MDS DIMM, eight DRAM
timing and voltage margins of the DQ signal are 0.53 UI and
211 mV after read training, and its write margin is 0.72 UI and chips are stacked in each package using wire bonding, and
230 mV, respectively, after write training. the IO loadings for four DQ lines are reduced using an on-
Index Terms— Dram interface, dual-inline memory module die repeater. However, there are 33 C/A pins in each DRAM,
(DIMM), feed-forward equalizer (FFE), glitch-free digitally con- and too much area is required to repeat these ones. Therefore,
trolled delay-line, memory controller, pulse-based equalizer. C/A lines are connected to all DRAM chips, and each C/A
transmitter (Tx) of the controller has to drive 80 DRAM
I. INTRODUCTION chips. As a result, the C/A lines have a very large capacitive
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
2564 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 56, NO. 8, AUGUST 2021
Fig. 1. PHY architecture of the MDS controller; this article is predominantly about the C/A Tx, which is a sub-block of the controller PHY.
of cost and power consumption to implement equalizers such this increases insertion loss; the three-tap PB-FFE compen-
as continuous-time linear equalizer (CTLE) and feed-forward sates for this insertion loss. The proposed PB-FFE only injects
equalizer (FFE) in each DRAM receiver, and the precursor current before and after a signal transition to compensate
ISI is generally dominated by the first precursor; thus, a Tx for precursor and postcursor ISI and make no current flows
equalizer is appropriate to compensate for this. Conventional through the PB-FFE when there is no transition. In addition,
feed-forward equalizing schemes in the transmitter shift the the impedance of the output driver does not change during
output data and sum the tap currents, which waste power and the no-transition region. The position of the third tap, which
change the output impedance of the drivers when there are no can cancel the postcursor reflection, can be controlled by
transitions in the signal [11]. introducing an adjustable delay. Furthermore, for our PHY
A pulse-based FFE (PB-FFE) [11]–[15] can overcome the with 132 C/A Txs, the area of an encoder and serializers needs
above disadvantages of the conventional FFE while compen- to be minimized; thus, the PB-FFE uses one serializer and a
sating for ISI. Wang and Gai [11] presented the PB-FFE using simple encoder to encode the serialized data.
precoded data, but the area and wire-consuming encoder and
ten serializers occupy a large area. In [12], a small current II. MDS C ONTROLLER PHY A RCHITECTURE
and a large termination resistor are used. However, using Fig. 1 shows the architecture of the MDS controller PHY
a termination resistor larger than the channel characteristic and how it communicates with DRAMs. The PHY consists
impedance can cause large reflection in the highly reflective of an all-digital phase-locked loop (ADPLL), an all-digital
channel with heavy DRAM loading. A pre-emphasis-based delay-locked loop (ADDLL), a clock distribution circuit, a link
FFE using the transition detector cannot remove the precursor training finite-state machine (LTFSM), eight pairs of clock
ISI [13], [14]. The PB-FFE in [15] requires a quadrature clock signal (CK) Txs, four groups of 33 Txs for the C/A lines,
for an additional return-to-zero (RZ) data aligner, resulting 80 DQ signal transceivers for 20 nibbles of data, and 20
in increased power consumption. Moreover, these PB-FFE transceiver pairs for the corresponding DQS signals. Here,
designs cannot reduce the postcursor reflections. a nibble is a bundle of four IOs and one strobe pair. The
To alleviate the above issues, we present an MDS con- ADPLL generates both the global CK (PHYCLK) used by
troller PHY with damping-resistor-aided three-tap pulse-based the transceivers and the system clock (SYSCLK) used by
feed-forward equalizing C/A Tx on heavy load DRAM inter- the LTFSM. The frequency of PHYCLK is 1066 MHz, and
faces. The C/A Tx needs to drive the signal to 80 DRAM that of SYSCLK is 533 MHz. The ADDLL has the same
dies without receiver termination, making a highly reflective delay line as that in each transceiver and provides a delay
channel environment. A damping resistor is used at the DRAM control-code corresponding to 1-cycle of PHYCLK to each
receiver to attenuate reflection in this channel environment but transceiver. Each delay line divides the received control-code
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
KO et al.: CONTROLLER PHY FOR MDS WITH DAMPING-RESISTOR-AIDED PB-FFE 2565
Fig. 3. C/A channel environment (a) without and (b) with damping resistor.
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
2566 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 56, NO. 8, AUGUST 2021
Fig. 5. Variation of the simulated eye width with the input parasitic resistance
and capacitance of each DRAM.
Fig. 7. Return loss of the C/A channel, probed on the output of the C/A Tx
in Fig. 2
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
KO et al.: CONTROLLER PHY FOR MDS WITH DAMPING-RESISTOR-AIDED PB-FFE 2567
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
2568 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 56, NO. 8, AUGUST 2021
Fig. 13. Simulated eye diagram of the farthest DRAM input (a) without and
(b) with applying the PB-FFE at 1.2 Gb/s. The Vertical eye mask is 200 mV.
Fig. 10. Detailed implementation of the output driver in the PB-FFE. for the same input rectangular mask. Considering supply and
reference voltage noise, crosstalk, receiver offset in DRAM,
and timing skew between C/A signals, the required target of
the input mask is 200 ps of a horizontal eye with 200 mV of
a vertical eye. The simulation result meets the required eye
mask.
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
KO et al.: CONTROLLER PHY FOR MDS WITH DAMPING-RESISTOR-AIDED PB-FFE 2569
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
2570 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 56, NO. 8, AUGUST 2021
Fig. 16. (a) Block diagram and (b) timing diagram of the proposed glitch-free
DCDL.
Fig. 19. (a) Measured waveforms of CK-CKB, CS, and C/A at the package
ball, and simulated C/A signal at (b) package ball and (c) receiver buffer input
of the farthest DRAM die. All of them were observed at the B4 DRAM ODP
in Fig. 2.
Fig. 17. Die micrograph of the MDS controller chip.
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
KO et al.: CONTROLLER PHY FOR MDS WITH DAMPING-RESISTOR-AIDED PB-FFE 2571
TABLE I
P ERFORMANCE S UMMARY AND C OMPARISON W ITH O THER DRAM I NTERFACES
Fig. 22. Measured read timing and voltage margin on the DIMM.
Fig. 20. Measured C/A Margin with and without the PB-FFE.
Fig. 23. Measured write timing and voltage margin on the DIMM.
Fig. 21. Read margin measured by ATE.
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
2572 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 56, NO. 8, AUGUST 2021
TABLE II
P ERFORMANCE S UMMARY AND C OMPARISON W ITH O THER PB-FFE S
R EFERENCES
[1] A. M. Ionescu, “Energy efficient computing and sensing in the Zettabyte
era: From silicon to the cloud,” in IEDM Tech. Dig., San Francisco, CA,
USA, Dec. 2017, pp. 1.2.1–1.2.8.
[2] DDR4 SDRAM Registered DIMM Design Specification, Standard
21C 4.20.28-1, JEDEC, May 2019.
[3] DDR4 SDRAM Load Reduced DIMM Design Specification, Standard
21C 4.20.27-1, JEDEC Aug. 2015.
[4] S. Lee et al., “23.4 a 512GB 1.1 V managed DRAM solution with
16GB ODP and media controller,” in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA, Feb. 2019,
pp. 384–386.
[5] J. Ren et al., “Precursor ISI reduction in high-speed I/O,” in Proc. IEEE
Symp. VLSI Circuits, Kyoto, Japan, Jun. 2007, pp. 134–135.
[6] W.-Y. Shin et al., “A 4.8Gb/s impedance-matched bidirectional multi-
drop transceiver for high-capacity memory interface,” in Proc. IEEE
Fig. 24. Power breakdown of (a) write and (b) read operation. Int. Solid-State Circuits Conf., San Francisco, CA, USA, Feb. 2011,
pp. 494–496.
[7] W. Lee et al., “Parallel branching of two 2-DIMM sections with
the controller during burst write and read operation is 1.97 W, write-direction impedance matching for an 8-Drop 6.4-Gb/s SDRAM
interface,” IEEE Trans. Compon., Packag., Manuf. Technol., vol. 9, no. 2,
which satisfies the requirement for an MDS DIMM [4]. Table I pp. 336–342, Feb. 2019.
lists the comparison of this PHY to other DRAM interfaces, [8] J. Seo et al., “A 7.8-Gb/s 2.9-pJ/b single-ended receiver with 20-tap DFE
and Table II shows the performance comparison with other for highly reflective channels,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 28, no. 3, pp. 818–822, Mar. 2020.
PB-FFE designs. Our damping-resistor-aided PB-FFE can [9] H.-J. Chi et al., “A single-loop SS-LMS algorithm with single-
transmit the signal to many loads with better energy efficiency. ended integrating DFE receiver for multi-drop DRAM interface,” IEEE
J. Solid-State Circuits, vol. 46, no. 9, pp. 2053–2063, Sep. 2011.
[10] S.-J. Bae, H.-J. Chi, Y.-S. Sohn, and H.-J. Park, “A 2Gb/s 2-tap DFE
VI. C ONCLUSION receiver for mult-drop single-ended signaling systems with reduced
A controller PHY for a high-capacity DRAM solution noise,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, San Francisco, CA, USA, 2004, pp. 244–245.
was presented. It was mounted on an MDS DIMM [4] and [11] Y. Wang and W. Gai, “Power-efficient pre-emphasis method for transmit-
interfaced to 40 DRAM packages. This controller supports ters with LVDS drivers,” Electron. Lett., vol. 50, no. 24, pp. 1811–1813,
all the training sequences specified in the DDR4 standard Nov. 2014.
[12] B. Kim and V. Stojanovic, “An energy-efficient equalized transceiver
including link trainings for C/A, read, and write operation. for RC-dominant channels,” IEEE J. Solid-State Circuits, vol. 45, no. 6,
A glitch-free DCDL reduces training time. To attenuate reflec- pp. 1186–1197, Jun. 2010.
tion and improve the ISI due to the heavy load of a number [13] S. Han, S. Lee, M. Choi, J.-Y. Sim, H.-J. Park, and B. Kim,
of DRAM chips on a C/A channel, a damping-resistor-aided “A Coefficient-Error-Robust feed-forward equalizing transmitter for eye-
variation and power improvement,” IEEE J. Solid-State Circuits, vol. 51,
PB-FFE is used in the C/A Tx. The controller was fabricated in no. 8, pp. 1902–1914, Aug. 2016.
a 55-nm CMOS and occupies 77.2 mm2 . Its C/A timing margin [14] H.-G. Ko, S. Shin, J. Oh, K. Park, and D.-K. Jeong, “6.7 an 8Gb/s/μm
at 1067 Mb/s is improved from 0.23 to 0.29 UI by applying the FFE-combined crosstalk-cancellation scheme for HBM on silicon inter-
poser with 3D-staggered channels,” in IEEE Int. Solid-State Circuits
PB-FFE. At 2133 Mb/s, the measured read timing and voltage Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA, Feb. 2020,
margins are 0.53 UI and 211 mV after read training, and the pp. 128–130.
write margins are 0.72 UI and 230 mV after write training. [15] S.-G. Kim, T. Kim, D.-H. Kwon, and W.-Y. Choi, “A 5–8 Gb/s low-
power transmitter with 2-tap pre-emphasis based on toggling serializa-
The power consumption of the controller during burst write tion,” in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), Toyama,
and read operation is 1.97 W which satisfies the requirement Japan, Nov. 2016, pp. 249–252.
of MDS DIMM [4]. Our damping-resistor-aided PB-FFE can [16] DDR4 SDRAM, Standard JESD79-4C, JEDEC, Jan. 2020.
[17] H.-H. Chuang et al., “Signal/Power integrity modeling of high-speed
be applied to the standard RDIMM or LRDIMM to drive the memory modules using chip-package-board coanalysis,” IEEE Trans.
C/A channel with improved power efficiency. Electromagn. Compat., vol. 52, no. 2, pp. 381–391, May 2010.
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.
KO et al.: CONTROLLER PHY FOR MDS WITH DAMPING-RESISTOR-AIDED PB-FFE 2573
[18] C. Menolfi et al., “A 16Gb/s source-series terminated transmitter in Sangyoon Lee received the B.S. degree in electrical
65 nm CMOS SOI,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) and electronics engineering from Korea University,
Dig. Tech. Papers, San Francisco, CA, USA, Feb. 2007, pp. 446–447. Seoul, South Korea, in 2016. He is currently pursu-
[19] J.-H. Chae, Y.-U. Jeong, and S. Kim, “Data-dependent selection of ing the Ph.D. degree with Seoul National University,
amplitude and phase equalization in a quarter-rate transmitter for mem- Seoul.
ory interfaces,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 9, His research interests include high-speed and
pp. 2972–2983, Sep. 2020. low-power I/O interface and memory interface.
[20] J.-T. Kwak, C.-K. Kwon, K.-W. Kim, S.-H. Lee, and J.-S. Kih, “A low
cost high performance register-controlled digital DLL for 1 Gbps×32
DDR SDRAM,” in Proc. Symp. VLSI Circuits. Dig. Tech. Papers, Kyoto,
Japan, 2003, pp. 283–284.
[21] D. De Caro, “Glitch-free NAND-based digitally controlled delay-lines,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 1,
pp. 55–66, Jan. 2013.
[22] W. Yun et al., “A digital DLL with hybrid DCC using 2-step duty Jaewook Kim received the B.S. degree in electrical
error extraction and 180◦ phase aligner for 2.67Gb/S/pin 16Gb 4-H and electronics engineering from Korea University,
stack DDR4 SDRAM with TSVs,” in IEEE Int. Solid-State Circuits Seoul, South Korea, in 2015. He is currently pursu-
Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA, Mar. 2015, ing the Ph.D. degree with Seoul National University,
pp. 1–3. Seoul.
[23] M. Kim et al., “A 4266 Mb/s/pin LPDDR4 interface with an asynchro- His research interests include high-speed I/O and
nous feedback CTLE and an adaptive 3-step eye detection algorithm for memory interfaces.
memory controller,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 65,
no. 12, pp. 1894–1898, Dec. 2018.
[24] M. Kossel et al., “DDR4 transmitter with AC-boost equalization and
wide-band voltage regulators for thin-oxide protection in 14-nm SOI
CMOS technology,” in Proc. 43rd IEEE Eur. Solid State Circuits Conf.,
Leuven, Belgium, Sep. 2017, pp. 115–118.
Authorized licensed use limited to: Korea Advanced Inst of Science & Tech - KAIST. Downloaded on May 24,2025 at 15:17:07 UTC from IEEE Xplore. Restrictions apply.