INTEGRATED CIRCUIT DESIGN FOR HYBRID
OPTOELECTRONIC INTERCONNECTS

by
Kehan Zhu

A dissertation
submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy in Electrical and Computer Engineering
Boise State University

December 2016
DEFENSE COMMITTEE AND FINAL READING APPROVALS

of the dissertation submitted by

Kehan Zu

Dissertation Title: Integrated Circuit Design for Hybrid Optoelectronic Interconnects

Date of Final Oral Examination: 4 December 2016

The following individuals read and discussed the dissertation submitted by student Kehan Zu, and they evaluated his presentation and response to questions during the final oral examination. They found that the student passed the final oral examination.

John Chiasson, Ph.D. Co-Chair, Supervisory Committee
Vishal Saxena, Ph.D. Co-Chair, Supervisory Committee
Hao Chen, Ph.D. Member, Supervisory Committee
Wan Kuang, Ph.D. Member, Supervisory Committee
Subhanshu Gupta, Ph.D. External Examiner

The final reading approval of the dissertation was granted by John Chiasson, Ph.D., Co-Chair of the Supervisory Committee, and Vishal Saxena, Ph.D., Co-Chair of the Supervisory Committee. The dissertation was approved by the Graduate College.
Dedicated to My Parents
Acknowledgments

Looking back to when I arrived here in Boise, in 2012, I can remember those years since with gratitude: gratitude to so many here, who helped me. I had graduated with my Master’s Degree in China and had worked for four years before making the decision to come to Boise State University to study for my PhD. I am so thankful to have made it here. I wish I could name everyone who helped me, encouraged me, warmly welcomed me, but there are too many of you wonderful people to name right now. I must mention a few, however.

First, I would like to acknowledge my BSU advisor, Dr. Vishal Saxena, who introduced me to a wonderful research topic. His teaching skills, multi-disciplinary knowledge and unique insight inspired me throughout the process of my doctoral research.

I’d also like to thank the ECE department at BSU for funding my PhD program and MOSIS educational program for providing the two electrical chip tapeouts. Thanks to Sakkarapani Balagopal for assisting me during my first chip tape-out in November of 2013. Thanks to Virginia Molina for helping with the grating coupler alignment. And thanks to Dr. John Chiasson, Dr. Hao Chen and Dr. Wan Kuang, on my supervisory committee, for their support and encouragement during my PhD study.

I’m especially thankful to Hewlett-Packard Labs for providing the unique opportunity to work as a research associate in their Palo Alto facilities from May 2015 through May 2016. That was a great experience, contributing to a leading-edge project that...
might contribute to HP’s future. Thanks, Cheng Li, for your mentoring and Marco Fiorentino for your support there. Thanks, also, for all the members of the team; I learned a great deal from you. Nan Qi and Kunzhi Yu, thank you for sharing your knowledge of PCB design and chip testing.

I would also like to thank Ran Ding and Zhe Xuan. They were so helpful in discussions at the early stage of the MZM Verilog-A model development.

Thanks also to Don Dutcher and Ann Dutcher, residents of Boise. They have been so nice to me and willing to help me on anything during my stay in Boise. They are just like my parents in America.

Last, and definitely not least, I gratefully thank my parents back in Xiangtan City, Hunan Province, China, for their unconditional support.
This dissertation focuses on high-speed circuit design for the integration of hybrid optoelectronic interconnects. It bridges the gap between electronic circuit design and optical device design by seamlessly incorporating the compact Verilog-A model for optical components into the SPICE-like simulation environment, such as the Cadence design tool.

Optical components fabricated in the IME 130nm SOI CMOS process are characterized. Corresponding compact Verilog-A models for Mach-Zehnder modulator (MZM) device are developed. With this approach, electro-optical co-design and hybrid simulation are made possible.

The developed optical models are used for analyzing the system-level specifications of an MZM based optoelectronic transceiver link. Link power budgets for NRZ, PAM-4 and PAM-8 signaling modulations are simulated at system-level. The optimal transmitter extinction ratio (ER) is derived based on the required receiver’s minimum optical modulation amplitude (OMA).

A limiting receiver is fabricated in the IBM 130 nm CMOS process. By side-by-side wire-bonding to a commercial high-speed InGaAs/InP PIN photodiode, we demonstrate that the hybrid optoelectronic limiting receiver can achieve the bit error rate (BER) of $10^{-12}$ with a -6.7 dBm sensitivity at 4 Gb/s.

A full-rate, 4-channel $2^9$-1 length parallel PRBS is fabricated in the IBM 130 nm SiGe BiCMOS process. Together with a 10 GHz phase locked loop (PLL) designed
from system architecture to transistor level design, the PRBS is demonstrated operating at more than 10 Gb/s. Lessons learned from high-speed PCB design, dealing with signal integrity issue regarding to the PCB transmission line are summarized.
# TABLE OF CONTENTS

ACKNOWLEDGMENTS ......................................................... v

ABSTRACT ................................................................. vii

LIST OF TABLES ......................................................... xiii

LIST OF FIGURES ......................................................... xiv

LIST OF ABBREVIATIONS ........................................... xxii

1 Introduction ......................................................... 1
   1.1 Motivation ....................................................... 3
   1.2 Contributions ................................................... 4
   1.3 Dissertation Organization ..................................... 4

2 MZM Device Characterization and Behavioral Modeling ........ 6
   2.1 MZM Device Mechanism ......................................... 7
   2.2 Modeling for MZM ............................................... 12
      2.2.1 Grating Coupler ........................................... 12
      2.2.2 Silicon Waveguide ......................................... 15
      2.2.3 High-Speed Phase Modulator ............................. 15
      2.2.4 Low-Speed Phase Modulator .............................. 19
         2.2.4.1 Thermal Phase Modulator ............................ 20
<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.2.4.2 PIN Phase Modulator</td>
<td>21</td>
</tr>
<tr>
<td>2.3 EO Co-Design Consideration</td>
<td>21</td>
</tr>
<tr>
<td>2.3.1 Current-Mode Drive</td>
<td>22</td>
</tr>
<tr>
<td>2.3.2 Voltage-Mode Drive</td>
<td>27</td>
</tr>
<tr>
<td>2.3.3 Velocity Mismatch</td>
<td>27</td>
</tr>
<tr>
<td>2.4 MZM Measurement and Behavioral Simulation</td>
<td>31</td>
</tr>
<tr>
<td>2.5 Summary</td>
<td>34</td>
</tr>
<tr>
<td>3 A Reconfigurable MZM Based Optical Link Budget Analysis</td>
<td>35</td>
</tr>
<tr>
<td>3.1 Derive OMA for Receiver</td>
<td>36</td>
</tr>
<tr>
<td>3.2 Determine ER for Transmitter</td>
<td>38</td>
</tr>
<tr>
<td>3.3 Correlation between Transmitter and Receiver</td>
<td>45</td>
</tr>
<tr>
<td>3.4 Reconfigurable MZM Transmitter Simulation</td>
<td>47</td>
</tr>
<tr>
<td>3.5 Summary</td>
<td>48</td>
</tr>
<tr>
<td>4 A Hybrid Optoelectronic Limiting Receiver</td>
<td>50</td>
</tr>
<tr>
<td>4.1 Receiver Architecture</td>
<td>51</td>
</tr>
<tr>
<td>4.1.1 Photodiode and Trans-impedance Amplifier</td>
<td>52</td>
</tr>
<tr>
<td>4.1.1.1 Gain and Bandwidth</td>
<td>54</td>
</tr>
<tr>
<td>4.1.1.2 Noise and Sensitivity</td>
<td>56</td>
</tr>
<tr>
<td>4.1.2 Limiting Amplifier</td>
<td>59</td>
</tr>
<tr>
<td>4.1.2.1 Gain Stages with Active Feedback</td>
<td>60</td>
</tr>
<tr>
<td>4.1.2.2 DC Offset Compensation</td>
<td>62</td>
</tr>
<tr>
<td>4.1.2.3 Large-Signal in Limiting Region</td>
<td>63</td>
</tr>
<tr>
<td>4.1.3 Output Buffer</td>
<td>65</td>
</tr>
<tr>
<td>4.2 Experimental Results</td>
<td>68</td>
</tr>
</tbody>
</table>
6.3.5 Creation of PAM Signaling ........................................ 114
6.4 Experimental Results .................................................. 115
  6.4.1 Packaging and Socket ........................................... 116
  6.4.2 PCB Engineering .................................................. 117
6.5 Summary ............................................................... 124

7 Conclusion ................................................................. 125

REFERENCES ............................................................... 128

APPENDIX A Verilog-A to Enable Optical Simulation ............ 135
APPENDIX B Determine the PRBS Feedback Tap .................... 139
APPENDIX C First Author Publications during 2013-2016 ....... 141
LIST OF TABLES

2.1 RLGC parameters at 10 GHz ......................................................... 18
2.2 Parameter description used in Equation 2.9 and Equation 2.10, corresponding values used for curve fitting in Figure 2.10 are listed. ........ 19

4.1 TIA design parameters ............................................................... 54
4.2 Comparison of the optoelectronic RX fabricated in 130 nm (SOI) CMOS process ................................................................. 74

5.1 Loop filter parameters when $f_{VCO} = 11$ GHz, $N = 128$, $b = 25.57$, $c = 9$, $PM = 65^\circ$ .......................................................... 89
5.2 Simulated opppcres resistor and MIM capacitor characteristics at three corners in IBM8HP process ........................................... 89
5.3 Noise transfer functions from PLL o/p to each noise sources ........... 93

6.1 Comparison with recent PRBS generators published in JSSC ........... 103
6.2 QFN package electrical parasitic provided by the vendor ................. 116
6.3 Design parameters of single-ended and differential transmission lines made of RO4350B Rogers material .................................. 120
6.4 Transient measurement for RF cable and three different transmission lines with 500 $mV_{pp}$ PRBS-7 pattern at 10 and 15 Gb/s, respectively. 121
LIST OF FIGURES

1.1 Simulated $f_T$ versus current density for four generations of IBM processes. The width for HBT and MOS are $5 \mu m$ and $15 \mu m$, respectively. Minimum length is used. .................................................. 2

2.1 Illustration of a MZM device, HSPM cross-section, not drawn to scale. . 9

2.2 Optical power transmission characteristic of the MZM as a function of the phase difference with and without considering the insertion loss introduced by the optical components. ......................... 10

2.3 Layout of a PAM4 MZM device. ........................................ 11

2.4 Measured transmission spectrum characteristic of the MZM device shown in Figure 2.3. .................................................. 11

2.5 Fiber array alignment to grating couplers on the silicon photonic die fabricated as a part of this research. .............................. 13

2.6 Loss profiles of IME TE 1550 nm grating coupler tested with $22^\circ$ polished angle fiber array.................................................. 14

2.7 Decompose HSPM model into electronic model and photonic model. . 17

2.8 The object properties of a Verilog-A HSPM cell. ......................... 17

2.9 Concatenate 1000 HSPM cells in Cadence schematic. .................... 18

2.10 Simulated and measured results comparison of (a) the change of relative phase shift and (b) pn junction capacitance as a function of the reverse-biased voltage on a 5 mm long HSPM. ....................... 19
2.11 Cross section of (a) thermal phase modulator and (b) p-i-n phase modulator (not drawn to scale). ........................................ 20
2.12 DC characteristic of a 200 µm long thermal phase modulator........... 21
2.13 Flowchart for the MZM-based transmitter design. ......................... 22
2.14 Schematic of the NRZ TX circuit to drive the MZM......................... 24
2.15 (a) CML dc transfer characteristic. (b) $V_{tail}$-input characteristic. . 25
2.16 A segmented MZM device consists of fourteen lumped HSPM elements and PIN PM as dc phase device. ................................. 28
2.17 Illustration of the segmented MZM with lumped HSPM devices. Each segment has a dedicated push-pull driver. ......................... 29
2.18 Velocity mismatch simulation of NRZ and PAM-4 signaling at 32 G symbol rate (9.8 ps optical delay per segment). ......................... 30
2.19 Deterministic jitter of the ideal PAM-4 signal. .............................. 31
2.20 Eye pattern at 20 Gb/s with a 1 Vpp differential drive for a 5 mm MZM using a 1555 nm wavelength. (a) Compact model simulation result. (b) Measured result in [6]. ................................. 32
2.21 Prototype of a MZM device wire-bonded on PCB. Fiber array is aligned on top of the chip. ............................................ 33
2.22 Eye pattern at 12.5 Gb/s with a 1.8 Vpp differential drive for a 1.1 mm segment MZM in Figure 2.3. (a) Compact model simulation result. (b) Measured result. ........................................... 33

3.1 A conceptual MZM based hybrid silicon photonic link block diagram. 37
3.2 MZM based optical link specification parameters. ............................... 37
3.3 Inverter based optical front-end receiver. ............................................ 38
3.4 PAM-2/4/8 receiver sensitivity based on a 32 Gb/s TIA in 16nm FinFET CMOS process ($i_{n,rms} = 4 \mu A, V_{th} = 20 mV, Z_{TIA} = 58 dB\Omega, \rho = 0.8 A/W$). .................................................. 39

3.5 Latch-based level shifter with example of potential reliability issue at the start of the signal toggling. Voltage at node 3 may jump to $2*VDDH$ or $-VDDL$ so that will stress the gates of INV1 and INV2. . 40

3.6 A block diagram of the MZM segment driver. ................................. 41

3.7 Illustration of the reconfigurable MZM transmitter using segmented serpentine layout style for the proposed driver shown in Figure 3.6. . . 42

3.8 An example of the OE buffer from TSMC digital standard cell library. . 43

3.9 Optical power transfer function of the MZM in a 130nm SOI CMOS process with different lengths (an effective phase shift of $7.58^\circ/mm$ is extracted when operating at 32 Gb/s). . . . . . 44

3.10 MZM ER and IL versus length. .................................................. 45

3.11 $OMAPD$ versus ER with varying input laser power. ................. 46

3.12 The overload current seen from the TIA versus ER. ...................... 47

3.13 Simulated NRZ modulation format eye diagrams with 5 segments and 9 segments, respectively, operating at 32 Gb/s. ......................... 48

3.14 Simulated PAM4 modulation format eye diagrams with 5+9, 4+7, 3+5 segment combinations operating at 64 Gb/s. ......................... 49

4.1 The architecture of the proposed hybrid optoelectronic receiver. ..... 52

4.2 Limiting receiver design specification partition. ............................. 53

4.3 The schematic of the the TIA with photodiode, triple-well NMOS is used for TIA core devices. ................................. 54
4.4 Post layout simulation results for TIA frequency response and group delay at nominal process corner, 40°C. ........................................... 56
4.5 Histogram plot from nominal process Monte Carlo simulation of a 1.2 V nfet biased at maximum $f_T$, 40°C, in IBM 130 nm CMOS process. . 57
4.6 Post layout simulation eye diagram of TIA output at 4 Gb/s. . . . . . . . . 58
4.7 TIA input referred current noise spectrum from post layout simulation. 59
4.8 Bandwidth improvement factor and stage dc gain versus the number of stages for an achievable gain-bandwidth product of 20 GHz ($20 \times 1$ GHz). ................................................................. 61
4.9 Topology of the limiting amplifier. ............................................ 62
4.10 Illustration of dc offset compensation. ..................................... 64
4.11 Schematic of the folded cascode gain-boosted opamp used in the dc offset compensation feedback loop. ...................................... 65
4.12 A 2.5 GHz 150 mV$_{PP}$ sinusoidal signal gets amplified and gets more NRZ-like waveform with edges sharpened, as it travels along the LA chain. ................................................................. 66
4.13 The schematic of the level shifted output buffer. ......................... 67
4.14 Post layout simulated eye diagram of the signal coming out from the output buffer at 4 Gb/s. .................................................. 68
4.15 (a) Chip microphotograph. (b) Chip-on-board bonding to the PCB.
    (c) PCB setup. (d) Test setup in the Lab. .................................. 70
4.16 Block diagram of the test setup used for measuring the eye diagram and BER of the designed receiver. .................................. 71
4.17 Optical eye diagrams of the MZM output measured with PRBS-31 at 4 Gb/s with (a) 2 dBm and (c) 6 dBm laser power, at 5 Gb/s with (b) 2.8 dBm and (d) 6 dBm laser power, wavelength is set at 1550 nm.

4.18 (a)-(c) Electrical eye diagrams of a single-ended output of the limiting receiver measured with PRBS-31 from 4 Gb/s to 5 Gb/s. (d) Oscilloscope mode at 5 Gb/s with PRBS-7 pattern.

4.19 Power consumption breakdown of the limiting receiver.

4.20 BER bathtub measurement of the limiting receiver with PRBS-31 operating from 4 Gb/s to 5 Gb/s.

4.21 Sensitivity plot of the limiting receiver with PRBS-31 operating from 4 Gb/s to 5 Gb/s, BER versus MZM OMA and average power.

5.1 Schematic of the proposed type-II third-order PLL architecture.

5.2 Schematic of the LC VCO. Control bits $C < 2 : 0 >$ control the capacitor bank for discrete frequency coarse tuning, $C0=66$ fF.

5.3 Two different VCO layout examples.

5.4 Layout extracted simulation results of VCO characteristics.

5.5 PLL model with possible noise sources.

5.6 Plot of unity loop bandwidth over zero versus capacitor ratio in the loop filter for 65° phase margin.

5.7 Schematic of the linear phase frequency detector.

5.8 State diagram of the linear phase frequency detector.

5.9 Schematic of charge pump.

5.10 Simulated MOS capacitor characteristics with vary gate voltages at three corners.
5.11 Schematic of tunable resistor............................................................................ 90
5.12 Schematic of CML divider-by-2......................................................................... 91
5.13 Schematic of TSPC divider-by-2......................................................................... 92
5.14 Plots of PLL noise transfer function for each noise source......................... 94
5.15 Phase noise of each noise sources introduced into the PLL......................... 95
5.16 PLL output noise due to individual noise sources........................................ 96
5.17 A micro picture of the PLL................................................................................ 97
5.18 Signal generator, signal analyzer and the prototype PCB............................ 98
5.19 VCO frequency versus capacitor control bits (66 $fF$ incremental) when
the PLL is in the lock state..................................................................................... 99
5.20 PSD of (a) the reference clock at 83 $MHz$ and (b) a PLL/64 output
signal at 166 $MHz$. ............................................................................................... 99
5.21 Phase noise of (a) the reference clock and (b) the PLL/64 signal............. 100
5.22 Comparison of the measured and the simulated phase noises for PLL
and free-running VCO.............................................................................................. 101

6.1 System block diagram of the MZM based PAM-4 transmitter using the
4-channel parallel PRBS......................................................................................... 103
6.2 An $n$-stage PRBS generator with possible $n$ and $k$ combinations (adapted
from [74]).............................................................................................................. 104
6.3 Single-ended version block diagram of the full-rate 4-channel $2^9 - 1$
parallel PRBS........................................................................................................... 107
6.4 Auto-correlation and cross-correlation of the 4-channel PRBS genera-
tor (signal amplitude rescaled to -1 and 1). ......................................................... 108
6.5 Output characteristic comparison of self biased wide swing NMOS cascode current mirror and BJT current mirror with beta helper and emitter degeneration. ........................................... 110
6.6 Schematic of the BJT DFF employed in Figure 6.3. ............... 111
6.7 Schematic of the XOR-merged DFF employed in Figure 6.3. ........ 111
6.8 Schematics of (left) clock buffer employed in Figure 6.3 and (right) bias condition for DFF clock inputs. ................................. 113
6.9 Schematic of output buffer. .............................................. 113
6.10 PRBS start-up process by enabling “set” signal. ................... 114
6.11 Simulated eye diagrams of data pattern at 40 G Baud rate for NRZ, PAM-4/8/16. ......................................................... 115
6.12 Micro photograph of half of the fabricated die. ..................... 116
6.13 Picture of the prototype FR4 PCB. ..................................... 117
6.14 Eye diagrams of PRBS output with the prototype FR4 PCB. ........ 118
6.15 Block diagram of the FFE to process the signal. .................... 118
6.16 Transmission line sample board made of RO4350B material. Total length of the transmission line including SMA footprints is 1.368 inch. 119
6.17 Cross-section of single-ended microstrip, GCPW and differential GCPW transmission lines. ........................................... 120
6.18 Measured S-parameters of the sample transmission lines in Figure 6.16. 121
6.19 TDR Simulated characteristic impedance with 25 ps rise time for a 10 MHz to 20 GHz GCPW measured S-parameters ............... 122
6.20 Picture of the prototype Rogers PCB. ................................. 123
6.21 Eye diagrams of PRBS output with prototype Rogers PCB. ........ 123
6.22 System integration of electrical die and photonics die via side-by-side wire bonding (Bond wires drawn not to scale).
LIST OF ABBREVIATIONS

- ER  Extinction Ratio
- FEC  Feedforward Error Correction
- LA  Limiting Amplifier
- LFSR  Linear Feedback Shift Register
- MRM  Micro Ring Modulator
- MZM  Mach-Zehnder Modulator
- NRZ  Non-Return-to-Zero
- OMA  Optical Modulation Amplitude
- PAM  Pulse Amplitude Modulation
- PLL  Phase-Locked Loops
- PPG  Pulse Pattern Generator
- PRBS  Pseudo-Random Bit Sequence
- PVT  Process, Voltage and Temperature
- TIA  Trans-Impedance Amplifier
CHAPTER 1

INTRODUCTION

High-speed, low-power, small form factor interconnects are increasingly being demanded in today’s large-scale computing and switching system applications. For example, contemporary data centers can use 4-lane 25 Gb/s optical transceivers to achieve 100 Gb/s data transfer for different distances depending on the cable options. Ethernet and optical transport network (OTN) protocols have put 400G physical layer technology development on the agenda in several emerging industry standards, such as IEEE 802.3bs [1], ITU-T G.709 [2], MSA [3] and OIF [4], to upgrade from the 100G standard. Advanced modulation scheme such as the pulse amplitude modulation 4-level signaling (PAM-4) is adopted in the newer standard to achieve higher throughput in next-generation designs.

Electrical I/Os have reached a bottleneck in efficiently raising the speed of the data transfer up to more than 28 Gb/s per lane data communication. This is true even for an ultra-short reach, due to the speed limitations of the switching devices and the requirements of complex circuits and systems used for compensating the electrical channel loss and dispersion. Taking the actual bias condition and parasitics into consideration, the maximum achievable operating frequency of a reliable system is usually less than $1/10^{th}$ to $1/20^{th}$ of the device’s cutoff frequency $f_T$. As shown in Figure 1.1, the $f_T$ versus current density of a single device characterization from
four generations of IBM processes are compared. As demonstrated in the plot, it is challenging to achieve > 50 Gb/s/lane data transfer even using the IBM 130 nm SiGe BiCMOS process.

![Figure 1.1: Simulated $f_T$ versus current density for four generations of IBM processes. The width for HBT and MOS are 5 $\mu$m and 15 $\mu$m, respectively. Minimum length is used.](image)

Recently, silicon-based photonic integration together with an advanced modulation format has emerged as a promising solution to meet these I/O design requirements. It has opened wider opportunities for circuit designers to exploit photonic devices for high-speed signal processing, and also posed challenges in electrical-optical co-design, packaging and testing. For a given lithographic generation, e.g. a monolithic photonic integrated circuit (PIC) transceiver in 130 nm SOI CMOS process, the achievable speed of the system is limited by the CMOS circuitry at about 10
Gb/s/lane [5]. While the Mach-Zehnder Modulator (MZM) fabricated in a silicon photonic platform also in 130 nm SOI CMOS being driven by external test equipment can achieve up to 40 Gb/s [6]. Comparable high-speed electrical devices (MOS or HBT) are not available for integration on the same die in the commercial silicon photonic process platform, therefore designers have to rely on an advanced CMOS or BiCMOS process to design faster electrical circuitry, and then interface them with the photonic chip. Packaging challenges involved with bonding options for the electrical circuits die and silicon photonic die are critical to the overall signal integrity, as well as optical packaging for high volume production.

1.1 Motivation

Designing CMOS or BiCMOS circuits to interface with silicon photonic devices requires a high level of optoelectronic integration. In the past, silicon photonic devices were typically designed by engineers trained in the field of optics, with specialized device simulators and field solvers, that are not at all compatible with the traditional IC design flow. There are commercially available tools for photonic device and circuit simulation such as the Lumerical computational solutions [7]. Recently, Lumerical demonstrated an optical system-level design tool called Lumerical Interconnect [7]. However, such a tool is specific to the photonic integrated circuits (PIC) simulation and cannot be employed for hybrid optoelectronic system simulations since a SPICE-like solver is required for transistor-level circuit simulations. Thus, IC designers who design circuits for integration with silicon photonics, need compact photonic device models which encapsulate both electrical and optical properties for the Electrical-Optical (EO) hybrid circuit simulation. Verilog-A, a hardware descrip-
tion language developed for behavioral modeling of analog circuits, is a good candidate for addressing this essential need [8]. There has been a general lack of such compact models for integrating silicon photonic devices with CMOS electronics. Lacking such models hinders EO co-design simulation and link budget analysis.

1.2 Contributions

This dissertation focuses on the design, analysis and hardware implementation of high-speed integrated circuits for optical interconnects. Specifically it addresses:

- How to bridge the gap between electrical circuit design and optical device design by creating compact optical device models using Verilog-A.

- Using systematic optical link power budget analysis, which can optimize the system-level specification for NRZ and PAM signaling format. This will be used to further guide the circuit-level design and energy-efficient optical system development.

- The design of high-speed circuit blocks for optical receivers and transmitters.

- A high frequency PCB design for maintaining signal integrity.

1.3 Dissertation Organization

The rest of the dissertation covers MZM device design, characterization and behavioral modeling, system-level optical link budget analysis and two electrical chips for receiver and transmitter, respectively. Chapter 2 characterizes the optical components fabricated in a 130 nm SOI CMOS process platform along with the
compact behavioral modeling. Chapter 3 presents a MZM based link power budget analysis, and proposes a NRZ/PAM-4 reconfigurable optical transmitter based on voltage mode drivers with a segmented MZM device. Chapter 4 showcases a hybrid optoelectronic limiting receiver design by using the IBM 130 nm CMOS process and an InGaAs/InP PIN photodiode device. Chapter 5 and Chapter 6 detail a high-speed type-II third-order charge-pump PLL design and a full-rate, 4-channel $2^9 - 1$ length parallel PRBS design, respectively. The PRBS is clocked by the PLL designed on the same chip in the IBM 130 nm SiGe BiCMOS process. Chapter 7 discusses future directions for this research and concludes the dissertation.
CHAPTER 2

MZM DEVICE CHARACTERIZATION AND
BEHAVIORAL MODELING

Optical devices such as lasers, modulators and detectors are typically designed by engineers trained in optics background, with specialized device simulators and field solvers, that are not at all compatible with the traditional IC design flow. There are commercially available tools for optical device and circuit simulations such as Lumerical computational solutions [7], PhoeniX software [9], COMSOL [10]. Recent progress has been made with optical system-level design tools such as Lumerical’s Interconnect. However, such tools are specific to photonic integrated circuits (PIC) simulation and cannot be employed for hybrid optoelectronic system simulation, where a SPICE-like solver is required for transistor-level circuit simulations. Photonic process design kit (PDK) provided by contemporary commercially accessible silicon photonic platform foundry services, such as IME in Singapore [11] and ePIXfab from Europractice [12], also don’t provide models for co-simulation with electronic process platforms like standard CMOS technologies. Thus, IC designers who design circuits for integration with optical devices, need compact device models which encapsulate both electrical as well as optical properties for the Electrical-Optical (EO) hybrid circuit simulation. Verilog-A, a hardware description language developed for behavioral modeling of analog circuits, is a good candidate for addressing this essential need [8].
This chapter focuses on modeling of one type of electrooptic modulators, which is called Mach-Zehnder modulator (MZM) [6]. MZM is by far the most reliable indirect optical modulator in silicon photonic platform, although its footprint is large comparing to Micro-Ring modulator (MRM) [13]. Thus it requires relatively more power for modulator drivers. MZM device working mechanism will be explained in the first place followed by Verilog-A model developing for behavioral simulation. Driving options will be discussed based on lumped element modulator and traveling-wave modulator. The performance of NRZ and PAM-4 signaling method will be analyzed.

2.1 MZM Device Mechanism

A Mach-Zehnder modulator device consists of optical components including high-speed phase modulator (HSPM), thermal phase modulator (TPM) or p-i-n phase modulator (PIN PM) and grating couplers. As illustrated in the cross-section in Figure 2.1, the HSPM is a lateral p-n diode ridge waveguide which is the key optical component that determines the operating speed of MZM device. The doped ridge waveguide consists of lightly doped p-n junction and heavily doped p++ and n++ implant for contact [6, 14]. Intermediate density p+ and n+ regions can be added in between for reducing series resistance without inducing excessive optical loss [15]. Dynamic optical phase shift modulation is induced by applying reverse bias voltage on p-n diode to create a change in refractive index at the depletion region. DC optical phase shift can be introduced from the asymmetric waveguide or TPM or PIN PM. The TPM is relying on resistive heating to induce phase shift of a length of waveguide, while the PIN PM uses carrier injection to create a change in refractive index. The grating couplers serve as optical IOs which need to be placed some
distance away from the core device, to make sure there is no collision between the alignment and bondwire or the chip package. With the continuous-wave light source being split evenly into the two HSPM arms, when an electrical field forced by the reverse-biased voltage applied on each of the HSPM arms inducing a change in the carrier density, which, in turn induces a phase shift as the optical wave propagates in the arms. Depending on the relative E-field polarity applied on the HSPM arms, the two paths of lights interfere either constructively or destructively when they combine together at the output. The phase modulation is converted into intensity modulation at the combiner. Without considering the insertion loss of all optical components, the optical power transfer function ($T_{opt}$) of the MZM can be derived as shown in Equation 2.1 [14, 16]. Here $P_{in}$ and $P_{out}$ are the input laser power and MZM output power, respectively. Here, $\phi_1$ and $\phi_2$ are the absolute phases of the two arms (HSPM+PIN PM). However, in reality every optical components will introduce insertion loss. For accurate modeling, all these non ideal effects need to be considered into the model. An accurate $T_{opt}$ expression is given in Equation 2.2 [17]. In this model, $k$ is the mismatch factor between the two arms which will deviate from 0.5 in practice. The two branches of optical power before entering the combiner are represented by $P_1$ and $P_2$ and represented by Equation 2.3 and 2.4 in dB scale, respectively. The two most significant insertion losses are introduced by the grating coupler ($IL_{GC}$) and the HSPM ($IL_{HSPM}$), respectively. Losses introduced by other optical elements such as the Y-junctions ($IL_{Y-junc.}$), silicon waveguides ($IL_{WG}$) and the low-speed phase modulator also need to be included in the MZM model. What’s more, the insertion loss for phase modulators can be partitioned into static and dynamic parts. $T_{opt}$ derived from models with and without considering insertion loss versus phase difference are plotted in Figure 2.2. It can be observed that it will result a significantly
different extinction ratio (ER) and average power which will impact the optical link analysis. Further, the dc phase operating point $\phi_{dc}$ should be set at the quadrature bias point ($90^\circ$) to achieve symmetric modulation. This can be achieved by either using an extra length (about 100 $\mu$m) of waveguide or PIN phase modulator for one of the arms. In order to save power consumption, the asymmetry length waveguide and PIN phase modulator can be used together.

$$T_{opt} = \left| \frac{P_{out}}{P_{in}} \right|^2 = \frac{1 + \cos (\phi_1 - \phi_2)}{2}$$ (2.1)

$$T_{opt} = \frac{P_1 k + P_2 (1 - k) + 2 \sqrt{P_1 P_2 k (1 - k) \cos (\phi_1 - \phi_2)}}{P_{in} 10^{\frac{IL_{junc} + IL_{WG} + IL_{GC}}{10}}}$$ (2.2)

$$P_{1|dBm} = P_{m|dBm} - IL_{GC} - IL_{WG} + 10 \log_{10} k - IL_{HSPM} - IL_{LSPM}$$ (2.3)
\[ P_{2|dBm} = P_{in|dBm} - IL_{GC} - IL_{WG} + 10\log_{10}(1 - k) - IL_{HSPM} - IL_{LSPM} \quad (2.4) \]

**Figure 2.2:** Optical power transmission characteristic of the MZM as a function of the phase difference with and without considering the insertion loss introduced by the optical components.

The MZM figure of merit, \( V_{\pi L_{\pi}} \) in the units of \( V\cdot cm \), is defined as the product of the driver voltage height and the MZM length to cause a phase shift of \( \pi \). This can be expressed in Equation 2.5. In which, \( \Delta \phi_{MZM} \) is the actual phase shift due to the driver voltage of \( V_{drv} \) applied on a MZM device length of \( L_{MZM} \).

\[ V_{\pi L_{\pi}} = \frac{\pi}{\Delta \phi_{MZM}} V_{drv} L_{MZM} \quad (2.5) \]

Figure 2.3 demonstrates a layout of the PAM-4 traveling-wave MZM device. A 200 \( \mu m \) long thermal phase modulator is used in this case which requires about 8-10 mA current for quadrature point phase bias (90°). Ideally, the transmission spectrum should be constant flat for symmetric MZM without adding any dc phase bias. However, mismatches always exist between the two branches so that the measured
MZM transmission spectrum for symmetric MZM has similar characteristic to the asymmetric MZM in [18, 6, 19, 20], but has less voltage dependency. This is plotted in Figure 2.4. The free spectrum range (FSR) wavelength is about 6 nm.

Figure 2.3: Layout of a PAM4 MZM device.

Figure 2.4: Measured transmission spectrum characteristic of the MZM device shown in Figure 2.3.
2.2 Modeling for MZM

Verilog-A language is chosen for the model development for its advantage to do co-simulation with transistor-level circuitry in Cadence design tool platform. However, because Verilog-A doesn’t support complex numbers, the power intensity and phase are processed separately. In order to display the units for optical power (OptPower) and optical phase (OptPhase) in the unit of Watts and radians, respectively, their natures should be added explicitly as a Verilog-A discipline [21]. The optical discipline is given in Appendix A.1. With optical power and phase discipline and nature defined, an optical source block can be made as a general cell for converting the voltage to the optical power and optical phase units. The corresponding Verilog-A code is given in Appendix A.2. The MZM model shown in Figure 2.1 can be partitioned into four sub-blocks in silicon photonic process platform, they are grating coupler, un-doped waveguide used for optical routing and Y-junction, HSPM and PIN PM. Each optical components will be presented in the following.

2.2.1 Grating Coupler

Grating couplers bring the optical signal into or out of the optical chip, typically externally interfacing to a fiber array as shown in Figure 2.5. The other interface of the grating coupler is in the plane of the wafer which connects to a 500 nm wide rib (fully etched) waveguide to the rest of the optical devices. There are two types of grating couplers which are single-polarization grating coupler (SPGC) and polarization-splitting grating coupler (PSGC). As a rule of thumb, for IME TE SPGC aligned with the 22° lid polish angle fiber array, the aligned fiber array is about 35 \mu m above the die and the SPGC is about 250 \mu m behind the front edge of the fiber array.
The loss profile in decibel scale can be expressed in Equation 2.6 with the parameters of peak loss \( \text{Loss}_{\text{peak}} \), peak lambda \( \lambda_{\text{peak}} \) and -3 dB wavelength \( \lambda_{-3dB} \). Figure 2.6 shows four sets of SPGC measured data along with a Verilog-A model simulation result. The simulated result matches well with the measured result. The typical peak loss is measured about 4.5-6.5 dB per SPGC at different wavelengths depending on the different fiber array tilt angle and the actual height to die.

\[
\text{Loss} = -\text{Loss}_{\text{peak}} - \left( \frac{\lambda - \lambda_{\text{peak}}}{\lambda_{-3dB} / (2\sqrt{3})} \right)^2
\]  

(2.6)

Figure 2.5: Fiber array alignment to grating couplers on the silicon photonic die fabricated as a part of this research.
Figure 2.6: Loss profiles of IME TE 1550 nm grating coupler tested with 22° polished angle fiber array.
2.2.2 Silicon Waveguide

Si waveguide in SOI technology has been made possible due to the high contrast between silicon and silicon dioxide. As optical signal routing, the waveguide introduces negligible insertion loss, but it does introduce phase shift and propagation delay which can’t be ignored in MZM design. Verilog-A compact model can capture the length and temperature dependent phase shift as shown in Equation 2.7, as well as insertion loss being customized for single-mode, multi-mode, different radius bends and tapers. In Equation 2.7, $n_{Si}$ is the Silicon’s effective refractive index and the $floor$ function helps to remove the integer multiples of $2\pi$. It also includes thermo-optic coefficient of the refractive index ($ndT$), which is about $1.86 \times 10^{-4}/^\circ C$ [22]. The propagation delay is the waveguide length over the group velocity ($v_g$) which can be expressed in Equation 2.8. In which $c$ and $n_g$ are the velocity of light in vacuum and group index of silicon, respectively. For the purpose of simplicity and without losing accuracy, the refractive index and group index can be approximated to constant values for narrow range wavelength operation.

\[
\Delta \phi = 2\pi \left( \frac{(n_{Si} - ndT \cdot (T - T_0)) L}{\lambda} - floor \left( \frac{(n_{Si} - ndT \cdot (T - T_0)) L}{\lambda} \right) \right) \quad (2.7)
\]

\[
t_d = \frac{L}{v_g} = \frac{Ln_g}{c} \quad (2.8)
\]

2.2.3 High-Speed Phase Modulator

The HSPM model is decomposed into electronic model and optical model as shown in Figure 2.7. The electronic model can be treated as either S-parameters of a short
section of T-line or a distributed RLGC network in which the electrical propagation delay is designed to match the optical propagation delay. The latter is used here in which $R_{tl}$ is a frequency-dependent metal skin resistance. It changes from 2.5 $\Omega$/mm at very low frequencies up to 10 $\Omega$/mm at 40 GHz. However, it is kept as a constant (e.g. 5 $\Omega$/mm at 10 GHz) in the model for simplicity and without losing accuracy at the same time. Moreover, the value shown in the component description format (CDF) parameter can be overwritten for other frequencies by editing the object properties of the symbol in Cadence, e.g., shown in Figure 2.8. $L_{tl}$ and $C_{tl}$ are the inductance and capacitance between the metal traces, respectively, which are also frequency-dependent [15]. The $p$-$n$ diode can be modeled as parasitic resistance $R_{pn}$ and depletion capacitance $C_{pn}$. $C_{pn}$ characteristic varies with the applied reverse-bias voltage as shown in Figure 2.10 [6]. The RLGC parameters used for the modeling at 10 GHz are listed in Table 2.1. A rough estimation of the microwave velocity ($\frac{1}{\sqrt{L_{tl}(C_{tl}+C_{pn})}}$) and optical group velocity ($\frac{3\times10^8 m/s}{n_g}$) are approximately $8.5 \times 10^7 m/s$ and $7.5 \times 10^7 m/s$, respectively. We can observe that the two velocities are roughly matched. Velocity mismatch effect will be analyzed in detail in later section. The distributed elements can be concatenated to form the MZM arm, the output phase of each stage reflects the voltage-dependent phase-change induced by the current stage, added to the phase of the next stage. The segment length of each cell should be set $< \frac{1}{10} \frac{3\times10^8 m/s}{n_g\times5\times f_{Nyquist}}$, e.g., for operating at 10 Gb/s, the segment length of the cell should be at least less than 300 $\mu$m. Using small cell length will result in a large number of segments in cascade for a long MZM arm. However, there is a convenient way to do series connection as illustrated in Figure 2.9 for a thousand 3 $\mu$m cells to form a 3 mm length arm.
Figure 2.7: Decompose HSPM model into electronic model and photonic model.

Figure 2.8: The object properties of a Verilog-A HSPM cell.
Figure 2.9: Concatenate 1000 HSPM cells in Cadence schematic.

### Table 2.1: RLG parameters at 10 GHz.

<table>
<thead>
<tr>
<th>$R_{tl}$</th>
<th>$L_{tl}$</th>
<th>$R_{pn}$</th>
<th>$C_{tl}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>5 Ω/mm</td>
<td>450 pH/mm</td>
<td>15 Ω · mm</td>
<td>123 fF/mm</td>
</tr>
</tbody>
</table>

For the optical model, physical model equations for the voltage dependent dynamic phase shift and depletion junction capacitance are given in Equation 2.9 and Equation 2.10, respectively [8, 23]. The description of the parameters are detailed in Table 2.2. Simulated results using Verilog-A based on the physical models for relative phase shift and depletion junction capacitance for a 5 mm long HSPM are well matched with the measured results given in [6]. They are plotted in Figure 2.10(a) and (b), respectively. The HSPM has static insertion loss and dynamic insertion loss due to the doped waveguide and the applied modulation voltage. The static insertion loss is about 0.63 dB/mm which will contribute significant amount of loss when the device length is long. The scale of the dynamic insertion loss is 1/10 smaller of static insertion loss and will be decreased with increasing the applied voltage.

\[
\Delta \phi = c_{V_{2 Ph}} \cdot L \cdot \frac{\pi}{180} + (k \cdot V_R)^{m_{ph}}
\]  

\[C_d = C_0 \left(1 + \frac{V_R}{V_{bi}}\right)^{-m_{cap}} + C_p\]  

(2.9)  

(2.10)
Table 2.2: Parameter description used in Equation 2.9 and Equation 2.10, corresponding values used for curve fitting in Figure 2.10 are listed.

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Values</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>$c_{v2ph}$</td>
<td>$-7^\circ / (V \cdot mm)$</td>
<td>Voltage to phase multiplication coeff.</td>
</tr>
<tr>
<td>$L$</td>
<td>5 mm</td>
<td>Length of the HSPM</td>
</tr>
<tr>
<td>$k$</td>
<td>1.2</td>
<td>Voltage to phase fitting coeff.</td>
</tr>
<tr>
<td>$m_{ph}$</td>
<td>0.8</td>
<td>Voltage to phase power coeff.</td>
</tr>
<tr>
<td>$C_0$</td>
<td>0.22 fF/µm</td>
<td>Zero-bias junction capacitance</td>
</tr>
<tr>
<td>$V_{bi}$</td>
<td>1.5 V</td>
<td>Built-in voltage</td>
</tr>
<tr>
<td>$m_{cap}$</td>
<td>0.33</td>
<td>Voltage to capacitance power coeff.</td>
</tr>
<tr>
<td>$C_p$</td>
<td>0</td>
<td>Parasitic capacitance</td>
</tr>
</tbody>
</table>

Figure 2.10: Simulated and measured results comparison of (a) the change of relative phase shift and (b) pn junction capacitance as a function of the reverse-biased voltage on a 5 mm long HSPM.

2.2.4 Low-Speed Phase Modulator

The low-speed phase modulator (LSPM) is used for the purpose of tuning the device’s optical properties such as the optical phase for MZM device quadrature phase bias and resonance wavelength tuning for MRM. Two commonly used LSPM in silicon photonic platform are Thermal phase modulators (TPM) and p-i-n phase modulators (PIN PM). Their cross-sections are illustrated in Figure 2.11. The property of using...
less current to induce larger phase shift range is desirable for low power design.

![Figure 2.11: Cross section of (a) thermal phase modulator and (b) p-i-n phase modulator (not drawn to scale).](image)

### 2.2.4.1 Thermal Phase Modulator

The thermal phase modulator (TPM) is a doped Si waveguide as shown in Figure 2.11 (a). It relies on resistive heating to induce phase shift of a length of Si waveguide. The additional phase can be increased with input current as expressed in Equation 2.11. In which \( \eta \) is the tuning efficiency in the units of radians/mW. The measured dc characteristic of a 200 \( \mu m \) long p-doped TPM which features a cross-section denoted in Figure 2.11 (a) is plotted in Figure 2.12. It can be observed that its resistance is not a monotonic relationship with respect to the power. The corresponding \( \eta \) is about 0.11 radians/mW. Using TPM for quadrature phase bias is not a low-power solution as it requires more current to achieve the phase shift when compared to the PIN PM.

\[
\text{phase} = \eta I_{in}^2 R 
\]  
(2.11)
2.2.4.2 PIN Phase Modulator

The p-i-n phase modulator, as shown in Figure 2.11 (b), is used in forward-biased condition at carrier injection mode to create the change of refractive index. They are usually used to provide low-speed optical phase modulation, for instance control the quadrature dc phase bias point in MZM device. Like HSPM, it also has static insertion loss and dynamic insertion loss.

2.3 EO Co-Design Consideration

Trade-off between driver voltage swing and device length is key to determine the driver scheme. MZM device and driver circuits need to be co-designed. Differential drive or differential push-pull drive on both of the MZM arms can reduce the the arm
length thus is a preferred option for on-chip driver design, although mismatches in differential drive will introduce chirping effects. However, signal chirping techniques can be useful in long-haul optical transmission [24] which won’t be discussed here. A flowchart for the CMOS photonic design methodology is shown in Figure 3.2.

![Flowchart for the CMOS photonic design methodology](image)

**Figure 2.13:** Flowchart for the MZM-based transmitter design.

### 2.3.1 Current-Mode Drive

As the operating frequency increases, the electrodes of the long MZM device should be treated as transmission line when the arm length has a physical dimension comparable to 1/10 of the signal wavelength. The wavelength ($\lambda$) can be calculated with Equation 2.12, in which, $f$ and $\varepsilon_{eff}$ are the operating frequency and the effective dielectric of the material, respectively.
Since MZM electrodes can be designed as on-chip transmission line according to the back-end of the process line metallization specification, current-mode driver is a natural fit to drive the traveling-wave MZM device at high-speed data rates. It is also required to provide enough voltage swing [25][26]. As an example, IBM 130 nm CMOS process which features a 1.2 V core device and 2.5 V I/O device, which maximum operating voltage is 1.6 V and 2.7 V, is employed for implementing the CML driver with 1.2 V single-ended swing as the schematic shown in Figure 2.14. For speed consideration, the differential pair (diff-pair) should use 1.2 V core devices. For reliability issue, 2.5 V I/O devices (M3a, M3b) have to be cascoded on 1.2 V devices as shown in Figure 2.14, at the sacrifice of speed. Transistor sizing, bias scheme, and parasitic introduced by the pad and bond wire are all critical design considerations for high-speed circuit design.

In order to efficiently use the bias current to obtain the desired voltage swing, the input pair (ML1,2 and MR1,2 in Figure 2.14) of the CML driver are operating in the large-signal regime. Three operating regions of the input pair are illustrated along with the dc transfer characteristic as shown in the Figure 2.15 (a). The input pair are both in saturation in region II. In region I and III, one transistor of the input pair will enter sub-threshold while the other one will either stay in saturation, or in triode if its input amplitude is further increased. The rise-time of the output voltage is mainly determined by the RC time constant. This delay depends on the charging of the load capacitor by the resistors to the supply rail, and thus the total capacitance contributed by the current and the next stage should be minimized.
Figure 2.14: Schematic of the NRZ TX circuit to drive the MZM.

The fall-time of the output is contributed by discharging the load capacitor during which the transistor transitions from sub-threshold region to saturation region (It will enter triode region until $V_g > V_d + V_{TH}$ when the amplitude is large), with the discharge current reaching close to the tail current. Here, output slew-rate limitation is alleviated by using a large tail current. The slope in the region II can be increased by reducing the overdrive voltage of the input pair [27], this will also help to satisfy Equation 2.13 to maintain the tail current source in saturation, as is illustrated in Figure 2.15 (b).

$$V_{pmin} = V_{cm} - V_{gs,M_{R,L}} |\frac{I}{4} > V_{dsat,Ms}$$  \(2.13\)
Open-drain CML with single termination of 50 Ω at the far-end of the TWMZM is chosen for power saving purpose if the far-end can be perfectly matched. Indeed, this method can only be used for frequencies up to a few gigahertz since the matching at the far-end can hardly be made perfect [28]. Active back termination can be used to save power [29, 30]. A 2.5 mm long bondwire is adopted to introduce enough parasitic inductance for series peaking. However, the series peaking only works for open drain CML [31]. A headroom of 250 mV is chosen to satisfy the $V_{dsat}$ of $M_s$. The size of $M_s$ is meant to be large due to the requirement of large bias current. Non-ideal current source dc characteristic with the displacement current spiking induced by the parasitic capacitance during the fast signal transition need to be taken into account. In order to efficiently utilize the tail current to achieve the desired 1.2 V voltage swing across the MZM arms, a minimum 0.4 V single-ended amplitude with a common-mode voltage of 1 V is required for the diff-pair to be switched on and off to steer the current into the resistive load. Sizing the diff-pair is a trade-off between its overdrive voltage and the maximum allowable parasitic capacitance being introduced. In order to carry the
desired 24 mA current capability, it would result in a relatively large size for the input pair and the cascoded devices $M3$. This is detrimental to high-speed performance. However, the size of $M3$ can’t be too small due to the ESD considerations. Thus, there is a trade-off between TX speed and ESD tolerance. Explicit capacitor is needed for the node $V_{cas}$ to minimize the signal feed-through due to the parasitic capacitance of $C_{dg,M3}$.

Since the MZM driver consumes large current (24mA), the resulting diff-pair size is large, thus exhibiting large input capacitance. A predriver stage is therefore necessary [28] to drive the output stage with the required swing and suitably fast transitions. This requires the supply voltage of the predriver to be 1.4 V. Since the gate capacitance of $M_{R2}$ is about 33 fF and suppose the $V_{outp1}$ node has 15 fF parasitic capacitance, including the drain capacitance of $M_{R1}$, it requires the load resistance $R_L$ to be smaller than 95 $\Omega$ to keep the rise-time less than 0.125 UI (unit interval). Consequently, a 70 $\Omega$ load resistance with 11.4 mA tail current is chosen for the predriver. It also needs a minimum 0.4 V amplitude with a common-mode voltage of 1 V for the predriver diff-pair to be efficiently switched on and off. The size of the prominent n-channel MOSFETs and the resistor values are annotated in Figure 2.14.

It is worth noting that, in order to pass the lowest frequency component at certain data rate with certain length of the PRBS pattern, the on-chip ac coupling components $C_b$ and $R_b$ need to be chosen large enough [32]. DC coupling or use discrete bias-T device for testing is recommended to save the silicon area occupied by passive RC components.
2.3.2 Voltage-Mode Drive

Voltage-mode driver is suitable to drive lumped-element load. Circuit design techniques for voltage-mode driver are required to provide enough voltage swing as well as high-speed data rates [33][34]. For a lumped-element MZM device, inverter based drivers are the better option because it doesn’t consume static current dissipation and precludes the need of termination resistor for impedance matching. Shorter length MZM devices feature high $V_\pi L_\pi$ are highly desired for inverter based driver[35]. A long MZM which consists of multiple HSPM segments arranged in serpentine style as shown in Figure 2.16. Each segment is 500 $\mu$m long that can be treated as a lumped-element. It can be configured as either NRZ or PAM-4 modulation by controlling the drivers. For this application, flip-chip or CuPillar bonding options feature extremely small parasitic inductance are necessary for the inverter based driver integration. Another main challenge for multiple segments lumped-element MZM transmitter design is precision delay cells between every two consecutive segments for velocity matching are required [36].

2.3.3 Velocity Mismatch

Microwave propagation delay must be matched to the optical propagation delay. Otherwise, the bandwidth of the optical link will be degraded, especially when the device is operating at higher data rate. As mentioned before the precision delay cells are required for multiple segments lumped-element MZM device. While the electrode design for traveling-wave MZM velocity matching is explained in Section 2.2.3. This phenomenon is studied with electrical-optical (EO) behavioral simulation for a multi-segment lumped-element MZM device similar to Figure 2.16. A simplified
The length of the waveguide can be tuned to meet the delay cell design spec.

Figure 2.16: A segmented MZM device consists of fourteen lumped HSPM elements and PIN PM as dc phase device.

schematic is illustrated in Figure 2.17. By varying the time delay of the delay cell with an offset time, defined by $t_{os}$, with respect to the optical propagation delay. In this example, the optical propagation delay is set to 9.8 ps. And the MZM device is modulated with 32 Gb/s electrical signal. As evident in the NRZ and PAM-4 eye diagrams, which are plotted in Fig. 2.18, by varying $t_{os}$ from 0.5 ps to 2 ps, large delay mismatch directly impacts the overall bandwidth in a considerable manner. This is especially critical for PAM-4 signaling format, the delay mismatch can cause skew for top, middle and bottom eye patterns and more jitter. In other words, the NRZ signaling performance affected by velocity mismatch is equivalent to the effect caused by slow rise and fall time of the modulation signal. PAM-4 signaling has an additional
intrinsic deterministic jitter comparing to NRZ signaling. This can be observed from Figure 2.19. With a 10 ps 10%-90% rise and fall time 10 Gb/s bit rate, PAM-4 signal has 5.66 ps deterministic jitter due to the cross point of certain level transitions. In certain practical applications, forward-error-correction (FEC) with DSP techniques are required for PAM-4 signaling modulation due to more jitter and less SNR than NRZ signaling modulation [37].

Figure 2.17: Illustration of the segmented MZM with lumped HSPM devices. Each segment has a dedicated push-pull driver.
Figure 2.18: Velocity mismatch simulation of NRZ and PAM-4 signaling at 32 G symbol rate (9.8 ps optical delay per segment).
2.4 MZM Measurement and Behavioral Simulation

The developed MZM model is intended to be used for hybrid circuit-level simulations. The simulated 20 Gb/s optical eye pattern of a 5 mm MZM is shown in Figure 2.20 (a), and it closely predicts the optical power levels and extinction ratio (ER) in the eye pattern shown in Figure 2.20 (b) obtained from RF wafer probing in [6].
Figure 2.20: Eye pattern at 20 Gb/s with a 1 Vpp differential drive for a 5 mm MZM using a 1555 nm wavelength. (a) Compact model simulation result. (b) Measured result in [6].

The MZM device illustrated in Figure 2.3 was fabricated in IME SOI CMOS process and chip-on-board (COB) wire bonded on a PCB. The test platform for testing the prototype is shown in Figure 2.21. The high-speed electrical modulation signal was added through 27 GHz bandwidth end launch SMA connectors with impedance controlled CPW transmission lines. High frequency (up to 20 GHz), thin film surface mount resistors with 50 Ω and 0402 footprint are used for MZM far-end termination. The PAM-4 MZM device has two pairs of HSPM arms, which can be individually modulated to get NRZ signaling. Figure 2.22 compares the Verilog-A model simulated result with the measured result only when the shorter arms are modulated at 12.5 Gb/s. Jitter in the random pulse generator and relative intensity noise in the laser source are also included in the simulation stimulus to make the simulated result more close to the measured result. The Anritsu MP17763C we have in the lab has a 2.5 ps rms jitter (14 ps peak-to-peak jitter) which will significantly contribute to the jitter of the MZM optical output.
Figure 2.21: Prototype of a MZM device wire-bonded on PCB. Fiber array is aligned on top of the chip.

Figure 2.22: Eye pattern at 12.5 Gb/s with a 1.8 Vpp differential drive for a 1.1 mm segment MZM in Figure 2.3. (a) Compact model simulation result. (b) Measured result.
2.5 Summary

A traveling-wave MZM device is fabricated and characterized. A library that consists of optical components behavioral models is created which enables the co-simulation of a silicon photonic MZM device and the CMOS transistors in Cadence Spectre. Current-mode and voltage-mode driver schemes with respect to multi-segment lumped-element and traveling-wave MZM devices are discussed. Velocity mismatch effect is emphasized with compact model behavioral simulation. The power consumption of voltage-mode driving scheme may comparable or even exceed the power consumption of current-mode driving scheme as the data rate getting higher and number of the lumped elements getting larger due to the $CV^2f$ relationship.

Behavioral simulation using the developed compact model for a traveling-wave MZM device being tested with wafer probing and a traveling-wave MZM device which is wire-bonded on a PCB prototype being tested with RF cables at 20 Gb/s and 12.5 Gb/s are demonstrated, respectively. Simulation results are well matched with the measurement results. Optical component behavioral modeling is an indispensable part along with the creation of layout standard cell for optical process design kit (PDK) development. With the help of this optical library, IC designers who have less optical device design background can accurately simulate the optoelectronic system, thus the design risk and time to market can be significantly reduced.
CHAPTER 3

A RECONFIGURABLE MZM BASED OPTICAL LINK BUDGET ANALYSIS

Hybrid integration of CMOS chip with silicon photonic devices has emerged as a promising cost-effective solution to meet the ever increasing data transfer bandwidth requirement in the computing system. MZM device is by far the most reliable indirect optical modulator in silicon photonic platform, though its footprint is large and thus requires relatively more power for the drivers. Escalating the amplitude modulation scheme from NRZ (PAM-2) to PAM-4, even to increase the data rate to PAM-8 requires an analytic model to estimate the trade-off among the electrical-to-optical (EO) channel loss over the sacrificed signal-to-noise ratio, the circuit design complexity, the chip area and the power consumption. It is imperative to find a methodology to evaluate the link topology at the system-level and guide the transistor-level design of the transceiver circuits.

This chapter proposes a methodology which helps to find the optimized system-level performance specifications in terms of the improved link energy efficiency. CMOS drivers designed in TSMC 16 nm FinFET CMOS process will be flip-chip bonded with CuPillar technology to a segmented serpentine style MZM device in 130 nm SOI CMOS photonic process. With the help of output enable function in the CMOS driver circuit, the effective length of the MZM device can be reconfigured. NRZ and PAM-4
modulation with different extinction ratios (ER) can be achieved with the one design solution. Figure 3.1 illustrates a conceptual MZM-based silicon photonic link block diagram. The Germanium homo-junction photodetector (PD) is used at the receiver (RX) side features a small parasitic capacitor and thus can realize more than 25 GHz bandwidth with less than 2 V bias voltage, which has the best case responsivity ($\rho$) of 0.9 A/W. The ER[37] of the MZM at the transmitter (TX) side is determined by the MZM device insertion loss and modulation efficiency. The TX has to meet the minimum OMA requirement being derived at the RX side, considering the signal attenuation due to the PD coupling loss. However, the larger the received power at the PD, the more input current can be seen at the input of the transimpedance amplifier (TIA). Excessive amount of current will overload the TIA thus degrading its sensitivity. Therefore, an optimized optical link requires a suitable ER at the TX side to achieve the target BER performance with the least power consumption. Figure 3.2 lists the most important design parameters of interest for both transmitter and receiver. The electrical and optical parameters can be simulated and extracted from the process design kit (PDK) provided by the foundries which are essential for the detailed link analysis.

3.1 Derive OMA for Receiver

As shown in Figure 3.3, a single-ended optical front-end receiver consists of a transimpedance amplifier (TIA) stage, an isolating stage, a variable gain amplify (VGA) stage, an automatic gain control (AGC) which includes a peak detector and a comparator [38, 39]. The gain of the TIA has to be the largest among the stages in the signal path for the best noise performance. The isolating stage decouples the dc
operating points between TIA and VGA, and provides small gain. The VGA makes the overall gain tunable to cover the cases of the input signal at different amplitude. This is achieved by using AGC to control the VGA stage.

The bit error rate (BER) performance is determined by the signal-to-noise ratio (SNR). It can be expressed with the complementary error function as given in Equation 3.1 [40]. The SNR is represented by scaling factor $\alpha$. The target $\alpha$ with other electrical parameters of the TIA set the OMA requirement seen by the PD as given in Equation 3.2. In which, $N$ stands for $N$-level of PAM. $i_{n,\text{rms}}$ is the TIA’s input referred rms current noise. In this context, $4 \ \mu A \ i_{n,\text{rms}}$ is used as a design example. $V_{th}$ is the decision threshold after the TIA. $Z_{TIA}$ is the TIA’s transimpedance gain. By correlating Equation 3.1 and Equation 3.2, BER as a function of the the PD’s
\( OMA_{\text{min}} \) is plotted in Figure 3.4. As shown in Figure 3.4, in order to achieve a BER of \( 10^{-12} \), the approximate minimum \( OMA_{PD} \) has to be larger than -10 dBm, -5 dBm and -1.5 dBm for PAM-2, PAM-4 and PAM-8, respectively.

\[
BER = \frac{1}{2} erfc \left( \frac{\alpha}{2\sqrt{2}} \right) \tag{3.1}
\]

\[
OMA_{PD} = (N - 1) \frac{\alpha_{\text{in,rms}} + V_{th}/Z_{TIA}}{\rho} \tag{3.2}
\]

3.2 Determine ER for Transmitter

MZM device needs certain voltage swing to achieve the modulation depth. This inevitably requires circuit techniques to design high voltage driver with low voltage MOSFETs (faster speed). As Figure 3.5 shows a latch-based level shifter couples the signal from low voltage domain to a higher voltage domain. When compared to the traditional passive AC coupling circuit, the biggest advantage of this topology is there is no dc wandering issue, no pattern length and bit rate dependency and much less capacitance required. However, as the voltage potential can’t be changed...
Figure 3.4: PAM-2/4/8 receiver sensitivity based on a 32 Gb/s TIA in 16nm FinFET CMOS process ($i_{n,rms} = 4 \mu A, V_{th} = 20 mV, Z_{TIA} = 58 dB\Omega, \rho = 0.8 A/W$).

instantaneously between the capacitor, potential voltage overshoot or undershoot can happen at the start of the signal toggling depending on the initial conditions at the two plates of the capacitor. The voltage overshoot and undershoot can exceed the gate oxide breakdown voltage and cause reliability issue in the long term which is not an advisable solution for product development. Thus capacitor charge reset circuits are developed to protect the reliability of the transistors after the capacitors. A complete MZM segment push-pull driver schematic is shown in Figure 3.6. Where, vip and vin are complementary signals at the same low voltage domain, vouth and voutl are complementary signals at the high voltage domain and low voltage domain, respectively. The initial reset and power-down function can share one circuitry. The two nodes, before and after the capacitor, will be pulled down the voltage low at the
respective voltage domain when either one of the rst or pd signal is high. This is realized with an output enabled (OE) buffer. When the OE=1, the input can pass to Z. when OE=0, Z is in high impedance state so that a small pull down NMOS can pull the node to voltage low. The pull down NMOS is preferred to be sized small due to the high-speed operation. The VDDL to VDDH voltage domain pull down needs a level shifter circuit and a thick gate PMOS with an always-on weak pull down NMOS. When the reset or power down is high, b=0, node 2 is pre-charged to VDDH, so that node 3 is pull down to VDDL. When reset or power down is low, b=VDDH, node 2 is pulled to VDDL, so that node 3 is unaffected. The power down signal can cut off the signal path for a specific MZM segment even if the signal along the chain is presented.

![Diagram](image)

**Figure 3.5:** Latch-based level shifter with example of potential reliability issue at the start of the signal toggling. Voltage at node 3 may jump to 2*VDDH or −VDDL so that will stress the gates of INV1 and INV2.

The driver is especially suited to drive the lumped high-speed phase modulator (HSPM) being laid out in serpentine style as shown in Figure 3.7. Flip-chip bonding is required. A PAM-4 transmitter block diagram is illustrated in Figure 3.8. The reset signal can be generated by the digital control circuit before the PLL clocking for serializers is ready. Each driver can be individually powered down (disable the HSPM segment modulation) by controlling the corresponding pd signal. Thus
different ER can be achieved with disable or enable driving certain segments for the MZM device. The rms current for segment driver operating at 32 Gb/s can be 14 mA and 4.6 mA for VDDH (1.8 V) and VDDL (0.9 V), respectively. This added flexibility can save power consumption by disabling drivers which are not necessary when the ER requirement is not high. It can also be configured as NRZ modulation if the patterns for serializer1 and serializer2 are kept the same.

ER, which correlates the OMA with the average power, is an important specification for optical modulators. The optical power transfer function (\(T_{opt}\)) of the MZM device is given in Equation 3.3 and is plotted in Figure 3.9 with different lengths. In this plot, the MZM is driven by a push-pull driver with 1.8 \(V_{PP}\) swing on both of the MZM arms. The dc phase difference (\(\phi_{dc}\)) being introduced by the PIN phase modulators in Figure 3.7 is set to 90°. Therefore, the MZM is ideally biased at the

Figure 3.6: A block diagram of the MZM segment driver.
quadrature operation point for the best symmetric linear modulation. However, this assumption is started without considering the mismatches between the two arms due to process variation. $\phi_1$ and $\phi_2$ are the absolute phases of MZM’s two arms. $k$, which deviates from 0.5, is the mismatch factor of the Y-junction and between the two arms. A total static insertion loss is expressed in Equation 3.4. The two branches of optical power before entering the combiner is represented by Equation 3.5 and Equation 3.6, respectively. A dB scale is used in the above equations. The two most significant insertion losses, $IL_{GC}$ and $IL_{HSPM, sat}$, are introduced by the grating coupler and the high-speed phase modulators (HSPM), respectively. A nominal $IL_{GC}$ of 5 dB is used for this analysis. The static and dynamic insertion loss for HSPM are $0.57 \text{ dB/mm}$.
and -0.08 dB/mm, respectively, when it is reverse-biased at 1.8 V. The longer the HSPM, the more static insertion loss will be introduced. The larger the reverse-bias voltage is, the less the dynamic loss of HSPM is. The static and dynamic insertion loss, $IL_{PIN, static}$ and $IL_{PIN, dyn}$, for a 250 $\mu$m PIN PM are 0.22 dB and 1.7 dB/$\pi$, respectively. Moreover, losses introduced by other optical elements including the Y-junctions used for splitting and combing the lights, optical routing like the straight and turned silicon waveguides and the PIN phase modulators, are considered in the MZM model. Thus, the ER can be accurately derived from the plotted $T_{opt}$ in Figure 3.9. It’s astonishing that only less than 5% of the laser power can be transmitted. Figure 3.10 shows the ER for different MZM lengths, and the corresponding insertion loss profile. The achievable ER will be reduced as the data rate speed increases due to the bandwidth limitation of the electrical signal imposed on the effective phase shift.
\[ T_{opt} = \frac{P_1 k + P_2(1 - k) + 2\sqrt{P_1 P_2 k(1 - k)} \cos(\phi_1 - \phi_2)}{P_{\text{laser}} 10^{\frac{IL_{\text{junc}} + IL_{\text{WG}} + IL_{\text{GC}}}{10}}} \] (3.3)

\[ IL_{\text{static}} = IL_{\text{GC}} + IL_{\text{WG}} + IL_{\text{HSPM,static}} + IL_{\text{PIN,static}} \] (3.4)

\[ P_1 = P_{\text{laser}} - IL_{\text{static}} + 10\log_{10} k - IL_{\text{HSPM,dyn}}(\phi_1) - IL_{\text{PIN,dyn}}(\phi_{dc}) \] (3.5)

\[ P_2 = P_{\text{laser}} - IL_{\text{static}} + 10\log_{10}(1 - k) - IL_{\text{HSPM,dyn}}(\phi_2) \] (3.6)

Figure 3.9: Optical power transfer function of the MZM in a 130nm SOI CMOS process with different lengths (an effective phase shift of $7.58^\circ/mm$ is extracted when operating at 32 Gb/s).
3.3 Correlation between Transmitter and Receiver

The OMA after the coupling at the RX PD side needs to be guaranteed to meet the BER requirement for PAM-N signaling. This is plotted in Figure 3.11 with the laser power ranging from 10 to 19 dBm (numbers in red are denoted at the side of each curve). The dash-dotted horizontal lines are the minimum required OMA at the PD for PAM-N signaling derived from the parameters used in Figure 3.4. 5 dB insertion loss is assumed per grating coupler. It can be observed that there is an optimal ER for this specific MZM device, which is around 8 dB. In order to meet the $OMA_{\text{min}}$ for PAM-4 and PAM-8 RX sensitivity requirement, laser power needs to be increased due to the degraded SNR. It requires at least 11 dBm and 19 dBm laser power for PAM-4 and PAM-8 signaling modulation, respectively, to meet the OMA at the RX side to achieve BER of $10^{-12}$. Longer MZM device can increase the
ER but doesn’t necessarily help the OMA. This is because longer MZM device can introduce excessive loss.

Figure 3.11: $OMA_{PD}$ versus ER with varying input laser power.

Besides OMA, the overload current, which is set by the average power of MZM and the $\rho$ of the PD, should also be watched. This is plotted in Figure 3.12. The TIA needs to be able to tolerate this amount of overload current by turning on sink current nfets. Thus the more overload current need to be cancelled, the more sink current nfets need to be turned on thus more input current noise can be introduced, which in turn raises the minimum required $OMA$ at the RX side.
Figure 3.12: The overload current seen from the TIA versus ER.

3.4 Reconfigurable MZM Transmitter Simulation

NRZ modulation format can be realized by completely powering down either all the 5 LSB segments or the 9 MSB segments. The simulated results are shown in Figure 3.13. Since NRZ modulation requires less ER specification, segments within LSB or MSB can be further powered down to save power consumption. With 10 dBm (10 mW) laser power and a fixed current setting for p-i-n phase modulator. The obtained ERs are about 2.94 dB and 5.55 dB, respectively. The driver power consumption is 290 mW and 522 mW, respectively.

For PAM-4 modulation, the two serializers generate two uncorrelated data patterns. Simulated results with three different LSB and MSB segment combinations are
shown in Figure 3.14. In order to meet the receiver’s sensitivity requirement, PAM-4 signaling need more laser power. Here as an example, given 13 dBm (19.95 mW) laser power and a fixed current setting for p-i-n phase modulator. The driver power consumption for achieving 9.38 dB, 7.12 dB, 4.92 dB extinction ratio is 812 mW, 638 mW and 464 mW, respectively. Different ER specifications for PAM-4 modulation format can be realized to meet different application scenarios. The middle eye height is smaller than the height of the top and bottom eyes for the 3+5 segments combination. This can be improved by tuning the dc phase point by changing the current setting for the p-i-n phase modulator.

![Simulated NRZ modulation format eye diagrams](image)

**Figure 3.13:** Simulated NRZ modulation format eye diagrams with 5 segments and 9 segments, respectively, operating at 32 Gb/s.

### 3.5 Summary

A driver has been designed for reconfigurable MZM transmitter. Depending on the different application, it can be configured either NRZ or PAM4 modulation format. Furthermore, it can be set to have different ER specifications, which can significantly save power in certain scenarios. A systematic optical link power budget analysis has
Figure 3.14: Simulated PAM4 modulation format eye diagrams with 5+9, 4+7, 3+5 segment combinations operating at 64 Gb/s.

been presented based on a CMOS driver topology for a lumped-element segmented MZM device. Minimum OMA for error-free RX, optimal ER for MZM device and minimum laser power are derived. The high-speed PAM-4 hybrid optoelectronic transceiver is under fabrication in the TSMC 16 nm FinFET CMOS process and a SOI CMOS silicon photonic process. It will be demonstrated by Hewlett-Packard Labs.
CHAPTER 4

A HYBRID OPTOELECTRONIC LIMITING RECEIVER

Optical communication chipsets play a vital role in the contemporary datacom industry. High-speed, low-power, small form factor interconnect modules are increasingly being sought for the cloud computing and switching systems. An optical receiver converts the small current signal being detected by the photodiode (PD) to a resolvable voltage signal for the subsequent stage, such as clock data recovery (CDR) circuitry, for further processing. Although CMOS PDs can be integrated to achieve monolithic optoelectronic receivers, they have extremely low responsivity ($\rho$) and bandwidth, thus more transmitter optical power is required and equalization technique for bandwidth compensation is a necessity for the receiver [41, 42]. Avalanche PD features a good $\rho$ but requires high voltage, that may not compatible to the practical system application [43]. Costly III-V material PDs can provide high bandwidth and good $\rho$ with small voltage requirement, which are dominantly deployed for high-speed optical receivers [41, 44]. Recent advances in silicon photonic technologies have enabled a cost-effective SOI CMOS platform for electro-optical integration with improved energy efficiency [11]. It can provide Ge PDs for hybrid integration with CMOS compatible power supply requirement. However, low-cost mass producible packaging for silicon photonics is still challenging [45]. Nowadays, hybrid integration has become the mainstream solution due to the cost
consideration and speed limitations of the transistors in available silicon photonic process [5, 46, 38, 47].

Design analysis of the fabricated limiting receiver is elaborated in section 4.1. System-level analysis and discussion with measurement results are given in section 4.2. Finally, key design considerations learned from this case study are summarized.

4.1 Receiver Architecture

In this work, a limiting receiver was designed in IBM 130 nm CMOS process. Similar to prior work such as [46, 43], without having CDR or DeMUX circuitry, the signal is taken out by using an output buffer. As illustrated in Figure 4.1, the signal path of the CMOS limiting receiver consists of a front-end trans-impedance amplifier (TIA) followed by limiting amplifier (LA) stages, with a high gain feedback opamp for dc offset compensation (DCOC). The final outputs will be level shifted by a output buffer (OB), which utilizes a 3.3 V power supply. A pair of off-chip capacitor ($C_{ex}$) is used to achieve the desired low-frequency cut-off for this DCOC loop. The CMOS chip also includes a bandgap reference (designed using 3.3 V transistors) and bias generator. A top illuminated InGaAs/InP PIN photodiode (PD) is side-by-side wire bonded to the TIA input on the CMOS die.

The system specification of the limiting receiver is illustrated in Figure 4.2. The overall bandwidth ($BW_{tot}$) of gain stages in cascade in the linear region can be first order estimated by using $\frac{1}{BW_{tot}} = \frac{1}{BW_1} + \frac{1}{BW_2} + \cdots$ [48]. However, for the latter stages operating in the limiting regime, the concept of small-signal bandwidth is not effectively applicable and must be replaced by modeling the slew-rate limited operation for large-signal excursions. As a rule of thumb, the whole receiver’s bandwidth needs
4.1.1 Photodiode and Trans-impedance Amplifier

According to the InGaAs/InP PIN PD device specification, it features a typical responsivity of 0.8 A/W, 5 nA dark current and a maximum of 100 fF capacitance when it’s reverse biased at 5 V [44]. As shown in Figure 4.3, the voltage applied on the cathode of the PD should be adjusted to 5.6 V as the anode is about 0.57 mV which is set by the operating bias voltage of the TIA input. The TIA consists of
Figure 4.2: Limiting receiver design specification partition.

a shunt feedback common source stage and a second stage with a biased n-channel transistor (nfet) which act as an active inductor load for bandwidth extension. The second stage also provides additional gain and adjusts the dc operating point for the next block by adjusting the bias voltage ($V_b$), $V_b$ must be set below $V_{DDL} + V_{th}$ to keep $M_3$ in saturation. The input current level will be raised when the incoming optical intensity is increased, this can result in improper dc operating point for the next stage, which is CML type in the LA stages. This undesired current is called overload current which can be canceled by turning on the overload control signal ($ol\_en$). However, enabling the current mirror will introduce noise and parasitic capacitance at the input of the TIA. In order to alleviate this problem, the current mirror is connected at the middle of the shunt resistance [5] but at the sacrifice of less overload current canceling capability. What’s more, no input ESD protection device is added here since this pin is not exposed after the CMOS die and PD die wire bonding integration is performed. Machine model (MM) ESD stress during the wire bonding process can be avoided with ESD control precautions [49]. Design trade-offs among TIA gain, bandwidth and noise performance are discussed in the following.
Figure 4.3: The schematic of the the TIA with photodiode, triple-well NMOS is used for TIA core devices.

Table 4.1: TIA design parameters

<table>
<thead>
<tr>
<th>PDK displayed resistance (Ω)</th>
<th>Total width of nfets</th>
</tr>
</thead>
<tbody>
<tr>
<td>$R_{F1,2}$</td>
<td>$R_L$</td>
</tr>
<tr>
<td>384</td>
<td>768</td>
</tr>
</tbody>
</table>

4.1.1.1 Gain and Bandwidth

The first stage TIA gain frequency response is expressed in Equation 4.1 without taking the bondwire effect into account. The low-frequency trans-impedance gain is approximately $R_F$ as long as $g_m R_F$, $g_m R_L \gg 1$. The dominant pole is due to the input capacitance including the PD junction capacitance, PD pad, TIA input device and pad capacitances. The corresponding input resistance is expressed in Equation 4.2. The second pole is resulted from the TIA’s first stage output resistance ($R_L \parallel \frac{1}{g_m M_1}$) and the capacitance associated at that node, which will be much smaller than the input capacitance. The input bond-wire can potentially create complex conjugate poles which cause peaking in the frequency domain and ringing in the time
domain.

\[ R_T(f) = \frac{R_L (1 - gmR_F)}{1 + gmR_L} \frac{1}{(1 + \frac{f}{f_1})(1 + \frac{f}{f_2})} \]  \hspace{1cm} (4.1)

\[ R_{in} = Z_{in}(s)|_{s=0} = \frac{R_L + R_F}{1 + gmM1R_L} \]  \hspace{1cm} (4.2)

With the design parameters listed in Table 4.1, the post layout simulation results for gain frequency response of the 2-stage TIA is plotted in Figure 4.4. The layout parasitic is critical, as an example here, it reduces the TIA’s bandwidth by more than half when compared to the schematic simulation result. Even though the \( f_{T,\text{max}} \) mean value of the nfet is 117.8 GHz, from the schematic Monte Carlo simulation as plotted in Figure 4.5. The achievable gain-bandwidth product using the nfet in practical design will be much less than \( f_{T,\text{max}} \), especially without using on-chip inductors. This is due to the actual bias condition and parasitic effects introduced at the input and output nodes of the nfet.

Extracted layout transient simulation analysis was performed at nominal process corner including bondwire S-parameters extracted with ADS and a PRBS-15 pattern (chosen for practical simulation time) including 1.2 ps rms jitter is used as stimulus. The stimulus has 70 ps rise/fall time, zero level and one level at 1.5 \( \mu \)A and 174 \( \mu \)A, respectively, which are referenced from the measured MZM optical output after a 3.5 dB coupling loss into the PD with a \( \rho \) of 0.8 A/W. The simulated signal operating at 4 Gb/s at the output of the TIA is plotted in eye diagram form as shown in Figure 4.6. It has a peak to peak jitter of about 13.5 ps. The group delay of the 2-stage TIA is also plotted in Figure 4.4. As claimed in [48], group delay variation is required to be less than \( \pm 10\% \) of the bit period (\( \pm 0.1 \) UI) over the specified bandwidth. However,
Figure 4.4: Post layout simulation results for TIA frequency response and group delay at nominal process corner, 40°C.

This simulated group delay variation has 330 ps, which is 1.32× UI for 4 Gb/s. The effect of group delay distortion on the TIA output eye diagram is not obvious due to the jitter is overwhelmed by the source jitter. Moreover, the signal will enter limiting region in the following LA stages, thus making the group delay specification less critical at this stage.

4.1.1.2 Noise and Sensitivity

The first stage in the front-end circuit is key to the noise performance. The input-referred current noise density at the middle band frequency is derived in Equation 4.3. There, $\gamma$ is a process dependent thermal noise coefficient of the nfet. The thermal noise due to the feedback resistor ($R_F$) is directly reflected at the input. It’s preferable to increase $R_F$ and $gm$ of $M_1$ (equivalent to gain) to reduce the noise floor. However, the bandwidth will be reduced by increasing the gain, the nfet channel noise being
referred to the input will start to increase from the TIA bandwidth onwards. This can be observed from the post layout simulation result as plotted in Figure 4.7. For integrated noise, it is necessary to look at the whole spectrum up to about twice the TIA bandwidth[48]. The estimated input referred rms current noise \(i_{n,rms}\) is less than 1.2 \(\mu\text{A}\).

\[
\bar{T}_{n,in}^2 \approx \frac{4kT}{R_F} + \frac{4kT}{(1 - gmR_F)^2} \left( \frac{1}{R_L} + \gamma gm \right)
\]  

(4.3)

Wideband front-end circuit can cause more noise, while narrowband front-end circuit has more ISI distortion at higher speed which also reduces the sensitivity [48]. The bit error rate (BER) performance is derived by integrating of the Gaussian
distribution like noise profile (thermal noise is assumed) from $Q$ times the standard deviation ($\sigma$) to infinity, which is expressed as the complementary error function given in Equation 4.4 [40]. In order to achieve a BER of $10^{-12}$, $Q = 7$ is required.

The sensitivity, which is defined as optical modulation amplitude (OMA) seen at the PD, can be expressed in Equation 4.5 for non-return-to-zero (NRZ) signaling. In which, the scaling factor $\alpha = 2Q$ is the amplitude signal-to-noise ratio (SNR).

$$i_{\text{noise}} = i_{\text{n, rms}} + i_{\text{shot}} + i_{\text{RIN}},$$

where $i_{\text{shot}}$ and $i_{\text{RIN}}$ are the shot noise and the relative intensity noise due to the PD dark current and the laser source, respectively. $\eta$ is the frequency dependent loss factor, e.g. $\eta = 0.7$ at -3 dB bandwidth. $V_{\text{th}}$ is the decision threshold of the circuit after the TIA [40]. $Z_{\text{TIA}}$ is the TIA’s trans-impedance dc gain. From $\eta$ in Equation 4.5, it can manifest the sensitivity can be degraded when
Figure 4.7: TIA input referred current noise spectrum from post layout simulation.

the bandwidth is lowered to certain extent.

\[
BER = \int_{Q,\sigma}^{+\infty} \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{x^2}{2\sigma^2}\right)dx = \frac{1}{2} erfc\left(\frac{Q}{\sqrt{2}}\right) \quad (4.4)
\]

\[
OMAPD = \frac{(\alpha/\eta) + (V_{th}/Z_{TIA})}{\rho} \quad (4.5)
\]

4.1.2 Limiting Amplifier

The LA needs to provide high gain bandwidth and convert the incoming single-ended signal from TIA to a differential signal. For broadband high gain amplifier design, low gain stages in cascade are used. Gain and bandwidth trade-off with identical number of stages has been studied systematically in [48, 50, 51, 52]. In
order to simplify the analysis, let’s assume that the LA consists of $n$ identical cascaded gain stages that do not load each other. As an example of evaluating a total gain-bandwidth of 20 GHz. Also, let’s assume the gain ($A_s$) and bandwidth ($BW_s$) for a single stage amplifier are 20 (26 dB) and 1 GHz, respectively. With $n$ number of first-order stages equally with Butterworth frequency response in cascade, the total gain can be kept the same since each stage has a gain of $\sqrt[n]{A_s}$. While the total bandwidth can be improved by $A_s^{1-\frac{1}{n}} \sqrt{\frac{2}{n}-1}$ than the bandwidth of a single stage with a gain of $A_s$. For this example, bandwidth improvement ratio and stage dc gain versus the number of stages are plotted in Figure 4.8. It can be calculated that the peak bandwidth improvement point is about 4.25 when $n = 6$. The corresponding stage gain is 1.65. In other words, it means the bandwidth for each stage should be 12.14 GHz if the gain-bandwidth product remains constant. Using active feedback other than simply cascading stages to make a higher order system can extend the overall bandwidth but at the cost of a reduced dc stage gain and increased power consumption [51, 52]. Gain, bandwidth and power consumption trade-offs need to be evaluated at the transistor-level circuit design.

4.1.2.1 Gain Stages with Active Feedback

This LA design used twelve single stages in total with active feedback as is shown in Figure 4.9. The inputs of the LA will be connected to the TIA output and the feedback node, respectively. Ideally the feedback node provides a dc bias for comparison with the TIA output comparison. It should be noted that the amplitude and common-mode level of the differential signals at the first several stages will not be equal due to the single-end to differential conversion. In order to get a genuine differential signal, additional buffer stages are required. For easy hand calculation, let’s assume the
Figure 4.8: Bandwidth improvement factor and stage dc gain versus the number of stages for an achievable gain-bandwidth product of 20 GHz (20×1 GHz).

A feedback stage does not introduce extra loading to the single stage cell $A_v(s)$. Then the transfer function of the 3rd-order stage with active feedback can be expressed as Equation 4.6. It has a dc gain of $\frac{A_3}{1+A_0G_fR_L}$, one non-dominant real pole $p_1$ expressed in Equation 4.7 and two dominant complex conjugate poles $p_{2,3}$ expressed in Equation 4.8. From the s-plane pole locations illustrated in Figure 4.9, it can be graphically observed that the bandwidth can be extended with the active feedback technique for the third-order system, comparing to which without using active feedback. The $f_{-3dB}$ increases with increasing the feedback gain but sacrificing more dc gain. The physical layout routing for the feedback cell $G_f$ will introduce some systematic mismatch. The interleaving active feedback proposed in [52] will complicate the layout design and thus is avoided. What’s more, the frequency peaking effect needs to be carefully
examined with extracted layout simulation.

\[ H(s) = \frac{A_0^3}{1 + A_0^2 G_f R_L} = \left(1 + \frac{s}{\omega_p}\right)^3 + A_0^3 G_f R_L \]  

\[(4.6)\]

\[ p_1 = -\omega_p \left(1 + \sqrt[3]{A_0^2 G_f R_L}\right) \]  

\[ p_{2,3} = -\omega_p \left(1 - \sqrt[3]{\frac{A_0^2 G_f R_L}{2}} \pm \frac{\sqrt{3}}{2} \sqrt[3]{\frac{A_0^2 G_f R_L}{2}}\right) \]  

\[(4.7)\]

\[(4.8)\]

4.1.2.2 DC Offset Compensation

In order to convert the single-ended signal to a differential signal, it needs a dc offset compensation loop which can sense the common-mode voltages of the LA’s outputs and then connect to its input in negative feedback. The purpose of the negative feedback is to reduce the offset voltage of the system and provide a stable dc bias close to the TIA output’s average voltage against PVT variations. Figure 4.10 illustrates the function. Equation 4.9 derives the relationship between the gain and
the offset voltages in the loop. Equation 4.10 is the equivalent input referred offset voltage. It can be seen that the offset voltages presented at the inputs of A1 and A2 are reduced by the loop gain and the forward gain of the system, respectively. Thus the offset of the feedback opamp A2 has to be designed small. In order to ensure the loop stability, the RC needs to be large enough to make the dominant pole very low to meet the stable phase margin. The low cutoff frequency point (P2) is set by the RC and dependent to the gain of A1, A2 stages [48]. On-chip capacitor results in a large area. Off-chip methods have a bonding wire inductance in series with the terminating capacitor. The loop stability needs to be carefully simulated across the PVT variations by taking the effect of bondwire into account. The higher the loop gain, the less static error between the feedback node voltage and the average input voltage, in turn, the more symmetrical between LA’s differential output. Thus, a folded cascode gain-boosted opamp (Figure 4.11) features a dc gain of 65 dB is used as the feedback opamp (A2) [53]. Amplifier $G_t$ and $G_b$ with compact CMFB scheme are used for gain-boosting [54]. In this design, P2 is set below 10 MHz. RC needs to be large enough due to the large loop gain.

$$
[(V_{os1} - V_{cm}) A_1 + V_{os2}] A_2 = V_{cm}
$$

(4.9)

$$
V_{offset} - V_{cm} \approx \frac{V_{os1}}{A_1 A_2} + \frac{V_{os2}}{A_1}
$$

(4.10)

4.1.2.3 Large-Signal in Limiting Region

Depending on the magnitude of the input signals, as they propagating to certain stages in the limiting amplifier, the tail current in the CML stages is fully steered
to one side so that the voltage swing is limited by the tail current times the load resistance. The rise-time of the output voltage is mainly determined by the RC time constant. This delay depends on the charging of the load capacitor by the resistors to the supply rail, and thus the total capacitance contributed by the current and the next stage should be minimized. The fall-time of the output is contributed by discharging the load capacitor during which the transistor transitions from sub-threshold region to saturation region (it will enter triode region until gate voltage larger than drain voltage plus $V_{th}$ when the amplitude is large), with the discharge current reaching close to the tail current [16]. The bandwidth-limited signal in the linear region gets sharpened in the limiting region. As shown in Figure 4.12, with a 2.5 GHz 150 $mV_{PP}$ sinusoidal signal propagating through the LA stages, the signal is first being amplified in the linear region and then sharpened in the limiting region. More additive noise effect will present in slow rise/fall time signals. Sharper transition edge signal is desirable although it gets sharpened by the latter limiting stages in the LA [28].

Figure 4.10: Illustration of dc offset compensation.
4.1.3 Output Buffer

Signal integrity, electro-migration and ESD protection are the main design considerations for the output buffer. The signals arriving at the inputs are already large amplitude limiting signals. The tail current of the output buffer is set by the required output signal amplitude and the termination resistance. As shown in Figure 4.13, 1.2 V nfets are used as high speed switching device. In order to prevent the output devices from ESD damage, 3.3 V thick gate oxide nfet is cascoded on the 1.2 V nfet. What’s more, ballasting resistors (Rd) in parallel are added in series at the drain side of each unit finger of the 3.3 V nfet as a current limiter. The unit width and total width of the nfet should be properly sized for ESD safety and be carefully laid-out.
Figure 4.12: A 2.5 GHz 150 mV_{PP} sinusoidal signal gets amplified and gets more NRZ-like waveform with edges sharpened, as it travels along the LA chain.

On-chip resistors with a nominal value of 50 Ω is used to alleviate the reflection due to the transmission line on the PCB. It usually has a 50 Ω termination with respect to the ground at the equipment side (e.g. the oscilloscope). The required tail current can be estimated from the desired output amplitude divided by 25 Ω, which also sets the dc operating point of the CML buffer. The dc tail current is designed at 10 mA in the current mirror by default setting. However, it will have current mismatch due to the $V_{ds}$ of the current mirror devices will not be the same. In order
to alleviate the $V_{gs}$ drop of the switching devices ($M_{1,2}$) to keep $M_4$ in saturation, low $V_T$ nFET is adopted for $M_{1,2}$. A minimum metal width is required to meet the electro-migration reliability concerns.

![Figure 4.13: The schematic of the level shifted output buffer.](image)

Figure 4.14 is the simulated eye diagram of the OB drives a 50 Ω through a dc blocker. In the Lab measurement, this signal not only goes to the sampling oscilloscope for eye measurement, but also goes into the BERT for bathtub and sensitivity testing. Multiple jitter sources contribute to the total jitter (TJ) in the eye diagram of the OB. The distribution of TJ is the convolution of the distributions of random jitter (RJ) and deterministic jitter (DJ). Thus probability density function (PDF) of TJ won’t be a Gaussian like PDF in practice as illustrated in Figure 4.14. Tail fitting technique can be used to construct a Gaussian distribution to accurately match the actual PDF effects [55]. Bathtub plot is obtained by building the cumulative distribution function (CDF) of jitter on the left and right data edges.
It shows the probability of error versus the sampling point, and used to estimate the
eye opening for very low BER levels. Instead of the time domain, similarly, the BER
can be characterized in the magnitude domain. This is shown in the experimental
results section.

![Figure 4.14: Post layout simulated eye diagram of the signal coming out
from the output buffer at 4 Gb/s.](image)

### 4.2 Experimental Results

To represent a complete optical link, a continuous-wave (CW) laser, a commercial
LN ($LiNbO_3$) Mach-Zehnder modulator device and a discrete broadband driver are
adopted to assist the electro-optical testing. As demonstrated in Figure 4.15 (a),
the CMOS die fabricated in IBM 130nm CMOS process and the commercial PD
die ($500\mu m \times 500\mu m$) are wire bonded using chip-on-board (COB) method. The
bondwire gap between the TIA input pad and the PD anode pad is about $820 \mu m$. 
Figure 4.15 (b) shows a 4-layer PCB fabricated with laminate material of Rogers (RO4350B). ENEPIG surface finish is required for COB bonding. A parallel of decoupling capacitors (100 pF, 1 µF, 10 µF) are used for the 1.2 V, 3.3 V and PD power supplies (no LDO is used). Grounded coplanar waveguide (GCPW) T-line was designed to carry the signal out to the equipment. As indicated in Figure 4.15 (c), a single fiber probe is aligned on top of the PD. The differential outputs from the OB are connected to the sampling oscilloscope (Keysight DCA 86100D) and BERT (Anritsu MP1800A), respectively, for observing the waveform and eye diagram inspection and simultaneously for sensitivity measurement. A picture of the experimental test setup is shown in Figure 4.15 (d). The yellow fiber cable at the bottom right connects to the tunable CW laser source at the other side of the room. The laser wavelength is set to 1550 nm.

A detailed test setup block diagram is shown in Figure 4.16. The optical stimulus for the PD is generated by using a commercial LiNb Mach-Zehnder modulator device being modulated by the Anritsu MP1800A PPG source with a high voltage swing, broadband driver. Polarization controller is required since non polarization-maintaining single mode patch cables are used.

### 4.2.1 Eye Measurement

The optical information from the output of the MZM is recorded for later sensitivity analysis. Figure 4.17 shows the optical eye diagrams measured at the output of the MZM device with PRBS-31 at 4 Gb/s and 5 Gb/s, respectively, at different input laser power. It can be observed that it has about 12.1 dB total insertion loss from the laser to the MZM output. It should be noted that the zero level power shown in Figure 4.17 (a) and (b) went down to negative values. This is due to an
uncalibrated high bandwidth plug-in module (Keysight 86105D) was used for this optical measurement. High bandwidth module features more integrated noise. The insertion loss can be reduced to about 12 dB if the optical signal being offset up by 4 $\mu W$. The actual extinction ratio (ER) will be 17.8 dB and 16.5 dB when operating at 4 Gb/s and 5 Gb/s, respectively, with 6 dBm laser power.

In order for the PD to get a stable detection in the photoconductive mode, the PD power supply needs to be set above 5.6 V. Given a PD responsivity of 0.8 A/W, by reading the MZM optical power level and the PD power supply currents, it can be estimated that there is a coupling loss of 3.5 dBm due to the fiber alignment. It can be derived that the input current eye height for obtaining the electrical eye in Figure 4.18 is about 173 $\mu A$, 172 $\mu A$ and 170 $\mu A$ at 4 Gb/s, 4.5 Gb/s and 5 Gb/s,
respectively. In Figure 4.18 (a)-(c), data rate, rise/fall time, rms jitter, peak-to-peak jitter, eye height, SNR were recorded for one of the output buffer’s differential signals with large number of counts along with the eye diagram measurement at PRBS-31 pattern. The receiver draws 39 mA from 1.2 V and 12 mA from 3.3 V, resulting in a total dc power consumption of 85 mW. The power consumption breakdown for the limiting receiver is plotted in Figure 4.19.

4.2.2 Bathtub and Sensitivity

One of the output buffer’s differential outputs was connected to the Anritsu MP1800A error detector. Auto bathtub measurement was performed to test the BER versus the internal CDR clock phase as plotted in Figure 4.20. The bathtub plot is not smooth due to the fiber alignment gradually shifts during the long time measurement. For operating at 4 Gb/s, the eye opening at BER = $10^{-12}$ is around 0.1 UI, while for operating at 5 Gb/s, the eye opening at BER = $10^{-8}$ is around 0.2 UI.
Figure 4.17: Optical eye diagrams of the MZM output measured with PRBS-31 at 4 Gb/s with (a) 2 dBm and (c) 6 dBm laser power, at 5 Gb/s with (b) 2.8 dBm and (d) 6 dBm laser power, wavelength is set at 1550 nm.

Sensitivity is also characterized by manually sweeping the input laser power while recording the BER. BER versus the MZM average power (top x-axis) and optical modulation amplitude (OMA at the bottom x-axis) as shown in Figure 4.21. OMA is the logarithm of the optical power amplitude. When the receiver operating at 4 Gb/s PRBS-31 pattern, a BER of $10^{-12}$ can be achieved at the sensitivity level of -6.2 dBm average power and -3.2 dBm OMA at the MZM output (estimated OMA ≈ -6.7 dBm after coupling to the PD).
4.3 Summary

A limiting receiver was designed and fabricated in IBM 130 nm CMOS process. It was wire-bonded to a commercial discrete PD. A complete electro-optical link was demonstrated with a discrete high-swing broadband driver and a discrete LN MZM device as a transmitter. A BER of $10^{-12}$ at 4 Gb/s is achieved with a TIA bandwidth of $0.46 \times B$ and a LA bandwidth of $2 \times B$ in the receiver’s linear region. Table 4.2 compares some of the OE limiting receivers fabricated in 130 nm CMOS process. It can be observed that using integrated PD or using flip-chip bonding option to eliminate the input bondwire can help to improve the performance. Several other points can be learned from this design, they are: 1). Bondwire effects in side-by-side bonding option and the overload current effect need to be co-simulated with the extracted layout. 2). In order to reduce the parasitic effects introduced by the metal routing, the TIA block can be placed close to the input pad. No latchup issue is present since there has only nfets and no large switching currents in the TIA. ESD protection devices at the TIA input can be waived for high-speed application. 3). The latter stages of the LA working in the limiting region are also act as predriver stages for the output buffer. 4). Other important issues like signal integrity and power integrity including T-line design in PCB, the selection of SMA connectors and power supply noise at different frequency bands are key to the overall system performance.
Table 4.2: Comparison of the optoelectronic RX fabricated in 130 nm (SOI) CMOS process.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>RX topology</td>
<td>TIA+LA+CDR</td>
<td>TIA+EQ+LA+OB</td>
<td>TIA+EQ+LA+OB</td>
<td>TIA+LA+OB</td>
</tr>
<tr>
<td>TIA topology</td>
<td>Shunt feedback CS</td>
<td>Diff. shunt feedback</td>
<td>Diff. shunt feedback</td>
<td>Shunt feedback CS</td>
</tr>
<tr>
<td>Feedback resistance</td>
<td>310 $\Omega$</td>
<td>5 $k\Omega$ (PMOS)</td>
<td>4 $k\Omega$</td>
<td>768 $\Omega$</td>
</tr>
<tr>
<td>On-chip inductor</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td>PD type</td>
<td>PIN flip-chip</td>
<td>CMOS Integrated</td>
<td>CMOS Integrated APD</td>
<td>Hybrid InGaAs/InP PIN wire-bond</td>
</tr>
<tr>
<td>PD capacitance</td>
<td>150 $fF$</td>
<td>1 $pF$</td>
<td>NA</td>
<td>100 $fF$</td>
</tr>
<tr>
<td>Optical wavelength</td>
<td>1550 nm</td>
<td>850 nm</td>
<td>850 nm</td>
<td>1550 nm</td>
</tr>
<tr>
<td>Max. data rate</td>
<td>10 Gb/s</td>
<td>4.5 Gb/s</td>
<td>10 Gb/s</td>
<td>4 Gb/s</td>
</tr>
<tr>
<td>Sensitivity</td>
<td>-19.5 dBm$^*$</td>
<td>-3.8 dBm$^*$</td>
<td>-4 dBm$^*$</td>
<td>-6.7 dBm (-6.2 dBm$^*$)</td>
</tr>
<tr>
<td>Measured method</td>
<td>Packaged</td>
<td>NA</td>
<td>Die with RF probing</td>
<td>COB with SMA and RF cable</td>
</tr>
</tbody>
</table>

Note: * Average input power is measured as sensitivity for BER=$10^{-12}$. Otherwise, it is characterized with OMA.
Figure 4.18: (a)-(c) Electrical eye diagrams of a single-ended output of the limiting receiver measured with PRBS-31 from 4 Gb/s to 5 Gb/s. (d) Oscilloscope mode at 5 Gb/s with PRBS-7 pattern.
Figure 4.19: Power consumption breakdown of the limiting receiver.

Figure 4.20: BER bathtub measurement of the limiting receiver with PRBS-31 operating from 4 Gb/s to 5 Gb/s.
Figure 4.21: Sensitivity plot of the limiting receiver with PRBS-31 operating from 4 Gb/s to 5 Gb/s, BER versus MZM OMA and average power.
CHAPTER 5

A 10 GHZ PHASE LOCK LOOP DESIGN

On-chip phase Lock Loop (PLL) which can provide high precision clocking is indispensable for data generation and synchronization in any transceivers. This is especially important to high-frequency, high-speed applications due to it’s challenge to feed high-frequency clocks externally. Higher-speed data transfer is more desired nowadays due to the ever-increasing appetites for bandwidth-hungry applications, like demands on video streaming, online gaming, cloud computing, etc. In this chapter, a systematic design flow for a 10 GHz type-II third-order charge pump PLL in the IBM 130 nm SiGe BiCMOS process is presented along with experimental results. It is used to provide system clock for the > 10 Gb/s full-rate pseudo-random binary sequence (PRBS) which will be introduced in the next chapter. As semiconductor technology advances, the channel length of the devices shrink for higher speed and density. On the other hand, the gate leakage of the MOS capacitors used in the traditional analog loop filter design will increase due to the decrease of gate oxide thickness, this can cause system instability with an analog PLL. Low voltage charge pump design also poses challenges in the < 100 nm technologies due to large device mismatches and the stringent voltage headroom requirement. Digital PLLs adopt a digital loop filter, which can solve the leakage problem and replace analog PLLs in the < 100 nm processes [57]. However, in this 130 nm SiGe BiCMOS process, thick gate
oxide MOS devices and a 2.5 V power supply are used, so that the above mentioned issues in the analog PLL design won’t be a problem.

In order to perform PLL loop stability analysis, VCO characteristic based on the specific topology and process should be ready beforehand. So VCO is characterized right after the PLL architecture is defined. Then followed by system loop stability analysis to choose the appropriate loop filter parameters. Noise budget analysis is finally performed to predict the noise contributions from each block.

5.1 PLL Architecture

![Schematic of the proposed type-II third-order PLL architecture.](image)

The proposed type-II 3rd-order PLL architecture is shown in Figure 5.1. LC VCO is chosen for the advantages of high resonance frequency and better phase noise performance. It also suits for the narrow tuning range application which is the case here. A divide-by-128 divider chain is used in the feedback loop so that the required reference clock is PLL output clock divide by 128. A resistive positive feedback
An inverter-based input buffer takes the reference clock from an external source. A fully differential charge pump with an opamp to reduce the transient current mismatch is designed. The PLL is designed with SiGe HBT as well as MOS fets with 2.5 V power supply.

### 5.2 LC VCO Design

The LC VCO schematic is shown in Figure 5.2. The resonance frequency is expressed in Equation 5.1 which depends on the inductance and total capacitance. Usually, $L$ should be kept small so that leave more tuning flexibility to capacitor banks and for smaller $K_{VCO}$ as indicated in Equation 5.2. Lower $K_{VCO}$ is desirable to reduce the VCO phase noise due to the spur. The MOS varactor capacitance $C_{var}$ should be larger than or comparable to the load capacitance $C_L$. MOS varactor is made by N+ polysilicon gate over n-well using 5.2 nm gate oxide and is operating between depletion and accumulation. It has a tuning voltage and capacitance range of 1 V to -0.5 V and 2.8:1, respectively. The unit area capacitance is about $6 \, fF/\mu m^2$ at 1.25 V. However, all these passive devices are not ideal in practical, they have series resistance which need to be canceled by the negative resistance. Negative resistance is realized by npn BJT and pFET double cross-coupled pair. With the same bias current, in theory, the differential amplitude of double cross-coupled pair is twice of the single cross-coupled pair counterpart. So that the double cross-coupled pair topology has 6 dB phase noise improvement [58]. The tail current is made tunable to avoid the oscillator working under voltage limited mode. The post-layout simulated single-ended peak-to-peak amplitude of the oscillator is about 600-890 mV across the process and temperature variations. Since the oscillator is working under large-signal region, one side of the
npn BJT and pfet double cross-coupled pair will enter reverse-active region and linear region, respectively. This can incur a severe phase noise penalty. Design techniques are proposed in [59, 60] to shift down the base or gate voltage so that the oscillator operating in Class-C mode. However, it requires extra bias voltage. For layout simplicity, Class-B oscillator topology is used in this design. A pfet current mirror is used for the oscillator’s current bias so that it has less $1/f$ noise up-conversion effect than the nfet current mirror counterpart. Big resistor and pfet capacitor are added to filter out the noise introduced from the current source and bandgap reference.

Figure 5.2: Schematic of the LC VCO. Control bits $C < 2 : 0 >$ control the capacitor bank for discrete frequency coarse tuning, $C0=66$ fF.
\[
\omega_0 = \frac{1}{\sqrt{\frac{L}{2}(C_{var} + \sum_{n=0}^{2} V_C[n]2^nC_0 + C_L)}}
\]  

(5.1)

\[
K_{VCO} = \left(\frac{\omega_0}{2\pi}\right)' = -2\pi^2 f^3 L \frac{dC_{var}}{dV} \text{ (Hz/V)}
\]  

(5.2)

Parasitic resistance and capacitance are detrimental to the LC tank resonance frequency. Even a small amount of resistance (&lt;1 Ω) will have a profound effect on high-Q circuits such as resonators. A bad layout and good layout comparison is given in Figure 5.3 (a) and (b), respectively. The inductor should be placed as close as possible to the cross-coupled transistors. And any capacitance being introduced

Figure 5.3: Two different VCO layout examples.
by the metal routing and VCO buffer loading should be considered during the VCO design and simulation. The layout extracted VCO characteristics with two extreme corners and one nominal corner are plotted in Figure 5.4. With the known range of the $K_{VCO}$, charge pumped current can be designed digitally programmable to compensate the $K_{VCO}$ variation.

Figure 5.4: Layout extracted simulation results of VCO characteristics.

5.3 Loop Stability Analysis

The corresponding PLL model is illustrated in Figure 5.5 with individual noise sources added for later noise analysis. The loop transfer function of type-II third-order PLL can be expressed in Equation 5.3.

$$L(s) = \frac{I_{CP}}{2\pi} \left[ \left( R_1 + \frac{1}{sC_1} \right) \parallel \frac{1}{sC_2} \right] \frac{2\pi K_{VCO}}{s} \frac{1}{N} = \frac{I_{CP}K_{VCO}}{(C_1 + C_2)N} \frac{1 + \frac{s}{\omega_Z}}{s^2(1 + \frac{s}{\omega_P})} \quad (5.3)$$

Where $\omega_Z = 1/(R_1C_1)$ and $\omega_P = (C_1 + C_2)/(R_1C_1C_2)$. The unit of $K_{VCO}$ is Hz/V. Let $b = C_1/C_2$, then $\omega_P = (1 + b)\omega_Z$. In the 3$^{rd}$-order system, two poles
are located at the origin, so that the zero should be placed before the unity loop bandwidth \( (\omega_{u,\text{loop}}) \), thus we have \( \omega_Z < \omega_{u,\text{loop}} < \omega_P < \omega_{\text{ref}} \), \( \omega_{\text{ref}} \) is the reference radius frequency. By letting \( c = \omega_u,\text{loop}/\omega_Z \) and using the trigonometric identity
\[
\tan^{-1} A - \tan^{-1} B = \tan^{-1}\left( \frac{A-B}{1+AB} \right),
\]
the phase margin (\( \varphi \)) of \( L(s) \) is shown in Equation 5.4.

\[
\varphi = \tan^{-1}\left( \frac{\omega_u,\text{loop}}{\omega_Z} \right) - \tan^{-1}\left( \frac{\omega_u,\text{loop}}{\omega_P} \right) = \tan^{-1}\left( \frac{bc}{1 + b + c^2} \right) \tag{5.4}
\]

Unity loop bandwidth over zero (defined as \( c \)) versus the capacitor ratio in the loop filter (defined as \( b \)) is plotted in Figure 5.6 when the phase margin (\( \varphi \)) is 65°. In this case, it can be observed that \( C_1 \) has to be more than 18 times larger than \( C_2 \). Typically, \( c \) is set in the range of 6 to 10 [61].
Next, let’s define $\omega_{\text{ref}} = a \omega_{u,\text{loop}}$. $a$ is set at more than 10 for the following two main reasons: 1). In order to be able to approximate the VCO tuning voltage ($V_C$) as continuous time and linearize the PLL model; 2). Filter out the periodic reference disturbance due to reference clock feed-through. However, there is a chance that $C_2$ will be very small that is comparable to the parasitic capacitance if the reference clock frequency is high, and the noise performance is also sensitive to the loop bandwidth. As a rule of thumb, for this application it’s better to set $f_{u,\text{loop}}$ less than 1 MHz from noise perspective. In this design $f_{u,\text{loop}} = 500$ kHz is chosen to give enough margin.

$C_2$ can be solved by substituting $\omega_{u,\text{loop}}$ into the loop magnitude function as in Equation 5.5. Once $C_2$ is known, $C_1$ and $R_1$ can be easily derived from the $b$ coefficient and $\omega_Z$.

**Figure 5.6:** Plot of unity loop bandwidth over zero versus capacitor ratio in the loop filter for 65° phase margin.
\[ C_2 = \frac{I_{CP} K_{VCO}}{N} \frac{a^2 \sqrt{1 + c^2}}{\omega_{ref}^2 \sqrt{(1 + b)^2 + c^2}} \] (5.5)

Thus far, \( I_{CP} \) and \( K_{VCO} \) seem to be the most important design variables to determine the loop filter parameters. As long as \( I_{CP} K_{VCO} \) is kept as a constant, \( C_2 \) will be fixed. It’s evident that the larger the \( K_{VCO} \) or \( I_{CP} \), the larger the capacitance and the smaller the resistance will be required in the loop filter.

### 5.4 Circuit Block Design

In this section, circuit block design including phase frequency detector (PFD), charge pump, loop filter, feedback dividers will be given.

#### 5.4.1 Phase Frequency Detector

A common linear PFD with resettable DFFs is sufficient to meet the design requirement since the operating frequency is less than 100 MHz. Its schematic and state diagram are illustrated in Figure 5.7 and Figure 5.8, respectively. The PFD’s characteristic is ideally linear for the entire range of input phase difference from \(-2\pi\) to \(2\pi\). Latches are added at the differential outputs to improve the rise and fall time so that the crossing point can be adjusted to the half of VDD. In order to avoid PFD+CP+LF dead zone, minimum delay needs to be assured in the reset path to make sure the outputs have enough pulse width to turn on the switching fets in the charge pump. While too large delay will reduce the detectable phase range and the PFD operating frequency.
5.4.2 Charge Pump and Passive Loop Filter

On one hand, the smaller the charge pump current, the larger the noise it will contribute to the PLL output phase noise. On the other hand, the larger the current, the larger capacitance is required in the loop filter. Figure 5.9 shows the schematic of the charge pump. By using differential switches (M1-M4) steering the current, the up and down current sources are always kept in saturation region. The complementary node $v_{cm}$ of $v_{ctrl}$ is well fixed by a buffer to reduce the transient current mismatch (charge sharing) [62]. Wide range input and output common-mode opamp is desired to cover the valid $v_{ctrl}$ range. Replica bias and large device dimensions are used in
the current mirrors to reduce systematic and random process mismatch, respectively.

Charge pump currents are made controllable to compensate $K_{VCO}$ variation for stabilizing the system. With simulated $K_{VCO}$ values given in Figure 5.4 and Equation 5.5, possible loop filter parameters can be calculated which are listed in Table 5.1. On-chip capacitors and resistors have large process variations so that the loop filter needs to be designed with certain tunability. Table 5.2 and Figure 5.10 characterized the resistor and capacitor at three corner conditions. The variations of resistor, MIM capacitor and 2.5V nfet MOS capacitor can be as much as 37%, 34% and 6.3%, respectively. MIM capacitors are preferred to be used in the LC VCO for its better quality factor and are not voltage dependent. While, MOS capacitor with less process variation is better for loop filter application. The resistor can be designed digitally programmable to cover the large process and temperature variations. The schematic is shown in Figure 5.11, the size of the transmission gates need to be large enough to make sure negligible turn-on resistance comparing to the ploy resistance in parallel.
Table 5.1: Loop filter parameters when $f_{VCO} = 11$ GHz, $N = 128$, $b = 25.57$, $c = 9$, $PM = 65^\circ$.

<table>
<thead>
<tr>
<th>$I_{CP}$</th>
<th>$K_{VCO}$</th>
<th>$C_1$</th>
<th>$C_2$</th>
<th>$R_1$</th>
</tr>
</thead>
<tbody>
<tr>
<td>360 $\mu$A</td>
<td>100 MHz/V</td>
<td>235.21 pF</td>
<td>9.2 pF</td>
<td>12.18 k$\Omega$</td>
</tr>
<tr>
<td>100 $\mu$A</td>
<td>400 MHz/V</td>
<td>261.34 pF</td>
<td>10.22 pF</td>
<td>10.96 k$\Omega$</td>
</tr>
<tr>
<td>60 $\mu$A</td>
<td>700 MHz/V</td>
<td>274.44 pF</td>
<td>10.73 pF</td>
<td>10.44 k$\Omega$</td>
</tr>
</tbody>
</table>

Table 5.2: Simulated opppcrees resistor and MIM capacitor characteristics at three corners in IBM8HP process.

<table>
<thead>
<tr>
<th>Device</th>
<th>$W/L$</th>
<th>$+3\sigma$ @ $0^\circ C$</th>
<th>nominal @ $27^\circ C$</th>
<th>$-3\sigma$ @ $80^\circ C$</th>
</tr>
</thead>
<tbody>
<tr>
<td>opppcrees</td>
<td>0.8$\mu$/2.1$\mu$</td>
<td>1183.5 $\Omega$</td>
<td>1000.9 $\Omega$</td>
<td>818.2 $\Omega$</td>
</tr>
<tr>
<td>MIM cap</td>
<td>10$\mu$/10$\mu$</td>
<td>120.5 $fF$</td>
<td>86.04 $fF$</td>
<td>102.5 $fF$</td>
</tr>
</tbody>
</table>

Figure 5.10: Simulated MOS capacitor characteristics with vary gate voltages at three corners.
5.4.3 Frequency Divider

As illustrated in Figure 5.1, CML DFF and true single-phase-clock (TSPC) dynamic flip-flop [63] are used for high frequency and lower frequency dividers, respectively. High frequency divide-by-2 circuits require a minimum amplitude at certain frequency operation, which can be characterized with divider sensitivity curve [64]. The schematic of the CML divide-by-2 circuit is shown in Figure 5.12. AC coupling is used to take the clock input signals so that the dc operating points are independent to the input signals. HBT devices are used for the amplification fets (B1-B2) and regeneration fets (B3-B4). While for the purpose of easy bias under relative large tail current condition, NMOS fets (M1-M2) are used for clocking switches. Source follower (B5-B6) stage as level shifter provides better dc operating points for the BJT to stay in the forward-active region but at the sacrifice of consuming more currents.
Figure 5.12: Schematic of CML divider-by-2.

TSPC dividers feature low power consumption, but require a full swing input amplitude to be functional. Resistive feedback inverter with AC coupling can be used to convert the CML signal to CMOS signal. The TSPC divide-by-2 circuit as shown in Figure 5.13 is used for clock frequencies less than 1 GHz in this application.
5.5 PLL Phase Noise Analysis

Every block in the PLL will add noise to the system, other than that, external reference clock and supply voltage can also contribute significant noise. In order to find out the dominate noise source to the PLL output at different noise band, noise in each block in the system is checked. Noise transfer function of the PLL output with respect to the individual noise source is studied in the closed-loop form. Final PLL output phase noise contributed by individual phase noise after being filtered by its corresponding noise transfer function. Phase noise due to the device thermal noise and flicker noise in each block is estimated with Spectre simulator using periodic steady state (pss) analysis and periodic noise (pnoise) analysis. The simulated noise data are collected and then post processed with Matlab.

Figure 5.13: Schematic of TSPC divider-by-2.
5.5.1 Noise Sources and Noise Transfer Function

Noise transfer functions from PLL output to each noise source shown in Figure 5.5 are listed in Table 5.3. The reference, PFD and divider noise transfer function have the same the system closed-loop transfer function. Higher reference frequency can reduce the contribution of the PFD noise [65]. The charge pump noise transfer function has the same effect as is introduced from the reference, but scaled by $\frac{2\pi}{I_{CP}}$. Which means lower change pump current will contribute more noise. Continue down to the loop filter, its noise transfer function is further scaled by the loop filter impedance $Z(s)$ results in a band-pass like transfer function. VCO noise accumulates due to its integral characteristic of VCO transfer function. The transfer function for noise introduced after the VCO is a high-pass shape.

Table 5.3: Noise transfer functions from PLL o/p to each noise sources.

<table>
<thead>
<tr>
<th>NTF</th>
<th>Ref+Div</th>
<th>PFD+CP (A^-1)</th>
<th>LF</th>
<th>VCO</th>
</tr>
</thead>
<tbody>
<tr>
<td>PLL O/P</td>
<td>$\frac{NL(s)}{1+L(s)}$</td>
<td>$\frac{2\pi NL(s)}{I_{CP}(1+L(s))}$</td>
<td>$\frac{2\pi K_{VCO}}{\pi(1+L(s))}$</td>
<td>$\frac{1}{1+L(s)}$</td>
</tr>
</tbody>
</table>

The corresponding amplitude frequency response is plotted in Figure 5.14. It can be observed that all of the added noise for the PLL system can be attributed to two primary noise sources. They are detector noise and VCO noise [61]. Detector noise is considered to be the addition of white and spurious noise, and is composed of noise due to the reference, PFD and divider jitter, charge pump noise, and spurious noise from the reference clock. VCO noise is assumed to roll of at $–20$ dB/dec, and is primarily caused by thermal noise in the VCO structure. But in practice, VCO noise rolls off at a higher rate than $–20$ dB/dec at low frequencies due to the influence of the flicker noise. Detector noise can be reduced by setting the bandwidth as low as possible. By reducing the bandwidth, the VCO noise amplitude frequency curve will
not only shifts to the low frequency side, but also the magnitude will go up. So in order to minimize VCO noise, the bandwidth of the loop needs to be raised. The peak point of the band-pass (refer to Figure 5.14) is the bandwidth of the loop. Optimizing loop bandwidth is basically balancing between the noise due to the reference and VCO characteristics.

5.5.2 Phase Noise Simulation

Phase noises for each building blocks including VCO, charge pump, feedback divider chains and resistor in the loop filter are obtained from Spectre simulation (pss and pnoise analysis). The reference clock is generated by Keysight E8257D analog signal generator. Its phase noise is measured with Agilent PXA N9030A signal analyzer. Phase noise introduced by all the noise sources versus an offset frequency
ranges from 100 Hz to 10 MHz are plotted in Figure 5.15. Given the noise transfer functions listed in Table 5.3 and plotted in Figure 5.14, the PLL output phase noise due to phase noise sources introduced by each block can be plotted in Figure 5.16, respectively. The solid black curve in Figure 5.16 denotes the PLL output total phase noise which is the addition of all the other phase noise components. It can be observed that the reference phase noise seen at the output dominates the total output phase noise at lower offset frequency range, the VCO output phase noise became dominant from 2.5 MHz out onwards.

Figure 5.15: Phase noise of each noise sources introduced into the PLL.

Equation 5.6 can be used to calculate the rms jitter (variance) of the phase noise PSD in the time domain.

$$\sigma_{rms} = \frac{1}{2\pi f_{VCO}} \sqrt{\int_{f_{start}}^{f_{stop}} S(f) df}$$  \hspace{1cm} (5.6)
The calculated rms jitter of the reference clock and the VCO from their phase noise profile is about 0.94 ps and 1.08 ns, respectively. The rms jitter of the PLL output phase noise profile is 0.76 ps. This manifests that most of the VCO noise is filtered out, but reference clock still contributes significantly.

### 5.6 Experimental Results

The PLL was fabricated in IBM 130 nm BiCMOS process. The micro-photograph is captured in Figure 5.17. In order to avoid dealing with high-frequency signal off-chip, a VCO divide-by-64 signal (PLL/64) is taken out from the die for measurement. The CMOS die is packaged with QFN-64 package and tested with a high-frequency QFN socket mounted on a prototype PCB. The test setup is given in Figure 5.18. Keysight E8257D PSG analog signal generator which can generate high quality clock

<table>
<thead>
<tr>
<th>Offset Freq. (Hz)</th>
<th>SSB Phase Noise (dBc/Hz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>-170</td>
<td></td>
</tr>
<tr>
<td>-160</td>
<td></td>
</tr>
<tr>
<td>-150</td>
<td></td>
</tr>
<tr>
<td>-140</td>
<td></td>
</tr>
<tr>
<td>-130</td>
<td></td>
</tr>
<tr>
<td>-120</td>
<td></td>
</tr>
<tr>
<td>-110</td>
<td></td>
</tr>
<tr>
<td>-100</td>
<td></td>
</tr>
<tr>
<td>-90</td>
<td></td>
</tr>
<tr>
<td>-80</td>
<td></td>
</tr>
<tr>
<td>-70</td>
<td></td>
</tr>
<tr>
<td>-60</td>
<td></td>
</tr>
</tbody>
</table>

Figure 5.16: PLL output noise due to individual noise sources.
as mechanical crystal is used to provide reference clock for the PLL. The PLL/64 signal is taken and fed into the Keysight PXA N9030A signal analyzer. The first step of the testing is without giving the reference clock, set the signal analyzer at spectrum mode to measure the free running frequency of the PLL/64 signal. Then turn on the signal generator and set the frequency close to the previous measured free running frequency, tune the reference clock frequency with a fine step within a certain range, and see if the PLL/64 signal on the signal analyzer can track the reference clock frequency. Phase noise can be measured by changing the signal analyzer to phase noise mode once the PLL is locked.

![Figure 5.17: A micro picture of the PLL.](image)

The VCO control voltage is not taken out of the chip so that VCO frequency versus control voltage can’t be measured here. However, the PLL is measured with changing the VCO capacitor bank when the PLL is at lock condition. This is performed with a fixed charge-pump current and loop filter setting. The VCO frequency can be derived and plotted in Figure 5.19 along with the capacitor bank setting. Please refer to Figure 5.2 for VCO capacitor control bits setting ($C < 2 : 0 >$). The PLL at this
specific setting can be locked and operated from 10.81 GHz to 11.66 GHz.

Figure 5.20 and Figure 5.21 show the spectrum plot and the phase noise of the reference clock compared with the output signal, respectively, when the PLL locks at 83 MHz reference clock. It can be seen that the harmonics of the PLL/64 output is due to the harmonics of the reference clock. The phase noise for reference clock and PLL/64 output signal is -113 dBc/Hz and -109 dBc/Hz at 100 Hz offset frequency, respectively.
Figure 5.19: VCO frequency versus capacitor control bits (66 fF incremental) when the PLL is in the lock state.

Figure 5.20: PSD of (a) the reference clock at 83 MHz and (b) a PLL/64 output signal at 166 MHz.
The phase noise of a clock is decreased by 6 dB per divide-by-2. So the PLL output phase noise can be derived by adding 36 dB on top of the phase noise of the PLL/64 output signal (Figure 5.21 (b)). Figure 5.22 compares the measured and the simulated phase noise for PLL output and VCO at free running along with the measured reference clock phase noise. The measured VCO free running phase noise is lower than the simulated result at lower frequency range but higher at higher frequency range. The PLL loop bandwidth is designed and measured at around 500 kHz and 1 MHz, respectively. Because of the low-pass frequency characteristics of the PLL output to the reference clock, the reference noise will dominate at the low frequency band. At lower frequencies, the measured PLL output phase noise matches well with its simulated result. While, at higher frequencies, the measured PLL output phase noise went off with its simulated result due to the high-pass frequency characteristics of the PLL output to the VCO noise.
Figure 5.22: Comparison of the measured and the simulated phase noises for PLL and free-running VCO.

5.7 Summary

A type-II third-order charge pump PLL was designed and analyzed in a systematical approach. The PLL is fabricated in the IBM 130 nm SiGe BiCMOS process and tested with a prototype PCB. No LDO regulators are used for testing the PLL. So power supply noise can be further exploited and studied to improve the system noise performance. The PLL is intended to provide clock for a full-rate high-speed PRBS generator which will be covered in the next chapter.
Pseudo random bit sequence (PRBS) generators were proposed and studied for over 50 years for testing transceivers [66][67]. More recently the testing of N-level pulse amplitude modulation (PAM-N) transceivers requires multiple uncorrelated data streams [68]. Although test equipment vendors like Anritsu have pulse pattern generators (PPG) models which can provide 2-channel (MP1800A) and 4-channel (MP1775A) PRBS streams up to 32 Gb/s and 12.5 Gb/s, respectively, but their cost is prohibitive [69]. Alternatively, multi-channel PRBS can be designed and fabricated on-chip which will not only save the cost, but also avoid the signal integrity challenges when externally feeding the high-speed input signals. A survey of > 20 Gb/s PRBS generator publications in the IEEE Journal of Solid State Circuits (JSSC) over the past 10 years are compared in Table 6.1. Reference [70] adopted emitter-coupled logic (ECL) design in a SiGe bipolar process which features 200 GHz $f_T$. [71] and [72] used a 150 GHz $f_T$ SiGe BiCMOS process, while [71] was making use of the on-chip inductors.

In this chapter, a system transition matrix design method is explained and applied to design a full-rate 4-channel $2^9 - 1$ parallel PRBS generator. It can be used to test transmitters up to PAM-16 modulation format. Designed in the IBM 130 nm SiGe
Table 6.1: Comparison with recent PRBS generators published in JSSC.

<table>
<thead>
<tr>
<th>Reference</th>
<th>Single lane length</th>
<th>Bit-rate</th>
<th>S.E. $V_{pp}$</th>
<th>Technology</th>
<th>Power</th>
<th>Area</th>
<th>Test method</th>
</tr>
</thead>
<tbody>
<tr>
<td>Kanpp [70]</td>
<td>$2^7 - 1$ Half-rate</td>
<td>100 Gb/s</td>
<td>100 mV</td>
<td>SiGe bipolar $f_T = 200$ GHz</td>
<td>1.5 W</td>
<td>0.63 $\mu m^2$</td>
<td>RF probing</td>
</tr>
<tr>
<td></td>
<td>$2^{11} - 1$ Full-rate</td>
<td>54 Gb/s</td>
<td>300 mV</td>
<td>SiGe bipolar $f_T = 200$ GHz</td>
<td>1.9 W</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Dickson [71]</td>
<td>$2^{11} - 1$ Quarter-rate</td>
<td>80 Gb/s</td>
<td>430 mV</td>
<td>130 nm SiGe BiCMOS $f_T = 150$ GHz [71] used inductor</td>
<td>9.8 W</td>
<td>3.5 x 3.5 mm$^2$</td>
<td></td>
</tr>
<tr>
<td>Laskin [72]</td>
<td>$2^7 - 1$ Half-rate</td>
<td>23 Gb/s</td>
<td>125 mV</td>
<td>130 nm SiGe BiCMOS $f_T = 210$ GHz [71]</td>
<td>243 mW</td>
<td>393 x 178 $\mu m^2$</td>
<td></td>
</tr>
<tr>
<td>This work</td>
<td>$2^9 - 1$ Full-rate</td>
<td>11 Gb/s</td>
<td>450 mV</td>
<td>130 nm SiGe BiCMOS $f_T = 210$ GHz</td>
<td>1.3 W</td>
<td>1500 x 78 $\mu m^2$</td>
<td>QFN socket on PCB</td>
</tr>
</tbody>
</table>

BiCMOS process which provides hetero-junction bipolar transistors (HBT) with a maximum $f_T$ of 210 GHz, the PRBS is achievable to operate up to $> 40$ Gb/s if a 40 GHz clock is provided. However, the final implementation of this PRBS is clocked by an on-chip PLL which was covered in the previous chapter. A system block diagram of the PRBS application in a PAM-4 optical transmitter is given in Figure 6.1. Circuit-level design and simulation are presented. Finally, the design is demonstrated with prototype PCB with extensive PCB transmission line discussions.

![Figure 6.1: System block diagram of the MZM based PAM-4 transmitter using the 4-channel parallel PRBS.](image)
6.1 PRBS Principles

A linear-feedback shift register (LFSR) is maximal-length if and only if the corresponding feedback polynomial is primitive [73]. In order to save hardware and reduce capacitive load in practical implementation, primitive trinomial is preferred, which means the feedback polynomial has the form of \( p(x) = x^n + x^k + 1 \), where \( n \) is the degree of \( p(x) \) represents the number of the registers in the loop, and \( 1 \leq k < n \) is the necessary but not sufficient condition. As the hardware implementation illustrated in Figure 6.2, \( n \) and \( k \) represent the \( n^{th} \) and \( k^{th} \) registers, called taps, which are connected to the XOR gate. However, for a polynomial \( p(x) \) of degree \( n \) with coefficients in Galois finite field \( GF(2) \) to be primitive, \( k \) is not randomly chosen. It must satisfy the condition that, in \( GF(2) \), \( x^{2^n-1} + 1 \mod p(x) \) is zero. For example, for \( n = 8 \), no number exists for \( k \) to meet the condition; for \( n = 9 \), \( k \neq 6 \) and 7. Some of the possibilities for \( n \) and \( k \) are given in the table in Figure 6.2. A Python code for determining \( k \) with respect to \( n \) is given in Appendix B.1.

![Figure 6.2: An n-stage PRBS generator with possible n and k combinations (adapted from [74]).](image)

If \( u_i(j) \) is defined as the state of the \( i^{th} \) DFF at the \( j^{th} \) clock cycle, from Figure 6.2, it can be observed that only the input of the first DFF is newly generated at a period of \( 2^n - 1 \) clock cycles. Inputs for the rest of the DFFs are as just one clock delay of their previous states. So, in this case we cannot simply use any other states to form a
multi-channel PRBS since they are highly correlated to each other with short delays. Another important issue for PRBS is all zero states should be avoided. For full-rate PRBS made of CML DFFs, all the states may start from the common-mode voltage, there is a chance that all the DFFs enter into the same logical “0” state. Start-up circuit is essential to PRBS generator to avoid being trapped into all zero state.

6.2 Transition Matrix Method and Correlation

Transition matrix method can be used to analyze and understand the implementation of the multiple parallel random sequence generation [66]. In order to generate 4-channel of maximum uncorrelated length of $2^9 - 1$ PRBS, nine DFFs and four XOR gates are needed when operating at a full clock rate. In this design, $n = 9$ and $k = 5$ were chosen and the corresponding transition matrix $T$ is given by Equation 6.1. If the $9^{th}$ DFF is initialized to 1 at the start and the others to 0, the initial state of the nine DFFs is then be presented as $s(0) = \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}^T$. After $l$ clock cycles, the state of the DFFs is $s(l) = T^l s(0)$.

$$T = \begin{bmatrix}
0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1
\end{bmatrix}$$ (6.1)
The 9th DFF output $s_9(j)$ of the above PRBS is either 1 or 0. In computing the correlation of sequences, it is standard practice to rescale the outputs to -1 and 1 respectively, which is done by letting $t(j) \triangleq 2s(j) - 1$. Implementing this transition matrix results in the sequence \{t_9(j)\}=\{2s_9(j) - 1\} being uncorrelated with itself for a period of length $2^9$ [74]. This implies that for $i = 0, 1, 2, ..., 2^9 - 1$, the auto-correlation is given by Equation 6.2.

\[
\phi(i) = \frac{1}{2^9 - 1} \sum_{j=0}^{2^9-1} t_9(j)t_9(j-i) = \begin{cases} 
1, & i = 0 \\
\frac{-1}{2^{9}-1}, & i \neq 0 
\end{cases} \quad (6.2)
\]

As shown in [66], implementing $T^4$ (rather than $T$) results in two sets of four parallel IQ outputs $s_1(l)$, $s_2(l)$, $s_3(l)$, $s_4(l)$ and $s_5(l)$, $s_6(l)$, $s_7(l)$, $s_8(l)$. $s_5(l)$, $s_6(l)$, $s_7(l)$, $s_8(l)$ is simply one clock cycle delayed version of $s_1(l)$, $s_2(l)$, $s_3(l)$, $s_4(l)$. These outputs are found using $s(l) = (T^4)^l s(0)$. The corresponding hardware implementation block diagram for Equation 6.3 is shown in Figure 6.3.

\[
T^4 = \begin{bmatrix}
0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 
\end{bmatrix} \quad (6.3)
\]
Figure 6.3: Single-ended version block diagram of the full-rate 4-channel $2^9 - 1$ parallel PRBS.

The function of the PRBS generator was simulated and its outputs were processed in Matlab to perform auto-correlation and cross-correlation which are plotted in Figure 6.4 over several periods of $2^9$ length. It’s evident that the auto-correlation of a single channel and the cross-correlation between any two of the four channels have a period of $2^9$ length random pattern and they are $2^7$ spaced. Which means the 4-channel have a characteristic like differential IQ signals. Two PRBS patterns featuring $180^\circ$ phase shift is key for using a MUX to form a higher speed data rate PRBS pattern which can maintain the same pattern to the source pattern. The four outputs $S1, \ldots, S4$ in Figure 6.3 correspond to $s_5(l), s_6(l), s_7(l), s_8(l)$. The first four rows of Equation 6.3 show that the implementation requires four XOR gates. The fifth XOR gated is used as a “set” signal to ensure the PRBS can start up from an all-zero state. In normal operation the “set” signal is at logic low.
Figure 6.4: Auto-correlation and cross-correlation of the 4-channel PRBS generator (signal amplitude rescaled to -1 and 1).

6.3 Circuits Design and Simulation

Both CMOS and Hetero-junction bipolar transistors are provided in the IBM 130 nm SiGe BiCMOS process. The graded base SiGe HBTs feature accelerated drift field across the base-collector junction which reduced the base transport time, that is key to improve device speed compared to all silicon homo-junction bipolars [75]. HBTs are chosen not only for higher $f_T$, but also for higher current density and higher operating voltage ratings. Current mode logic (CML) topology is adopted for high-speed circuit design. Transistor biasing and sizing are important design aspects for CML operating at high data rates will be introduced.
6.3.1 Current Reference

Under ideal conditions, the current-mirror output current is independent of the voltage between the output and common terminals. In practice, real transistor-level current mirrors suffer many deviations from this ideal behavior. One of the most important deviations from ideality is the variation of the current-mirror output current with changes in voltage at the output terminal, which is called channel length modulation. Figure 6.5 compares the dc I-V characteristic of a self biased wide swing NMOS cascode current mirror with a simple bipolar current mirror with beta helper and emitter degeneration [27]. When they are set at unity gain for providing 1 mA current, CMOS current mirror features better dc I-V characteristic than the bipolar counterpart, while the latter consumes 90× less silicon area just for the mirror branch. With the help of B2 transistor to reduce the gain error and degeneration resistor $R_E$ to boost the output resistance, the performance of bipolar current mirror is acceptable for this application. Equation 6.4 shows that the systematic gain error from finite forward current gain $\beta_F$ has been reduced by a factor of $[\beta_F + 1]$, which is the current gain of emitter follower $B2$ [27]. The small-signal output resistance seen at the collector of $B1$ transistor is expressed in Equation 6.5. What’s more, $I_{ref}$ can be designed with digitally controllable.

$$I_{C,B1} \simeq I_{ref} \times \left(1 - \frac{2}{\beta_F (\beta_F + 1)}\right)$$  \hspace{1cm} (6.4)$$

$$R_o \simeq r_{o,B1} \times (1 + g_m R_E)$$  \hspace{1cm} (6.5)$$
Figure 6.5: Output characteristic comparison of self biased wide swing NMOS cascode current mirror and BJT current mirror with beta helper and emitter degeneration.

6.3.2 CML DFF

In bipolar designs, the peak $f_T$ current density ($J_{pf_T}$) of the device is set by sizing the emitter length ($l_E$) \cite{71}. In the IBM 130 nm SiGe BiCMOS process, $J_{pf_T}$ is about 11.9 mA/$\mu$m$^2$. And the emitter width ($w_E$) if fixed at 120 nm. The tail current of the CML can be derived with Equation 6.6 by setting the design variable of emitter length ($l_E$), where $c$ is a constant which is chosen to 1.12 for CML design since its diff-pair may not be able to fully switched. While $c$ is set to 1 for source follower.

$$I_{tail} = c \times J_{pf_T} \times w_E \times l_E$$ \hspace{1cm} (6.6)

There has two main circuit blocks used in the PRBS core previously shown in Figure 6.3, which are DFF (Figure 6.6) \cite{76} and XOR-merged DFF (Figure 6.7) \cite{77}. Be noted that there are two diff-pair outputs for each block. The QH and QHb output pair are intended to drive the top devices (A and Ab denoted as the input pair) in the XOR. However, the BJTs which connected to nodes A, Ab and B, Bb are still barely biased at the forward-active region. This issue can be fixed by reducing and
increasing the load resistor a little for the XOR-merged master latch and regular slave latch, respectively. The $V_{BE}$ of the upper BJT is $V_{T/ln2}$ smaller than the $V_{BE}$ of the lower BJT due to the collector current is halved. 3.3 V power supply is used due to it has four BJTs stacked in the XOR-merged DFFs. AC coupling is used for the clock signals for the DFFs which will be described next.

![Figure 6.6: Schematic of the BJT DFF employed in Figure 6.3.](image)

![Figure 6.7: Schematic of the XOR-merged DFF employed in Figure 6.3.](image)
6.3.3 Clock and Data Buffers

It’s impractical to drive nine DFFs with one clock buffer due to the heavy loading effects and the DFFs are at different locations on-chip. So PRBS clocking needs to be distributed to the DFFs by using multiple clock buffers. As shown in Figure 6.3, it has three clock buffers to individually drive two DFFs and another clock buffer drives another three DFFs. An emitter follower is used as a clock buffer as shown in Figure 6.8. AC coupling is used for each clock buffer so that its dc operating points are independent to the previous stage. The resistors $R_A$ and $R_B$ used for providing the base operating point $V_B$ can’t be too large especially when the base current cannot be neglected. As shown in Equation 6.7, the second term needs to be kept much less than the first term so that $V_B$ won’t drop too much. Figure 6.8 shows two bias condition examples for emitter follower and differential pair. Larger bias resistors can be used for emitter follower with current source as a load due to the merit of larger input resistance looking into the base.

$$V_B = \frac{V_{DD} \cdot R_B}{R_A + R_B} - I_B \cdot (R_A \parallel R_B) \quad (6.7)$$

The schematic of the output data buffer is shown in Figure 6.9. It has 3.3 V and 2.5 V power supplies. The CML stage is added to filter out the ringing caused by the emitter follower stage and provide extra gain. The final stage emitter follower buffer can direct drive the 50 Ω termination oscilloscope.
Figure 6.8: Schematics of (left) clock buffer employed in Figure 6.3 and (right) bias condition for DFF clock inputs.

Figure 6.9: Schematic of output buffer.

6.3.4 PRBS Startup

PRBS initially stuck at all-zero state can happen both at the simulation and real scenarios. A start-up pin was added on PCB which can be manually set to VDD or
ground. Figure 6.10 shows a simulated PRBS startup process.

![Figure 6.10: PRBS start-up process by enabling “set” signal.](image)

### 6.3.5 Creation of PAM Signaling

By combining the PRBS outputs, PAM signaling can be formed up to PAM-16 with four channels. Figure 6.11 shows the simulated results at 40 Giga baud rate for NRZ and PAM-4/8/16 signaling, respectively. However, the developed prototype for testing only has clock at around 11 GHz and one data buffer. So only NRZ signaling can be measured. The rms power consumption for the PRBS including clock distribution and bandgap reference circuits is about 1.2 W.
Figure 6.11: Simulated eye diagrams of data pattern at 40 G Baud rate for NRZ, PAM-4/8/16.

6.4 Experimental Results

The micro photograph for the fabricated die is shown in Figure 6.12. Locations for PLL, full-rate PRBS and data buffer are denoted. High-speed PCB design plays an important role for the successful prototype demonstration. In this section, potential high-speed limitations due to the selection of packaging, socket and SMA connectors, PCB transmission line design are investigated. The main required equipment for testing this prototype are high quality signal generator for PLL reference clock generation, spectrum analyzer for checking PLL lock and sampling oscilloscope to measure the eye diagram of the PRBS.
6.4.1 Packaging and Socket

Since the area of the complete die is 4 mm by 4 mm with a total pad number of 58, QFN_9X9_64A package is chosen. The corresponding electrical parasitic parameters of several sizes of QFN packages are listed in Table 6.2. A first order RC estimation with 50 Ω load indicates the QFN 64 pin package can only support a bandwidth at around 8 GHz. Ironwood electronics QFN_64 socket (part number: SG_MLFP_7008) features 30 GHz bandwidth was chosen [78]. So QFN 64 package itself is the main bandwidth limitation.

Table 6.2: QFN package electrical parasitic provided by the vendor.

<table>
<thead>
<tr>
<th>Package</th>
<th>Inductance (nH)</th>
<th>Capacitance (pF)</th>
<th>Resistance (mΩ)</th>
</tr>
</thead>
<tbody>
<tr>
<td>4 × 4*</td>
<td>0.691</td>
<td>0.251</td>
<td>32</td>
</tr>
<tr>
<td>6 × 6*</td>
<td>1.156</td>
<td>0.355</td>
<td>51</td>
</tr>
<tr>
<td>8 × 8*</td>
<td>1.470</td>
<td>0.395/0.496</td>
<td>63</td>
</tr>
<tr>
<td>9 × 9**</td>
<td>1.221/1.895</td>
<td>0.395/0.496</td>
<td>242.3/315.6</td>
</tr>
</tbody>
</table>

Note: * Simulated results at 100 MHz; ** Simulated results at 2 GHz.
6.4.2 PCB Engineering

Two 4-layer PCB versions were made for testing the chip. The first one and the second one were made of FR-4 and Rogers material as shown in Figure 6.13 and Figure 6.21, respectively. GCPW transmission lines were designed for high-speed signaling. Regular SMA and Southwest SMA [79] connectors have a bandwidth specification of 12 GHz and 27 GHz, respectively, are used for the two versions of PCB prototype. The footprint for regular SMA is smaller than the Southwest one, so that the PCB can be more compact with using regular SMA connectors.

PRBS measurement result with the first version PCB is shown on the left eye diagram in Figure 6.14. The eye diagram is almost closed. However, with the help of the embedded math function in the sampling oscilloscope, the signal can be copied and added with different delay and weighting. A 2-tap feed-forward equalization (FFE) is illustrated in Figure 6.15. As one example, the eye diagram can be opened
a little bit as shown on the right in Figure 6.14 with the FFE delay and weightings denoted. This FFE setting provides about 5.1 dB peaking at the Nyquist frequency bandwidth of the sampling clock, but has 10.46 dB attenuation at dc.

![Figure 6.14: Eye diagrams of PRBS output with the prototype FR4 PCB.]

The bad signal performance of the first version PCB is due to the lack of vias under the side arms of SMA footprints. The regular SMA connectors have relative thick and long launch pin which can further limit the high-frequency operation. A test board with a set of single-ended transmission lines is fabricated along with the second version PCB using RO4350B material with a dielectric constant ($E_{r1}$ in Table 6.3) of 3.36. This is shown in Figure 6.16, from top to bottom, it has a microstrip, a GCPW, a

![Figure 6.15: Block diagram of the FFE to process the signal.]

**Table 6.3**

<table>
<thead>
<tr>
<th>Measurement</th>
<th>Current</th>
<th>Minimum</th>
<th>Maximum</th>
<th>Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bit Rate</td>
<td>10.345 GΩ</td>
<td>10.135 GΩ</td>
<td>10.415 GΩ</td>
<td>2.531 s</td>
</tr>
</tbody>
</table>

**Table 6.4**

<table>
<thead>
<tr>
<th>Measurement</th>
<th>Current</th>
<th>Minimum</th>
<th>Maximum</th>
<th>Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>Eye Width [μm]</td>
<td>21.15 μm</td>
<td>20.38 μm</td>
<td>25.10 μm</td>
<td>268</td>
</tr>
<tr>
<td>Eye Height [μm]</td>
<td>44.0 μm</td>
<td>33.2 μm</td>
<td>44.6 μm</td>
<td>263</td>
</tr>
<tr>
<td>Bit Rate</td>
<td>10.64 GΩ</td>
<td>10.55 GΩ</td>
<td>10.64 GΩ</td>
<td>264</td>
</tr>
</tbody>
</table>
GCPW with coating and a GCPW for regular SMA connector. The total length of the transmission line is 1.368 inch. A cross-section illustration for microstrip and GCPW transmission lines are illustrated in Figure 6.17. The corresponding design parameters targeting for single-ended 50 Ω and differential 100 Ω impedance control are listed in Table 6.3. With the same $W_{1,2}$ for microstrip and GCPW transmission lines, the capacitance for GCPW will be larger due to it has side capacitance with respect to the side ground traces. But the dominant capacitance is still with respect to the bottom ground plane due to $D_1 > H_1$. From $\sqrt{L/C}$, it can be derived that the designed GCPW features less characteristic impedance than microstrip type. Transmission line design tools such as Polar Si9000 [80] and Ansys 2D Extractor transmission line toolkit [81] can be used for impedance control estimation.

Figure 6.16: Transmission line sample board made of RO4350B material. Total length of the transmission line including SMA footprints is 1.368 inch.
Table 6.3: Design parameters of single-ended and differential transmission lines made of RO4350B Rogers material.

<table>
<thead>
<tr>
<th>Type</th>
<th>$H_1$</th>
<th>$E_{r1}$</th>
<th>$W_1$</th>
<th>$W_2$</th>
<th>$S_1$</th>
<th>$G_{1,2}$</th>
<th>$D_1$</th>
<th>$T_1$</th>
</tr>
</thead>
<tbody>
<tr>
<td>SE</td>
<td>3.937 mil</td>
<td>3.36</td>
<td>7.5 mil</td>
<td>8 mil</td>
<td>NA</td>
<td>20 mil</td>
<td>9 mil</td>
<td>1.4 mil</td>
</tr>
<tr>
<td>Diff.</td>
<td>5.5 mil</td>
<td>5.8 mil</td>
<td>5.2 mil</td>
<td>20 mil</td>
<td>7.3 mil</td>
<td>1.4 mil</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

The S-parameters ($S_{11}$ and $S_{21}$) for the four transmission lines on the test board (Figure 6.16) are measured with Agilent PNA network analyzer N5225A (10 MHz-50 GHz) and compared in Figure 6.18. It proves that the lack of vias under the side ground shielding traces for regular SMA connector footprint cause big reflection coefficient ($S_{11}$) at all frequencies and deep $S_{21}$ suck-outs at lower frequencies. Enough vias is critical to make sure good connections between the top and bottom planes in order to minimize any accidental resonances. It can be observed that at 5 GHz, GCPW has more insertion loss, but $S_{11}$ is better, so that features less amplitude but better jitter performance. Transient measurement was also performed with a 500 mV_{PP} PRBS-7 pattern at 10 and 15 Gb/s, respectively. Peak-to-peak jitter and minimum eye height are recorded and compared in Table 6.4 along with a RF cable. Overall, all three types of transmission lines with the 27 GHz bandwidth Southwest SMA connectors are acceptable for high-speed signal transmission, and their signal integrity performance are very comparable.
Figure 6.18: Measured S-parameters of the sample transmission lines in Figure 6.16.

Table 6.4: Transient measurement for RF cable and three different transmission lines with 500 mV<sub>PP</sub> PRBS-7 pattern at 10 and 15 Gb/s, respectively.

<table>
<thead>
<tr>
<th>DR (Gb/s)</th>
<th>T-line type</th>
<th>Jitter&lt;sub&gt;P-P&lt;/sub&gt;</th>
<th>Eye height</th>
</tr>
</thead>
<tbody>
<tr>
<td>10/15</td>
<td>Just RF cable</td>
<td>7.6/7.7 ps</td>
<td>312/268 mV</td>
</tr>
<tr>
<td></td>
<td>Microstrip</td>
<td>9.3/10.7 ps</td>
<td>248/183 mV</td>
</tr>
<tr>
<td></td>
<td>GCPW</td>
<td>8.8/11.2 ps</td>
<td>242/180 mV</td>
</tr>
<tr>
<td></td>
<td>Coated GCPW</td>
<td>9.9/11.5 ps</td>
<td>250/189 mV</td>
</tr>
</tbody>
</table>

Time-domain reflectometry (TDR) measurements can be applied to determine characteristic impedance along the signal path. The transient TDR simulation using Ansys Circuits is performed by importing the measured S-parameters in s2p data format. With a 25 ps rise time input stimuli, the impedance of the GCPW transmission line is plotted in Figure 6.19. It can be seen that the characteristic impedance is less than 50 Ω. W<sub>1,2</sub> can be reduced a little bit to make the characteristic impedance of GCPW more close to 50 Ω.

The shorter the rise time, the more impedance discontinuities will be seen in the channel from the TDR plot. This is because a shorter rise time will be more sensitive
to those impedance changes in comparison to a longer rise time, for a longer rise time may flow right over a small impedance without noticing it. This can be attributed to the frequency content in a signal. So smaller rise times come at a price that a wider bandwidth signal contents need to be obtained. Knee frequency \( f_{knee} = 0.5/t_{rise} \) is usually used to calculate the bandwidth. The bandwidth in frequency determines how clean a time domain representation will be. For example, a signal with a 100 \( ps \) rise time needs only 5 \( GHz \) bandwidth, whereas a 10 \( ps \) rise time needs 50 \( GHz \).

![Figure 6.19: TDR Simulated characteristic impedance with 25 ps rise time for a 10 \( MHz \) to 20 \( GHz \) GCPW measured S-parameters.](image)

As shown in Figure 6.20, GCPW transmission lines with 27 \( GHz \) bandwidth Southwest connectors are used in the final prototype PCB design. The testing was assisted with a FPGA for digital control, as shown in Figure 6.21, the quality of the measured PRBS output eye diagram has been improved significantly. It has a maximum 23.4 \( ps \) peak-to-peak jitter, an eye amplitude of 453 \( mV \) when driving 50
Ω in the oscilloscope with a dc blocker when operating at 10.9 Gb/s.

Figure 6.20: Picture of the prototype Rogers PCB.

Figure 6.21: Eye diagrams of PRBS output with prototype Rogers PCB.
6.5 Summary

A full-rate 4-channel 511 length parallel PRBS generated is demonstrated operating at more than 10 Gb/s with an on-chip PLL. BJT design for high-speed circuits are discussed. High-speed PCB design is also discussed extensively. The designed BiCMOS die is supposed to be hybrid integrated to a PAM-4 traveling-wave MZM device on the silicon photonic die as shown in Figure 6.22. However, it didn’t reach this step due to some issue of testing of the modulator drivers.

Figure 6.22: System integration of electrical die and photonics die via side-by-side wire bonding (Bond wires drawn not to scale).
CHAPTER 7

CONCLUSION

This dissertation has covered, from the circuit designer’s perspective, optical device characterization, behavioral modeling and electrical circuits design. The core contents of this dissertation come from three chips designed during my PhD study: one optical chip in the IME 130 nm SOI CMOS process, one limiting receiver chip in the IBM 130 nm CMOS process, one PRBS transmitter chip in the IBM 130 nm SiGe BiCMOS process. Chapter 2 focused on the behavioral modeling for silicon photonic MZM device. The developed model was verified with Cadence Spectre simulation, the simulated result matched well with the measurement result. Chapter 3 ultilized the developed model and data from optical device characterization, link budget was analyzed at the system-level based on a segmented MZM device with voltage drivers. Optimal transmitter extinction ratio was derived according to the receiver’s sensitivity requirement. Reliability issue for the latch-based ac coupling level shifter was discussed. A NRZ/PAM-4 reconfigurable driver scheme was proposed. Chapter 4 showcased a hybrid optoelectronic limiting receiver design by using the IBM 130 nm CMOS process and an InGaAs/InP PIN photodiode device. The prototype achieved a BER of $10^{-12}$ at the sensitivity level of -3.2 dBm MZM OMA at 4 Gb/s. Conclusions derived from this case study can provide insights to guide other optoelectronic limiting receiver design and testing at higher speeds. Chapter 5 systematically presented a
type-II third-order charge pump PLL from system-level architecture to transistor-level
design. Noise simulation results matched well with the measurement results. When
operating at 10.624 GHz, the phase noise of the PLL output is $-73 \text{ dBc/Hz}$ at 100
Hz offset frequency. Chapter 6 demonstrated a full-rate 4-channel parallel PRBS-9,
being clocked by the on-chip PLL presented in Chapter 5. The chip was fabricated
in the IBM 130 nm SiGe BiCMOS process. High-speed PCB design was also covered
with experimental results.

By continuing on this work, future work can include designing the complete
transceiver in one process platform, like in more advanced but cost-effective 65 nm
or 28 nm CMOS process nodes or in SiGe BiCMOS. The full-rate multi-channel
parallel PRBS design concept can be used to implement lower speed multi-channel
parallel PRBS with serializers to create the high-speed patterns required by the
PAM-4 modulation. In this way, it can better meet the production requirement
of the integration of DSP circuits with the high-speed transmitter circuit. Topics
such as optical modulator device and its driver design trade-off, PAM-4 CDR circuit
are important blocks to the creation of the large system.

There is still a debate regarding whether silicon photonic technology can com-
pletely replace the traditional optical technologies, for example, the vertical-cavity
surface-emitting lasers (VCSELs). Regardless, research scientists are developing sil-
icon photonic quantum dot lasers in the laboratory [82]. Silicon CMOS photonic
link products, mainly the MZM based, were first shipped to the market at the
year of 2012 by Luxtera [83]. Luxtera has recently announced the 100G-PAM4
silicon photonics chipset [84]. Other companies like Macom demonstrated the first
CWDM4 laser photonic integrated circuits for 100G datacenter applications [85]. Aca-
cia Communications announced the industry’s first 400G coherent transceiver module
In terms of choosing the electrical process for high-speed hybrid optoelectronic integration, advanced CMOS processes are preferred for DSP based systems and by academic community. However, SiGe BiCMOS and InP processes are still more used by industries such as Infinera, Inphi and Macom, etc. A successful development of the optoelectronic interconnect products requires both circuit design expertise and optical design expertise. Integrated circuits design for high-speed optical communication keeps evolving because of today’s electronic social media growth and the Internet of Things (IoT) technologies adopted in all kinds of fields generate exploded amount of data which need to be transported faster than ever before in a more energy-efficient approach. More than 400G technological breakthrough is expected to come in the near future as lots of companies, research institutions and governments are investing big money and considerable manpower into the field.

To this end, the work in this dissertation has looked at several aspects of advanced modulation based optical interconnect including optical as well as electronic chips, their co-design and simulation, prototyping and test. This work will form the basis for further integration of PAM-N transceivers in finer CMOS processes and low parasitic packaging techniques.
REFERENCES


[32] “D.C. Blocks, a Trap for the Unwary When Using Long Patterns.”


[75] IBM, *Design Kit and Technology Training, BiCMOS8HP, V1210*.


[81] http://www.ansys.com/Products/Electronics/Option-ANSYS-SI.


APPENDIX A

VERILOG-A TO ENABLE OPTICAL SIMULATION

The built-in Verilog AMS definitions for natures, disciplines and constants are in the MMSIM installation hierarchy in tools/spectre/etc/ahdl. Optical power and optical phase can be custom defined in a similar fashion in a separate discipline file, with units of “W” and “rads” defined in its according nature.

Listing A.1: Custom optical discipline file

/*
Verilog-A definition of Silicon Photonics related Natures and Disciplines
$RCSfile: opticalDisciplines.vams,v
$Revision: 1.0 Date: Feb 15 2013$
$Kehan Zhu, Vishal Saxena @ Boise State University
*/
 ifdef DISCIPLINES_OPTICAL
 else
 define DISCIPLINES_OPTICAL

 // Optical Power in Watts
 nature OPower
units = "W";
access = OptPower;
abstol = 1e-9;
endnature

// Optical Phase in radians
nature OPhase
    units = "rads";
    access = OptPhase;
    ddt_nature = Angular_Velocity;
    abstol = 1e-9;
endnature

// Signal flow disciplines
discipline opticalPower
    potential OPower;
    flow Current;
enddiscipline

discipline opticalPhase
    potential OPhase;
    flow Angular_Force;
enddiscipline
Listing A.2: Optical source converts voltage to optical power and optical phase

// VerilogA for Modulators, OptSource, veriloga

'include "constants.vams"
'include "disciplines.vams"
'include "./.../../opticalDisciplines.vams"

module OptSource(VoptPower, VoptPhase, outOptPower, outOptPhase);

// voltage sources setting the power and phase
input VoptPower, VoptPhase;
output outOptPower, outOptPhase;

electrical VoptPower, VoptPhase;
opticalPower outOptPower;
opticalPhase outOptPhase;

analog begin
    if (V(VoptPower) > 0)
        OptPower(outOptPower) <+ V(VoptPower);
    else
        OptPower(outOptPower) <+ 0;
        OptPhase(outOptPhase) <+ V(VoptPhase);
end

dendmodule
APPENDIX B

DETERMINE THE PRBS FEEDBACK TAP

A Python code is used to determine the PRBS feedback tap. Python and Sympy need to be installed to run the code. The coefficients of the remainder should all be even number. Even and odd numbers are equivalent to 0 and 1, respectively, in the modulo-2 operation.

Listing B.1: Python code to determine the feedback tap for primitive trinomial

```python
from sympy import *
x = symbols('x')
n = 9
k = 4 # in GF(2), k should be chosen such that f mod p is 0
f = x**(2**n-1) + 1  # dividend
p = x**n + x**k + 1  # feedback polynomial as divisor
q, r = div(f, p, x)  # quotient and remainder
remainder = poly(r, x)
coef = remainder.coeffs()

for elem in coef:
    if elem % 2 != 0:
```


ans = 1

break

ans = 0

print ans  # The answer should be 0
APPENDIX C

FIRST AUTHOR PUBLICATIONS DURING 2013-2016


