The PCI bus was first introduced by Intel in 1992, and PCI-X is a bus standard created by IBM, Hewlett-Packard and Compaq in 1998. Both PCI and PCI-X are half-duplex, and all devices share a single bus. The PCI-SIG organization launched PCI-Express in 2003, which is a PCI bus serial, and uses node transmission, and each device can use the bus independently.
Despite the success of PCI-Express, we were stuck with the relatively slow PCI-Express 2.0 for over 3 years. After that, I was stuck with PCI-Express 3.0 for another 7 years. When multi-core processors were entering a high-speed development phase with increasing core counts, and when Ethernet networks began to return to Moore’s Law’s 2-year cycle of bandwidth improvement, the PCI-Express bus was bogged down with no support for the evolving This heterogeneous computing provides sufficient bandwidth, and the PCI-Express bus has become a new bottleneck.
The bus conditions inside the system will get better in 2019. The PCI-Express 4.0 bus, introduced in 2017, first appeared in IBM’s Power9-based Power Systems machines later that year, and is now making its way to X86 and Arm processors. With this update, peripherals from PCI-Express controllers hanging on the CPU package can use 16 (x16) 16Gb/sec lanes (before encoding) to deliver 31.5GB/sec of bandwidth (after removing error correction encoding overhead). This x16 slot is typically used to drive high-end GPU or FPGA accelerators and is the highest point of PCI-Express peripheral bandwidth. In any case, this PCI-Express 4.0 x16 slot has almost eight times the bandwidth of the original PCI-Express 1.0 specification that was available a decade and a half ago.
On the PCI-Express 5.0 specification, PCI-SIG has again doubled the raw lane speed to 64Gb/s, which will allow the x16 link to have slightly more bandwidth than 126Gb per lane over a duplex serial link /s. Therefore, special thanks go to physicists and chemists for proposing new materials used in PHY communication circuits in interconnects.
Like the latest generation of Ethernet switch ASICs and the standards they are based on, the PCI-Express 6.0 standard will move to pulse amplitude modulation encoding (PAM4 encoding to be precise) to cram twice as many bits into the signal. So far this has been done using conventional bit encoding. The upcoming PCI-Express 6.0 specification will also add a low-latency forward error correction layer to the protocol that most interconnects must add as bandwidth increases and the chance of packet loss increases. FEC adds a bit of latency to all interconnect protocols, and it’s unclear how this will affect PCI-Express. The PCI-SIG introduced the original PCI-Express 6.0 specification in June, and released version 0.3 of the specification in October. And it’s ready to be released to hardware makers sometime in 2021 as planned. This could mean that PCI-Express 6.0 devices could hit the market between the second half of 2021 and the first half of 2022.
With numerous multi-chip and heterogeneous architectures that rely on PCI-Express transport, and layer other protocols on top of PCI-Express hardware, such as CXL, CAPI, CCIX, and Gen-Z, we think NVLink has a good chance of even OpenCAPI can both be fused with some of these protocols, such as CXL, to create a single protocol for linking compute and accelerators to each other over PCI-Express, and as we’ve seen, steady bandwidth growth is critical to system architecture.
Here’s an interesting chart that PCI-SIG has compiled historical bandwidth trends for the PCI, PCI-X, and PCI-Express buses over time since 1992. Unlike the numbers we discussed above, and the numbers we’ll plot in the chart below, the PCI-SIG chart adds bidirectional bandwidth at the same time in a full-duplex link (x16 link in the case of PCI-Express). .
The lighter line shows what the bandwidth on the peripheral bus would be over time if it doubled like the clock frequency every three years. Clearly, the transition from PCI-Express to a serial duplex channel architecture has greatly increased bandwidth beyond expectations. Importantly, by adding point-to-point links, there is no shared bus like PCI and PCI-X. Whereas for PCI and PCI-X buses, when you add a second card on the bus, it consumes some bandwidth, and adding many cards to the system causes a lot of overhead. Therefore, the relative performance of PCI and PCI-X devices may be low in heavily configured systems.
The 36-month performance doubling cadence is significant because it’s more or less the external advantage of Moore’s Law that provides at least some processor vendors. Intel and IBM are currently at a three-year pace, a far cry from the 18 months that Intel co-founder Gordon Moore originally described or even the modification he proposed in the 1980s – his stretch to halve transistor costs to about 24 months.
The good news is that the pace of this acceleration is picking up, the pace of CPUs is slowing, and the pace of networking is picking up. As a result, it is possible to bring some components of the system back into balance and develop true hybrid architectures in the next few years, as they do not require proprietary interconnects to enable rapid communication between components.
chart:
The graph above shows the relative performance of servers since the advent of the PCI-x protocol in 1998, roughly matching the bandwidth of Ethernet switch ports over the same period and the actual PCI bus bandwidth over the same period.
Our relative server volume figure based on two-socket servers based on X86 server chips is a very rough number because the rest of the market is essentially noise in the data, and it takes into account incremental instructions per core (IPC) and The number of cores per socket for long-term comparative performance.
The last time these three vectors saw a sharp rise at the same time was in 2010, after which both PCI-Express and Ethernet took a breather at their respective PCI-Express 3.0 and 40Gb/sec levels and paused for a while . Even if it takes 18 months, 24 months or now closer to 36 months to double the transistor count and cut the cost per transistor in half, the annual refresh rate of processors is still more or less limited by IPC and The effect of process adjustments. We may be optimistic about how Ethernet will jump from 100Gb/sec to 200Gb/sec and 400Gb/sec to 800Gb/sec in the next few years, but this is the roadmap of the major commercial chip vendors and their Convert customers.
It looks like PCI-Express will never keep up with CPU performance improvements, but at least for the next few years it will.
The Links: QM30DY-H ZJ080NA-08A