- Ushering in Korea’s status to be the world’s fifth supercomputer manufacturer, surpassing other accelerators of the same class
- Double-precision acceleration in contrast to NPU, enabling the market for high-performance domain-specific accelerators
A group of South Korean researchers has developed a floating point accelerator chip, which is a core technology for supercomputers. Developed for the first time domestically, the accelerator chip plays a crucial role to speed up the calculations of supercomputers. After its commercialization, this technology will be a game-changer for Korea to become the world’s fifth nation to manufacture supercomputers.
Electronics and Telecommunications Research Institute (ETRI) announced that it has developed an accelerator system-on-chip (SoC) named ‘K-AB211)’. The accelerator chip developed by the researchers measures 77mm x 67mm, and is fabricated through the 12-nanometer process.
The newly developed accelerator for supercomputers is integrated and embedded with a general purpose processor and a 64-bit parallel computing unit and is capable of 8 teraflops (TFLOPS) in performance2) for parallel processing of double-precision floating-point (FP64)3) computations. One 3U size4) computational node can hold up to two accelerator chips, including a main processing unit and a liquid cooling system.
In November, ETRI showcased the computational node with the accelerator chip at SuperComputing24, the world’s largest supercomputing technology exhibition, held in Atlanta, USA. At this exhibition, ETRI demonstrated the capabilities and operations of the core functions of K-AB21. ETRI plans to undertake a system level verification of the high performance computing server based on K-AB21, integrated with parallel processing SW stacks around the first half of next year.
Currently, only four countries are capable of supercomputer development and manufacturing, which are the U.S., China, Japan and the E.U. (France). Each country is focused on introducing a general-purpose accelerator to improve performance.
However, general-purpose accelerators have a tendency of focusing more on lower-precision computation for AI, making them less effective for traditional supercomputer applications requiring high-precision computation. Also, the neural network processing unit (NPU), which is an accelerator for AI inference, only supports low-precision computation, and thus is not suitable for high-end engineering simulations or large-scale scientific researches.
1) K-AB21: Code name (proper noun) for a system-on-chip (SoC) with ultra-parallel acceleration function developed by ETRI
2) Teraflop (TFLOPS, Tera floating point operations per second): This is a metric for the number of computations performed per second. A teraflop means one trillion computations per second/Floating point operations per second (FLOPS), a teraflop means 12 powers of 10, which is one trillion. A teraflop means one trillion computations per second.
3) Double-precision floating point: Floating point numbers are one of the ways to represent real numbers. In contrast to fixed-point numbers, floating point numbers attributes its name to moving decimal places. Floating-point numbers can be divided into half-precision, single-precision and double-precision data types according to the number of bits used for the number representation (IEEE754 standard). The more bits are used, the higher the precision becomes. Double precision FP64: 64-bit = 8 bytes
4) 3U size: This is a unit indicating the height of a standard rack unit, where 1U is 1.75 inches. 3U size refers to a standard rack unit that is 5.25 inches high.
ETRI researchers have developed a few core technologies for accelerating the traditional supercomputer applications, which includes a proprietary supercomputer accelerator chip (SoC), software and computing node. This accelerator is a massive-parallel processor, containing nearly 10 billion transistors, making it the largest of its kind developed in Korea.
The chip includes a ▲high-performance core, ▲over 4,000 parallel floating point operators and ▲ultra high-speed interfaces, such as DDR5 and PCIe GEN51). The software consists of a compiler, runtime and device driver.
Given the accelerator market’s diversification into technology-specific areas (GPGPU2), TPU3), NPU4), IPU5), etc.), the researchers have expected that ETRI can strengthen technology base and local industry, while pursuing global market penetration through this accelerator development.
ETRI Senior Vice President Il Yeon Cho of the Artificial Intelligence Computing Research Laboratory explained, “This development is to achieve top-class accelerator manufactured with 12-nanometer process. From the chip to the system, this invaluable outcome achieved with the researchers’ efforts will help establish and revitalize the high-performance computing ecosystem in Korea.”
ETRI Research Fellow Woojong Han of the Supercomputing System Research Section, who led the project, said, “In the accelerator market dominated by the global big tech companies, we will strengthen the technology sovereignty in the field of high-performance computing (HPC). For high-performance computers, we have been entirely dependent on foreign technologies. I hope this achievement will provide a firm stepping-stone for developing supercomputers with home-grown technologies.”
In this project, the team of researchers produced 29 domestic and international patent applications, 15 SCI papers and three technology transfers to the industry.
Once the industry commercializes this technology subject to verification, the researchers expect it can target the specific application domain of large-scale high-performance computing systems by customizing the scale and price of the system for the specific customer.
The researchers plan to transfer the technology, based on demand, to not only the supercomputer system makers, but also the wider industry of high performance data center, system integrators (SIs) and liquid cooling systems, as well as related areas, such as self-driving vehicles, intelligent robots, edge servers and cloud service AI training.
1) DDR5, PCIe GEN5: DDR5 is the latest DRAM standard for the main memory used in products. It is defined with transfer rates of up to 8800 MT/s, with 4800 MT/s being widely adopted. / PCIe Gen5 is the fifth-generation product of the PCI express standard, offering the speed of 32 GT/s per pin.
2) GPGPU (General Purpose Graphic Processing Unit): A universal accelerator with multi-purpose parallel calculator for AI, graphics, and HPC processing
3) TPU (Tensor Processing Unit): An accelerator specialized to AI training and inference as a tensor parallel calculator
4) NPU (Neural Processing Unit): An accelerator specialized to AI inference as a low-precision parallel calculator focusing on the neural network operations
5) IPU (Infra Processing Unit): An accelerator specialized to data processing acceleration
With the follow-on R&D projects, they also anticipate advancing the technologies and strengthening the industry ecosystem to develop high-performance computing systems.
ETRI claims that this achievement was attributed to its expertise and system software development capabilities learned from the previous project called MAHA, a supercomputer for genomic analysis.
Over the years, the government has supported national R&D to develop the core technologies for large-scale high-performance computing and strengthen domestic capabilities in this area. Such steady government support will pave the way for Korea to become the fifth country in the world capable of developing own supercomputers. To help researchers thrive on the global stage, the government pursue the policy to actively support the export of technology with tailored strategy for individual markets.
For supercomputers, which is an essential infrastructure for the advanced industrial development and science and technology, Korea has been reliant on foreign products. However, this breakthrough will establish a firm base for technology sovereignty in high-performance computing strengthen the industry ecosystem, and foster local talent.
This achievement comes from the “Supercomputer Computing Node Development based on Massively parallel Processor” project funded by the Ministry of Science and ICT and the National Research Foundation of Korea (NRF). Under the supervision of ETRI as the leading institution, the Korea Institute of Science and Technology Information led the software development, and roughly ten university laboratories and two domestic enterprises have collaborated for the R&D.
Woojong Han, Research Fellow
Supercomputing System Research Section
(Tel. 82-42-860-6670, woojong.han@etri.re.kr)