Democratizing domain-specific computing

Media: article
Author: Chi Y., Qiao W., Sohrabizadeh A., Wang J., Cong J.
Review published as: CR147606 (2308-0108)
ISBN: Communications of the ACM 66(1): 74-85
Edited by: Association for Computing Machinery (ACM)

As computer professionals, we mostly envision computers as general-purpose tools by default. Over the past decades, Moore’s law and Dennard scaling have, year after year, given us consistently better “toys”: faster computers, larger storage spaces. With these tools, computer science has changed the face of humankind. However, most computing professionals focus on building software. Can software developers work to produce domain-specific accelerators (DSAs), that is, purpose-built computers (application-specific integrated circuits, ASICs) that, at the cost of losing generality, can deliver much better performance and energy efficiency than general-purpose chips?

The authors’ main focus is to present AutoDSE, a design-space exploration (DSE) tool. But in order to present it some more background is needed.

Of course, the authors start by acknowledging that designing ASICs can be prohibitively expensive for most use cases, in no small part due to the cost of chip manufacturing. They focus, however, on field-programmable gate arrays (FPGAs), a special kind of chip that can be reconfigured to perform as DSAs. Although FPGAs are not as fast as manufactured ASICs, the authors argue that they can bring many of the advantages of purpose-specific computing, accelerating workloads tens or hundreds of times compared to using regular central processing units (CPUs) to tackle them.

The authors acknowledge the first hurdle for programmers used to dealing with regular programming: the completely unfamiliar developing environment, based on hardware description languages such as Verilog or VHDL, which are very different from the languages they use for the bulk of their day-to-day work. Different FPGA vendors offer high-level synthesis (HLS) tools so that FPGAs can be programmed using a more familiar C/C++/OpenCL program, annotated with “pragmas” to specify the compiler to use for specific parallelization, pipelining, buffering, and others. The correct use of such pragmas results in the same code being compiled to an FPGA able to solve a problem 108x slower than a CPU to being 89x faster than it. However, identifying the correct patterns and the right optimization level to use for each is very hard to get right, even for experts.

Most of the article explains, in great detail, specific issues often taken for granted when writing code in a traditional setting, as well as some challenges unique to FPGA programming. After showing many clear and frankly easy-to-understand examples, the authors present the above-mentioned DSE tool, AutoDSE, for the automatic generation of pragmas for the Merlin compiler. AutoDSE focuses on automatically detecting and applying the correct optimization points and values; they mention AutoDSE can achieve similarly performing results “while using 26.38x fewer optimization pragmas for the same input C programs.”

Given the current status of computing, it is quite likely we will no longer get the “free lunch” that Moore’s law and Dennard scaling brought to our field, and in order to continue to improve the speed of computation of problems of ever-increasing magnitude and complexity, programmers will need to find new techniques to aid their processing. FPGAs certainly pose an attractive way forward. Tools such as the DSE presented in this article will likely help developers tackle the difficult task of translating software into hardware.