Wednesday, February 11, 2009

Abstraction and Control-Dominated Hardware Designs

I've been saying for awhile that SystemC is not a significantly improved solution for control-oriented designs, at least when it comes to designs targeted for synthesis into RTL (I'm not addressing other uses of SystemC in this post, like simulation-based testbenches, for example). I did a quick look and found this letter I had written to Deepchip covering this subject -- I'm sure I could find many more. And to be clear, as it's been intimated otherwise, I've never claimed that SystemC could not be used for control logic. Anything that you can code in RTL, you should be able to code in SystemC.

My point has been consistent and straightforward: SystemC offers little benefit over Verilog/VHDL/SystemVerilog for the expression of control logic (and, probably, is even a step backward for complex control designs; with SystemC, the description will necessarily be just as low-level and manual, but it will add an additional translation step away from RTL).

But, in this week's EETimes, Gabe Moretti quotes Gary Smith as saying that "Now we have ESL synthesizers from Forte Design Systems and Cadence that target both the Algorithmic and Control logic domains. Because of that we are moving away from the three domains (algorithmic, processor/memory, and control logic) view of ESL to a more traditional look at the methodology. The ESL methodology is indeed maturing."

I'd love to understand how things have matured in the control area -- as I'm not sure that much has beyond the messaging. Another piece this week might provide some clues:

John Sanguinetti of Forte has a contributed article for EDADesignLine called Abstraction and Control-Dominated Hardware Designs. Anyone considering SystemC for control logic should take a critical look at the example provided, particularly as it's the only example supporting the thesis. A cursory look at the example would indicate that there's a good code savings with the SystemC implementation -- a closer look highlights that this is a very special example, and one that doesn't particularly support the notion that SystemC provides unique abstraction for control logic. Here are some things I noticed:
  • The code illustrated is a basic sequential set of steps, with no flow control, no conflicts with shared resources, no conflicts with other FSMs in the system -- sure, it's "control" code, but it's hardly representative of what makes control complex.
  • The article mentions that this is illustrating a cooperating FSM (because it "cooperates" with a memory, I guess). Well, this is a pretty special (not to mention convenient) case of a "cooperating" FSM: the interactions are deterministic and stepwise and mirror images of each other. This is hardly an illustration of cooperating FSMs that have any usefully interesting interactions.
  • Isn't this showing abstraction? It definitely is -- and, in general, it's powerful to be able to hide code in a library and reuse it. It's especially elegant to overload operators as illustrated in this example. In SystemVerilog RTL, you wouldn't be able to do this type of overloading (I don't believe) and have it synthesize. That said, I believe you could do almost the exact same thing -- and provide almost the identical succinctness -- by using SystemVerilog tasks (I'm not an expert at SystemVerilog RTL -- so I'm not sure I'm using the right terminology). Basically, the point is that there's nothing about RTL that precludes this type of abstraction -- and I believe you could closely mimick it. Does that make SystemVerilog ESL?
  • The code that's "eliminated" in this example has to be written elsewhere in a class library -- so, for a single use, there's no code succinctness
  • What does the code in the library look like? Is it much better than what's illustrated in the RTL in this example? Or, is it RTL-like SystemC code put away in a library so that you write it once instead of at every instance?
  • What if the memory subsystem had different behaviors every time? For example, it completed transactions in differing numbers of cycles. What would that code look like?
  • How flexible is this code -- and how prone to error is it for an engineer to use, especially now that it's been hidden away? For example, in this design, it's assumed that only one process accesses the memory at any one time -- who guarantees this and how? Is it only usable in situations where you know it's the only thing accessing memory -- or accessing it "at that time"? What's required to use it if you need to worry about other processes potentially accessing the memory at the same time? So, if multiple processes try to simultaneously access the memory, what would happen? (I presume you'd get a bug, unless the library accounts for this (see the next point))
  • What would the code look like in the library if it had to account for multiple processes needing to access the single memory resource at the same time? Would the synthesis work for this particular case where the memory access is abstracted with []?
A code abstraction like this example looks good -- but isn't really illustrative of the kind of control logic that really needs to get abstracted. Complex control designs have cooperating FSMs with shared resource contentions, flow control considerations and more. Expressing this type of control (which is seen in most control-based design such as processors/controllers; DMA controllers; memory controllers; communications/networking IP; bus interfaces; system interconnects; etc.) is hard whether you do it in RTL or SystemC. It would be nice to know how SystemC looks and handles these types of issues -- and how it's better than RTL. A couple years ago, I put forth a challenge to see what SystemC looked like for these types of designs -- and published very transparent examples of how and why Bluespec dramatically improves these types of designs. What does SystemC look like for these types of designs? Why don't the behavioral synthesis vendors provide examples of these types of designs?

These more complex interactions are one important area where atomic transactions offer a profoundly better abstraction than RTL, making complex concurrency dramatically simpler to express, easier to change, and much more. Abstraction for control is about addressing and improving shared resource management, arbitration, scheduling, flow control, etc.

I'm not saying that SystemC can't be used to express control logic -- and I'm not suggesting that SystemC doesn't provide benefits over C/C++. It does -- and I'm sure there are many times where a dataflow implementation benefits from finer grained control and concurrency expressiveness. But, these aren't the fundamental questions.

The fundamental question is: what types of designs benefit from synthesizable design with SystemC versus Verilog/VHDL/SystemVerilog? And, how do they benefit? Abstraction buys you little if the quality of the results are not acceptable. Abstraction also buys you little if it improves the 20%, not the 80%. SystemC buys you little when there's little abstraction.

Thursday, February 05, 2009

A benchmark suite for C-based synthesis

An interesting development in the C-based behavioral synthesis space is the development of a benchmark suite. CHStone is a proposed suite for providing a common platform of benchmarks upon which to compare high-level synthesis tools. It's an interesting idea -- as it might be a way for potential customers to assess, compare and contrast different tools, without having to devote as much work. As well, if the suite is really representative, realistic and broad enough in capability, then it will allow customers to understand how the tools might perform in different applications.

At a minimum, it would be another datapoint prospective customers could use. Wouldn't it be interesting if vendors published source code, any support files required by the tool, generated Verilog RTL, and synthesis results (for popular silicon targets, e.g. TSMC 65 nm, Xilinx, Altera, ...) for each design in the benchmark suite?

People using published results should be thinking about an array of considerations:

* What's the source code like and ancillary files, if any

* How usable is the RTL (debug/ECOs/...)

* Can end-users recreate the results

* In the microprocessor space, compilers and processors were tuned to address the benchmark suites. For example, there were compilers that recognized that the Dhyrstone source and spit out a pre-designed, hand-optimized assembly code. Other tricks like optimizations that are narrowly tailored such that the benchmark(s) benefit but other designs almost never do. It's likely that games will be played over time if people start using these designs -- so, end-users will have to change the sources in different dimensions to test for this (features/naming/loop structure/....). And, this may diminish the value of this suite over time -- so new suites will need to get developed.

* I'm sure there are some others... (I've got to run -- I'll probably add some more when I've had time to digest.)

I don't have access to the IEEE paper that the developers of the CHStone suite wrote -- I'd love to read it to understand more. I think it's an interesting idea, as long as it doesn't get gamed. I'm curious about how good the proposed designs are at really pushing the tools -- and pushing them in different directions. Thoughts?