Saturday, April 11, 2009

The Magical Number Seven, Plus or Minus Two

In 1956, George Miller of Princeton University did a famous study entitled: "The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information". The conclusion of the study was that people's short term memory can keep track of about seven things plus or minus two.

This may explain why people prefer to think and write sequential software instead of parallel software. Although our brains are constructed with a parallel architecture like hardware -- they seem to prefer to think about a few things at a time. Sequential programming languages enable engineers to build a design one step at a time, enabling them to limit the number of items that need to be coordinated and managed at any one time.

Hardware is different -- it's inherently parallel, like the brain. Developing hardware forces engineers to coordinate many parallel activities -- especially where they intersect. A lot of time is spent managing shared access to resources -- and developing complex FSMs to ensure each resource is accessed by only one operation at a time. This makes hardware much more complex and harder to design.

It's only natural to want to move hardware abstractions to sequential languages like C/C++. These languages have large numbers of users -- and, because they're sequential, they are perceived to be easier to write. But there are some issues:
  • Behavioral synthesis tools are good in the small (operating on smaller, simpler blocks) but not so good in the large (where there is hierarchy/modularity/data dependencies/...). So, a simple algorithm can be efficiently synthesized -- but more complex algorithms cannot compete with optimal, hand-coded implementations.
  • Behavioral synthesis tools need to auto-parallelize the sequential code -- this technology has been around for years and has very clear, well understood limits. It's not good at system interconnect/composition -- and it's not good at complex control logic. From results we've seen, it's also often not very good at handling many algorithms.
Since C/C++ can't handle concurrency except where auto-parallelization tools can effectively discern it, SystemC was introduced. While SystemC adds concurrency so anything can be expressed, it adds little over SystemVerilog for the abstraction of control logic and system interconnect. As an abstraction, it's basically RTL -- except for the same places where C/C++ provide abstractions --> for algorithms that can be described with tightly nested FOR loops. Sure, you can write in a sequential software style, but you can't synthesize efficient hardware from this style -- and writing complex hardware like memory/DMA/... controllers is just as hard as with RTL.

Enter Atomic Transactions

In order to get efficient hardware, you need to design hardware using a parallel programming language, as with Verilog, VHDL, or SystemC. But, the abstraction levels for these languages aren't allowing us to keep up with design complexity. These languages are sort of like writing software in assembly language. We need a better abstraction for concurrency in order to design hardware faster and with fewer bugs.

Here's where atomic transactions come in. They offer a much higher level of abstraction for concurrency -- while enabling explicit control over parallelism. Because they enable explicit parallel hardware design, engineers can consistently achieve efficient hardware implementations.

But, atomic transactions allow engineers to think about one operation at a time -- without having to manage all the complexities of coordinating accesses to shared resources. The concurrency abstraction of atomic transactions is essentially "one-problem-at-a-time".

Sure it would be great to move to a sequential programming language -- to keep the problem scope for the engineer to a limited number (let's say: 7 +/- 2). But, it's of little use if you can't efficiently generate efficient hardware from it.

Like sequential languages, atomic transactions keep the problem scope down to one-problem-at-a-time (staying easily within the 7 +/- 2 zone that the human mind is good and comfortable with) -- but enable the efficient implementation of hardware by staying explicitly parallel. Atomic transactions provide all the flavor, but without the calories -- that is, they provide abstraction for hardware design (by keeping the problems to one-at-a-time) while enabling efficient hardware implementation (no compromises in QoR).

0 comments:

Post a Comment