Hi, John,
Please inform your readers that EDAC has decided to sponsor the return of "Free Monday" to DAC this year. If they want to take advantage of this "Free Monday" registration, your readers must go to:
https://reg.mpassociates.com/reglive/register.aspx?confid=95
and complete all four pages of the registration. On the THIRD page they'll find a newly added "Free Monday Exhibits" option -- they MUST check this box to get this special registration.
On the forth page they should see a web receipt with their unique bar code confirmation on it. They must print this entire page.
To enter the DAC Exhibit Hall on Monday, July 27th, the engineer must present a paper copy of his/her entire bar code page to the Advance Registration desk located in the North Lobby of Moscone Center.
See you at DAC, John!
- Bob Gardner
EDAC San Jose, CA
Tuesday, July 07, 2009
Free Monday's back on at DAC
According to an email blast by John Cooley this morning, Free Monday is back on at DAC. EDAC is sponsoring it -- which is great news. Here's the letter that Cooley sent out this morning in case you're not on Deepchip's email list:
Saturday, May 09, 2009
Setting good expectations about C/C++/SystemC synthesis
I was pleasantly surprised the other day to come across a blog entry on Cadence's website entitled: C-to-Silicon Compiler: A High Level and a Low Level Tool. Although it didn't get into a lot of detail or make many claims about where Cadence's SystemC synthesis tool is "high-level", it did acknowledge some of the areas where SystemC is "low-level" (for synthesis specifically). I found the entry honest and forthright, not to mention pretty consistent with how we've characterized SystemC.
As I noted in previous writings, most recently in February in this blog post, I've never claimed that SystemC isn't general purpose. What I've asserted is the following:
But, this all leaves an open question: what about concurrency, complex control, interfaces, and system interconnect? Why do we have to be stuck with RTL for all of that?
As I noted in previous writings, most recently in February in this blog post, I've never claimed that SystemC isn't general purpose. What I've asserted is the following:
- SystemC (for synthesis) adds value in the same areas where C/C++ adds value: for doing algorithmic blocks that can be expressed at a high-level and efficiently synthesized. We believe the scope of solutions that you can efficiently develop at a high level and efficiently synthesize is limited primarily to block level, simpler algorithms -- but there's no doubt you can develop RTL quickly for datapath centric designs, irrespective of quality of results.
- For control logic, interfaces, and system interconnect, SystemC is very RTL-like. You can describe anything, but SystemC as a language does not add significant value over RTL for these types of designs (and, in fact, may be worse than RTL as the level of abstraction is the same, but it's one step further removed from the hardware, complicating debug). Fundamentally, SystemC's synthesizable model of concurrency and communications is very much that of RTL.
That's not to say that this doesn't add value over C/C++ -- of course, it does. It gives you finer grain control (and a way to express it) when you need it -- but this is done at the RTL level. - Of course, C++ classes offer the ability to develop and substitute pre-built libraries for operators and interfaces. Forte provides an example of this in the article I referenced in my February post -- and, in this post, you can reference my criticisms of what Forte illustrated in their example.
- Complex I/O protocols (in fact, it says: "Trying to specify complex protocols at a High Level was the failure of the early High Level Synthesis tools")
- Multiple processes (and "various instances of the same hardware running concurrently")
- "Low level" communication between multiple of these concurrent processes (I presume he's alluding to managing access to shared resources)
There is plenty of room and need for a solution to deliver hardware from C/C++/SystemC -- especially if you don't need the quality of hand-coded RTL (in terms of latency, area, timing). (Particularly because Cadence allows one to replicate any aspect of a hand-coded RTL design, there's no doubt in my mind that one could replicate the QoR of a hand-coded design with their tool. The only question is how much productivity advantage you get when meeting the QoR of hand-coded design -- this will be entirely dependent on the type of design and how difficult it is to operate at a low-level of abstraction. Of course, with C/C++ solutions, you don't have the fine grained control of Cadence's solution.) And, it's refreshing to see a solution that, at least in this example, isn't claiming that it is more than it is.But while they may be good for DSP filters, FFTs,
and audio processors, these algorithmic synthesis
tools don't offer a significant advantage over
Verilog or VHDL for the bulk of gates that are
shipped today, including microcontrollers, DMA
controllers, cache and memory controllers,
bus/switch interconnects, bus interfaces,
network/link layer controllers,
sorting/queuing engines, finite state machines,
processors (whether CISC/RISC/DSP/graphics), etc.
But, this all leaves an open question: what about concurrency, complex control, interfaces, and system interconnect? Why do we have to be stuck with RTL for all of that?
Friday, May 01, 2009
Hackers delight -- A history of MIT pranks
Boston.com (the online presence of the Boston Globe) has a neat photo history of "pranks" at MIT on its website. I loved the 1982 prank where a large balloon came out of the ground near the 50 yard line in the middle of the annual Harvard-Yale football game (picture #9).
Saturday, April 11, 2009
The Magical Number Seven, Plus or Minus Two
In 1956, George Miller of Princeton University did a famous study entitled: "The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information". The conclusion of the study was that people's short term memory can keep track of about seven things plus or minus two.
This may explain why people prefer to think and write sequential software instead of parallel software. Although our brains are constructed with a parallel architecture like hardware -- they seem to prefer to think about a few things at a time. Sequential programming languages enable engineers to build a design one step at a time, enabling them to limit the number of items that need to be coordinated and managed at any one time.
Hardware is different -- it's inherently parallel, like the brain. Developing hardware forces engineers to coordinate many parallel activities -- especially where they intersect. A lot of time is spent managing shared access to resources -- and developing complex FSMs to ensure each resource is accessed by only one operation at a time. This makes hardware much more complex and harder to design.
It's only natural to want to move hardware abstractions to sequential languages like C/C++. These languages have large numbers of users -- and, because they're sequential, they are perceived to be easier to write. But there are some issues:
Enter Atomic Transactions
In order to get efficient hardware, you need to design hardware using a parallel programming language, as with Verilog, VHDL, or SystemC. But, the abstraction levels for these languages aren't allowing us to keep up with design complexity. These languages are sort of like writing software in assembly language. We need a better abstraction for concurrency in order to design hardware faster and with fewer bugs.
Here's where atomic transactions come in. They offer a much higher level of abstraction for concurrency -- while enabling explicit control over parallelism. Because they enable explicit parallel hardware design, engineers can consistently achieve efficient hardware implementations.
But, atomic transactions allow engineers to think about one operation at a time -- without having to manage all the complexities of coordinating accesses to shared resources. The concurrency abstraction of atomic transactions is essentially "one-problem-at-a-time".
Sure it would be great to move to a sequential programming language -- to keep the problem scope for the engineer to a limited number (let's say: 7 +/- 2). But, it's of little use if you can't efficiently generate efficient hardware from it.
Like sequential languages, atomic transactions keep the problem scope down to one-problem-at-a-time (staying easily within the 7 +/- 2 zone that the human mind is good and comfortable with) -- but enable the efficient implementation of hardware by staying explicitly parallel. Atomic transactions provide all the flavor, but without the calories -- that is, they provide abstraction for hardware design (by keeping the problems to one-at-a-time) while enabling efficient hardware implementation (no compromises in QoR).
This may explain why people prefer to think and write sequential software instead of parallel software. Although our brains are constructed with a parallel architecture like hardware -- they seem to prefer to think about a few things at a time. Sequential programming languages enable engineers to build a design one step at a time, enabling them to limit the number of items that need to be coordinated and managed at any one time.
Hardware is different -- it's inherently parallel, like the brain. Developing hardware forces engineers to coordinate many parallel activities -- especially where they intersect. A lot of time is spent managing shared access to resources -- and developing complex FSMs to ensure each resource is accessed by only one operation at a time. This makes hardware much more complex and harder to design.
It's only natural to want to move hardware abstractions to sequential languages like C/C++. These languages have large numbers of users -- and, because they're sequential, they are perceived to be easier to write. But there are some issues:
- Behavioral synthesis tools are good in the small (operating on smaller, simpler blocks) but not so good in the large (where there is hierarchy/modularity/data dependencies/...). So, a simple algorithm can be efficiently synthesized -- but more complex algorithms cannot compete with optimal, hand-coded implementations.
- Behavioral synthesis tools need to auto-parallelize the sequential code -- this technology has been around for years and has very clear, well understood limits. It's not good at system interconnect/composition -- and it's not good at complex control logic. From results we've seen, it's also often not very good at handling many algorithms.
Enter Atomic Transactions
In order to get efficient hardware, you need to design hardware using a parallel programming language, as with Verilog, VHDL, or SystemC. But, the abstraction levels for these languages aren't allowing us to keep up with design complexity. These languages are sort of like writing software in assembly language. We need a better abstraction for concurrency in order to design hardware faster and with fewer bugs.
Here's where atomic transactions come in. They offer a much higher level of abstraction for concurrency -- while enabling explicit control over parallelism. Because they enable explicit parallel hardware design, engineers can consistently achieve efficient hardware implementations.
But, atomic transactions allow engineers to think about one operation at a time -- without having to manage all the complexities of coordinating accesses to shared resources. The concurrency abstraction of atomic transactions is essentially "one-problem-at-a-time".
Sure it would be great to move to a sequential programming language -- to keep the problem scope for the engineer to a limited number (let's say: 7 +/- 2). But, it's of little use if you can't efficiently generate efficient hardware from it.
Like sequential languages, atomic transactions keep the problem scope down to one-problem-at-a-time (staying easily within the 7 +/- 2 zone that the human mind is good and comfortable with) -- but enable the efficient implementation of hardware by staying explicitly parallel. Atomic transactions provide all the flavor, but without the calories -- that is, they provide abstraction for hardware design (by keeping the problems to one-at-a-time) while enabling efficient hardware implementation (no compromises in QoR).
Thursday, April 02, 2009
Coverage of our April 1rst "product announcement" in Chip Design Magazine
Max wrote up/posted our announcement yesterday on Chip Design Magazine's website. It's great that Max recognizes a really big L.I.E. when he sees one.
What's a "chappesses"?
What's a "chappesses"?
Atomic Rules reacts to Deepchip posting
Shep Siegel wrote a post last night after seeing his letter posted on Deepchip. He felt that his original letter had been diluted by editorial license -- and wanted to provide his original letter as full reference.
(My previous letters have been edited as well -- I believe this is standard editorial policy.)
(My previous letters have been edited as well -- I believe this is standard editorial policy.)
Wednesday, April 01, 2009
(No fools! Unlike previous post) Bluespec users react on Deepchip
This was a long time coming -- Deepchip just posted user reactions to a previous post from last fall.
45 Billion ASIC Gates in a Box -- Whole-System Emulation
I don't know why anyone didn't think of it before -- why not just build an emulation platform that can easily fit the biggest chips, or even a complete system? Bluespec is introducing a game-changer today -- please check out the press release. :>)
Friday, March 06, 2009
C-Synthesis Benchmarks revisited -- and a proposed design for the benchmark suite
A few posts ago, I talked about a new benchmark suite being proposed for C-synthesis. I've had a chance to look at one of the benchmarks, specifically the MIPS design. Based on this benchmark, I think it's worth taking a deeper look at some of the other benchmarks.
The MIPS design is basically a small piece of C code that executes a subset of MIPS processor-like instructions -- it's basically an instruction set simulator built mostly as a large case statement on the ops code of the instruction. I'm curious what the synthesis results for this design would mean -- there is no pipeline, no microprocessor architecture, nor would one expect that a C synthesis tool would generate one from this code. Additionally, it's not a particularly complicated design. As a control based example, I'm not sure it would tell you much.
If we are going to have a benchmark suite, it needs to reflect the complexity and functionality of real designs that people would do -- and ideally be of a size that allows design teams to understand differences. A good example of a complex design suitable to benchmarking is Reed-Solomon, which has:
Hopefully, CHStone might consider Reed-Solomon in its mix. If I were looking at C synthesis, I'd take a look at this design.
The MIPS design is basically a small piece of C code that executes a subset of MIPS processor-like instructions -- it's basically an instruction set simulator built mostly as a large case statement on the ops code of the instruction. I'm curious what the synthesis results for this design would mean -- there is no pipeline, no microprocessor architecture, nor would one expect that a C synthesis tool would generate one from this code. Additionally, it's not a particularly complicated design. As a control based example, I'm not sure it would tell you much.
If we are going to have a benchmark suite, it needs to reflect the complexity and functionality of real designs that people would do -- and ideally be of a size that allows design teams to understand differences. A good example of a complex design suitable to benchmarking is Reed-Solomon, which has:
- Complexity
- Decent size
- Practical use -- it's a design that someone might really need to implement
Hopefully, CHStone might consider Reed-Solomon in its mix. If I were looking at C synthesis, I'd take a look at this design.
Monday, March 02, 2009
Subscribe to:
Posts (Atom)