Data Driven Development – Part 2

Generating SBOCV/AOCV derate tables for a cell involves calculating a derate value for a number of load/slew combinations.

In the graph below we’ve calculated the derate for 4 cells at various load/slew points. The colors are darkest approaching 1 and brighter as the derate value moves away from 1. The + symbols show the library’s load/slew points.

AOCV Heat Map

Besides being fast, the latest version of Amber Path FX supports saving all derate calculations and results to a database.  You can now change the load/slew selection criteria and generate new derate tables in seconds.  We are adding the ability to mine this data and perform variance checks to ensure your design is operating as you expect. We are just now learning how useful this information might be – not only to improve timing results but also to make optimization choices.

The other thing that these graphs show is how unsafe using a few cherry-picked load/slew points can be.  That’s the methodology some have suggested in order to make generating AOCV tables more palatable with other SPICE tools.

With Amber Path FX you don’t have to make those choices up front. You can generate an SBOCV database with a large range of load/slew points (the full set from the library or based on your design rules) and then refine based on how the cells are actually used.

We’ll have a lot more to say about this at the TSMC Technology Symposium in a few weeks but if you have questions now feel free to get in touch.

Data Driven Development

We’ve been taking a very close look at the data going in to and out of our software.  Some times the data validates our original assumptions. Other times the data takes us in new and interesting directions.  Here’s a picture that we found very enlightening

heatmap

I’ll explain what this is in another post but here is a hint. It has to do with our work with SBOCV/AOCV.

A 100x speedup in SBOCV / AOCV table generation

Some times big performance breakthroughs require taking a step back and reexamining a problem.  We’ve done just that for SBOCV table generation and are really pleased with the outcome.  (Stage based on chip variation is also sometimes called AOCV.) We can now generate tables  for all the combinatorial cells in TSMC 40LP library in just under 4 hours! That’s 100 times faster than our June release.

We’ve been working closely with TSMC for a while now to provide a very fast solution for SBOCV table generation.  Earlier this year, TSMC told us it took them 3 weeks to create  tables for 30 simple cells using a commercial version of SPICE.  That wasn’t 1 copy of SPICE – it was 120. The looming problem is that some of the more complex cells in the TSMC libraries have as many transistors as the 30 simple cells combined.  As the complexity and transistor count of cells increases so does the simulation time.

Clearly, TSMC and our other customers want to generate tables for more than a small subset of the cells in their libraries. When compared to SPICE, Amber Path FX is blazingly fast.  However, our own testing and feedback from TSMC was that finishing a whole library with Amber Path FX was still taking too long.

So how did we make table generation 100x faster than our June release?  We’ll we broke the problem in to two different questions. How can we make our fast transistor simulation even faster? And how can we take advantage of the commodity hardware we have at our disposal?

The answer to the first question was mainly solved with good old fashioned software engineering – profiling, testing, optimization, and plenty of trial and error.  Throw in some heuristics for good measure and we were able to reduce the simulation time for complex cells by an order of magnitude. That’s a tremendous improvement but still not enough to get the kind of turnaround times we were looking for. The benefit of this work is that we now have a great set of data to continue to improve performance in the coming months.

The answer to the 2nd question seems obvious now but it took a little while for us to realize how well it would turn out.  Amber Path FX was already multi-threaded.  We have a bunch of 4 and 8 CPU machines in our server farm and a few 16 CPU machines.  The software scales nicely on a single machine.  But what would happen if we could get more than 1 machine working on the problem at a time?  It turns out that making the software multi-threaded to start with had laid a good foundation for distributed computing as well.

We turned the distributed version of Amber Path FX SBOCV table generator loose over the weekend using 15 of our 8 CPU machines (120 total CPUS).  The results: total run time of 3 hours, 50 mins, and 7 seconds for 852 cells.

We think that Amber Path FX is going to be a great answer to the “Where doe we get AOCV tables?” question.  If you are seriously looking at adding SBOCV/AOCV to your static timing flow we’d love to share our results with you.

System Level Design for Variability

Ed Sperling at the SLD blog wrote an interesting article a while ago on the challenges facing designers at 32 and 28 nm. In the article he lamented that the big EDA companies are leaving their customers to do the hard work of designing with variability in mind.

He is right that the big EDA companies seem to be mostly watching from the sidelines.  It is the small EDA companies that are pushing forward with innovations around variation.  Ed asserts that it’s “difficult stuff”.  We agree; especially if you are trying to deliver practical solutions.

For example, we have worked closely with TSMC for more than 2 years building a high accuracy static timing tool based upon transistor level models and a very novel, high performance approach to statistical calculations.  Think accuracy of Monte Carlo SPICE but with hundreds of thousands of paths per hour.

The result of that collaboration - Amber Path FX – was just announced this spring.

report_timing -with_spice_accuracy -and_statistical -fast

Wouldn’t it be nice if your static timing tool had the ability to give you results with the accuracy of SPICE, handled variation, and still ran really fast?

We’ve just introduced something that does just that.  Amber Path FX is a new take on statistical STA.  It’s the first practical tool for getting SPICE accurate timing results at 40nm and below.  It’s not quite report_timing -with_spice_accuracy -and_statistical -fast, but it’s close.

What did we do differently than everyone else?

Well, first, we decided that rather than trying to replace your static timing environment we’d simply enhance it.  You still run PrimeTime (or whatever) to find your critical paths.

Then you source a simple Tcl script and run an extra command – still within your STA flow.  The new command passes the critical paths to Amber Path FX for a detailed analysis.  Once the analysis is complete you get SDF and timing reports that reflect the new, more accurate delays.

Second, we use a fast transistor level model of the cells used in your design and do a novel statistical analysis on each path. This is like running Monte Carlo SPICE on all your critical paths without the hassle of setting up the SPICE simulations.

Finally, we do all of this using our fast, multi-threaded, static timing engine, so that we can handle hundreds of thousands of paths per hour. Monte Carlo SPICE is accurate but it isn’t practical for more than a handful of critical paths.  With Amber Path FX you can pick out thousands or even hundreds of thousands of paths to look at.

We invite you to check out more of Amber Path FX at clkda.com and download a free evaluation.

Follow

Get every new post delivered to your Inbox.