Wednesday, January 6, 2010

The "REAL" Story behind Fermi & Oak Ridge, Another View, Different Side



Which is true? Time would tell
Bushes' legacy threatens progress?
source: brightsideofnews.com

Recently, a rumor appeared that Oak Ridge National Laboratories shelved their nVidia Fermi-based 10PFLOPS+ super-cluster due to Fermi's power consumption. We decided to find what's really going on and following two weeks of investigation, we managed to do just that.

The rumor that Oak Ridge cancelled their GPU-based supercomputer wasn't made from thin air. Even though nVidia and ORNL moved quickly to dispel the rumor, there is a situation going on, but the situation is much more complex and the current situation is not exactly project-threatening.

First of all, the part of the rumor which pegs the power consumption of NV100-class architecture as the reason for rumored cancelation is absolute nonsense and anyone who thinks that should get their head examined. Upcoming Tesla C2050, C2070 and more importantly, the S2070 4GPU-server [the base for the 10+ PFLOPS rig in ORNL] uses roughly the same amount of power as the present generation of Tesla products. Even the 6GB GDDR5-memory carrying Tesla C2070 consumes 190 Watts, in the range of Tesla C1060 [65nm - 170W, 55nm - 160W]. nVidia explained to us that custom-built GDDR5 ECC SDRAM consumes less power than currently-used GDDR3 memory and that allows for 6GB GDDR5 to replace 4GB GDDR3 memory inside the same power envelope. GPU is also downclocked to a level of high reliability and more importantly - fitting inside GT200b thermal budget. As a consequence, it doesn't matter if you put Tesla C2070 or C1060 in your installation, power draw is the same.

Now for the second part of the rumor - "Oak Ridge will cancel their Fermi-based supercomputer." We spoke with sources inside ORNL, close to ORNL and as the story developed, we contacted our sources over in Washington D.C. In this paragraph, I wish to offer my sincerest apologies for contacting our sources over Christmas holidays, but we felt that situation was important.

Oak Ridge National Laboratories

The Green Factor - EO 13423
Even though George W. Bush and the previous administration was criticized for bombing the Kyoto protocol, back in early 2007, George put his signature on Executive Order 13423. This EO marks the beginning of new era how governmental agencies can harm the environment, through purchases of carbon-credits, usage of renewable power resources etc. Just like police cars started to use Ethanol fuel, the world of governmental supercomputing laboratories had to change their approach. EO 13423 mandates that every new governmental project, including ORNL has to have 10% of their power grown from natural resources. You might guess where this is heading - the problem that ORNL has might significantly limit their future expansion, since the closest natural resource is none . other but the Mississippi river and the Southwest Power Pool As we all know, Oak Ridge is some 450 miles away - not exactly feasible to draw the power lines across states [one of our sources did state that new power lines would actually do good, given the state of power infrastructure around United States]. In case of Oak Ridge and Tennessee, we would figure the "location, location, location" quote [Achmed the Dead Terrorist] sounds most appropriate.

Tesla 20-Series powered supercomputer at Oak Ridge is expected to draw 60,000 MWh annually, so ORNL has to come up with 6000 MWh from renewable sources. Alternative solution would be buying Carbon Credits. Based on specifications given to us, the 10PFLOPS+ GPGPU supercomputer will output something to the tune of 140,000 metric tons on annual basis, so ORNL would need to buy 14,000 metric tons in Carbon Credits. Regardless of how you put it, GPGPU Supercomputer will run 8-10 million dollars every year in annual power bill, plus the renewable power fees.

An alternative solution for the Fermi-powered rig would be getting another Lab to take the necessary increase in renewable power and trade up. The problem is that western-located Labs aren't happy to help Oak Ridge, especially given the circumstances under which Fermi Supercomputer was given to Oak Ridge. The only labs that can help ORNL are Lawrence Livermore and Sandia / Los Alamos National Laboratories. Sandia is limited to about 35MW. As our sources explained to us, Sandia is limited by the existing power infrastructure in New Mexico - "which resembles the 3rd world countries." We do not share the views of the source, but then again, 35MW does sound a little on the power side for everything that's going on.

If you are asking yourself why Kraken and Jaguar [the Opteron sexa-core upgrade for the Jaguar rig] were exempt from this, we are entering the murky waters of politics. Both systems were passed after Obama came into office, but they were passed by the Bush Administration, on the republican side of this. Did they violate the EO 13423 - we can debate all day long, but the personal take of the author of this article would be yes.

The Interesting Case of Lawrence Livermore
After Los Alamos could not help out, everything is falling on LLNL. Livermore, CA-based Lab has excess of renewable power, given the vast fields of wind-power in and around the Altamont Pass. Future plans call for even more efficient power windmills, but the reason why US is lagging behind Europe are various bird lovers that are afraid of some flying rats ending their lives after running into windmills [like they do today]. That is a topic for another story altogether.

Our sources told us that LLNL is not exactly enthusiastic to help out ORNL without getting something serious for them. California got hung dry on supercomputing power throughout the Bush Administration because even though Arnold Schwarzenegger is a republican, he wasn't exactly popular among republicans, who worked out nice supercomputers for New Mexico and Tennessee, leaving Lawrence Berkeley and Lawrence Livermore hung to dry.

Lawrence Livermore wants GPGPU performance due to their demands. According to our confidential sources, GPGPU excels when page files exceed 8K, which is virtually all the scientific computing needed by the interested parties in this developing story. LLNL could help Oak Ridge if for instance, second Fermi-based Supercomputer would find its way to Livermore, California. With Obama in the office, that isn't a remote possibility. There are growing fears that competing countries might run over United States if the state doesn't wake up, but we have to ask ourselves, when did green start being an excuse for the efficiency and why Fermi-based Supercomputer wasn't calculated on Performance/Watt base in the duties that computer will fulfill. There is a lot of questions that need to be answered and who knows, perhaps Oak Ridge getting Fermi-based supercomputer when in fact, Livermore was passed over by the former administration, AFTER the 2008 Elections. Something's gotta give.

Also, if you wonder why Livermore in not enthused in helping other Labs out is also the next-generation Supercomputer codenamed Blue Waters. Again, it may end up in NERSC http://www.nersc.gov/ over at Berkeley CA, instread of Poughkeepsie [New York] or Livemore [CA].

Is GPGPU the future and what needs to happen?
In conclusion of this story, we take that Oak Ridge will seal the deal with Lawrence, Berkeley and Livermore will get their GPGPU-based toys. As a result, nVidia will sell more chips to the government as GPGPU begins to take over the HPC space. If you are wondering why are we not mentioning AMD as a company that also sells GPGPU products, according to our sources - AMD will start to make sense in 2011, after Bulldozer and Northern Islands make their debut. Until then, it is nVidia all the way. But that's another story altogether.

From now on, it looks like Oak Ridge will be limited with Jaguar II, Kraken and yet unnamed Fermi-based supercomputer, and for future projects, the Labs better come up with something on renewable scale. Even though they compete for resources, the EO 13423 just might be the way to go forward and start the cooperation between Labs.

Rumour Has IT
source: semiaccurate.com
REMEMBER THE TRIUMPHANT WIN for Fermi at the Oak Ridge National Laboratory that Nvidia heavily touted at its GTC conference keynote? The supercomputer project was just killed for power reasons. Fermi power reasons. Whoops.

That keynote 'win', shortly followed up by the showing of faked boards, was the highlight of an otherwise dull show, but it was used to show the potential of Fermi. Faked boards aside, putting GPUs into HPC clusters and supercomputers is what Nvidia has staked much of it's future on. The Oak Ridge National Laboratory (ORNL) win was a massive PR statement, even if it was unlikely to net Nvidia any money directly.

These 'wins' tend to be 'sold' at breakeven or a loss when all the numbers are added up, but they provide some very compelling selling points for smaller and more lucrative corporate clusters. It was a halo that would have been used to sell many more Fermi and Quadro boards over the next few years.

We said "would have been" because word has reached us that the win is now dead and gone. Stone cold dead. Actually, the blame is being put on Fermi and it's power use, so it might not be stone cold, maybe burnt to death by hot stones. In any case, Dear Leader is now zero for two in the 'wins' he touted at GDC. Fermi is massively delayed, underpowered, hot, and the stopgap GPUs are so good that Nvidia won't push them directly. Talk about opening a can of whoop-ass!

We plan on asking Nvidia for a comment about this, but it has recently decided not to respond to us any more, so don't hold your breath for an update. If it does come, it will be posted here.

No comments:

Post a Comment