Thursday, April 29, 2010

AMD's Thuban Six-Core Phenom II X6 1090T & 1055T Reviewed

by: Anand Lal Shimpi
A very smart man once told me that absolute performance doesn’t matter, it’s performance at a given price point that makes a product successful. While AMD hasn’t held the absolute performance crown for several years now, that doesn’t mean the company’s products haven’t been successful. During the days of the original Phenom, AMD started the trend of offering more cores than Intel at a given price point. Intel had the Core 2 Duo, AMD responded with the triple core Phenom X3.

As AMD’s products got more competitive, the more-for-less approach didn’t change. Today AMD will sell you three or four cores for the price of two from Intel.In some situations, this works to AMD’s benefit. The Athlon II X3 and X4 deliver better performance in highly threaded applications than the Intel alternatives. While Intel has better performance per clock, you can’t argue with more cores/threads for applications that can use them.

When Intel announced its first 6-core desktop processor, the Core i7 980X at $999, we knew a cheaper AMD alternative was coming. Today we get that alternative, this is the Phenom II X6 based on AMD’s new Thuban core.


It’s still a 45nm chip but thanks to architecture and process tweaks, the new Phenom II X6 still fits in the same power envelope as last year’s Phenom II X4 processors: 125W.

Update: AMD tells us that it gave us the wrong pricing on the 1090T. The part sells for $295, not $285, in 1000 unit quantities.

CPU Specification Comparison
ProcessorClock SpeedMax TurboL2 CacheL3 CacheTDPPrice
AMD Phenom II X6 1090T3.2GHz3.6GHz3MB6MB125W$295
AMD Phenom II X6 1055T2.8GHz3.3GHz3MB6MB125W$199
AMD Phenom II X4 965 BE3.4GHzN/A2MB6MB125W/140W$185
AMD Phenom II X4 955 BE3.2GHzN/A2MB6MB125W$165
AMD Phenom II X4 9453.0GHzN/A2MB6MB95W$155
AMD Phenom II X4 9252.8GHzN/A2MB6MB95W$145

You also don’t give up much clock speed. The fastest Phenom II X6 runs at 3.2GHz, just 200MHz shy of the fastest X4.

When Intel added two cores to Nehalem it also increased the L3 cache of the chip by 50%. The Phenom II X6 does no such thing. The 6 cores have to share the same 6MB L3 cache as the quad-core version.


The Phenom II X6 die. Monolithic, hexa-core

There’s also the issue of memory bandwidth. Intel’s Core i7 980X is paired with a triple channel DDR3 memory controller, more than enough for four cores under normal use and enough for a six core beast. In order to maintain backwards compatibility, the Phenom II X6 is still limited to the same dual channel memory controller as its quad-core predecessor.

CPU Specification Comparison
CPUCodenameManufacturing ProcessCoresTransistor CountDie Size
AMD Phenom II X6 1090TThuban45nm6904M346mm2
AMD Phenom II X4 965Deneb45nm4758M258mm2
Intel Core i7 980XGulftown32nm61.17B240mm2
Intel Core i7 975Bloomfield45nm4731M263mm2
Intel Core i7 870Lynnfield45nm4774M296mm2
Intel Core i5 670Clarkdale32nm2384M81mm2
AMD Phenom II X4 965Deneb45nm4758M258mm2

The limitations are nitpicks in the grand scheme of things. While the 980X retails for $999, AMD’s most expensive 6-core processor will only set you back $285 and you can use them in all existing AM2+ and AM3 motherboards with a BIOS update. You're getting nearly 1 billion transistors for $200 - $300. Like I said earlier, it’s not about absolute performance, but performance at a given price point.

AMD 2010 Roadmap
CPUClock SpeedMax Turbo (<= 3 cores)L3 CacheTDPRelease
AMD Phenom II X6 1090T3.2GHz3.6GHz6MB125WQ2
AMD Phenom II X6 1075T3.0GHz3.5GHz6MB125WQ3
AMD Phenom II X6 1055T2.8GHz3.3GHz6MB125W/95WQ2
AMD Phenom II X6 1035T2.6GHz3.1GHz6MB95WQ2
AMD Phenom II X4 960T3.0GHz3.4GHz6MB95WQ2

We'll soon see more flavors of the Phenom II X6 as well as a quad-core derivative with 2 of these cores disabled. As a result, motherboard manufacturers are already talking about Phenom II X4 to X6 unlocking tools.

The new Phenom II X6 processors are aimed squarely at Intel’s 45nm Lynnfield CPUs. Both based on a 45nm process, AMD simply offers you more cores for roughly the same price. Instead of a quad-core Core i7 860, AMD will sell you a six-core 1090T. Oh and the T stands for AMD’s Turbo Core technology.


AMD’s Turbo: It Works

In the Pentium 4 days Intel quickly discovered that there was a ceiling in terms of how much heat you could realistically dissipate in a standard desktop PC without resorting to more exotic cooling methods. Prior to the Pentium 4, desktop PCs saw generally rising TDPs for both CPUs and GPUs with little regard to maximum power consumption. It wasn’t until we started hitting physical limits of power consumption and heat dissipation that Intel (and AMD) imposed some limits.

High end desktop CPUs now spend their days bumping up against 125 - 140W limits. While mainstream CPUs are down at 65W. Mobile CPUs are generally below 35W. These TDP limits become a problem as you scale up clock speed or core count.

In homogenous multicore CPUs you’ve got a number of identical processor cores that together have to share the maximum TDP of the processor. If a single hypothetical 4GHz processor core hits 125W, then fitting two of them into the same TDP you have to run the cores at a lower clock speed. Say 3.6GHz. Want a quad-core version? Drop the clock speed again. Six cores? Now you’re probably down to 3.2GHz.

Single CoreDual CoreQuad CoreHex Core

This is fine if all of your applications are multithreaded and can use all available cores, but life is rarely so perfect. Instead you’ve got a mix of applications and workloads that’ll use anywhere from one to six cores. Browsing the web may only task one or two cores, gaming might use two or four and encoding a video can use all six. If you opt for a six core processor you get great encoding performance, but worse gaming and web browsing performance. Go for a dual core chip and you’ll run the simple things quickly, but suffer in encoding and gaming performance. There’s no winning.

With Nehalem, Intel introduced power gate transistors. Stick one of these in front of a supply voltage line to a core, turn it off and the entire core shuts off. In the past AMD and Intel only put gates in front of the clock signal going to a core (or blocks of a core), this would make sure the core remained inactive but it could still leak power - a problem that got worse with smaller transistor geometries. These power gate transistors however addressed both active and leakage power, an idle core could be almost completely shut off.

If you can take a single core out of the TDP equation, then with some extra logic (around 1M transistors on Nehalem) you can increase the frequency of the remaining cores until you run into TDP or other physical limitations. This is how Intel’s Turbo Boost technology works. Depending on how many cores are active and the amount of power they’re consuming a CPU with Intel’s Turbo Boost can run at up to some predefined frequency above its stock speed.

With Thuban, AMD introduces its own alternative called Turbo Core. The original Phenom processor had the ability to adjust the clock speed of each individual core. AMD disabled this functionality with the Phenom II to avoid some performance problems we ran into, but it’s back with Thuban.

If half (or more) of the CPU cores on a Thuban die are idle, Turbo Core does the following:

1) Decreases the clock speed of the idle cores down to as low as 800MHz.
2) Increases the voltage of all of the cores.
3) Increases the clock speed of the active cores up to 500MHz above their default clock speed.

The end result is the same as Intel’s Turbo Boost from a performance standpoint. Lightly threaded apps see a performance increase. Even heavily threaded workloads might have periods of time that are bound by the performance of a single thread - they benefit from AMD’s Turbo Core as well. In practice, Turbo Core appears to work. While I rarely saw the Phenom II X6 1090T hit 3.6GHz, I would see the occasional jump to 3.4GHz. As you can tell from the screenshot above, there's very little consistency between the cores and their operating frequencies - they all run as fast or as slow as they possibly can it seems.

AMD's Turbo Core Benefit
AMD Phenom II X6 1090TTurbo Core DisabledTurbo Core EnabledPerformance Increase
x264-HD 3.03 1st Pass71.4 fps74.5 fps4.3%
x264-HD 3.03 2nd Pass29.4 fps30.3 fps3.1%
Left 4 Dead117.3 fps127.2 fps8.4%
7-zip Compression Test3069 KB/s3197 KB/s4.2%

Turbo Core generally increased performance between 2 and 10% in our standard suite of tests. Given that the max clock speed increase on a Phenom II X6 1090T is 12.5%, that’s not a bad range of performance improvement. Intel’s CPUs stand to gain a bit more (and use less power) from turbo thanks to the fact that Lynnfield, Clarkdale, et al. will physically shut off idle cores rather than just underclock them.

I have noticed a few situations where performance in a benchmark was unexpectedly low with Turbo Core enabled. This could be an artifact of independent core clocking similar to what we saw in the Phenom days, however I saw no consistent issues in my time with the chip thus far.


The Performance Summary

At $199 and $285 the obvious comparison points are Intel’s Core i5 750 and Core i7 860. We’ll dive into the complete performance tests in a bit, but if you’re looking for some quick analysis here’s what we’ve got.

Single threaded performance is squarely a Lynnfield advantage. Intel’s quad-cores can turbo up more and Intel does have the advantage of higher IPC.

Phenom II X6 vs. Intel's Lynnfield Processors
Cinebench R10 (Single Threaded)Cinebench R10 (Multithreaded)3dsmax r9x264 HD - 2nd PassLeft 4 Dead
AMD Phenom II X6 1090T39511852613.728.5 fps127.2 fps
AMD Phenom II X6 1055T35471626812.725.1 fps111.5 fps
Intel Core i7 86044901659815.026.8 fps131.0 fps
Intel Core i5 75042381414213.421.0 fps130.0 fps

Highly threaded encoding and 3D rendering performance are obviously right at home on the Phenom II X6. The 6MB L3 cache and lower IPC does appear to hamper the Phenom II X6 in a couple of tests but for the most part if you need threads, the X6 is the way to go.

Applications in between generally favor Intel’s quad-cores over the Phenom II X6. This includes CPU-bound games.

None of this should be terribly surprising as it’s largely the same conclusion we came to with the Athlon II X3 and X4. If you run specific heavily threaded applications, you can’t beat the offer AMD is giving you. It’s the lighter or mixed use workloads that tend to favor Intel’s offerings at the same price points.


AMD’s 890FX Chipset

The Phenom II X6 will work in all existing Socket-AM2+ and AM3 motherboards that can 1) support the 125W TDP of the processors, and 2) have BIOS support (apparently over 160 boards at launch). Despite this impressive showing of backwards compatibility, we also get a new chipset today for those of you looking to build a new system instead of upgrade.

The 890FX is a mildly updated version of AMD’s 790FX chipset, mostly adding AMD’s SB850 South Bridge with 6Gbps SATA support. The number of PCIe 2.0 lanes and other major features remains unchanged.

AMD 890FXAMD 890GXAMD 790FX
CPUAMD Socket-AM3AMD Socket-AM3AMD Socket-AM3/AM2+
Manufacturing Process65nm55nm65nm
PCI Express44 PCIe 2.0 lanes24 PCIe 2.0 lanes44 PCIe 2.0 lanes
GraphicsN/ARadeon HD 4290 (DirectX 10.1)N/A
South BridgeSB850SB850SB750
USB14 USB 2.0 ports14 USB 2.0 ports12 USB 2.0 ports
SATA6 SATA 6Gbps ports6 SATA 6Gbps ports6 SATA 3Gbps ports
IOMMU1.2N/AN/A
Max TDP19.6W25W19.6W

You get IOMMU support (an advantage over 790FX) and despite the chipset being built on TSMC's 65nm process, it pulls less power than the 890GX as it lacks any integrated graphics.

The Test

To keep the review length manageable we're presenting a subset of our results here. For all benchmark results and even more comparisons be sure to use our performance comparison tool: Bench.

Motherboard:ASUS P7H57DV- EVO (Intel H57)
Intel DP55KG (Intel P55)
Intel DX58SO (Intel X58)
Intel DX48BT2 (Intel X48)

Gigabyte GA-MA790FX-UD5P (AMD 790FX)
MSI 890FXA-GD70 (AMD 890FX)
Chipset Drivers: Intel 9.1.1.1015 (Intel)
AMD Catalyst 8.12
Hard Disk: Intel X25-M SSD (80GB)
Memory: Corsair DDR3-1333 4 x 1GB (7-7-7-20)
Corsair DDR3-1333 2 x 2GB (7-7-7-20)
Video Card:eVGA GeForce GTX 280 (Vista 64)
ATI Radeon HD 5870 (Windows 7)
Video Drivers:ATI Catalyst 9.12 (Windows 7)
NVIDIA ForceWare 180.43 (Vista64)
NVIDIA ForceWare 178.24 (Vista32)
Desktop Resolution:1920 x 1200
OS:Windows Vista Ultimate 32-bit (for SYSMark)
Windows Vista Ultimate 64-bit
Windows 7 x64

SysMark 2007 Performance

Our journey starts with SYSMark 2007, the only all-encompassing performance suite in our review today. The idea here is simple: one benchmark to indicate the overall performance of your machine.

SYSMark really taxes two cores most of the time, giving the edge to Lynnfield and its aggressive turbo modes. Lightly threaded or mixed workloads won't do so well on the Phenom II X6.

SYSMark 2007 - Overall


Adobe Photoshop CS4 Performance

To measure performance under Photoshop CS4 we turn to the Retouch Artists’ Speed Test. The test does basic photo editing; there are a couple of color space conversions, many layer creations, color curve adjustment, image and canvas size adjustment, unsharp mask, and finally a gaussian blur performed on the entire image.

The whole process is timed and thanks to the use of Intel's X25-M SSD as our test bed hard drive, performance is far more predictable than back when we used to test on mechanical disks.

Time is reported in seconds and the lower numbers mean better performance. The test is multithreaded and can hit all four cores in a quad-core machine.

Adobe Photoshop CS4 - Retouch Artists Speed Test

Performance here is good, but even Photoshop doesn't make consistent enough use of all six cores to really give the Phenom II X6 the edge it needs here. It's faster than the Phenom II X4, but not faster than the Core i5 750.


DivX 6.8.5 with Xmpeg 5.0.3

Our DivX test is the same DivX / XMpeg 5.03 test we've run for the past few years now, the 1080p source file is encoded using the unconstrained DivX profile, quality/performance is set balanced at 5 and enhanced multithreading is enabled.

Thanks to AMD's Turbo Core the Phenom II X6 is pretty close here, but still not able to topple Intel's Core i5 and i7.

DivX 6.8.5 w/ Xmpeg 5.0.3 - MPEG-2 to DivX Transcode


x264 HD Video Encoding Performance

Graysky's x264 HD test uses x264 to encode a 4Mbps 720p MPEG-2 source. The focus here is on quality rather than speed, thus the benchmark uses a 2-pass encode and reports the average frame rate in each pass.

And we finally see the Phenom II X6 flex its muscle, even the 1055T is faster than the Core i7 860:

x264 HD Encode Benchmark - 720p MPEG-2 to x264 Transcode

In the actual encoding pass the 1055T falls behind the 860 but it's still a good 19% faster than the Core i5 750.

x264 HD Encode Benchmark - 720p MPEG-2 to x264 Transcode


3dsmax 9 - SPECapc 3dsmax CPU Rendering Test

Today's desktop processors are more than fast enough to do professional level 3D rendering at home. To look at performance under 3dsmax we ran the SPECapc 3dsmax 8 benchmark (only the CPU rendering tests) under 3dsmax 9 SP1. The results reported are the rendering composite scores.


Not all heavily threaded workloads will show the Phenom II X6 in a good light. Here Intel maintains the advantage:


3dsmax 9 - SPECapc 3dsmax 8 CPU Test


Cinebench R10

Created by the Cinema 4D folks we have Cinebench, a popular 3D rendering benchmark that gives us both single and multi-threaded 3D rendering results.


Single threaded performance is obviously an Intel advantage, but crank up the thread count and there's no match for the Phenom II X6. As we pointed out earlier, if you've got a lot of CPU intensive threads there's no replacement for more cores.


Cinebench R10 - Single Threaded Benchmark


Cinebench R10 - Multi Threaded Benchmark


POV-Ray 3.73 beta 23 Ray Tracing Performance

POV-Ray is a popular, open-source raytracing application that also doubles as a great tool to measure CPU floating point performance.


I ran the SMP benchmark in beta 23 of POV-Ray 3.73. The numbers reported are the final score in pixels per second.


Once again, the Phenom II X6 does very well here.


POV-Ray 3.7 beta 23 - SMP Test


WinRAR - Archive Creation

Our WinRAR test simply takes 300MB of files and compresses them into a single RAR archive using the application's default settings. We're not doing anything exotic here, just looking at the impact of CPU performance on creating an archive.


The i7 860 wins against the 1090T, but the lack of Hyper threading keeps the 750 behind the Phenom II X6 1055T.


WinRAR 3.8 Compression - 300MB Archive


7-Zip Performance

7-Zip Benchmark - 32MB Dictionary


7-Zip 300MB 7z Archive - Max Compression


Gaming Performance

None of the games here can take advantage of more than 4 cores. The Phenom II X6 ends up performing no different than a Phenom II X4. Thankfully due to Turbo Core you rarely see a drop in performance compared to the Phenom II X4 965.


If you want the better gaming chip, you want Lynnfield.


Fallout 3 - 1680 x 1050 - Medium Quality


Left 4 Dead - 1680 x 1050 - Max Settings (No AA/AF/Vsync)


Crysis Warhead - 1680 x 1050 - Mainstream Quality (Physics on Enthusiast) - assault bench


Batman: Arkham Asylum


Dragon Age Origins


Dawn of War II


Power Consumption

Most impressive is AMD's ability to run six 45nm cores at the same power consumption as four 45nm cores. The Phenom II architecture in general does reasonably well at idle, but without power gating AMD can't compete with Intel's idle power levels.


Under load Intel also has the clear advantage.


Idle Power Consumption




Overclocking

The Phenom II X6 1090T is a Black Edition part, meaning it has a fully unlocked clock multiplier. With very little effort our 3.2GHz sample was up and running at 3.80GHz without any additional cooling beyond the stock heatsink/fan.

With a little extra effort, 3.9GHz should be possible, but the fact that we can even run at 3.8GHz with six 45nm cores is very impressive. Update: You asked, and we pushed harder. Our 1090T sample can hit 4GHz at 1.45V and even reach 4.1GHz but not with great stability. The even more important takeaway is that AMD's 64-bit/4GHz limits appear to be gone with Thuban.








Final Words

Today's conclusion is no different than what we've been saying about AMD's CPU lineup for several months now. If you're running applications that are well threaded and you're looking to improve performance in them, AMD generally offers you better performance for the same money as Intel. It all boils down to AMD selling you more cores than Intel at the same price point.

Applications like video encoding and offline 3D rendering show the real strengths of the Phenom II X6. And thanks to Turbo Core, you don't give up any performance in less threaded applications compared to a Phenom II X4. The 1090T can easily trump the Core i7 860 and the 1055T can do even better against the Core i5 750.


You start running into problems when you look at lightly threaded applications or mixed workloads that aren't always stressing all six cores. In these situations Intel's quad-core Lynnfield processors (Core i5 700 series and Core i7 800 series) are better buys. They give you better performance in these light or mixed workload scenarios, not to mention lower overall power consumption.

The better way to look at it is to ask yourself what sort of machine you're building. If you're building a task specific box that will mostly run heavily threaded applications, AMD will sell you nearly a billion transistors for under $300 and you can't go wrong. If it's a more general purpose machine that you're assembling, Lynnfield seems like a better option.


No comments:

Post a Comment