Friday, May 1, 2009

How Many CPU Cores Do You Need?

source: tomshardware.com

In the early years of the new millennium, with CPU clock speeds finally accelerating past the 1 GHz mark, some folks (Ed.: including Intel itself) predicted that the company's new NetBurst architecture would reach speeds of 10 GHz in the future. PC enthusiasts looked forward to a new world where CPU clocks kept increasing at an accelerating pace. Need more power? Just add clock speed.


Newton’s apple inevitably fell soundly on the heads of those starry-eyed dreamers who looked to MHz as the easiest way to continue scaling PC performance. Physics doesn’t allow for exponential increases in clock rate without exponential increases in heat, and there were a number of other challenges to consider, such as manufacturing technology. Indeed, the fastest commercial CPUs have been hovering between 3 GHz and 4 GHz for a number of years now.

Of course, progress can’t be stopped when money is involved, and with folks willing to shell out cash for more powerful computers, engineers set out to find ways to increase performance by improving efficiency rather than relying solely on clock speed. Parallelism presented itself as a solution--if you can’t make the CPU faster, well, why not add addition compute resources?

The trouble with parallelism is that software has to be specifically written to run in multiple threads--it doesn't offer an immediate return on investment, like clock speed. Back in 2005, when the first dual-core CPUs were seeing the light of day, they didn’t offer much in the way of tangible performance increases because there was so little desktop software available properly supporting them. In fact, most dual-core CPUs were slower than single-core CPUs in a great majority of tasks because single-core CPUs were available at higher clock speeds.

However, that was four years ago and a lot has changed. Many software developers have since been hard at work optimizing their applications to take advantage of multiple cores. Single-core CPUs are actually hard to find and two-, three-, and four-core CPUS are now the norm.

Which begs the question: how many CPU cores are right for me? Is a triple-core processor good enough for gaming, or should you splurge on a quad-core chip? Is a dual-core CPU good enough for the average user, or do more cores really make a difference? Which applications are optimized for multiple cores and which ones react only to specifications like frequency or cache size?

We thought it would be a good time to run some tests with apps from our updated benchmark suite (there are still more to come, too), running the gamut of one, two, three, and quad-core configurations to illustrate what multi-core CPUs really offer in 2009.

Test Methodology: How Do You Compare Multiple Cores?

To keep things on an even keel, we chose a quad-core CPU for our test bed--an overclocked Intel Core 2 Quad Q6600 at 2.7 GHz. After we ran all of our benchmarks on the system, we then disabled one of the cores, rebooted, and ran the benchmarks again. We rinsed and repeated until we had results for one to four CPU cores, all running at identical clock speeds on the same CPU.

It turns out that disabling a CPU core in Windows is easy to do. If you’re interested in trying it yourself, simply type "msconfig" in Windows Vista's "Start Search" field and hit enter. This will open up the system-configuration application.

From here, click the "Boot" tab, and then click the "Advanced options" button.

This will bring up the "BOOT Advanced Options" pop-up menu. Simply click the "Number of Processors" check-box and select the number of processors you’d like your system to use from the drop-down menu. It’s that simple.


After clicking "okay," you’ll be prompted to reboot. After the reboot, you can easily see if the proper number of processor cores is available through Windows Task Manager. Open the task manager by simultaneously hitting "Ctrl-Alt-Delete’" and selecting "Task Manager" from the pop-up menu.

Then, select the "Performance" tab. You will see a CPU-usage graph for each enabled processor (whether it's physical, as seen here, or logical, as you'd get from a Core i7 with Hyper-Threading) under "CPU Usage History." Two histograms mean that two cores are enabled, three mean three cores are active, etc.

Now that you know the method to our madness, let’s go over the details of the hardware and software we use for our tests.

System Hardware
Processor(s) = Intel Core 2 Duo Q6600 (Kentsfield), 2.7 GHz, FSB-1200, 8 MB L2 Cache
Platform = MSI P7N SLI Platinum Nvidia nForce 750i, BIOS A2
RAM = A-Data EXTREME DDR2 800+ 2 x 2,048 MB, DDR2-800, CL 5-5-5-18 at 1.8 V
Hard Drive = Western Digital Caviar WD50 00AAJS-00YFA 500 GB, 7200 RPM, 8 MB cache, SATA 3.0 GB/s
Networking = Onboard nForce 750i Gigabit Ethernet
Graphics Cards = Gigabyte GV-N250ZL-1GI 1 GB DDR3 PCIe
Power Supply = Ultra HE1000X, ATX 2.2, 1000W

Software and Drivers

Operating System = Microsoft Windows Vista Ultimate 64-bit 6.0.6001, SP1 DirectX Version = DirectX 10
Platform Driver = nForce Driver Version 15.25
Graphics Driver = Nvidia Forceware 182.50

Benchmarks and Settings
3D Games
Crysis = Quality settings set to lowest, Object Detail to High, Physics to Very High, version 1.2.1, 1024x768, Benchmark tool, 3-run average
Left 4 Dead = Quality settings set to lowest, 1024x768, version 1.0.1.1, timed demo.
World in Conflict = Quality settings set to lowest, 1024x768, Patch 1.009, Built-in benchmark.

Audio Encoding
iTunes = Version: 8.1.0.52, Audio CD ("Terminator II" SE), 53 min., Default format AAC
Lame MP3 = Version: 3.98 (64-bit), Audio CD ""Terminator II" SE, 53 min, wave to MP3, 160 Kb/s
Video Encoding = TMPEG 4.6 Version: 4.6.3.268, Import File: "Terminator II" SE DVD (5 Minutes), Resolution: 720x576 (PAL) 16:9
DivX 6.8.5 = Encoding mode: Insane Quality, Enhanced Multi-Threading, Enabled using SSE4, Quarter-pixel search
XviD 1.2.1 = Display encoding status=off
MainConcept Reference 1.6.1 = MPEG2 to MPEG2 (H.264), MainConcept H.264/AVC Codec, 28 sec HDTV 1920x1080 (MPEG2), Audio: MPEG2 (44.1 KHz, 2 Channel, 16-Bit, 224 Kb/s), Mode: PAL (25 FPS), Profile: Tom’s Hardware Settings for Qct-Core =

Applications
Autodesk 3D Studio Max 2009 (64-bit) = Version: 2009, Rendering Dragon Image at 1920x1080 (HDTV)
Adobe Photoshop CS3 = Version: 10.0x20070321, Filtering from a 69 MB TIF-Photo, Benchmark: Tomshardware-Benchmark V1.0.0.4, Filters: Crosshatch, Glass, Sumi-e, Accented Edges, Angled Strokes, Sprayed Strokes
Grisoft AVG Antivirus 8 = Version: 8.0.134, Virus base : 270.4.5/1533, Benchmark: Scan 334 MB Folder of ZIP/RAR compressed files
WinRAR 3.80 = Version 3.80, Benchmark: THG-Workload (334 MB)
WinZip 12 = Version 12, Compression=Best, Benchmark: THG-Workload (334 MB)

Synthetic Benchmarks and Settings
3DMark Vantage = Version: 1.02, GPU and CPU scores
PCMark Vantage = Version: 1.00, System, Memory, Hard Disk Drive benchmarks, Windows Media Player 10.00.00.3646
SiSoftware Sandra 2009 SP3 = CPU Test=CPU Arithmetic/MultiMedia, Memory Test=Bandwidth Benchmark

Synthetic Benchmarks: 3DMark And PCMark Vantage

I like to start things off with synthetic benchmarks, so that we can see if the real-world results mirror these theoretical workloads. It is important to keep in mind that synthetic benchmarks are usually written with the future in mind, so we expect them to be a little more responsive to configuration changes than today's real-world applications.


Let's start with the 3D gaming graphics benchmark, 3DMark Vantage. We selected the "Entry" preset, which is 3DMark's lowest-resolution preset, to better demonstrate the CPU's effect on the results:

The almost-linear speed increase is really interesting. The largest bump occurs between one and two CPU cores but it keeps scaling from there. Now let's have a look at PCMark Vantage, 3DMark's general-use counterpart:


PCMark suggests that an end user would see tangible benefits with up to three CPU cores, while the fourth core even drops performance by an insignificant amount. Let's dig a little deeper and see if we can find out why.

In the Memories test, we once again see the biggest performance jump between one and two CPU cores.


The productivity test is likely weighted heavily in PCMark's total system score, as here we see that performance tapers off at three cores. Let's move on to SiSoft's Sandra synthetic benchmarks and see if the results are similar.

Synthetic Benchmarks: SiSoft Sandra

We'll begin with SiSoft Sandra's CPU arithmetic and CPU multimedia benchmarks:


These synthetic benchmarks are showing us a very linear progression between one to four CPU cores. Frankly, this benchmark software is written in a way to utilize multiple cores very efficiently, but I doubt many real-world applications are going to be able to duplicate this linear progression.

Sandra's memory benchmark suggests that three cores will be more memory bandwidth friendly when it comes to integer buffered iSSE2 operations.



Synthetics have their place, but let's get into the meat and potatoes of this review: application benchmarks.

Application Benchmarks: Audio Encoding

Audio encoding has traditionally been a segment of software that either doesn't really benefit from multiple cores or hasn't been optimized by developers for multiple cores that well. Here are our Lame and iTunes results:

Lame doesn't show much advantage when using multiple cores at all. There appears to be a very slight advantage to using even numbers of CPU cores, which is odd. However, the difference is so small it could simply be within the margin of error.

Moving on to iTunes, we see a definite but small speed increase once two cores are enabled, and then it plateaus there.

Neither Lame nor iTunes is well-optimized for multiple CPUs when encoding audio. Conversely, we know that video encoding is often highly optimized for multiple cores due to its inherently parallel nature, so we'll look at that next.

Application Benchmarks: Video Encoding

We'll start our video encoding benchmarks with MainConcept Reference:

What a profound impact more CPU cores make in this benchmark: from nine minutes on the single-core 2.7 GHz Core 2 processor, down to only two minutes and 30 seconds when all four cores are enabled. Clearly, if video encoding is your thing, a quad-core processor is a must.

Or is it? Let's see if TMPGEnc can offer the same advantages:


Here we see the effect of the encoder on the results. While DivX is highly optimized for multiple CPU cores, Xvid doesn't show as much of a benefit. Having said that, even Xvid shows about a 25% speed increase going from single to dual cores.

Application Benchmarks: 2D And 3D Graphics

Moving on to graphics applications, let's see how Adobe Photoshop deals with multiple CPU cores:


The answer seems to be that it doesn't deal with them at all. This is a strange result for such popular software, although we admit that we aren't running the newest version of Photoshop, CS4. But the results still aren't inspiring.

Let's try some 3D rendering with Autodesk 3ds Max:

It turns out that Autodesk 3ds Max loves multiple processors when rendering. This functionality has been part of 3ds Max since its early days in a DOS environment, because 3D rendering took so long that it was necessary to split the workload over a network to multiple computers. Once again, we have an application where quad-cores are extremely desirable.

Application Benchmarks: General Usage


The virus scanner benchmark might best reflect how the average user spends CPU cycles on a computer.

AVG Antivirus shows us some wonderful performance gains with multiple CPU cores. A PC's performance can slow to a crawl while a scan takes place, but these results show that multiple cores can assume more of the load while shortening scan times.


WinZip and WinRAR show almost no multi-threading enhancements. WinRAR does have the edge with an apparent ability to at least take advantage of two cores, but no more than that.

Game Benchmarks

Back in 2005, when threading was first introduced to the desktop, there were no games available that would show any performance advantage with a multi-core CPU versus a single-core model. But times have changed. So what do multiple CPU cores offer the average gamer today? Let's try some popular titles and see. We ran these games at a low 1024x768 resolution with low details to minimize the impact of the graphics card so we could really see how CPU-core limited these game titles might be.

We'll start with Crysis. All details were at minimum except object detail, which was set to High, and Physics, which was set to Very High. This should create a CPU bottleneck, regardless of the graphics settings:

Crysis shows an incredible dependency on the quantity of CPU cores, which is really surprising since we thought it would be more of a graphics card-limited title. Essentially, a single-core CPU delivers half the frame rates of three or four CPU cores in Crysis (at these settings, bear in mind a more GPU-limited scenario will normalize CPU performance substantially). Also interesting is that the game only seems to take advantage of three CPU cores and that there is no performance benefit to using four cores.

But we know that Crysis is heavy on the in-game physics calculations, so let's see if this trend continues in a physics-light game such as Left 4 Dead:


Indeed, Left 4 Dead shows similar results, although the lion's share of the performance jump happens when the second core is added. There is a slight jump with three cores, and once again, the fourth CPU core seems unused. This is an interesting trend, so let's see if it continues in the real-time-strategy game, World in Conflict:

The results are once again similar, but we see a bit of a surprising twist in that three CPU cores seem to fare a bit better than four. This falls close to the margin of error but at the very least it indicates that the fourth core is unused in these game titles.

Performance Analysis And Conclusion

Time to draw some conclusions. Because we have a lot of data to digest, let's simplify it by averaging the performance on a graph:

First, we see that the synthetic benchmarks are overly optimistic as to what multiple cores can accomplish compared to the average real-world scenario. The synthetic performance progression between a single core and multiple cores looks like an almost linear progression of 50% performance increases with each new CPU core.

The average application gains show us a much more realistic progression, with about a 35% speed increase with the second CPU core, a 15% jump with the third, and another 32% jump with the fourth. It's strange that the third CPU core seems to provide half the advantage of adding a fourth CPU core to the mix.

And when considering applications, we must look at individual software titles instead of just the big picture. Indeed, the audio-encoding software we tested seems to offer no multi-core optimizations of which to speak. However, video encoding software conversely offers massive benefits with more CPU cores and depends somewhat on the video encoder utilized. In the case of 3D rendering software, 3ds Max usually has highly optimized multi-core performance improvements, while 2D photo editors like Photoshop seem to have none at all. AVG anti-virus demonstrates massive performance increases with multiple cores, while compression utilities seem to sport little to no multi-threading benefits.

As far as games go, we see a huge 60% performance jump from going single-core to dual-core, and a further 25% leap from dual- to triple-core. Quad cores offer no benefits in the sampling of games we tested. While more games might change the landscape a little, we think the triple-core Phenom II X3s are looking good as a low-cost gaming option. It's also important to note here that as you start shifting to higher resolutions and adding visual detail, the picture gets a lot murkier as graphics muscle becomes the prevalent determinant of frame rates.

After all is said and done, we can come to a few general conclusions based on this data. We don't think you have to be a power-user to enjoy the benefits of a multi-core CPU. This is in stark contrast to the situation four years ago. So, while these gains might not be overwhelming at first glance, it's impressive to note how much thread-level optimization has gone on in the last few years, particularly in the applications identified as most receptive to acceleration through parallelism. In fact, we'll go as far as to say that there is relatively little reason to consider a single-core CPU (if you can find one), except for power-saving applications.

There are a few applications for which users should invest in as many CPU cores as possible, which include video encoding, 3D rendering, and optimized productivity titles, such as AVG's virus-scanning software. The lesson for the gamer is that long gone are the days when a single-core CPU paired with a powerful graphics solution would be "good enough."



No comments:

Post a Comment