AWS micro v. small - finally got a nice benchmark result

Patrick · Jun 22, 2014

I am fairly excited. Running Linux-Bench against t1.micro and m1.small

AWS t1.micro
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 122 seconds (122089 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 2228 seconds (2228734 milliseconds)

AWS m1.small
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 49 seconds (49035 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 732 seconds (732692 milliseconds)

AWS m3.medium
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 25 seconds (25610 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 395 seconds (395223 milliseconds)

AWS m3.large
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 10 seconds (10375 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 191 seconds (191703 milliseconds)

AWS m3.2xlarge
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 5 seconds (5342 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 97 seconds (97886 milliseconds)

You can clearly see the t1.micro fall off in the longer medium test and scaling is fairly decent. On the shorter test you can see 122s v 49s so the small completes in about 40% of the time of the t1.micro. On the longer medium test the m1.small completes in 33% of the time of the t1.micro.

That is starting to show the limits of the t1.micro burst mode.

For those wondering, I am doing this to generate test data for the web viewer.

Patrick · Jun 22, 2014

somewhat crazy but the t1.micro took just over 180 minutes to finish the c-ray benchmark. In comparison, the Intel C2750 just took 6min 37 sec.

Here is the comparison point:

Intel Atom C2750
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 5 seconds (5930 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 83 seconds (83355 milliseconds)

Peter_U · Jun 22, 2014

I think that is m3.2xlarge not 3xlarge
Still

Reserved pricing Heavy Utilization
m3.2xlarge $1772 up front $0.146 per Hour

I'd bet you could build a C2750 machine and colo it for less than $1772

Patrick · Jun 22, 2014

Good catch fixed.

So managed to have all of the General purpose instances and all of the compute optimized instances. The t1.micro is still chugging away. And I started the C3 instances after all of the m3 instances finished. Pretty bad.

Patrick · Jun 22, 2014

Intel Xeon E5-1660 V2
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 1 seconds (1321 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 24 seconds (24308 milliseconds)

Patrick · Jun 23, 2014

Previous generation Amazon AWS Instance Benchmarks

AWS m1.medium
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 24 seconds (24132 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 360 seconds (360878 milliseconds)

AWS m1.large
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 13 seconds (13684 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 182 seconds (182941 milliseconds)

AWS m1.xlarge
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 6 seconds (6091 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 91 seconds (91853 milliseconds)

AWS c1.medium
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 13 seconds (13507 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 182 seconds (182821 milliseconds)

AWS c1.xlarge
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 3 seconds (3172 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 45 seconds (45847 milliseconds)

TangoWhiskey9 · Jun 23, 2014

I just tried for you on a small performance 1 Rackspace. I will start another thread on it.

mackle · Jun 23, 2014

I recently used the $200 free credit Azure trial to explore with Folding@Home performance. My impression was that the general compute instances had terrible performance for compute. They are far more suited to web services, where you can have many single threaded workers, each working independently. Rather than compute, where you have everything pointing to the one task and require balance across all threads.

My understanding of the poor compute performance is that it is because they use a variety of slow tech (my instances were all 'second gen' instances, based on Opteron 4171 HE's). Pretty slow chips to start off with, and for the 8 core instances you're operating with 2 NUMA nodes that really hammer SMP performance (8 core perf. was worse than 4 core perf.). We're talking ~6,000 ppd across the 4-8 core instances.

The exception would be the horribly expensive compute intensive VMs, based on '8/16 cores @ 2.6 GHz'* of Intel® Xeon® E5-2670. There were much higher performance (95,000 ppd**), but they also cost $3,000+ a month to run a linux instance continuously... Probably partly because they each came with 56/112GB of ram.

*To this day, I am unclear whether this is 8/16 threads or actual physical cores...
** Noting that F@H ppd scales in a logarithmic fashion, so is approx 5x faster in compute terms than 6k ppd.

Patrick · Jun 23, 2014

mackle said:
I recently used the $200 free credit Azure trial to explore with Folding@Home performance. My impression was that the general compute instances had terrible performance for compute. They are far more suited to web services, where you can have many single threaded workers, each working independently. Rather than compute, where you have everything pointing to the one task and require balance across all threads.

My understanding of the poor compute performance is that it is because they use a variety of slow tech (my instances were all 'second gen' instances, based on Opteron 4171 HE's). Pretty slow chips to start off with, and for the 8 core instances you're operating with 2 NUMA nodes that really hammer SMP performance (8 core perf. was worse than 4 core perf.). We're talking ~6,000 ppd across the 4-8 core instances.

The exception would be the horribly expensive compute intensive VMs, based on '8/16 cores @ 2.6 GHz'* of Intel® Xeon® E5-2670. There were much higher performance (95,000 ppd**), but they also cost $3,000+ a month to run a linux instance continuously... Probably partly because they each came with 56/112GB of ram.

*To this day, I am unclear whether this is 8/16 threads or actual physical cores...
** Noting that F@H ppd scales in a logarithmic fashion, so is approx 5x faster in compute terms than 6k ppd.

I hope to get Rackspace and Azure instances benchmarked also. The 4171 HE's were actually popular. They had many PCIe lanes, cheap and could use RDIMMs. I know Rackspace and Amazon used them too although I think those are being pulled out.

Patrick · Jun 26, 2014

I have all of the Rackspace Performance 1 instances + the 512MB 1GB and 2GB standard instances running now. The new STHbench works!

cptbjorn · Jun 27, 2014

With t1.micro instances, if you just peg it and let something CPU intensive run without limit, it will go through cycles of running pretty quickly for 10-30 seconds, followed by a period of time where it is massively throttled and you see CPU steal jump up to 98-99%, then the cycle repeats.

At least when I played with them a couple years back, I got more work out of them by throttling the CPU usage to just beneath the threshold where Amazon's throttling kicks in, I used "cpulimit" and it worked pretty well. Micros in different regions allowed different percentages too, I remember us-west-2 let me do twice as much work on a micro than us-east-1 before throttling.

Patrick · Jul 2, 2014

New AWS instances released this week

t2.micro
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 12 seconds (12022 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 197 seconds (197242 milliseconds)

t2.small
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 11 seconds (11695 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 192 seconds (192869 milliseconds)

t2.medium
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 6 seconds (6601 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 99 seconds (99617 milliseconds)

These are the "burstable" instances.

For reference an
Intel Atom C2550 put out:
c-ray Easy Test
c-ray-mt v1.1
Rendering took: 10 seconds (10525 milliseconds)
c-ray Medium Test
c-ray-mt v1.1
Rendering took: 152 seconds (152434 milliseconds)

The interesting part about this is that Linux-Bench is long enough that it is clearly destroying the burst speed instances. The C2550 and the t2.medium started Linux-Bench at the same time and the C2550 has been done for awhile now while the t2 instances are not even on the last benchmark.

Patrick · Jul 2, 2014

So one might conclude that the t2.medium is faster than the C2550.

Both are a bit slower than our target baseline but here are the times to complete Linux-Bench

AWS t2.micro - Time to complete (did not finish 7-zip)
real 409m53.438s
user 79m25.644s
sys 304m19.360s

AWS t2.small - Time to complete (did not finish 7-zip)
real 196m23.955s
user 74m36.776s
sys 109m43.005s

Second run
real 270m27.960s
user 77m7.756s
sys 178m44.506s

AWS t2.medium - Time to complete
real 142m45.583s
user 103m27.069s
sys 132m51.866s

Second run
real 172m44.659s
user 94m43.779s
sys 188m44.943s

Intel Atom C2550 - Time to complete
real 91m0.344s
user 181m21.790s
sys 55m3.327s

Second run
real 89m50.618s
user 180m27.243s
sys 54m55.130s

The bottom line is that in the short spurt that is c-ray (fourth test in our suite) we see the t2.medium sip along. On the overall workload we see the Intel Atom C2550 absolutely clean house (51 minutes faster or about 56% longer for the t2.medium to complete.)

Patrick · Jul 3, 2014

Added the second run for the t2.small, t2.medium and the Atom C2550. Added the first run which FINALLY completed for the t1.micro.

Final verdict - it is easy to see the impact of scaling on the new t2 instances.

Carlos · Oct 3, 2014

Did you keep an eye on the t2 credits while running the tests? It may be better if you run the benchmarks twice, once with plenty of credits to see how they perform in burst mode and a second time without any credits so that they run at the baseline. To do the burst test you will need to leave them idle for a while to rack up credits. You can see this in the console. Then for the baseline tests run something at 100% cpu until the credits run out and then do your tests.

Patrick · Oct 3, 2014

Carlos said:
Did you keep an eye on the t2 credits while running the tests? It may be better if you run the benchmarks twice, once with plenty of credits to see how they perform in burst mode and a second time without any credits so that they run at the baseline. To do the burst test you will need to leave them idle for a while to rack up credits. You can see this in the console. Then for the baseline tests run something at 100% cpu until the credits run out and then do your tests.

Hi Carlos we did something like this in July. AWS t2 instances with Linux-Bench Benchmark

Carlos · Oct 3, 2014

Kind of, but it is not clear if run 1 was purely in burst mode for all the t2 machines.
My suggestion was based on this article A look at Amazon EC2 t2.medium | Jason Read | LinkedIn I liked that they benchmarked both the peak/burst performance as well as the degraded/non-burst performance independently.
To me it looks like the t2.medium is quite a beast (> than c3.large) if your workloads can afford you to recover enough credits. However they only looked at the medium and I was interested in your article because you looked at all 3 t2 types.

Patrick · Oct 3, 2014

Interesting. So that is the model where you have a dev/ log instance that gets used once or maybe twice a day and only for the amount of time that it can complete a workload within the burst credit time?

Carlos · Oct 3, 2014

I believe so. Daily (or X times a day) log processing sounds like a good candidate for the t2 machines.

Patrick · Oct 3, 2014

If you get bored and want to compare you can just fire up a Ubuntu instance and just issue the one command to run Linux-Bench.

AWS micro v. small - finally got a nice benchmark result

Administrator

Administrator

New Member

Administrator

Administrator

Administrator

Active Member

Active Member

Administrator

Administrator

Member

Administrator

Administrator

Administrator

New Member

Administrator

New Member

Administrator

New Member

Administrator