x264 & Dual CPU

GCM

Active Member
Aug 24, 2015
137
43
28
So,

Like many, I'm a new owner of a dual E5-2670.

However, I'm hitting a brick wall with x264 encoding. I stream with a program called OBS (Open Broadcaster Software)

On paper, the E5-2670 x2 should easily handle the "slow" preset for encoding, however encoding falls over at anything less than medium, I'll get encoding warnings & choppy playback.

I've done a little research and it seems a few others have had this issue too, but no one seems to really have an answer. A few tests show that the second CPU will do almost nothing, unless the affinity is manually set to physical cores. Even then, when set to physical cores the second CPU will have greatly reduced load compared to the first.

That being said, has anyone encountered this, and have a solid solution?
 

GCM

Active Member
Aug 24, 2015
137
43
28
It's live, so nvec won't be good enough quality, since I would need 40 Mbps + to match 3 Mbps
 

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
What OS are you running? Windows? What resolution is the video?

AFAIK, x264 is not NUMA aware so trying to get it to work well on a NUMA architecture will be challenging. Even if you get it to utilize the 2nd CPU it could get messy when threads are sharing information are on different physical cores.

I looked into how x264 would likely behave on one prior to ordering up the parts for a dual E5-2670 system. I'm okay with running multiple concurrent encodes and do not expect a single encode to utilize the whole system.
 

GCM

Active Member
Aug 24, 2015
137
43
28
Windows 8.1 Pro, 1080p downscaled to 720p 60FPS.

It's live encoding, so I'm not able to run multiple encodes, sadly.
 

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
Windows 8.1 Pro, 1080p downscaled to 720p 60FPS.
Do you have full control over the x264 command line? If so, what command line switches are you using?

It's live encoding, so I'm not able to run multiple encodes, sadly.
Well, I wasn't sure if you had several quality levels streams you were simultaneously encoding for streaming/broadcasting.
 

GCM

Active Member
Aug 24, 2015
137
43
28
Do you have full control over the x264 command line? If so, what command line switches are you using?


Well, I wasn't sure if you had several quality levels streams you were simultaneously encoding for streaming/broadcasting.
I do have full control over the command line. Pretty much everything is at default, I've messed around with a few settings to no avail.
 

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
Can you paste your command line? Are you piping data in and out of it?

What settings have you been messing with? I'm not very knowledgeable about using x264 for real time encoding. With offline encoding adding more lookahead threads is potentially supposed to fix having idle cores but I'm not sure if that can apply to real time or not.
 

smithse79

Active Member
Sep 17, 2014
196
33
28
40
I've (as has everybody) just built my dual E5-2670 as well. In the past, I was running x264 on Dual L5420s and ffmpeg with x264 seemed to be able to saturate the processor when encoding (I'm not doing live encoding, just my movies). With the E5-2670 setup, it only seems to be running about half my cores if even that. I've set threads=32 and only got it a bit higher.

In digging, it appears that x264 has issues with high thread counts in general. We may have to wait until they get the problem fixed.
 

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
I've (as has everybody) just built my dual E5-2670 as well. In the past, I was running x264 on Dual L5420s and ffmpeg with x264 seemed to be able to saturate the processor when encoding (I'm not doing live encoding, just my movies). With the E5-2670 setup, it only seems to be running about half my cores if even that. I've set threads=32 and only got it a bit higher.
Are you using AVIsynth, or how are you getting video into x264? You could be hitting a limitation of how fast you can decode the source and feed it to x264. There's also a big jump going from 8 logical cores to 32. Your dual E5-2670 can do a lot more work simultaneously.

FWIW, x264 should automatically set threads to 48 on a system with 32 logical cores. However, quality will suffer if you use too many threads. How many threads is too many is a function of the vertical resolution of the video and --mvrange-thread. Unless you're changing --mvrange-thread you don't want more than 1 thread per 40 lines of vertical resolution. For 1080p that's 27.
In digging, it appears that x264 has issues with high thread counts in general. We may have to wait until they get the problem fixed.
I don't expect they're doing anything to fix it, if they're even aware of it.

The likely best course of action is to run multiple x264 encodes at once and restrict how many threads each instance of x264 can spawn as well as which NUMA node each encode runs on, and perhaps further restrictions on which cores in the node it will use. I don't have my dual e5-2670 system built yet so I can't give you any more specific guidance. If you can't saturate the system with 2 simultaneous encodes, try 4.
 

mackle

Active Member
Nov 13, 2013
212
34
28
(OBS user, a while since my serious transcoding days, never used that many threads)
A couple of things, although it's hard to tell what the situation is without the command line switches/sample output:

1. If you're using CRF - what's the difference in bitrate for your content between medium and slow? (i.e. is it worth the extra processing?)
2. Have you tried setting --threads manually? (and/or how many threads does your output state?)

The one problem with setting increasingly more threads is you only have so much framespace to divide up. (ninja'd)

I seriously doubt x264 is going to 'fix' thread/numa issues. The focus is probably more on x265 now.
 

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
1. If you're using CRF - what's the difference in bitrate for your content between medium and slow? (i.e. is it worth the extra processing?)
Given the amount of computing power he has dropping to "slower" presets may not actually be slower. They may just utilize more of his CPU (to a point).
 

badskater

Automation Architect
May 8, 2013
121
42
28
Canada
x264 and Dual CPUs has been a known issue for years. Even with --threads, it won't change, as NUMA isn't working on x264. It's the main reason i built a single CPU system for my x264 rendering.

See more info here:

The x264 Dual CPU Conundrum - Doom9's Forum
multiprocess system .. x264 - Doom9's Forum

It started since nahalem/westmere mainly due to the amount of threads used since, and the fact that each CPU has its own memory.
 

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
So I finally got time to test this with my dual v1 E5-2670 box running Windows 10 Pro. I was able to saturate both CPUs in my testing on a single x264 encode. I'm not sure it's a good idea because there are too many threads given the vertical resolution of the video per the x264 developers, but my results were not what I was expecting given what I've seen posted here and elsewhere.

I was doing a 2 pass encode and the switches I used for the 2nd pass were:
Code:
--bitrate 18438 --preset veryslow --tune film --bluray-compat --vbv-maxrate 40000 --vbv-bufsize 30000 --level 4.1 --keyint 24 --open-gop --slices 4 --colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 --pass 2 --qpfile x.chp -o x.264 x.avs
I also found that the /NODE and /AFFINITY switches used with the START command restrict the number of threads x264 will spawn without adding any extra switches to the x264 command line. If you only give it 8 logical cores it will only create as many threads as if it was running on a 8 logical core system.

Like:
Code:
START "x264 Encode #1" /NORMAL /NODE 0 /AFFINITY 00FF x.bat
where x.bat is my x264 command and switches.
 

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
Here are the results of my testing using the 2nd pass of a 2 pass Blu-ray re-encode (1920x1080) of a 2.35:1 movie with black bars on the top and bottom. The numbers are threads / lookahead-threads.

i7-4770k @ 4.2gHz [12 / 1]
Code:
encoded 195503 frames, 10.39 fps, 18445.55 kb/s
[info]: frame I:9295  Avg QP:10.37  size:185709
[info]: frame P:69175 Avg QP:14.49  size:118257
[info]: frame B:117033 Avg QP:15.16  size: 75998
E5-2670 dual CPU pinned to only one CPU (/NODE 0 & /AFFINITY FFFF) [24 / 2]
Code:
encoded 195503 frames, 14.06 fps, 18445.64 kb/s
[info]: frame I:9274  Avg QP:10.37  size:185136
[info]: frame P:70206 Avg QP:14.49  size:117965
[info]: frame B:116023 Avg QP:15.16  size: 75866
E5-2670 dual CPU 27 / 2 forced via command line (no node or affinity restrictions)
Code:
encoded 195503 frames, 20.31 fps, 18445.60 kb/s
[info]: frame I:9274  Avg QP:10.38  size:185082
[info]: frame P:70206 Avg QP:14.49  size:117979
[info]: frame B:116023 Avg QP:15.16  size: 75861
E5-2670 dual CPU no restrictions [48 / 4]
Code:
encoded 195503 frames, 27.42 fps, 18445.61 kb/s
[info]: frame I:9274  Avg QP:10.39  size:184313
[info]: frame P:70206 Avg QP:14.49  size:118064
[info]: frame B:116023 Avg QP:15.16  size: 75871

As you can see, letting it generate as many threads as it wants based on the 32 logical cores doesn't quite double the speed of pinning it to only 1 CPU, but it's pretty close meaning the NUMA architecture doesn't really cause x264 much trouble (at least with these encode settings).
 

smithse79

Active Member
Sep 17, 2014
196
33
28
40
What did you use for the encode, FFMPEG? Was this Linux or Windows? Details man, we need details!!!
 

Stereodude

Active Member
Feb 21, 2016
412
65
28
USA
What did you use for the encode, FFMPEG? Was this Linux or Windows? Details man, we need details!!!
Windows 10 Pro. I called x264 directly from the command line. I pasted the switches used in the command line in post 16. I fed the 64-bit version of x264 build 2525 from 64-bit AVIsynth 2.5.8 x64. Decoding of the source was done via HW on the Nvidia graphics card (GeForce GT 720) using DGdecodeNV (an AVIsynth filter).