x264 & Dual CPU

Discussion in 'Software Stuff' started by GCM, Mar 5, 2016.

  1. GCM

    GCM Active Member

    Joined:
    Aug 24, 2015
    Messages:
    137
    Likes Received:
    43
    So,

    Like many, I'm a new owner of a dual E5-2670.

    However, I'm hitting a brick wall with x264 encoding. I stream with a program called OBS (Open Broadcaster Software)

    On paper, the E5-2670 x2 should easily handle the "slow" preset for encoding, however encoding falls over at anything less than medium, I'll get encoding warnings & choppy playback.

    I've done a little research and it seems a few others have had this issue too, but no one seems to really have an answer. A few tests show that the second CPU will do almost nothing, unless the affinity is manually set to physical cores. Even then, when set to physical cores the second CPU will have greatly reduced load compared to the first.

    That being said, has anyone encountered this, and have a solid solution?
     
    #1
  2. Keljian

    Keljian Active Member

    Joined:
    Sep 9, 2015
    Messages:
    429
    Likes Received:
    71
    How much encoding, and for what? Obs has nvenc support on newer nvidia cards and the output can be very good...
     
    #2
  3. GCM

    GCM Active Member

    Joined:
    Aug 24, 2015
    Messages:
    137
    Likes Received:
    43
    It's live, so nvec won't be good enough quality, since I would need 40 Mbps + to match 3 Mbps
     
    #3
  4. Stereodude

    Stereodude Active Member

    Joined:
    Feb 21, 2016
    Messages:
    398
    Likes Received:
    61
    What OS are you running? Windows? What resolution is the video?

    AFAIK, x264 is not NUMA aware so trying to get it to work well on a NUMA architecture will be challenging. Even if you get it to utilize the 2nd CPU it could get messy when threads are sharing information are on different physical cores.

    I looked into how x264 would likely behave on one prior to ordering up the parts for a dual E5-2670 system. I'm okay with running multiple concurrent encodes and do not expect a single encode to utilize the whole system.
     
    #4
  5. GCM

    GCM Active Member

    Joined:
    Aug 24, 2015
    Messages:
    137
    Likes Received:
    43
    Windows 8.1 Pro, 1080p downscaled to 720p 60FPS.

    It's live encoding, so I'm not able to run multiple encodes, sadly.
     
    #5
  6. Patrick

    Patrick Administrator
    Staff Member

    Joined:
    Dec 21, 2010
    Messages:
    11,557
    Likes Received:
    4,485
    Just wondering, is hyper-threading on?
     
    #6
  7. GCM

    GCM Active Member

    Joined:
    Aug 24, 2015
    Messages:
    137
    Likes Received:
    43
    It is, but I've tried it with and without.
     
    #7
  8. Stereodude

    Stereodude Active Member

    Joined:
    Feb 21, 2016
    Messages:
    398
    Likes Received:
    61
    Do you have full control over the x264 command line? If so, what command line switches are you using?

    Well, I wasn't sure if you had several quality levels streams you were simultaneously encoding for streaming/broadcasting.
     
    #8
  9. GCM

    GCM Active Member

    Joined:
    Aug 24, 2015
    Messages:
    137
    Likes Received:
    43
    I do have full control over the command line. Pretty much everything is at default, I've messed around with a few settings to no avail.
     
    #9
  10. Stereodude

    Stereodude Active Member

    Joined:
    Feb 21, 2016
    Messages:
    398
    Likes Received:
    61
    Can you paste your command line? Are you piping data in and out of it?

    What settings have you been messing with? I'm not very knowledgeable about using x264 for real time encoding. With offline encoding adding more lookahead threads is potentially supposed to fix having idle cores but I'm not sure if that can apply to real time or not.
     
    #10
  11. smithse79

    smithse79 Active Member

    Joined:
    Sep 17, 2014
    Messages:
    196
    Likes Received:
    33
    I've (as has everybody) just built my dual E5-2670 as well. In the past, I was running x264 on Dual L5420s and ffmpeg with x264 seemed to be able to saturate the processor when encoding (I'm not doing live encoding, just my movies). With the E5-2670 setup, it only seems to be running about half my cores if even that. I've set threads=32 and only got it a bit higher.

    In digging, it appears that x264 has issues with high thread counts in general. We may have to wait until they get the problem fixed.
     
    #11
  12. Stereodude

    Stereodude Active Member

    Joined:
    Feb 21, 2016
    Messages:
    398
    Likes Received:
    61
    Are you using AVIsynth, or how are you getting video into x264? You could be hitting a limitation of how fast you can decode the source and feed it to x264. There's also a big jump going from 8 logical cores to 32. Your dual E5-2670 can do a lot more work simultaneously.

    FWIW, x264 should automatically set threads to 48 on a system with 32 logical cores. However, quality will suffer if you use too many threads. How many threads is too many is a function of the vertical resolution of the video and --mvrange-thread. Unless you're changing --mvrange-thread you don't want more than 1 thread per 40 lines of vertical resolution. For 1080p that's 27.
    I don't expect they're doing anything to fix it, if they're even aware of it.

    The likely best course of action is to run multiple x264 encodes at once and restrict how many threads each instance of x264 can spawn as well as which NUMA node each encode runs on, and perhaps further restrictions on which cores in the node it will use. I don't have my dual e5-2670 system built yet so I can't give you any more specific guidance. If you can't saturate the system with 2 simultaneous encodes, try 4.
     
    #12
  13. mackle

    mackle Active Member

    Joined:
    Nov 13, 2013
    Messages:
    199
    Likes Received:
    34
    (OBS user, a while since my serious transcoding days, never used that many threads)
    A couple of things, although it's hard to tell what the situation is without the command line switches/sample output:

    1. If you're using CRF - what's the difference in bitrate for your content between medium and slow? (i.e. is it worth the extra processing?)
    2. Have you tried setting --threads manually? (and/or how many threads does your output state?)

    The one problem with setting increasingly more threads is you only have so much framespace to divide up. (ninja'd)

    I seriously doubt x264 is going to 'fix' thread/numa issues. The focus is probably more on x265 now.
     
    #13
  14. Stereodude

    Stereodude Active Member

    Joined:
    Feb 21, 2016
    Messages:
    398
    Likes Received:
    61
    Given the amount of computing power he has dropping to "slower" presets may not actually be slower. They may just utilize more of his CPU (to a point).
     
    #14
  15. badskater

    badskater Active Member

    Joined:
    May 8, 2013
    Messages:
    116
    Likes Received:
    41
    x264 and Dual CPUs has been a known issue for years. Even with --threads, it won't change, as NUMA isn't working on x264. It's the main reason i built a single CPU system for my x264 rendering.

    See more info here:

    The x264 Dual CPU Conundrum - Doom9's Forum
    multiprocess system .. x264 - Doom9's Forum

    It started since nahalem/westmere mainly due to the amount of threads used since, and the fact that each CPU has its own memory.
     
    #15
  16. Stereodude

    Stereodude Active Member

    Joined:
    Feb 21, 2016
    Messages:
    398
    Likes Received:
    61
    So I finally got time to test this with my dual v1 E5-2670 box running Windows 10 Pro. I was able to saturate both CPUs in my testing on a single x264 encode. I'm not sure it's a good idea because there are too many threads given the vertical resolution of the video per the x264 developers, but my results were not what I was expecting given what I've seen posted here and elsewhere.

    I was doing a 2 pass encode and the switches I used for the 2nd pass were:
    Code:
    --bitrate 18438 --preset veryslow --tune film --bluray-compat --vbv-maxrate 40000 --vbv-bufsize 30000 --level 4.1 --keyint 24 --open-gop --slices 4 --colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 --pass 2 --qpfile x.chp -o x.264 x.avs
    I also found that the /NODE and /AFFINITY switches used with the START command restrict the number of threads x264 will spawn without adding any extra switches to the x264 command line. If you only give it 8 logical cores it will only create as many threads as if it was running on a 8 logical core system.

    Like:
    Code:
    START "x264 Encode #1" /NORMAL /NODE 0 /AFFINITY 00FF x.bat
    where x.bat is my x264 command and switches.
     
    #16
  17. Stereodude

    Stereodude Active Member

    Joined:
    Feb 21, 2016
    Messages:
    398
    Likes Received:
    61
    Here are the results of my testing using the 2nd pass of a 2 pass Blu-ray re-encode (1920x1080) of a 2.35:1 movie with black bars on the top and bottom. The numbers are threads / lookahead-threads.

    i7-4770k @ 4.2gHz [12 / 1]
    Code:
    encoded 195503 frames, 10.39 fps, 18445.55 kb/s
    [info]: frame I:9295  Avg QP:10.37  size:185709
    [info]: frame P:69175 Avg QP:14.49  size:118257
    [info]: frame B:117033 Avg QP:15.16  size: 75998
    E5-2670 dual CPU pinned to only one CPU (/NODE 0 & /AFFINITY FFFF) [24 / 2]
    Code:
    encoded 195503 frames, 14.06 fps, 18445.64 kb/s
    [info]: frame I:9274  Avg QP:10.37  size:185136
    [info]: frame P:70206 Avg QP:14.49  size:117965
    [info]: frame B:116023 Avg QP:15.16  size: 75866
    E5-2670 dual CPU 27 / 2 forced via command line (no node or affinity restrictions)
    Code:
    encoded 195503 frames, 20.31 fps, 18445.60 kb/s
    [info]: frame I:9274  Avg QP:10.38  size:185082
    [info]: frame P:70206 Avg QP:14.49  size:117979
    [info]: frame B:116023 Avg QP:15.16  size: 75861
    E5-2670 dual CPU no restrictions [48 / 4]
    Code:
    encoded 195503 frames, 27.42 fps, 18445.61 kb/s
    [info]: frame I:9274  Avg QP:10.39  size:184313
    [info]: frame P:70206 Avg QP:14.49  size:118064
    [info]: frame B:116023 Avg QP:15.16  size: 75871

    As you can see, letting it generate as many threads as it wants based on the 32 logical cores doesn't quite double the speed of pinning it to only 1 CPU, but it's pretty close meaning the NUMA architecture doesn't really cause x264 much trouble (at least with these encode settings).
     
    #17
  18. smithse79

    smithse79 Active Member

    Joined:
    Sep 17, 2014
    Messages:
    196
    Likes Received:
    33
    What did you use for the encode, FFMPEG? Was this Linux or Windows? Details man, we need details!!!
     
    #18
  19. Stereodude

    Stereodude Active Member

    Joined:
    Feb 21, 2016
    Messages:
    398
    Likes Received:
    61
    Windows 10 Pro. I called x264 directly from the command line. I pasted the switches used in the command line in post 16. I fed the 64-bit version of x264 build 2525 from 64-bit AVIsynth 2.5.8 x64. Decoding of the source was done via HW on the Nvidia graphics card (GeForce GT 720) using DGdecodeNV (an AVIsynth filter).
     
    #19
Similar Threads: x264 Dual
Forum Title Date
Software Stuff Handbrake 0.9.5 Released - New STH x264 benchmarks Jan 20, 2011

Share This Page