vtune counters does not add up to 100% when they should be

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

111alan

Active Member
Mar 11, 2019
291
109
43
Haerbing Institution of Technology
I was using microarchitecture exploration mode of vtune 2020. The problem mainly came from "port utilization" tab, where I thought the four "cycles of n ports utilized" should add up to be 100%, but their sum is lower than that.

I did this test on a variety of applications, but none of them suits the expectation. Did I interpret these metrics wrong?

Thx.

the description of these metrics: "This metric represents cycles fraction where the CPU executed total of [0,1,2,3-or-more] uop per cycle on all execution ports."
vtune.JPG
 

Nabladel

Member
Jan 27, 2017
38
5
8
37
Does vtune describe which performance counter(s) are being read by the "Cycles of X Ports Utilized" metrics?
 

Nabladel

Member
Jan 27, 2017
38
5
8
37
Hmm, the UOPS_EXECUTED.CORE_CYCLES_GE_XXX events seem to do what you think it does. The percentages do add up pretty close to 100% in your second post (~99.7%). The 5% difference from your first post is probably due to noise? Vtune is taking samples so some noise is probably expected.

I think this is probably how those percentages were calculated:
TotalCount (Clockticks) = UOPS_EXECUTED.CORE_CYCLES_GE_NONE + UOPS_EXECUTED.CORE_CYCLES_GE_1 = 21062
Cycles of 0 port utilized = UOPS_EXECUTED.CORE_CYCLES_GE_NONE/TotalCount = 11.18%
Cycles of 1 port utilized = (UOPS_EXECUTED.CORE_CYCLES_GE_1 - UOPS_EXECUTED.CORE_CYCLES_GE_2)/TotalCount = 15.13%
Cycles of 2 port utilized = (UOPS_EXECUTED.CORE_CYCLES_GE_2 - UOPS_EXECUTED.CORE_CYCLES_GE_3)/TotalCount = 21.96%
Cycles of 3 port utilized = (UOPS_EXECUTED.CORE_CYCLES_GE_3)/TotalCount = 51.71%
 
  • Like
Reactions: 111alan

111alan

Active Member
Mar 11, 2019
291
109
43
Haerbing Institution of Technology
Hmm, the UOPS_EXECUTED.CORE_CYCLES_GE_XXX events seem to do what you think it does. The percentages do add up pretty close to 100% in your second post (~99.7%). The 5% difference from your first post is probably due to noise? Vtune is taking samples so some noise is probably expected.

I think this is probably how those percentages were calculated:
TotalCount (Clockticks) = UOPS_EXECUTED.CORE_CYCLES_GE_NONE + UOPS_EXECUTED.CORE_CYCLES_GE_1 = 21062
Cycles of 0 port utilized = UOPS_EXECUTED.CORE_CYCLES_GE_NONE/TotalCount = 11.18%
Cycles of 1 port utilized = (UOPS_EXECUTED.CORE_CYCLES_GE_1 - UOPS_EXECUTED.CORE_CYCLES_GE_2)/TotalCount = 15.13%
Cycles of 2 port utilized = (UOPS_EXECUTED.CORE_CYCLES_GE_2 - UOPS_EXECUTED.CORE_CYCLES_GE_3)/TotalCount = 21.96%
Cycles of 3 port utilized = (UOPS_EXECUTED.CORE_CYCLES_GE_3)/TotalCount = 51.71%
Thanks, never thought it this way. Just confirmed this by checking the description of the counters.

Still trying to understand why "front-end bandwidth" can takes more pipeline slots than the entire"front-end bound"?
vtune4.JPG