Potential front page article?

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

Deslok

Well-Known Member
Jul 15, 2015
1,122
125
63
34
deslok.dyndns.org
I don't understand why they'd chose M2 ssd's instead of DDR4 sodimms, they could have made the card with no memory and a variable frame buffer that would be much faster than the m2 drives.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
I got a bit less excited about this once I learned the details of the card :-/
 

RobertFontaine

Active Member
Dec 17, 2015
663
148
43
57
Winterpeg, Canuckistan
If my reading is correct...

The ssds are essentially just nvme physically mounted to a video card:

1. These are standard retail nvme sticks mounted on the gpu.
2.The CPU on the computer uses the pcie bus on the video card to process the nvme pulling data off the card through the bus and then pushes it back through the pcie bus to the GPU. The gpu processor cannot speak nvme.
3. There is no such thing as hardware nvme raid so the fact that they put 2 ssd sticks on the gpu is about as meaningful as putting 1 nvme card on the the gpu.

... TL;DR this is no different than a m.2 nvme adapter on the pcie bus sitting in the next slot from the gpu in its current state.
 
  • Like
Reactions: Patrick

RobertFontaine

Active Member
Dec 17, 2015
663
148
43
57
Winterpeg, Canuckistan
Maybe wait for version 2 of this card then.

Or perhaps something could be written in openCL to do disk io
At first glance it seems like there should be something there and AMD rather than NVIDIA is the correct company to do it.

The xeon phi cards that I am so fond of are a bunch of pentium 5 with a bunch shared ddr5 some fp64 optimized instructions and a small unix implementation stuck on a coprocessor card.

Intel has improved with the current version by making them a bunch of atom chips with even better fp64 instructions and made it into a first class on the motherboard cpu running centos.

AMD could outboard a cpu to the gpu infrastructure (they know how to do both) and build a coprocessor card that emphasizes the power of shaders (linear algebra) for the class of problems that are better solved this way while providing the outboard os to manage io.

... enter the realm of fantasy...
nvme still requires an onboard pcie controller which might limit the bandwidth (this starts to get beyond me). When we start to talk about fabrics and point to point hardware communication things start to get interesting and require someone else to explain. AMD's current architecture has a significant advantage in "ASync" compute in that they have already started putting what are essentially cpus in front of their pipelines "async compute engines". Under Vega these could become a lot closer to a first class processor. The RX480 for example has 36 async compute units. If you steal some ddrx5, and extend the instruction set a little bit from the apu chips you now have a 36 core, realtime microcore linux. stick them in an token ring architecture or better with the flashram directly accessible by the async compute units and then throw vector equations at them.

Now you have a 36 core supercomputer on a co-processor card that focuses on linear algebra and vector math specifically with a nice bump in space for larger models that don't currently fit on single cards today. Is there a market for machine learning cards of this ilk? I suspect so. Maybe the architecture after Vega could provide this kind of horsepower to the hobbyiest basement hacker.

.... fantasy ends here.
 
  • Like
Reactions: gigatexal

gigatexal

I'm here to learn
Nov 25, 2012
2,913
607
113
Portland, Oregon
alexandarnarayan.com
This could be the hint of converged really really hardware. Imagine motherboards as nothing more than slots of PCIE 4 or something where these daughter cards have a GPU on them with SSDs baked in with x86 or arm coprocessors as well. I mean if they took the ps4 idea with HBM and unified memory and somehow got the graphics shaders to truly be heterogeneous you could have a system that when doing 2D work could reconfigure as a really beefy multi core workstation and then in gaming a really low latency gaming machine. Maybe my subconscious just wants to watch transformers again. But id hope we see something like this in the PC world -- only problem is then you're more or less buying an entire PC in one go as the daughter card becomes the functional unit.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
If you think about it, that is what KNL is. RAM and PCH/ disk directly to the many core compute.
 
  • Like
Reactions: gigatexal

RobertFontaine

Active Member
Dec 17, 2015
663
148
43
57
Winterpeg, Canuckistan
Only KNL is a first class cpu on the motherboard rather than a coprocessor board.
I'm kind of hoping to see KNL coprocessor boards next year to plug into one of the fancy watercooled development workstations.

It will be a new toy to lust after. I suspect the price will keep me out but it's fun to window shop.
 

Patrick

Administrator
Staff member
Dec 21, 2010
12,513
5,804
113
Only KNL is a first class cpu on the motherboard rather than a coprocessor board.
I'm kind of hoping to see KNL coprocessor boards next year to plug into one of the fancy watercooled development workstations.

It will be a new toy to lust after. I suspect the price will keep me out but it's fun to window shop.
I like the watercooled KNL system, but the 4N2U systems are where I would spend my money at this point.
 

RobertFontaine

Active Member
Dec 17, 2015
663
148
43
57
Winterpeg, Canuckistan
If you think about it, that is what KNL is. RAM and PCH/ disk directly to the many core compute.
There is a fundamental difference here. A gpu is a linear algebra engine. The async compute units merely (?) provide routing.

The KNL is a bunch of atom processors with AVX512 instructions (They are a full feature bunch of cpu's).

In a perfect world you would have a bunch of each and route your algorithms appropriately. I think this is called a super computing centre (the correct spelling).

In a basement a couple of xeon phi corner and a couple of rx 480's are the poverty solution for opencl and it can play crysis ;)

With a budget a 42" rack with high bandwidth low latency interconnects.
 
  • Like
Reactions: gigatexal