Potential front page article?

gigatexal · Jul 25, 2016

Radeon Pro Solid State Graphics keeps big data close to the GPU

AMD put m.2 SSDs on the PCB of the gpu - that's awesome!

Keljian · Jul 25, 2016

As someone pointed out- PCIe 3.0 16x can transfer much faster than the fastest ssd (unless you team 4-6 of them) gimmick?

gigatexal · Jul 25, 2016

I think it's that there's storage nearer the GPU such that pulling data into the GPU's main memory is quicker

Keljian · Jul 25, 2016

Why not use XDMA? Seems like a better solution

RobertFontaine · Jul 26, 2016

The SSG cuts out the offloading from the cpu on the mainboard.
For many workflows bandwidth across PCIE is the constraint to processing large models.

Deslok · Jul 26, 2016

I don't understand why they'd chose M2 ssd's instead of DDR4 sodimms, they could have made the card with no memory and a variable frame buffer that would be much faster than the m2 drives.

Keljian · Jul 27, 2016

Deslok said:
I don't understand why they'd chose M2 ssd's instead of DDR4 sodimms, they could have made the card with no memory and a variable frame buffer that would be much faster than the m2 drives.

Size, it's the only viable reason.

Patrick · Jul 29, 2016

I got a bit less excited about this once I learned the details of the card :-/

gigatexal · Jul 29, 2016

What specifically?

RobertFontaine · Jul 29, 2016

If my reading is correct...

The ssds are essentially just nvme physically mounted to a video card:

1. These are standard retail nvme sticks mounted on the gpu.
2.The CPU on the computer uses the pcie bus on the video card to process the nvme pulling data off the card through the bus and then pushes it back through the pcie bus to the GPU. The gpu processor cannot speak nvme.
3. There is no such thing as hardware nvme raid so the fact that they put 2 ssd sticks on the gpu is about as meaningful as putting 1 nvme card on the the gpu.

... TL;DR this is no different than a m.2 nvme adapter on the pcie bus sitting in the next slot from the gpu in its current state.

gigatexal · Jul 29, 2016

Maybe wait for version 2 of this card then.

Or perhaps something could be written in openCL to do disk io

RobertFontaine · Jul 30, 2016

gigatexal said:
Maybe wait for version 2 of this card then.

Or perhaps something could be written in openCL to do disk io

At first glance it seems like there should be something there and AMD rather than NVIDIA is the correct company to do it.

The xeon phi cards that I am so fond of are a bunch of pentium 5 with a bunch shared ddr5 some fp64 optimized instructions and a small unix implementation stuck on a coprocessor card.

Intel has improved with the current version by making them a bunch of atom chips with even better fp64 instructions and made it into a first class on the motherboard cpu running centos.

AMD could outboard a cpu to the gpu infrastructure (they know how to do both) and build a coprocessor card that emphasizes the power of shaders (linear algebra) for the class of problems that are better solved this way while providing the outboard os to manage io.

... enter the realm of fantasy...
nvme still requires an onboard pcie controller which might limit the bandwidth (this starts to get beyond me). When we start to talk about fabrics and point to point hardware communication things start to get interesting and require someone else to explain. AMD's current architecture has a significant advantage in "ASync" compute in that they have already started putting what are essentially cpus in front of their pipelines "async compute engines". Under Vega these could become a lot closer to a first class processor. The RX480 for example has 36 async compute units. If you steal some ddrx5, and extend the instruction set a little bit from the apu chips you now have a 36 core, realtime microcore linux. stick them in an token ring architecture or better with the flashram directly accessible by the async compute units and then throw vector equations at them.

Now you have a 36 core supercomputer on a co-processor card that focuses on linear algebra and vector math specifically with a nice bump in space for larger models that don't currently fit on single cards today. Is there a market for machine learning cards of this ilk? I suspect so. Maybe the architecture after Vega could provide this kind of horsepower to the hobbyiest basement hacker.

.... fantasy ends here.

gigatexal · Jul 30, 2016

This could be the hint of converged really really hardware. Imagine motherboards as nothing more than slots of PCIE 4 or something where these daughter cards have a GPU on them with SSDs baked in with x86 or arm coprocessors as well. I mean if they took the ps4 idea with HBM and unified memory and somehow got the graphics shaders to truly be heterogeneous you could have a system that when doing 2D work could reconfigure as a really beefy multi core workstation and then in gaming a really low latency gaming machine. Maybe my subconscious just wants to watch transformers again. But id hope we see something like this in the PC world -- only problem is then you're more or less buying an entire PC in one go as the daughter card becomes the functional unit.

Patrick · Jul 30, 2016

If you think about it, that is what KNL is. RAM and PCH/ disk directly to the many core compute.

RobertFontaine · Jul 30, 2016

Only KNL is a first class cpu on the motherboard rather than a coprocessor board.
I'm kind of hoping to see KNL coprocessor boards next year to plug into one of the fancy watercooled development workstations.

It will be a new toy to lust after. I suspect the price will keep me out but it's fun to window shop.

Patrick · Jul 30, 2016

RobertFontaine said:
Only KNL is a first class cpu on the motherboard rather than a coprocessor board.
I'm kind of hoping to see KNL coprocessor boards next year to plug into one of the fancy watercooled development workstations.

It will be a new toy to lust after. I suspect the price will keep me out but it's fun to window shop.

I like the watercooled KNL system, but the 4N2U systems are where I would spend my money at this point.

RobertFontaine · Jul 31, 2016

I'd like to spend your money as well.

gigatexal · Jul 31, 2016

Well maybe the STH cloud will get a KNL system and we can all play with it for a few days at a time.

RobertFontaine · Jul 31, 2016

Now isn't that an interesting idea... an adjunct to this it that I could host it here in my basement

RobertFontaine · Jul 31, 2016

Patrick said:
If you think about it, that is what KNL is. RAM and PCH/ disk directly to the many core compute.

There is a fundamental difference here. A gpu is a linear algebra engine. The async compute units merely (?) provide routing.

The KNL is a bunch of atom processors with AVX512 instructions (They are a full feature bunch of cpu's).

In a perfect world you would have a bunch of each and route your algorithms appropriately. I think this is called a super computing centre (the correct spelling).

In a basement a couple of xeon phi corner and a couple of rx 480's are the poverty solution for opencl and it can play crysis

With a budget a 42" rack with high bandwidth low latency interconnects.

Potential front page article?

I'm here to learn

Active Member

I'm here to learn

Active Member

Active Member

Well-Known Member

Active Member

Administrator

I'm here to learn

Active Member

I'm here to learn

Active Member

I'm here to learn

Administrator

Active Member

Administrator

Active Member

I'm here to learn

Active Member

Active Member