workstation for huge graphics files

Discussion in 'DIY Server and Workstation Builds' started by aag, Oct 25, 2018.

  1. aag

    aag Member

    Joined:
    Jun 4, 2016
    Messages:
    46
    Likes Received:
    2
    I need to process 3D stacks of 0.5-1 TB each, derived from light-sheet microscopy of brains (http://mesospim.org). The loading times on our current hardware are excruciatingly slow, and this is limiting our research throughput.

    I therefore need to assemble the fastest hardware on earth for this purpose. Note that I do not need to draw a lot of geometry. This is not a virtual reality or gaming project. The bottleneck is pulling out the files from storage and displaying them - think of a huge stack of ultra-high-resolution TIFF files that need to displayed in a rapid succession.

    I would be grateful for advice on which kind of hardware would represent the current state-of-the-art for this purpose. My budget is ca. $ 15'000 but could run up to $ 20'000 if justified.
     
    #1
  2. dandanio

    dandanio Member

    Joined:
    Oct 10, 2017
    Messages:
    35
    Likes Received:
    9
    Would you be able to provide more details? Where do you store the files now? How much data do you store? How fast do you need to access the data? How many concurrent streams at once? Once you ingest data, do you also save it later? How quickly will your storage needs grow?
    And just to give you an idea:
    832 TB – ZFS on Linux – Project “Cheap and Deep”: Part 1
     
    #2
    aag likes this.
  3. Blinky 42

    Blinky 42 Active Member

    Joined:
    Aug 6, 2015
    Messages:
    471
    Likes Received:
    156
    And basic config of your current system and how long it takes to load the data to give a reference point to for comparison could be helpful.

    Have you tried various optimizations/improvements of your setup that helped? What bottlenecks have you identified in current workflow/hardware/process (just i/o bandwidth?) What platform are you on (Windows / Linux / other).

    Are you dealing with one stack at a time, and is each stack ~1T in size, or multiple and doing comparison/analysis between them? Is the entire 1T in the working set (like you have a stack of MRI images and you scroll up and down through the layers to see things hence need the whole set in memory all the time).

    Are the images captured directly on the workstation, or do you need to pull them from tape/network/etc ? Is that done off-hours so it is sitting ready to go when you get to the workstation at the start of the day, or are you also waiting to get data into the system?

    What you have tried already and how well it helped is handy because if you have a few TB of NVMe already and the workstation is still "slow" then you may have a significant task ahead of you :)

    - Also possible platforms - are you restricted to generic x86 or can go with power etc?
     
    #3
    aag likes this.
  4. aag

    aag Member

    Joined:
    Jun 4, 2016
    Messages:
    46
    Likes Received:
    2
    Thank you. Here are some specs of the current system
    - windows system.
    - 30 TB within the PC (HDD)
    - 30 TB on thunderbolt-connected external RAID array.
    - up to 500 GB needs to be manipulated in memory
    - capture is done on the same workstation, but not simultaneously to analysis.
    - The major bottleneck is the time pull the files off the HDDs.

    We haven't even started any optimization. It is clear that moving to SSD will help. I am thinking that acquisition and first manipulations/corrections could occur on SSD, then a batch job could transfer current files to the RAID on a daily basis. Hence we may not need huge storage on expensive SSD.

    Is it advantageous to use a PCIe SSD (like the Samsung 6 TB $6K card) over NVM?
    Which CPU would you recommend? I guess that we will need something capable of handling 512 GB RAM.
     
    #4
  5. pgh5278

    pgh5278 Active Member

    Joined:
    Oct 25, 2012
    Messages:
    473
    Likes Received:
    124
    Sounds very interesting work, not sure You need the fastest hardware on earth for this. Have You worked out how much of the storage is general storage , and how much fast staorage do you require for the system to work effectively to store the required number of upcoming jobs and jobs currently being displayed. What is number of xx Terabytes or is 500 GB in one dispay session, what is the time you think is acceptable to load this? With this information You will be able to work backwards to quantify the the equipment required ( drive types and Qty, memory, CPU, cards etc and optimize Your spend? This information will certainly help get useful quotes, this type of machine is not uncommon in various organizations / universities. Please keep us copied on Your adventure.
     
    #5
    aag likes this.
  6. aag

    aag Member

    Joined:
    Jun 4, 2016
    Messages:
    46
    Likes Received:
    2
    Thank you. I will get back to you on this.
     
    #6
  7. gigatexal

    gigatexal I'm here to learn

    Joined:
    Nov 25, 2012
    Messages:
    2,481
    Likes Received:
    440
    @Venturi has been doing this for imaging for years now.
     
    #7
  8. aag

    aag Member

    Joined:
    Jun 4, 2016
    Messages:
    46
    Likes Received:
    2
    At this moment, my two specific questions are:
    • Is it advantageous to use a PCIe SSD (like the Samsung 6 TB $6K card) with NVMe instead of a SATA SSD? The price differential is huge, but it may be justified.
    • Which CPU would you recommend for the stated purposes? I guess that we will need something capable of handling 512 GB RAM.
    I would be enormously grateful for opinions on these two issues!
     
    #8
    Last edited: Oct 26, 2018
  9. TLN

    TLN Active Member

    Joined:
    Feb 26, 2016
    Messages:
    304
    Likes Received:
    32
    1. Yes. SATA limits are below NVME speeds, and you need big and fast SSD drive. Not necessary to do 6TB IMHO, you can start with smaller drive instead. Intel Optane seems to be the fastest one, 905P model comes in 1.5Tb. I'd also suggest getting spare SSD for cache/system.
    2. I'd recommend anything recent: EPYC, Xeon Scalable. You should look into top gaming CPU's: lower core count, but high freq

    Extra ideas:
    -I'd go with network storage using 10gbps network.
    -I'd seriously consider overclocking. Compare Cinebench R15 scores, some i9 CPU might easily beat Xeons in performance.
     
    #9
    aag likes this.
  10. mstone

    mstone Active Member

    Joined:
    Mar 11, 2015
    Messages:
    473
    Likes Received:
    109
    This seems to require sequential rather than random IO. That means that optane, enterprise NVMe, etc., are overkill and probably won't add much if any performance over lower cost NVMe solutions. (Samsung 960 pro would probably suffice.) The hardest part will probably cramming enough storage into a case if you're trying to upgrade an existing machine. Ideally, you could spec hardware with large number of NVMe U.2 drive bays, then this is an easy problem to solve.
     
    #10
    aag likes this.
  11. aag

    aag Member

    Joined:
    Jun 4, 2016
    Messages:
    46
    Likes Received:
    2
    Thanks to everybody for the input. It seems that we may be able to sink up to 50 k$ into this endeavor. This enables us to redesign completely the whole architecture. What I am currently considering is:
    1. Win10 client PC for image acquisition from light sheet microscope #1 (ca. 500 Gbyte/specimen). 1 TB PCIe NVMe + 4 TB SATA SSD local storage NVMe SSD. 256 GB RAM
    2. Win10 client PC for image acquisition from light sheet microscope #2 (ca. 500 Gbyte/specimen). 1 TB PCIe NVMe + 4 TB local storage NVMe SSD. 256 GB RAM
    3. Win2016 server for analysis. 4 TB local storage NVMe SSD for data + 4 TB local storage NVMe SSD for software.
    4. external RAID array (30 TB) for offloading stacks for intermediate storage attached to server via thunderbolt.
    5. ultra-fast tape system for archive/backup, probably HewlettPackard (?)
    6. ultra-fast network switch, 10 Gb/s or maybe 40 Gb/s, infiniband or ethernet? Copper or Fiber? All three workstations are <10m from each other.
    7. 10 or 40 Gb network cards (mellanox?) for each workstation.
    Any advice, comments and critique is highly welcome!
     
    #11
  12. cesmith9999

    cesmith9999 Well-Known Member

    Joined:
    Mar 26, 2013
    Messages:
    1,028
    Likes Received:
    313
    I would look at a SAS HBA/RAID card and JBOD for this. would be faster and more reliable. and should be about the same cost.

    Chris
     
    #12
    aag likes this.
  13. aag

    aag Member

    Joined:
    Jun 4, 2016
    Messages:
    46
    Likes Received:
    2
    This is to get you an idea of the stuff that we are doing. It's about visualizing vessels and Alzheimer's plaques in brains. The movie has been dramatically downsized, the original resolution is 3 micron isotropic, which scales up to half terabyte when recorded in three color channels
     
    #13
  14. TLN

    TLN Active Member

    Joined:
    Feb 26, 2016
    Messages:
    304
    Likes Received:
    32
    Doublecheck your workflow, before you drop $50k on equipment. Make sure it's exactly what you're looking for. Contact guys who made that microscope to check what workstations they're working on. I highly doubt they're all using 256Gb RAM stations.

    I don't think scanning image requires 256Gb RAM. Displaying that image and navigating - may be. I'd rather contact manufacturer about specs
     
    #14
  15. aag

    aag Member

    Joined:
    Jun 4, 2016
    Messages:
    46
    Likes Received:
    2
    The microscope is self-made. It is a development forked off the OpenSpim. But you are correct, scanning does not need much RAM.

    Once acquired, the 3D stacks need to be inspected for integrity and quality. This is currently the biggest bottleneck, as the loading of a stack into RAM takes up to 10 min. I'd rather do this on the acquisition machine than on the analysis server, because it takes a long time to move the files.
     
    #15
  16. TLN

    TLN Active Member

    Joined:
    Feb 26, 2016
    Messages:
    304
    Likes Received:
    32
    Network = 10Gbps
    Storage = All SSD
    This problem is gone:
    No reason to process files on acquisition machine(I guess it's one that's connected with microscope). So this machine can be "average": Fast SSD, 10Gbps NIC, normal CPU, and let's say 16Gb of RAM.

    One server to process and store. Fast NVME drives, fast CPU(s), lots of memory. If your bottleneck is "loading files into RAM" it will be solved with NVME (read speed is 25x your average hard drive).

    And backup storage with hard drives (again 10Gbps connection). Bunch of hard drives will be good enough to load files for analysis onto that server.

    Pretty much you only need one powerful PC to analyze data, and two PCs to operate can be "normal" PC with SSD and 10gbps network.


    ps. I'll throw in crazy idea: google about 7 gamers 1 PC video. Basically if you have multiple videocards you can run multiple VM's as dedicated PCs. You can build one server like that and make it work for all four roles above (2x user PC, 1x analysis compuer, 1x storage). You don't need to buy 10gbps network (internal ESXi is 10Gbps already), and you can use all available CPU power when needed. You cannot be very flexible with memory, unless you go with thin client, which may be good idea as well.
    ^I'm no VMware expert, so take it with grain of salt.
     
    #16
    aag likes this.
  17. mstone

    mstone Active Member

    Joined:
    Mar 11, 2015
    Messages:
    473
    Likes Received:
    109
    Not at all--10gbps networking will be a significant bottleneck.
     
    #17
  18. aag

    aag Member

    Joined:
    Jun 4, 2016
    Messages:
    46
    Likes Received:
    2
    Dear Mstone
    thank you for your viewpoint. Might you elaborate? I am now considering deploying a small optical fiber network, so that we could upgrade to faster network speeds in the future - should that become important. Here is a diagram of what I am currently envisaging (very preliminary, not yet discussed with the IT people):


    Screenshot 2018-11-03 08.53.39.png
     
    #18
  19. Rand__

    Rand__ Well-Known Member

    Joined:
    Mar 6, 2014
    Messages:
    2,599
    Likes Received:
    343
    10Gb(it) will be a bottleneck since you are talking about all nvme storage and streamed writes/reads. Ideally you will be able to read and write files at 2GB(yte)/s so you should consider at least 25 Gbe.

    Can you elaborate on the workflow again?

    Image is scanned at Win10 #x to a local drive. It's then reviewed immediately or after a batch has been scanned?
    After visual inspection its moved/copied to the Server. A local backup is created on SATA drives.
    After copying backup has been done the next image is scanned?

    Or is the scanning done all the time and the inspection/copying/backing up is done in parallel?
    Why a local backup?

    Why use a dedicated Thunderbolt storage btw? You will not be able to get 40GB write speeds to 4-6 disks in the first place. Is this 'warm' storage (as in tertiary after Client Sata + Server primary nvme)?

    I think you need to reevaluate the primary (scanning, initial review, review) and secondary requirements (backups, categorization/tagging) again without referencing the current setup. If you have cash/opportunity to design a whole new workflow then do so without having the old one impact your design (unless its a core business requirement o/c).

    Just to provide examples of why I say this:
    You could change backup of good images from server to the Client SSDs (thus offloading/distributing load) if you don't perform actual work/tagging/classification on the final image.
    You could parallelize scanning, initial review and sending to server if that is possible (saving a lot of time) if the workflow allows it from a work (user) and speed (drives) point of view. Just because it is not possible now it does not have to be that way in the new solution


    Tape is another topic - why a high speed (ultra expensive) tape. Backing up to tape can run 24/7 if you design the process appropriately. Having a dedicated tape box for example with local storage would enable you to simply send to tape (server) which then is either having a local copy or a copy on tape (library) so always backed up.


    Also the visio gives 256 MB ram to the clients;)
    Also you didn't mention ECC (memory) at all - how important is integrity? This will limit your options. Using Client/Gaming HW would improve single thread speed significantly (as i7's are faster/cheaper than single core Xeons, but getting 256GB Ram for them will not work). How about old (used HW)? Probably not but would save a ton of money (DDR4 vs DDR3) with maybe 20% performance loss (generational improvements)
     
    #19
    Last edited: Nov 3, 2018
    aag likes this.
  20. gigatexal

    gigatexal I'm here to learn

    Joined:
    Nov 25, 2012
    Messages:
    2,481
    Likes Received:
    440
    At this high a cost and this huge of a budget might it be better to get a vendor like gamepc.com or even a Dell or someone to spec and build it for you so you have some kind of support if it falls down? I mean yeah it’s fun to build this out but I’d rather not handle the risk
     
    #20
    aag likes this.
Similar Threads: workstation huge
Forum Title Date
DIY Server and Workstation Builds Move to Vmware Workstation from Esxi? Sep 24, 2018
DIY Server and Workstation Builds Workstation build log: classic with a touch of RGB Aug 14, 2018
DIY Server and Workstation Builds Workstation for autocad Mar 21, 2018
DIY Server and Workstation Builds Dual Xeon Gold 6154 Workstation Mar 14, 2018
DIY Server and Workstation Builds Dell Precision 3620 Xeon E3 workstation Mar 10, 2018

Share This Page