Hi guys,
First of all, I’m new to this forum and in server/HPC environment. I will be grateful for any suggestions and comments to my dilemmas on my enthusiastic project for which I hope will became interesting to clients. So far it is completely on my budget and in ‘hobby’ surrounding. Also, sorry for length of post
Problem I am dealing with is processing of raw files of about 2-4GB in a projects that initially were up to 20-80GB in size. After initial setup, processing of project of this size would on i7 4790k computer with SSD, 4 cores+HT last about 10-20h. Processing is done through approx. 50 subroutines of which 25 are highly parallel (all 8 logical cores run at 100%). Only about 10 subroutines are absolutely not parallel. Regarding time allocation, parallel subroutines consume about 60-80% of total project time. Each core needs max 2-2,5GB of RAM for calculation. Windows environment.
Initial files are during processing joined and separated into smaller files which are then used for statistical analysis. Each raw file is separated to about 400-800 files (depending on project setup and initial file size). Replacing HDDs with SSDs showed dramatic time reduction. Upon project completion, initial size of data is 2-2,5x bigger. When processed, whole project can be moved to ‘standard’ PC for statistical analysis and visualization.
Idea how to speed things up is to purchase used 4U server (such as HP DL580 – cheapest one or some DELL/IBM) with E7-4870 processors and 256 GB RAM. New fast NVMe SSD would be perfect EXCEPT (as much as I know) they can’t fit into this generation of 4U servers. In addition, new projects could easily be 1TB in size (meaning min 2TB of disk what is too expensive if PCI SSDs are considered). Therefore, I think that RAID 0 with standard 500GB SSD would be excellent replacement (each project upon analysis is completely removed so potential errors would be noticed immediately). Going into newer series of processors would be too expensive. Using cheaper than E7-4870 processors would be counterproductive because speed and number of cores are important.
Additional thing that I would like to take care of is possibility to buy identical 4U server in future and work on project with ‘double’ processing power. I’ve read that IBM is with it's InfiniBand capable of doing so but since HP is much cheaper, is it possible to use DL580 in same combination? Does that mean that 2 servers could work on 1 dataset even without RAID 0 on both units (which would be perfect)?
Loudness and power consumption in this stage is not a problem, I have extra room for it and would connect to it through remote desktop.
How to (wisely) start with this project? Budget could be up to 2500EUR in start. I was even thinking about Xeon Phi but software is not written for it and would not benefit from it.
Thx for reading and commenting!
First of all, I’m new to this forum and in server/HPC environment. I will be grateful for any suggestions and comments to my dilemmas on my enthusiastic project for which I hope will became interesting to clients. So far it is completely on my budget and in ‘hobby’ surrounding. Also, sorry for length of post
Problem I am dealing with is processing of raw files of about 2-4GB in a projects that initially were up to 20-80GB in size. After initial setup, processing of project of this size would on i7 4790k computer with SSD, 4 cores+HT last about 10-20h. Processing is done through approx. 50 subroutines of which 25 are highly parallel (all 8 logical cores run at 100%). Only about 10 subroutines are absolutely not parallel. Regarding time allocation, parallel subroutines consume about 60-80% of total project time. Each core needs max 2-2,5GB of RAM for calculation. Windows environment.
Initial files are during processing joined and separated into smaller files which are then used for statistical analysis. Each raw file is separated to about 400-800 files (depending on project setup and initial file size). Replacing HDDs with SSDs showed dramatic time reduction. Upon project completion, initial size of data is 2-2,5x bigger. When processed, whole project can be moved to ‘standard’ PC for statistical analysis and visualization.
Idea how to speed things up is to purchase used 4U server (such as HP DL580 – cheapest one or some DELL/IBM) with E7-4870 processors and 256 GB RAM. New fast NVMe SSD would be perfect EXCEPT (as much as I know) they can’t fit into this generation of 4U servers. In addition, new projects could easily be 1TB in size (meaning min 2TB of disk what is too expensive if PCI SSDs are considered). Therefore, I think that RAID 0 with standard 500GB SSD would be excellent replacement (each project upon analysis is completely removed so potential errors would be noticed immediately). Going into newer series of processors would be too expensive. Using cheaper than E7-4870 processors would be counterproductive because speed and number of cores are important.
Additional thing that I would like to take care of is possibility to buy identical 4U server in future and work on project with ‘double’ processing power. I’ve read that IBM is with it's InfiniBand capable of doing so but since HP is much cheaper, is it possible to use DL580 in same combination? Does that mean that 2 servers could work on 1 dataset even without RAID 0 on both units (which would be perfect)?
Loudness and power consumption in this stage is not a problem, I have extra room for it and would connect to it through remote desktop.
How to (wisely) start with this project? Budget could be up to 2500EUR in start. I was even thinking about Xeon Phi but software is not written for it and would not benefit from it.
Thx for reading and commenting!
Last edited: