If your dataset is larger files and not millions of small files (for the case of this exercise, small file are anything under 50MB) you can make this work, if its millions of smaller files you're going to spend all your time seeking and it will kill your speed. All of the rest assumes you have...