Search results

  1. S

    MDADM RAID6 - unstable behavior

    Some very interesting updates: there was an optimization for mdadm (md/raid5: avoid device_lock in read_one_chunk() · torvalds/linux@97ae272 ) merged in kernel 5.14 that now bumps the random read performance on my test machine from 1.2 million 4K random read IOPS to more than 7.2 million and now...
  2. S

    AMD Milan 7763 stuck at 400 MHz during heavy IO load

    I have a Gigabyte server with 2 x AMD Milan 7763, 22 x Micron 9300 Pro 15.36TB, directly attached, Ubuntu 20.04, Linux kernel 5.4 and I am puzzled for some time of this strange behavior: under pure CPU load it goes up to 3.25 GHz on all cores, which is great. However whenever I throw a large IO...
  3. S

    MDADM RAID6 - unstable behavior

    @i386 Thanks for the link, I was aware of it, reason for which the planning for Intel optane SSDs. Was wondering from another perspective, as on many forums I rarely see people even mentioning write caches. Is a common practice to just not have any kind of cache and just take the hit of a full...
  4. S

    MDADM RAID6 - unstable behavior

    @i386 Very curious how someone squeezes 200+GB out of mdadm also... Another topic, how is the write hole problem of RAID6 usually workaround ? I was planning for 2 x Intel P5800x for it in RAID1
  5. S

    MDADM RAID6 - unstable behavior

    @lihp What I am doing is a delicate balance between maximum possible storage per server, performance and reliability, all based on previous observed years of experience with SSDs in production. What I observed is that, compared to HDDs, SSDs do tend to have bad sectors more often, therefore the...
  6. S

    MDADM RAID6 - unstable behavior

    @Stephan From my understanding, since those are enterprise SSDs with strong error correction codes, bit flips are easily detectable and correctable by the SSD itself. If there is an uncorrectable read error, the SSD should just return a read error at upper level which would inform mdadm / zfs /...
  7. S

    MDADM RAID6 - unstable behavior

    I think what I am seeing is not the effect of that patch but poor concurrency. For example, I expected to have best results with a large group_thread_cnt so I set it to 24, but then the performance went down by a factor of 5 during check. Setting it to 4 lead to way better performance. Each SSD...
  8. S

    MDADM RAID6 - unstable behavior

    speed_limit_max = 3000000 speed_limit_min = 200000 My issue was that writes would not pass through when check is running. I have just changed speed_limit_min to 10000 and now I finally see writes passing. group_thread_cnt set to 6 made a huge difference in write consistency during check...
  9. S

    MDADM RAID6 - unstable behavior

    I am in the process of testing worst case scenario and now all applications running on software RAID. Previously had hardware RAID and it just worked, but with NVMe it's harder to build. Output of commands is below. From what I notice, it's scrubbing the data in foreground, freezing all writes...
  10. S

    MDADM RAID6 - unstable behavior

    Nothing useful in dmesg. No cronjobs for scrubbing that I am aware of, OS is freshly installed. Leaving aside the invitation for disaster, is there any known issue in general with RAID6 aside of needing a write cache? Did an early test with ZFS (zfs-0.8.3-1ubuntu12.9). It ate 90 cores (I have...
  11. S

    MDADM RAID6 - unstable behavior

    Hello, I have a big fat server with 22 * 15.36TB Micron 9300 Pro SSDs connected directly to the motherboard via PCIe lines. I have setup a RAID6 configuration without any write caching, getting an array of about 307,2 TB. On top of the array I have a big fat MySQL instance and what I have...
  12. S

    Tyan Server failed IPMI update

    Hello, I have a Tyan server (B7109F77DV14HR-2T-N) and I've tried recently to update the IPMI interface. The update procedure was interrupted somewhere in the middle and now I'm not able anymore to update the interface (from version 1 to version 6). I always get failures during update, which now...