storage: so, everything was working fine until...

Discussion in 'DIY Server and Workstation Builds' started by thisnewaccount, May 28, 2019.

  1. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    a follow-up to my previous messages, re. buying a direct-attached jbod array (quantum dxi6500, aka: supermicro cse-826, with sas826el1 backplane) off ebay and attaching it to my "backup" server running unraid v6.x via a dell h200e ...

    after running out of drives in my backup server, i expanded externally to keep up with need for space. everything has been running fine as new drives were added to shares as required. (*) anyway, after basically filling the 8th drive in the das chassis, i added a 9th drive... that cannot be seen by unraid. (currently: v6.7.0)

    checked the drive (it's ok), looked around in unraid if there was an overlooked setting that could limit the number of data drive at 21 for some reason (didn't see any), checked documentation for jumpers on the backplane (nothing stood out). right now i want to eliminate the "simple, obvious & dumb reason" that prevents the 9th external drive from being recognized by unraid before i look into swapping the backplane in case it is defective. i prefer to make sure nothing is staring me in the face (current gut feeling), because not all problems are "sexy".

    and frankly, i would like to avoid unracking the jbod if at all possible.

    so has anyone else seen anything like this? anyone else has an idea what could be going on?


    thanks in advance,


    (*) btw, this storage array totally convinced me that esata is not a good idea for servers and to stick with multilane (sff-8088, etc.) cables for reliability and performance.
     
    #1
  2. Fritz

    Fritz Well-Known Member

    Joined:
    Apr 6, 2015
    Messages:
    1,877
    Likes Received:
    377
    Did you try removing one of the working drives and see if the 9th drive shows up?
     
    #2
  3. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    @Fritz: No, i did not try to remove one of the working drives, but i did try put the new one into another bay to see if the 9th one was defective.
    oddly enough, it made a few other drives disappear, as far as unraid was concerned.
    (don't remember if the new unit appeared or not.)

    putting back the new unit in its original slot brought the server back to a working configuration (no missing drives... except the new one, of course).

    i now realize i should try to get into the h200e's bios to check if it sees the new drive.
    after that, i will try removing one of the working units to see what happens.

    cheers.
     
    #3
  4. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    @Fritz:
    a quick update before going to bed (can't call in sick tomorrow am!):
    (1) removed the new drive from the jbod chassis, moved one of the existing discs into the 9th bay and it shows up in unraid. so the bay itself is not defective.
    (2) put back the existing drive into its normal bay, put new drive in the 9th bay and checked the HBA's bios. all 9 drives are listed in the 'sas topology'.
    so at the lower levels, all seems to be working.
    i'll continue testing tomorrow after work.
    cheers.

    p.s.: edit to add this screenshot.
    IMG_20190529_000306.(ed).jpg
     
    #4
    Last edited: May 28, 2019
    Fritz likes this.
  5. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    quick update:

    was not able to perform any significant testing this evening, but managed to check the unraid syslog to see if there would be any error message, anything that could yield a clue why unraid is not seeing the new drive that was added to the external jbod array.
    whilst the h200e does see the new drive (see the photo in the previous post, it's drive 0 (zero) in the 'sas topology'), there is absolutely no trace of the new 2tb disc in the syslog.

    not sure where and how the disc is getting lost.
    the only certainty i have at the moment is that this install of unraid v6.7 appears to have a problem dealing with more than 22 drives (14 internal + 8 external) total.

    for reference, the main server is running 24 drives with absolutely no issue (norco 4224 chassis).

    to be continued.
     
    #5
  6. Fritz

    Fritz Well-Known Member

    Joined:
    Apr 6, 2015
    Messages:
    1,877
    Likes Received:
    377
    Nice detective story. Waiting to see what the culprit is.
     
    #6
  7. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    my gut feeling is telling me the cause of my troubles is not sexy at all, that it is blindingly obvious and rather stupid.
    i'll see if i can boot that box with a live distro (something like 'gparted live') to see what it tells me (dmesg, syslog, etc.).

    come to think of it, i did make of copy of the syslog, i'll check it to see if it contains messages concerning:
    (1) the h200e, maybe the card's driver said something useful;
    (2) the jbod's backplane -- assuming 'lsilogic sasx28 a.1' is the main backplane controller (referred to as the 'primary expander chip' in the backplane documentation) and that it is somewhat visible by unraid.
     
    #7
  8. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    another evening where external obligations didn't leave me time to do extensive testing.

    i did manage to get that 2nd server to boot under gparted live, which didn't see the new drive either (22 drives only).
    and to add insult to injury, could not find a way to get any log file off the box. things are going well -- not.

    rebooted with the usb thumbdrive containing unraid and ... boot media not found.
    huh, okay, turn off server, put thumbdrive in other front usb port, turn server back on... boot media not found.
    huh, not okay, reset box and go into motherboard's bios... notice it wants to boot the thumbdrive with eufi, so let's try a boot override (non-eufi)...
    and the unraid boot menu comes up. much better.

    once i got into the unraid web gui, i clicked on the drop-down menu besides the new drive slot ... and the new drive shows up?!?!?
    for the life of me, i haven't the foggiest how it could be possible and/or what had changed.

    after assigning the new drive to its slot in the main menu, i did download the file created by /tools/diagnostics but am unsure how good / useful it could be.
    am running out of time & steam tonight to elaborate (and edit this) any further, though i could upload the file tomorrow if anyone asks.

    i also have a copy of the previous syslog (from yesterday) that i could also upload, if asked.
    drive clearing is still in progress as i'm typing this, so things appear to be stable.

    cheers.
     
    #8
  9. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    (1st update of 2)

    after yesterday's unexpected improvements of sorts, i booted that server again tonight to continue the process of adding the new drive (shut it down after it finished clearing the disk, was now going to format it and the rest), only to have the ~"boot media not found" error anew. after some cajoling, got unraid to boot... only to discover the new drive is again invisible to unraid.

    i am now dealing with two problems:
    (1) running gparted apparently did something to the motherboard's bios, since i cannot boot unraid like before (bootable thumbdrive not recognized).
    (2) that missing/invisible drive, as far as unraid is concerned.

    not sure why & how, but there appears to be an issue with the motherboard's bios, so i might want to look into Tom's warning about bioses ('update your bios!') even though i wasn't affected in any way until that 9th drive in the external jbod chassis.

    ... to be continued...
     
    #9
  10. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    (2nd update of 2)

    curiouser and curiouser...
    after taking care of the homefront (kitchen, etc.), i sat down to check motherboard & bios information via the dashboard, only to end up seeing the new drive visible again. so within an hour or so, it's as if the drive, somehow, decided to wake up and be recognized by unraid.

    i don't believe in problems that sort of fix themselves without any human intervention.
    and last i checked, i don't have a "more magic" switch on the chassis.

    i flipped between screens/tabs just in case it was the browser acting up and displaying random incorrect stuff. even closed it and restarted it.
    nope, chrome is not going non-linear on me and the drive is still visible.
    started the array, and the drive formatted ok.
    was even able to create a new share and add the drive to it.

    all this is rather bewildering.
    i did run /tools/diagnostics and attached the resulting file to this message, in case someone else can see something in there that i'm totally missing. i'm sure the root cause of all this is staring me in the face, but i just can't see it for now.

    i will take a step back, try to go over everything i've done to see if i can remember something useful, and go from there.
    btw, updating the bios is not an option, i already have the latest & greatest.

    cheers.
     

    Attached Files:

    #10
  11. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    Due to an illness in the family, was not able to beat on this situation as much as i would like.
    I did observe something, though. It means something, unsure what it is exactly.
    it does indeed look as if you wait long enough (roughly 1.5 hours), the missing drive does become visible & usable in unraid.

    not sure if this makes any sense, but one might think there is a standoff of sorts, with two or more devices waiting for each other before initializing. or that there is a timer (?) that is triggered and prevents the new drive from completing its initialization.

    I did notice what i assume is the H200's bios taking some time to do its post (the spinner that advances, as dots appear -- sorry this is the best description i can formulate right now), more than before the new drive was added to the external storage array. or is this a red herring?
     
    #11
  12. Fritz

    Fritz Well-Known Member

    Joined:
    Apr 6, 2015
    Messages:
    1,877
    Likes Received:
    377
    Bad cable maybe?
     
    #12
  13. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    update:
    having to deal with a family member who is sadly in his sunset days has kept my attention away in the last week or two, so i haven't been able to continue this thread as diligently as i should have.

    first, @Fritz: i would say that a bad cable or an improperly seated one would have resulted in flakier, more unpredictable behaviour. but since the drive does plug directly into a backplane (with built-in port multiplier, btw), that does eliminate that hypothesis.

    secondly, there has been some developments:
    apparently, according to someone else's examination of the system logs, the added disc drive that is giving me grief does indeed appear to power on much later, after the others have. i am not sure what could be causing this delayed event, i will have to re-read the backplane's documentation to find anything that could be causing this. but from my previous reading of it, i don't remember anything to be configurable except too many jumpers that are of the "don't touch this, leave as-is" kind. (that's when you wonder "why put in jumpers i can't use?")

    i think i might also want to look into the hba's bios documentation to see if i did not accidentally toggle something on/off that i should not have.

    That's it for now, to be continued.
     
    #13
  14. BeTeP

    BeTeP Active Member

    Joined:
    Mar 23, 2019
    Messages:
    242
    Likes Received:
    97
    Update the backplane firmware. If it would not fix your issue - it's time to buy a new SAS2 backplane.

    SAS2-826EL1 sells for about $80 shipped.

    But if you just want to get to the bottom of it just of curiosity - try connecting bunch of SAS drives instead. SAS 3Gbps expanders are known for their compatibility issues with SATA drives.
     
    #14
    Last edited: Jun 7, 2019
  15. mrkrad

    mrkrad Well-Known Member

    Joined:
    Oct 13, 2012
    Messages:
    1,238
    Likes Received:
    50
    make sure the drives wwid is unique!
     
    #15
  16. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    am back from dealing with the unpleasantness that life throws at us. so, here goes, quickly:

    @BeTeP : i have searched for updated firmware on the supermicro site, but haven't found it. must be too obvious. but first, how do i check the backplane's fw version? secondly, how would i do it? the jbod is racked in the basement and is not connected to a windows box.
    lastly, where did you see those units sold for 80$ shipped? looked on ebay (where i bought the unit) and they are notably more expensive...

    @mrkrad : as i'm using plain old sata drives and not sas ones, i don't think this applies to my situation. but i might be wrong there.

    i think going brute-force method on my situation and swapping the backplane might be a quicker solution.
    also, having spares is never a bad idea, especially on now discontinued devices.
     
    #16
  17. BeTeP

    BeTeP Active Member

    Joined:
    Mar 23, 2019
    Messages:
    242
    Likes Received:
    97
    I just entered "BPN-SAS2-826EL1" in the ebay search box and this was the first result SuperMicro BPN-SAS2-826EL1 12-Port Bay SAS & SATA Expander Backplane SAS2 | eBay (for under $80 shipped at the time of posting, which "mysteriously" went up to $95 soon after. Well there are other listings still around $80 shipped though).

    I can't help you with locating the firmware for the old backplane. It used to be available on the supermicro site. But it seems they removed it.
    For flashing and checking the current version you can use SAS Expander Xtools Lite

    ftp://ftp.supermicro.com/utility/ExpanderXtools_Lite/
     
    #17
    Last edited: Aug 7, 2019
  18. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    @BeTeP : thanks for the info.
    over the week-end, ordered a rev. 1.02 backplane, for roughly 100cad, shipped.

    there must be places other than ebay for techies to order life-cycled/retired storage equipment.
    my employer cannot be the only one that offloads large quantities of gear onto the used market and there are only so many buyers of, say, external jbod chassis, esp. the rack-mountable kind.

    this being said, i also might want to look into a complete, new storage chassis, but i wonder what is easily available over the internet that would not cost a second mortgage. i think norco used to sell such a chassis, same for lenovo. the biggest problem is that not every seller ships to canada. or if they do, it's at exaggerated cost. will see what i can find.
     
    #18
  19. thisnewaccount

    thisnewaccount New Member

    Joined:
    Oct 20, 2018
    Messages:
    24
    Likes Received:
    1
    very, very delayed update to my situation.

    so i did order a spare backplane off e-bay (seller was in latvia) some time ago and i finally was able to do the swap today.
    the situation suddenly took a turn for the worse after inexplicably getting somewhat better for a brief period of time: all drives in the jbod chassis decided to go awol, a good motivator to do some surgery.
    this chassis is not easy to work with, i'd say. lots of swearing involved.

    anyway, if the faster boot-up and the fact that all drives are visible *and* that a new drive is being cleared to use right now means anything, it's safe to say that the old backplane was indeed defective. the new one appears to run much better than the old one.
    i hope this keeps up running without any issues for the foreseeable future.
     
    #19
    Last edited: Oct 27, 2019
Similar Threads: storage everything
Forum Title Date
DIY Server and Workstation Builds Old storage second life Nov 28, 2019
DIY Server and Workstation Builds Storage server using Athlon CPU and 24HDD Nov 9, 2019
DIY Server and Workstation Builds NVMe storage server on a budget Nov 8, 2019
DIY Server and Workstation Builds Home storage server May 8, 2019
DIY Server and Workstation Builds Looking for advice on build, storage spaces server Mar 4, 2019

Share This Page