Archive for June, 2008

Sun T2000 failed to boot – fc-fabric Method or service exit time out.

June 20, 2008

In February in this year, after installing some new LTO-4 tape drives in our Quantum (ADIC) i2000, we began having problems with our primary Networker 7.3.3 server, a Sun T2000. It refused to boot while the switch ports to the i2000 robot were enabled. This was the error message we saw:

NIS domain name is chi.navtech.com
svc.startd[7]: svc:/system/device/fc-fabric:default: Method or s ervice exit timed out. Killing contract 31.
svc.startd[7]: svc:/system/device/fc-fabric:default: Method “/lb /svc/method/fc-fabric” failed due to signal KILL.
svc.startd[7]: svc:/system/device/fc-fabric:default: Method or s ervice exit timed out. Killing contract 34.
svc.startd[7]: svc:/system/device/fc-fabric:default: Method “/lb /svc/method/fc-fabric” failed due to signal KILL.
svc.startd[7]: svc:/system/device/fc-fabric:default: Method or s ervice exit timed out. Killing contract 36.
svc.startd[7]: svc:/system/device/fc-fabric:default: Method “/lb /svc/method/fc-fabric” failed due to signal KILL.
svc.startd[7]: system/device/fc-fabric:default failed: transitio ned to maintenance (see ‘svcs -xv’ for details) Requesting System Maintenance Mode (See /lib/svc/share/README for more information.) Console login service(s) cannot run.

We received a work-around from Sun after a bit as follows:

After you add the latest patches and reboot the host, and assuming your problem hasn’t been resolved, do the following:

Get the host booted into multi-user by disabling the involved switch ports or removing the fiber connections.

# cd /etc/cfg/fp

I noticed from the Explorer run on January 15 that you currently have in
/etc/cfg/fp:

fabric_WWN_map fabric_WWN_map.old fabric_WWN_map.old2

You need to rename them all something that doesn’t begin with the word “fabric” or remove them.

Remove device paths:

# rm /dev/rmt/*

# rm /dev/dsk/c4*; rm /dev/rdsk/c4*; rm /dev/cfg/c4*

# rm /dev/dsk/c5*; rm /dev/rdsk/c5*; rm /dev/cfg/c5*

# rm /dev/dsk/c6*; rm /dev/rdsk/c6*; rm /dev/cfg/c6*

# rm /dev/dsk/c7*; rm /dev/rdsk/c7*; rm /dev/cfg/c7*

# mv /etc/path_to_inst /etc/old_path

Make sure there are no file in /etc that start with the word “path”.

Re-enable switch ports (or re-attach fiber cables)

# luxadm -e forcelip /devices/pci@7c0/pci@0/pci@8/QLGC,qlc@0/fp@0,0:devctl
# luxadm -e forcelip /devices/pci@7c0/pci@0/pci@9/QLGC,qlc@0/fp@0,0:devctl

# luxadm insert

Rebuild device paths:

# devfsadm -p /etc/path_to_inst

# cfgadm -c configure <c#> (for each controller)

# cfgadm -al

If it doesn’t look right at this point, reboot — -r.

This procedure was used multiple times over the last four months. In fact, it was necessary each time we needed to reboot the Networker server.

After further prodding and a furious email exchange over several weeks resulted in this:

The failure is against LUN=ff (255). Please check with tape vendor
to see if it’s possible to unmap LUN 255 from the tape library.
The error which is preventing boot is due to SFK’s inability to configure
LUN 255 (which is of type raid-ctrl and not recognized by Solaris).

So I proceeded to contact Quantums’ tech support to get their opinion on this matter. I spoke with a Quantum tech and he stated that in our environment our NetApps require the Control LUN on the Quantum to be at LUN 255. Also, he stated that the ANSI standards also required it to be at LUN 255.

I reported this to the Sun Engineer and he replied:

escalation engineer reply:

Task Summary: unable to Boot Bug re-occurring need faster solution than previous case 65808366
Note:         Update:

- Since LUN 255 (of type 'array_ctrl' is causing cfgadm to fail because
  there are no target driver for array_ctrl, besides unmapping this LUN
  from the Storage, another workaround is to bind this LUN to our sgen
  (Generic SCSI) driver.

- Disclaimer: I haven't tried this out as I don't have a 3rd party
  storage that maps a LUN of type 'array_ctrl' to Solaris host.

Action(s):

- Customer to try binding "array_ctrl" luns to sgen driver.

  1. Add the following lines to /kernel/drv/sgen.conf to allow array_ctrl
     LUNs to be bound to sgen (Generic SCSI) driver.

     device-type-config-list="array_ctrl";

  2. Add the following lines to /etc/driver_aliases

     sgen "scsiclass,0c"

  3. Reboot the machine to see if the problem is resolved.

After following these instructions and rebooting the Sun T2000 several times I discovered that the problem was fixed!

Advertisements