Dot Hill R/Evolution 2000 Family Release Notes for Software Versions J202P01, J212P01, and J302P01

Dot Hill R/Evolution 2000 Family Release Notes
for Software Versions J202P01, J212P01, and J302P01

Description

Product models

Operating systems

Enhancements

Fixes

Installation instructions

Installation notes and best practices
Installation instructions using RAIDar
Installation instructions using FTP

Known issues and workarounds

Effective date

Version:

J202P01 (Fibre Channel)
J212P01 (iSCSI)
J302P01 (SAS)

Update recommendation:

Immediate. An issue exists on Dot Hill R/Evolution 2000 family products running firmware versions J200P46, J210P22, or J300P22 that will eventually cause controller configuration information to be lost, with subsequent loss of management capability from that controller. Array management, event messaging, and logging will cease functioning, but host I/O will continue to operate normally. This issue affects the ability to manage the array from the affected controller only; if a partner controller is available, the array can be managed through the partner controller. Because configuration information is stored in non-volatile memory, resetting or powering off the controller will not clear this error. If the issue occurs, the controller must be replaced. This failure mode is time sensitive and Dot Hill recommends immediately upgrading firmware on all 2000 family controllers. This is not a hardware issue and proactive replacement of a controller is not a solution. To avoid this condition, you must upgrade your controller to the latest version of firmware.

Versions of firmware that will resolve this issue are in the following table.

Product	Corrected firmware	Affected firmware
2730 / 2730T FC	J202P01, J202R10, J201R10, J201R09, J200P50	J200P46
2330 iSCSI	J212P01, J212R10, J211R10, J211R09, J210P23	J210P22
2530 SAS	J302P01, J302R10, J301R10, J301R09, J300P23	J300P22

Supersedes: All previously released firmware versions.

..Description

These release notes are for the firmware indicated above, which adds improvements and corrects issues found during use and additional qualification testing after initial product release.

Firmware can be installed from any computer running a supported operating system with an Ethernet connection to the storage array system. For supported operating systems, see Operating systems.

Installation procedures vary, depending on connection environment and user preference. For details, see Installation instructions.

..Product models

R/Evolution 2730 (Fibre Channel)
R/Evolution 2730T (Fibre Channel)
R/Evolution 2330 (iSCSI)
R/Evolution 2530 (SAS)

..Operating systems

Operating systems supported for use with Dot Hill R/Evolution 2000 family controllers (and when installing the binary firmware package):

Microsoft Windows Server 2008 x64 - All Editions
Microsoft Windows Server 2008 W32 - All Editions
Microsoft Windows Server 2003 x64 Edition (Including R2 & Base Edition)
Microsoft Windows Server 2003 - All Editions (Including R2 & Base Edition)
Red Hat Enterprise Linux 5 Server (x86-64)
Red Hat Enterprise Linux 5 Server (x86)
Red Hat Enterprise Linux 4 (AMD64/EM64T)
Red Hat Enterprise Linux 4 (x86)
SUSE LINUX Enterprise Server 10 (AMD64/EM64T)
SUSE LINUX Enterprise Server 10 (x86)
SUSE LINUX Enterprise Server 9 (AMD64/EM64T)
SUSE LINUX Enterprise Server 9 (x86)
VMware ESX/ESXi 4.1
VMware ESX/ESXi 4.0
VMware ESX/ESXi Server 3.5
VMware ESX Server 3.0

..Fixes and enhancements

The following enhancements were incorporated in J202P01, J212P01, and J302P01 firmware:

In a Windows cluster environment, there was a possibility that a scrub, reconstruction, or volume creation could cause a controller to crash.
In the Windows Server 2008 R2 environment, cluster validation failed during the cluster creation.
iSCSI IQN/host mapping failed when uppercase characters were used in the IQN string.
Enclosure ID numbers were not updated when an additional drive enclosure was added to an array running on a single controller.
In RAIDar, the enclosure status displayed a false red alert status following a firmware update when the status was actually OK.
In the CLI, for the set advanced-settings command, the single-controller on parameter was added to set Single Controller redundancy mode for a single installed controller.

The following fixes and enhancements were incorporated in J202R10, J212R10, and J302R10 firmware:

Scrub caused controllers to halt.
Identical vdisks created through the CLI and RAIDar report different volume sizes.
Power Supply and I/O module statuses were reported differently on Controller A and Controller B.
Controller halted when a vdisk expansion started.
In dual-controller configurations, if one controller halted, the Fibre Channel host links did not failover to the surviving controller.
Medium errors on drives in a RAID 6 vdisk caused another vdisk to report a critical state.
The controller halted when clearing metadata of a leftover disk.
Due to loss of heartbeat between the two controllers, one of the controllers halted.
The event log was not updated when a drive was removed (or marked as “down”) while a utility, such as verify, was running.
RAID 6 reconstruct caused partner controller to halt.
Volumes became inaccessible when converting a master volume to a standard volume.
Drives in non-fault-tolerant vdisks did not report unrecovered media error as a warning.
Heavy RSR load caused a controller to halt.
Added “year” to the critical error log.
Controller halted when utility timing was in conflict.
Controller halted when, under heavy I/O loads, a RAID 6 vdisk had one or more failed drives.
There were verification errors after an internal error recovery process completed.
After a failover, an incorrect vdisk utility status was reported in the CLI and RAIDar.
Host lost access when a large vdisk was being rebuilt.
The spare drive was not activated when the vdisk passed into a critical state.
Updated scrub utility for improved behavior.
RAID 6 reconstruct reported incorrectly when an additional failure occurred.
Drive LED behavior was inconsistent.
A controller halted due to excessive retries when a drive that was being reconstructed to had a failure.
Enhanced the Power Supply module Voltage/Fan Fault/Service Required (bottom) LED. It illuminates solid amber during an under-voltage condition and will now remain illuminated even after the current returns to normal and the power supply is replaced or power cycled.
A controller halted during an update of an expansion controller.
Both controllers halted during a failover.
Chassis failure caused data access problems and/or data loss.
Multiple systems presented the same WWPN.
Improved logging with better historical information.
Scrub log entries did not properly display the parity error count after a failover event.
In both RAIDar and CLI, metadata was not cleared from all of the selected drives when commanded to do so.
The controller stalled when a vdisk with snapshot and replication volumes was deleted.
A duplicate vdisk was reported after a halted controller recovered.
LUNs were not properly remapped after changing vdisk ownership.
The wrong drive was marked as “down”.
Improved management of cache flushing.
Added ability to allow flow control on iSCSI ports.
Removed unused debug agent component.
RAID data logs were not flushed after an extended power off or when a controller was restarted but failover did not occur.
RAID 50 error reporting did not report errors during verify.
Improved performance on RAID 10 vdisks.
When a controller was removed and I/O was in process, the transaction was held in the cache.
Scrubbing message was unclear when vdisk is owned by the other controller.
After extensive runtime and I/Os, the system may stall during a shutdown procedure.
Background vdisk scrub stopped with no warning.
If a controller was restarting or failing over at the same time that a snapshot was being deleted, there was a possibility of the snapshot becoming inconsistent.
Disk loss during an expansion controller upgrade.
A power supply module failure event was not included in the system event logs.
Vdisk scrub failed ungracefully on disk unrecoverable error (URE).
Reduced the I/O delay when a disk has failed and data needs to be reconstructed by the RAID engine.
There was a disk channel error when using SATA drives.
The event log did not properly report when both controllers were restarted.
LEDs of all drives in enclosure 1 were amber and SMI-S reported them as failed.
Management controller hung after recovery.
Disconnecting a back end cable caused a controller to halt.
Volume was not accessible after converting it from a Master volume to a Standard volume.
An unknown setting was reported set on one disk in a system.
False under-run errors were written to the event logs.
A controller halt reported Double IOB to same Nexus in the event logs.
The power supply module was incorrectly identified in the event logs.
Incorrectly reported the components were in a degraded state.
Could not collect a complete set of logs from an array.

CLI-specific fixes and enhancements:

trust command: Updated CLI help example.
trust vdisk command: When run on a vdisk that was online, it reported success when it should have reported failure.
set vdisk command: When changing the vdisk name, the name was rejected as being invalid.
set debug log parameters command: command returned an error message and would not perform the requested action.
expand snap-pool size max command and variable: returned an error message.
clear events command: Improved online help.
set host-wwn-name command: Setting the host-wwn-name did not work as expected.
set iscsi-host host <host> <new-nickname> command: Was unable to enter an IQN alias name.
show host-wwn-names: Did not work as expected.

RAIDar-specific fixes and enhancements:

When a dedicated spare of a vdisk was deleted, the drive was marked as “leftover.”

Firmware update-specific fixes and enhancements:

All drives in enclosures 1 and 3 were reported as unknown following a firmware update.
Partner Firmware Update (PFU) did not properly update the firmware on Controller B.
After a failover, vdisk ownership did not change to the operating controller.
After performing a firmware upgrade, some drives were erroneously reported as duplicate/leftover drives.
After a firmware update, multiple drives are marked as “leftover.”
After upgrading firmware, the array had to be restarted.
After a firmware upgrade, the controller stalled.
A firmware upgrade failed.

..Installation instructions



	WARNING! Do not cycle power or restart devices during a firmware update. If the update is interrupted or there is a power failure, the module could become inoperative. If this occurs, contact technical support. The module may need to be returned to the factory for reprogramming.



	CAUTION: Before upgrading controller firmware, ensure that the storage system configuration is stable and is not being reconfigured or changed in any way. If configuration changes are in progress, monitor them and wait until they are completed before proceeding with the upgrade.



	IMPORTANT: As with any firmware upgrade, it is a recommended best practice to ensure that you have a full backup prior to the upgrade.



	NOTE: To install this firmware, you must download the firmware package from the Dot Hill Customer Resource Center at http://crc.dothill.com and save the file to your local filesystem.

Installation notes and best practices

When planning for a firmware upgrade, select and schedule an appropriate time to perform an online upgrade:
- For single domain systems, I/O must be halted.
- For dual domain systems, selecting the appropriate time is essential. Because the online firmware upgrade is performed while host I/Os are being serviced, the I/O load can impact the upgrade process. Selecting a period of low I/O activity will ensure the upgrade completes as quickly as possible and will avoid disruptions to hosts and applications due to timeouts.
In single domain systems, approximately 30–60 minutes are required for the firmware to load, plus an additional 15–30 minutes for the system to automatically restart.
In dual domain systems, an additional 30–60 minutes is required for the second update, plus an additional 15–30 minutes for the second module to automatically restart.
Set the Partner Firmware Update option so that, in dual-controller systems, both controllers are updated. When the Partner Firmware Update option is enabled, after the installation process completes and restarts the first controller, the system automatically installs the firmware and restarts the second controller. If Partner Firmware Update is disabled, after updating software on one controller, you must manually update the second controller.
During the installation process, monitor the system display to determine update status and to know when the update is complete.
After the installation process is complete and all systems have automatically restarted, use RAIDar to verify system status and to confirm that the new firmware version is listed as installed. Review system event logs.
Updating array controller firmware may result in new event messages that are not described in earlier versions of documentation. For comprehensive event message documentation, see the most current version of the Dot Hill R/Evolution Event Descriptions Reference Guide.
When reverting to a previous version of firmware, ensure that both Ethernet connections are available and accessible before downgrading the firmware. Manually disable the Partner Firmware Update (PFU) and then downgrade the firmware on each controller separately (one after the other).

Installation instructions using RAIDar



	WARNING! Do not cycle power or restart devices during a firmware update. If the update is interrupted or there is a power failure, the module could become inoperative. If this occurs, contact technical support. The module may need to be returned to the factory for reprogramming.

Place the downloaded firmware package in a temporary directory and extract the contents.
Locate the firmware file in the extracted folder.
In single-domain environments, halt I/O to vdisks before starting the firmware update.
Log in to RAIDar and select Manage > Update Software > Controller Software.
A table displays currently installed firmware versions.
Click Browse and then select the firmware file to install.
Click Load Software Package File.

Allow approximately 30–60 minutes for the firmware to load, plus an additional 15–30 minutes for the automatic restart to complete on the controller you are connected to. Wait for the progress messages to specify that the update has completed.

In dual-controller systems with Partner Firmware Update enabled, allow an additional 30–60 minutes for the second update, plus an additional 15–30 minutes for the second module to automatically restart.
In the RAIDar display, verify that the proper firmware version appears for each module.

Installation instructions using FTP



	WARNING! Do not cycle power or restart devices during a firmware update. If the update is interrupted or there is a power failure, the module could become inoperative. If this occurs, contact technical support. The module may need to be returned to the factory for reprogramming.



	NOTE: Allow sufficient time for the firmware to load and for the automatic restart to complete. Progress messages are displayed in the FTP interface during that time. Wait for the progress messages to specify that the firmware load has completed. If the system `Partner Firmware Update` (PFU) option is enabled, no messages are displayed in the FTP interface during PFU.

Place the downloaded firmware package in a temporary directory and extract its contents..
Locate the firmware file in the extracted folder.
Using RAIDar, prepare to use FTP:
1. Determine the network-port IP addresses of the system controllers.
2. Verify that the system FTP service is enabled.
3. Verify that the user you will log in as has permission to use the FTP interface and has manage access rights.
In single-domain environments, halt I/O to vdisks before starting the firmware update.
Open a command prompt (Windows) or a terminal window (UNIX), and navigate to the directory containing the firmware file to load.
1. Enter ftp <controller-network-address>, where <controller-network-address> represents the IP address.
2. Log in as an FTP user (user = ftp, password = flash).
3. Enter put <firmware-file> flash, where <firmware-file> represents the name of the firmware file.
Allow approximately 30–60 minutes for the firmware to load, plus an additional 15–30 minutes for the automatic restart to complete on the controller you are connected to. Wait for the progress messages to specify that the update has completed.
In dual-controller systems with Partner Firmware Update enabled, allow an additional 30–60 minutes for the second update, plus an additional 15–30 minutes for the second module to automatically restart.
If needed, repeat these steps to load the firmware on additional modules.
Quit the FTP session.
In the RAIDar display, or using the CLI, verify that the proper firmware version appears for each module.

Installation troubleshooting

If you experience issues during the installation process, do the following:

When viewing system version information in RAIDar's System Overview panel, if an hour has elapsed and the components do not show that they were updated to the new firmware version, refresh the web browser. If version information is still incorrect, proceed to the next troubleshooting step.
If version information does not show that the new firmware has been installed, even after refreshing the browser, restart all system controllers. For example, in the CLI, enter the restart mc both command. After the controllers have restarted, one of three things happens:
- Updated system version information is displayed and the new firmware version shows that it was installed.
- The Partner Firmware Update process automatically begins and installs the firmware on the second controller. When complete, the versions should be correct.
- System version information is still incorrect. If system version information is still incorrect, proceed to the next troubleshooting step.
Verify that all system controllers are operating properly. For example, in the CLI, enter the show disks command and read the display to confirm that the information displayed is correct.
- If the show disks command fails to display the disks correctly, communications within the controller have failed. To reestablish communication, cycle power on the system and repeat the show disks command. (Do not restart the controllers; cycle power on the controller enclosure.)
- If the show disks command from all controllers is successful, perform the firmware update process again.

..Known issues and workarounds

This is a cumulative list of known issues and workarounds since the initial firmware release.

How to get out of failure mode:
1. Pull host cables.
2. Power cycle raid-head enclosure.
3. After reboot, wait for disk lights to stop flashing as this indicates de-stage is complete.
4. Plug host cables back in and reconfigure the host to do I/O.
SSH access to the CLI may fail on repetitive attempts to open and close the connection.
When using telnet and secure shell (SSH) to access the command line interface (CLI), the connection may fail when multiple sequences or commands are sent from a script. The issue does not occur if a delay, for example 0.25 to 1 second, is inserted between the ssh close and the subsequent ssh open commands in the script.
An initializing vdisk is accessible immediately, but is not fault tolerant.
The "Virtual Disk Initialization" section in the Dot Hill R/Evolution 2000 Series Administrator's Guide statement that "If the virtual disk is initializing online, you can start using it immediately" may be misleading. As shown in the "Virtual Disk Icons" section in the guide, the vdisk is NOT fault tolerant while the vdisk is initializing or in a critical state.
MPIO reporting path fail-over to single LUN on Windows Server 2008 host.
A Windows Server 2008 host may occasionally lose a single path to a single LUN. The Windows 2008 MPIO reports a path fail-over, however, the path may not come back. This issue is fixed with Microsoft QFE KB957316, which is available from the Microsoft support website at http://support.microsoft.com/kb/957316. Review the information and download the appropriate QFE for the Windows Server 2008 operating system. If Microsoft QFE KB957316 is not installed, the system must be rebooted to correct the issue.
A failed drive may not be displayed on the Enclosure View page in RAIDar.
Should this condition occur, check the drive LED indicator for a solid amber light indicating a failed drive. If the drive was configured for SMART detection, check for entries in the array system event log. Note: The failed drive status displays as Missing using the show disks encl command from the command line interface (CLI).
Removing all the drives from a JBOD enclosure causes the enclosure to be removed from RAIDar's Enclosure View page.
To check the status of the enclosure, access the array using one of the command line interface (CLI) methods and run the show enclosures command.
Windows Server 2003 SP2 fails to hibernate causing the system to lock up.
Applying Microsoft hot fix KB940467 corrects the issue for Windows Server 2003 SP2.
I/O may not resume to SUSE and Red Hat Linux hosts upon cable reinsertion.
The likelihood of this issue occurring increases with the number of LUNs configured on the storage array and load. The failover/failback process is working correctly at the multipath driver level. At the multipath application level, multipath maps are not getting updated. To update the maps at application level, run the command multipath -v0. This command may take a few minutes with heavy I/O running on the system.
Windows Server 2003 host may hang after failure of a vdisk.
The start menu bar goes away, applications may become slow to close, and the system will not reboot (shuts down all programs and closes network connections but hangs at gray screen with mouse cursor still active). Microsoft is working on a QFE hotfix. Power cycling the host corrects the issue.
After a cable move, LSI Logic MPT SAS BIOS loads incorrectly, showing initialization twice on the server.
After moving SAS cables to a different SAS host bus adapter (HBA) on the server, the MPT SAS BIOS may incorrectly load and initialize SAS HBA cards, after which the 2530 stops accepting I/O requests from the server. Use RAIDar to reset the host port interface on both 2530 storage controllers. Log into RAIDar and navigate to the MANAGE > UTILITIES > host utilities > reset host channel page and click the Reset Host Port button to initiate the action. Although the web page will return immediately with a response, it may take up to 1 minute for the 2530 storage controller to process the request and make the host ports ready for initialization by the SAS HBA card. A reboot of the server is not required.
An unanticipated path change may occur on Red Hat Linux 4 Update 6 hosts using HPDM Multipath software.
In a multi-path configuration using HPDM software during periods of heavy I/O load, Linux Red Hat 4.6 hosts may experience unanticipated path change due to a SCSI I/O timeout. This issue does not occur for single path configurations. To reduce the likelihood of occurrence, ensure the storage array is properly configured and has a balanced I/O load.
Drives may not be seen by the LSI Logic MPT SAS BIOS when more than one SAS HBA card is installed on some servers.
After making some configuration changes, the MPT SAS BIOS may not list all available drives for boot during startup. This message may appear during startup: Adapter configuration may have changed, reconfiguration is suggested. The MPT SAS BIOS setup utility, accessed by pressing F8 during boot-up and selecting the SAS Configuration Tool, can be used to add the HBA back to the boot list. A reboot of the server is required.
A newly created snapshot volume on a single controller array presented and mapped to a Windows host may not be detected by the Windows operating system.
Use the Rescan Disks command in Windows Disk Management to force detection of the newly created snap shot volume. Rescanning disks can take several minutes, depending on the number of hardware devices installed.
RAIDar may hang after drive failures.
If this condition occurs, access the array using the command line interface (CLI) and restart the management controller (MC) using the command: restart mc a (single controller) or restart mc both (dual controller).
Issuing the CLI command set cache-parameters read-ahead maximum does not change the setting although the message indicates success.
Some cases, where the number of volumes is large or for some reason the available read cache is small, may result in the setting of "maximum" to display a smaller value than expected. The maximum read-ahead cache is calculated by dividing the available read cache by the number of volumes presented.
An NTP server IP address change on one controller does not propagate to the other controller.
Disable NTP on both controllers, set the IP address of the intended NTP server, and re-enable NTP.
When a controller is powered off or in a failed state, the Link LED remains ON although the host indicates a link down state.
Disregard the Link LED when the controller removal LED is illuminated.
On the RAIDar Manage > Scheduler page, the Snapshot Prefix field allows up to 31 characters to be input but then displays an error message indicating the prefix can have 1-14 characters.
Only type 1 to 14 characters in the Snapshot Prefix field.

On Linux, a scan may be required to detect a newly presented Volume/LUN.

A Volume/LUN newly presented to a Linux host may not be detected by the Linux operating system. This is a Linux issue, not an array issue. A Linux scan can be used to force detection of the newly presented Volume/LUN.

The scan command syntax is:

echo ?<Channel> <Target identifier> <LUN>? > /sys/class/scsi_host/host<Host number>/scan

Where <Channel>, <Target identifier>, and <LUN> can have a - wild card; <Host number> can be 0, 1, 2, or 3.

Examples

echo ?- - ?? > /sys/class/scsi_host/host1/scan # scan all Channels, all Targets, and all LUNs of Host 1.
echo ?0 - ?? > /sys/class/scsi_host/host1/scan # scan Channel 0, all Targets, and all LUNs of Host 1.
echo ?0 0 ?? > /sys/class/scsi_host/host1/scan # scan Channel 0, Target 0, and all LUNs of Host 1.
echo ?0 0 0? > /sys/class/scsi_host/host1/scan # scan Channel 0, Target 0, and LUN 0 of Host 1.



	NOTE: For multipath/dual HBA configurations connected to the same array, run the same command for both HBAs. For example: `echo ?- - ?? > /sys/class/scsi_host/host1/scanecho ?- - ?? > /sys/class/scsi_host/host2/scan` For HPDM Multipath configuration, after running the scan command, run the following commands to update the multipath maps in the kernel:`/etc/init.d/multipathd restart/sbin/multipath ?v3`

There is an issue with DMS where I/O can time out.
- The following conditions need to be true in order for this to occur:
  - DMS is enabled.
  - Snap Pool (aka Backing Store) and Master Volume are on the same vdisk.
  - Host I/Os are running. Based on what is observed during recreating or failing to recreate, it may only be brought out with heavy host I/Os. Since there are not enough runs to make it statistically significant, treat heavy I/Os as an observation.
  - There is a failover operation (as in a controller failure).

These steps should be followed if, on controller shutdowns or code updates, the surviving controller crashes.
1. Unplug all host interfaces from both controllers.
2. Reboot the crashed controller and perform the firmware update procedure again if applicable.
3. Once the code has been updated, or both controllers are now operational, plug the host cables back in.
4. Bring the storage array back online to the host(s).