Dot Hill AssuredSAN 3003 Family Release Notes for Software Version TN230P008

Microsoft Windows 2003 and 2003 R2 (IA32 and x64)
Microsoft Windows Server 2008 (IA32 and x64, Standard, Enterprise, and Datacenter editions)
Microsoft Windows Server 2008 R2 (x64, Standard, Enterprise, and Datacenter editions)
Microsoft Windows Server 2008 and 2008 R2 x64 Hyper-V
Red Hat Enterprise Linux 4.8, 4.9 and 5.4, 5.5, 5.6(IA32, x64)
SuSE Linux Enterprise Server 10.3 and 11.1 SP1 (x64 and IA32)
VMware ESX 4.0 and 4.1
Apple OS X (Snow Leopard 10.6)

..Enhancements and fixes

The following enhancement was incorporated in TN230P008:

Added new mode page 0 to set vendor-specific fields and ensure correct drive mode page 0 settings for Command Aging Limit, Read Reporting Threshold and Write Reporting Thresholds.

The following fixes were incorporated in TN230P008:

Page fault causes controller crash.
Fixed the code to generate an event when a drive drops out of a multilevel container.
When a drive is pulled from a vdisk, instead of retrying the pending queue and retry queue, fail the commands if the error is E_NO_TARGET.
Fixed incorrect reporting of supercapacitor end of life.
Page fault on both controllers while trying to get memory from the Read Memory Region(RMM).
Lengthen the periodic monitoring 'time-out' for the compact flash.

The following fixes were incorporated in TN230P006:

Add display information values for Gelding and Shorty midplane types to the "show enclosures" CLI command.
Add improvements to the Customization Tool Kit.

The following fixes were incorporated in TN230P005:

Improve no-mirror write performance while preserving NVCE data.
Improve drive element SES status on drive removal.
Generate an event when a drive drops out of a vdisk.
Use the on-die CPU temperature sensor to control fan speed.
Synchronize AutoFlush, enable PRM backup, and ensure that the SCAPs and CF are fully functional to improve performance and reliability.
A SATA drive is still displayed in RAIDar after it is removed.
With certain chassis, vdisks QTOF, CRIT, FTDN and OFFL and disks are no longer displayed after a power cycle.
Prevent a system crash on RAID-6 partial stripe reads with two failed drives or errors on two drives.
Mappings are removed by reset snapshot.
A vdisk is in a quarantined state even though drives are available in rescan.
Client hangs when QTOF Vdisk returns a check condition.
Correct an eventlog event so the PSU side is displayed correctly.
Prevent an SES PHY enable event for a drive from resetting the EC.
Setting expander-phy to enable or disable returns a success message but does not change PHY settings for enclosure or drives on the partner controller.
Generate an event when a drive drops out of a vdisk to prevent a vdisk from going critical.
Use the on-die CPU temperature sensor to control fan speed after a CPU Overtemp warning.
Data corruption can occur after a power cycle when volumes are operating in no-mirror mode.

The following fixes were incorporated in TN230P002:

Fix erroneous sense data reporting for >2TB drives.

The following fixes were incorporated in TN230R029:

Fix for fsw-destages that get read from RAID errors causing controller asserts or hangs.
Skip the URE drive and total I/O count while reconstructing data from remaining drives to fix an unrecoverable error during a partial stripe read.
Allow more DMAs to be sent between the two controllers for forwarded I/O.
When in the no-mirror mode do not turn off the broadcast bus.

..Important firmware notes

If you intend to use Windows Dynamic Disk (software RAID) on top of your hardware RAID there are some cautions to be considered. For more information, see the section "Real World: Dynamic versus Basic Disks" in the topic at http://technet.microsoft.com/en-us/library/dd163558.aspx.
Failover and failback times are affected by the number of system volumes; the more volumes there are on the system, the more time is required for failover and failback to complete.
When changing a replication set (for example, adding or removing a replication volume, or deleting the replication set), do so from the source system; when aborting, suspending or resuming a replication, do so from the destination system.
When changing the external-view volume of a replication set, do so from the destination system first, then perform the change on the source system.

..Installation instructions

The following sections discuss installing the firmware:

Installation notes and best practices
Installation troubleshooting
Installation instructions using FTP

Installation notes and best practices



	WARNING! Do not cycle power or restart devices during a firmware update. If the update is interrupted or there is a power failure, the module could become inoperative. If this occurs, contact technical support. The module may need to be returned to the factory for reprogramming.



	CAUTION: Before upgrading firmware, ensure that the system is stable and is not being reconfigured or changed in any way. If changes are in progress, monitor them and wait until they are completed before proceeding with the upgrade.



	NOTE: To install this firmware, you must download the firmware package from the Dot Hill Customer Resource Center and save the file to your local filesystem. Ensure that FTP and telnet are enabled.

As with any firmware upgrade, it is a recommended best practice to ensure that you have a full backup prior to the upgrade.
When planning for a firmware upgrade, schedule an appropriate time to perform an online upgrade.
- For single domain systems, I/O must be halted.
- For dual domain systems, because the online firmware upgrade is performed while host I/Os are being processed, I/O load can impact the upgrade process. Select a period of low I/O activity to ensure the upgrade completes as quickly as possible and avoid disruptions to hosts and applications due to timeouts.
When planning for a firmware upgrade, allow sufficient time for the update.
- In single-controller systems, it takes approximately 10 minutes for the firmware to load and for the automatic controller restart to complete.
- In dual-controller systems, the second controller usually takes an additional 20 minutes to update, but may take as long as one hour.
During the installation process, monitor the system to determine update status and know when the update is complete.
After the installation process is complete and all controllers have automatically restarted, power cycle your storage system. Then verify the system status in RAIDar or the CLI to confirm that the new firmware version is displayed as running on all controllers.
Updating array controller firmware may result in new event messages that are not described in earlier versions of documentation. For comprehensive event message documentation, see the current version of the Event Descriptions Reference Guide.

Installation troubleshooting

If you experience issues during the installation process, do the following:

When viewing system version information in the RAIDar System Overview panel, if an hour has elapsed and the components do not show that they were updated to the new firmware version, refresh the web browser. If version information is still incorrect, proceed to the next troubleshooting step.
If version information does not show that the new firmware has been installed, even after refreshing the browser, restart all system controllers. For example, in the CLI, enter the restart mc both command. After the controllers have restarted, one of three things will happen:
- Updated system version information is displayed and the new firmware version shows that it was installed.
- The Partner Firmware Update process will automatically begin and will install the firmware on the second controller. When complete, the versions should be correct.
- System version information is still incorrect. If system version information is still incorrect, proceed to the next troubleshooting step.
Verify that all system controllers are operating properly. For example, in the CLI, enter the show disks command and read the display to confirm that the information displayed is correct.
- If the show disks command fails to display the disks correctly, communications within the controller have failed. To reestablish communication, cycle power on the system and repeat the show disks command. (Do not restart the controllers; cycle power on the controller enclosure.)
- If the show disks command from all controllers is successful, perform the firmware update process again.

Installation instructions using FTP



	NOTE: It takes approximately 10 minutes for the firmware to load and for the automatic restart to complete. Progress messages are displayed in the FTP interface during that time. Wait for the progress messages to specify that the firmware load has completed. If the system Partner Firmware Update (PFU) option is enabled, allow an additional 20 minutes for the partner controller to be updated. No messages are displayed in the FTP interface during PFU.

A controller enclosure can contain one or two controller modules. In a dual-controller system, both controllers should run the same firmware version. Storage systems in a replication set must run the same firmware version. You can update the firmware in each controller module by loading a firmware file obtained from the enclosure vendor.

If you have a dual-controller system and the Partner Firmware Update option is enabled, when you update one controller the system automatically updates the partner controller. If Partner Firmware Update is disabled, after updating firmware on one controller you must log into the partner controller’s IP address and perform this firmware update on that controller also.

For best results, the storage system should be in a healthy state before starting firmware update.

Firmware update via FTP is supported from the following versions TN230R029.

Place the downloaded firmware package in a temporary directory.
Locate the firmware file in the extracted folder.
Using RAIDar, prepare to use FTP:
1. Determine the network-port IP addresses of the system controllers.
2. Verify that the system FTP service is enabled.
3. Verify that the user you will log in as has permission to use the FTP interface and has manage access rights.
In single-domain environments, halt I/O to vdisks before starting the firmware update.
Restart the Management Controller (MC) in the controller to be updated; or if PFU is enabled, restart the MCs in both controllers.
Open a command prompt (Windows) or a terminal window (UNIX), and navigate to the directory containing the firmware file to load.
1. Enter a command using the following syntax: ftp <controller-network-address>.
2. Log in as an FTP user (user = ftp, password = flash).
3. Enter a command using the following syntax: put <firmware-file> flash.
If needed, repeat these steps to load the firmware on additional modules.
Quit the FTP session.

If the Storage Controller cannot be updated, the update operation is cancelled. If the FTP prompt does not return, quit the FTP session and log in again. Verify that you specified the correct firmware file and repeat the update. If this problem persists, contact technical support.

When firmware update on the local controller is complete, the message Operation Complete is printed, the FTP session returns to the ftp> prompt, and the FTP session to the local MC is closed.
Power cycle your storage system.
Clear your web browser’s cache and then sign in to RAIDar. In the RAIDar display, verify that the proper firmware version appears for each module. If PFU is running on the controller you sign in to, a dialog box shows PFU progress and prevents you from performing other tasks until PFU is complete.



	NOTE: If you attempt to load an incompatible firmware version, the message `* Code Load Fail. Bad format image. *` is displayed and after a few seconds the FTP prompt is redisplayed. The code is not loaded.



	NOTE: If you are using a Windows FTP client, during firmware update a client-side FTP application issue can cause the FTP session to be aborted. If this issue persists use another client or use another FTP application.



	NOTE: After firmware update has completed on both controllers, if the system health is Degraded and the health reason indicates that the firmware version is incorrect, verify that you specified the correct firmware file and repeat the update. If this problem persists, contact technical support.

..Known issues and workarounds

Issue: When creating a user or modifying user information in RAIDar, there is no option to enable the SMI-S interface for the user.

Workaround: Add the SMI-S interfaces to the user using the set user interfaces CLI command, making sure to include all appropriate interfaces.

Issue: During path failover events on systems with large LUN counts, Red Hat Enterprise Linux 4 iSCSI clients may report issues accessing devices.

Workaround: Increase the disk timeout to a minimum of 60 seconds.

Issue: When creating a volume set with the volumes mapped to LUNs, if there is a LUN conflict, the array will stop mapping volumes to LUNs, but will create the volumes as requested.

Workaround: Ensure that there are no LUN conflicts before creating the volume set with mapping or map the remaining volumes to LUNs after the conflict.

Issue: In dual-controller systems, accessing RAIDar on a controller does not display vdisks owned by the other controller.

Workaround: Restart the other controller.

Issue: In the CLI, the TakeSnapshot task returns the message: “Unable to validate retention limits. – The user is not recognized on this system.”

Workaround: Restart the controller accessed.

Issue: A failed drive becomes available when the vdisk that it was a member of is deleted.

Workaround: Do not use the drive.

Issue: The array will incorrectly accept a DNS name for the address of the NTP server in RAIDar. The array does not use DNS, and will translate the name into an invalid “255.255.255.255” IP address.

Workaround: Instead of a network name, enter the NTP server IP address.

Issue: When using the CLI, the array incorrectly accepts a DNS name for the address for the SMNP, SMTP, and NTP servers. The array does not use DNS, and will not be able to connect to the server correctly.

Workaround: Instead of network names, enter the IP addresses for the servers.

Issue: RHEL 4.8 may not discover all multipath devices and partitions during boot or reboot.

Workaround: This issue is addressed by applying the updated device-mapper-multipath package described in RedHat Bug Fix Advisory RHBA-2009:1524-1, available at http://rhn.redhat.com/errata/RHBA-2009-1524.html.

Issue: RHEL 5.4 may not discover all multipath devices and partitions during boot or reboot.

Workaround: This issue is addressed by applying the updated device-mapper-multipath packages described in RedHat Bug Fix Advisory RHBA-2010:0255-1, available at https://rhn.redhat.com/errata/RHBA-2010-0255.html.

Issue: Under rare circumstances, some events from one controller are not seen on the other controller.

Workaround: Review the events from both controllers.

Issue: During a firmware upgrade, the firmware bundle version may show incorrectly.

Workaround: Wait until the firmware upgrade process is complete before checking the firmware bundle version.

Issue: Javascript issues are seen when using Microsoft Internet Explorer in multi-byte language locales, resulting in truncated messaging and hung pop-up windows. These issues will be resolved in a future firmware release.

Workaround: This is a display problem only. When a pop-up window remains on screen with no update for a prolonged period, close and then re-open the browser. The Internet Explorer English locale and the Firefox browser will not exhibit the issues.

Issue: SLES 11 may require multiple minutes (15+/-) to create all multipath devices during boot. This typically involves a system with a large number of LUNs and multiple LUN paths.

Workaround: None. Wait for the system to complete LUN and path discovery.

Issue: SLES 11 SP1 may not create all devices during boot. This typically involves a system with a large number of LUNs, multiple LUN paths, and the SLES 11 SP1 open-iscsi utilities.

Workaround: Do one of the following:

Install the following Novell patches:
- kpartx-0.4.8-40.23.1.x86_64.rpm
- libvolume_id1-128-13.11.4.x86_64.rpm
- multipath-tools-0.4.8-40.23.1.x86_64.rpm
- open-iscsi-2.0.871-0.22.1.x86_64.rpm
- udev-128-13.11.4.x86_64.rpm
Run the /sbin/multipath -v 0 command to force multipathd to rescan all LUNs and paths and create any devices that were not correctly created before.

Issue: In rare conditions, the array controller may report that a supported 10 GbE SFP+, 10GbE Copper Cable, or 10GbE Direct Attach Cable is unsupported. This condition is most likely to occur when a SFP+, Copper Cable, or Direct Attach Cable is hot-plugged into the controller while the controller is running. When this occurs, the following Warning message will be recorded in the event logs: “An unsupported cable or SFP was inserted". At the same time, the host port will not show a status of "Down".

Workaround: Do the following:

Verify that the SFP+, Copper Cable, or Direct Attach Cable is a supported component.
If the components are supported, remove and reinsert the SFP+, Copper Cable, or Direct Attach cable.
If this does not correct the issue with the SFP+, Copper Cable, or Direct Attach Cable connected to the controller, either remove and reinsert the controller or power down and reapply power to the array.

Issue: USB CLI becomes unusable after a Management Controller reboot in Windows environments.

Workaround:

Close down the terminal application. (Example: HyperTerminal)
Open Device Manager and disable the “Disk Array USB Port” under Ports (COM & LPT).
Re-enable the “Disk Array USB Port”. If the problem persists, reboot the host.

Issue: The mini-USB CLI port on the array controller does not work.

Workaround: Install a device driver for the mini-USB CLI port. This driver is shipped on CD with the system and is also available for download on the Dot Hill Customer Resource Center.

Issue: When using SMI-S, the ExtentStatus property in StorageVolume is always set to “2”.

Workaround: None.

Issue: There is no indication that a LUN has failed over to the other controller.

Workaround: Using RAIDar, open up system events and scan for failover events. When using the CLI, use the show events command.

Issue: When accessing the Command Line Interface (CLI), output from commands scroll by with no option to page through the output.

Workaround: Change the window height or scrollback value of the command window to a larger value and use the scroll bar on the command window to scroll through the CLI output.

Issue: A replication is initiated, but only a snapshot on the primary volume occurs, or the replication is queued.

Workaround: Ensure that all systems involved have valid replication licenses installed and that all volumes and vdisks involved in the replication have started, are attached, and are in good health, including vdisks that contain the snap pools for the volumes involved. A replication normally queues when a previous replication involving the same volumes is active.

Issue: A replication set was deleted, but is later shown with the primary volume status of “Offline” and the status-reason is record-missing.

Workaround: This generally occurs when the secondary volume is detached and its vdisk stopped when the replication set was deleted, and then the vdisk of the secondary volume restarted. To correct this issue, reattach the secondary volume, set it as the primary volume, and delete the replication set.

Issue: An error message indicating “controller busy” occurs while creating a replication set.

Workaround: Creating a replication set immediately following another replication set creation may result in "Controller Busy". This is expected behavior. Wait and try the operation again at a later time.

Issue: In RAIDar, the Vdisk > Provisioning > Create Multiple Snapshots task allows a secondary volume to be selected, but fails the operation.

Workaround: User initiated snapshots are not allowed on secondary volumes. Do not select a secondary volume.

Issue: A scheduled replication is missing or replications are queued, but don’t complete.

Workaround: A best practice is to schedule no more than four volumes to start replicating at the same time and for those replications to recur no less than 30 minutes apart. If you schedule more replications to start at the same time or schedule replications to start more frequently, some scheduled replications may not have time to complete.

Issue: Unable to perform a local replication (a replication where the external view volume and the destination volume reside on the same system) with a single vdisk.

Workaround: Create a second vdisk for the destination volume.

Issue: Deleting the replication-set from the destination system fails.

Workaround: Delete the replication set from the source system (the system where the external view volume resides.)

Issue: A replication set is missing the primary volume and the replication set cannot be deleted.

Workaround: Set the primary volume to the remaining volume in the set. You should then be able to delete the replication set.

Issue: On rare occasions, deleting a vdisk when volumes are in the process of rolling back may cause communications issues between the management controller and the storage controller.

Workaround: Cycle power on the array to resolve the issue. To avoid this situation, allow the rollbacks to complete or delete the volumes before deleting the vdisk.

Issue: Scheduled tasks are not occurring, and there is no indication of a problem in the schedules or the tasks.

Workaround: Restart both management controllers (MC) of the array(s) involved in the tasks.

Issue: Cannot schedule volume copy operations, or scheduled volume copy operations for snapshots and standard volumes do not occur.

Workaround: Perform the volume copy manually. Scheduled volume copies of master volumes should complete successfully, if the schedule permits completion of the volume copy before the next occurrence.

Issue: Debug logs are incomplete.

Workaround: Determine if the logs are incomplete by unzipping the log file retrieved from the array and examining the last line of the store_YYYY_MM_DD__HH_MM_SS.logs file for the lines: End of Data ]]></LOG_CONTENT></RESPONSE>. If the file contains these two lines at the end of the file, it is complete and you can forward it to your service support organization for analysis. If the file does not contain these two lines at the end, it is incomplete and may not be useful. In this case, repeat the log collection process after a 5 minute delay. Should the second collection contain the above specified lines at the end of the file, send it to your service support organization for analysis along with the first set of logs. However, if the second file does not contain the above specified lines at the end of the file, reboot the system and try once more to collect the logs. Be sure to send all of the collected logs back to your service support organization with a brief note explaining the actions you took and the result.

Issue: In a dual controller system, login to one of the controllers fails, but login to the other controller succeeds.

Workaround: Log in to the other controller and restart the inaccessible management controller using the CLI restart mc command or RAIDar Tools > Shut Down or Restart Controller page.

Issue: IOPs and Bytes per second may be lower or higher than expected for the workload.

Workaround: This is a reporting issue and not a performance issue. The correct values can be calculated by using the change in the Number of Reads and Number of Writes over time to determine IOPs, and the change in Data Read and Data Written over time to determine Bytes per second.

Issue: The array controller may interpret a switch login as an HBA login and erroneously present the switch port as a discovered host. This does not affect storage functionality.

Workaround: Either identify the erroneous host and do not attempt to use, or Disable Device Scan on switch ports connected to the array controller and restart the array controller.

Issue: FC link unstable when forcing to 4g Emulex Corporation Helios-X LightPulse Fibre Channel Host Adapter

Workaround: Use an 8Gb HBA or set link speed to Auto.

Issue: Controller firmware upgrade requires a system reboot.

Workaround: Power cycle the storage system after the firmware upgrade is complete, as described in the Installation instructions above.

Issue: In a VMware environment if no LUN 0 exists, host drivers might be unable to find a LUN to which they should have access.

Workaround: Use the CLI set advanced-settings command or RAIDar's Advanced Settings to set Missing LUN Response to illegal (Illegal Request).

..Supersedes history

Firmware version	Release date
TN230P008	September 2013
TN230P006	September 2012
TN230P005	June 2012
TN230R029	March 2012
TN230R028	January 2012

..Effective date

September 2013

Rev. A

Part number: 83-00006106-17-01