Dynamic Queue Handling Of SCSI Devices
This project aims to design and implement an adaptive and highly efficient algorithm to avoid the aborting of the requests coming onto the SCSI disks (target) in SAN (Storage area network). Every SCSI device has a queue to store the requests (commands) generated by initiator. When the initiator tries to simultaneously push N commands to the target, time to serve those commands, can be more than timeout for any single command. Hence one or more commands in the tail of the queue can not be served on time less than the timeout, so the initiator will decide that they are stuck on the target. And these commands are aborted by target, reducing the performance of SAN severely in seek intensive workloads. Increasing or decreasing the queue depth on the target dynamically based on how slow/fast the back storage speed is, comparing to the target link, will be the best solution to overcome above problem. An algorithm is designed to generate some formulae to maintain the queue depth dynamically. This algorithm gives the best solution to satisfy all the commands without aborting them, hence enhancing the CPU performance and making Linux the best storage OS.
User requests the data in the form of read and write operations which are transformed to SCSI commands. These SCSI commands are exchanged between initiator and target.
SCSI initiator is a host device that has ability to initiate SCSI operation or send SCSI commands.
SCSI target is a device that will respond to SCSI commands sent by initiators. Generally hard drives, tape drives, printers, scanners act as SCSI target device.
SCST is a SCSI target mid-level subsystem for Linux. It is an interface between SCSI target drivers and Linux kernel.
Multiple SCSI commands coming to the target device at a time are stored in a queue. Per physical device one queue is maintained, called as device queue. Generally default depth of queue is 256. This default depth is not enough in large networks where multiple initiators are active. So when the number of commands on a single device is greater than default queue depth further commands are aborted. Mere increase in queue depth is not the efficient solution as it increases the CPU load and memory requirement. This problem becomes bottleneck in commercial applications of storage networks. This purpose motivated us to design and implement an adaptive algorithm which will be milestone for commercial use of SCST.
- Current Works in this Area
Some schemes have been developed to attempt to handle the queue full conditions. In one of the trivial schemes, initiator and target agree upon some required fixed queue depth, thus number of commands sent by the initiator is not more than the queue depth. This scheme has been implemented as a session queue in SCST which works only for a single initiator environment. In multiple initiator environments, however, one or more initiators accessing the target at consistently heavy loads lead to low performance and starvation to an initiator. For device queue, which is shared among all initiators active on that device, queue full condition is still generated in multiple initiator environment. These drawbacks emphasis the need of developing some dynamic methods to handle the device queue.
- Details of the Design:
Platform: Linux, SCST (Generic SCSI target subsystem)
Design of the algorithm:
We have designed the following algorithm:
- Each SCST command has timeout value, which is set by the corresponding dev handler. SCST core should keep device's queue depth at the level that the worst command's execution time should be checked.
- P – Load watch period, all the statistic is gathered and processed.
- MN - underload ratio divisor, sets the underload portion of timeout. If the longest execution time among all commands completed during period P is below timeout/MN, the corresponding device considered underloaded.
- MX - overload ratio divisor, sets the overload portion of timeout. If the longest execution time among all commands completed during period P is above timeout/MX, the corresponding device considered overloaded.
- I - Queue size will be increased by I if device considered underloaded.
- D - Queue size will be decreased by D if device considered overloaded.
- QI - quick fall interval.
- Q - quick fall ratio divisor. If the longest execution time of a completed command is above timeout/Q and time from the previous quick fall is smaller than QI, the corresponding device considered heavily overloaded. It is needed to handle cases when load on device is instantly increased.
- QD - divisor on which device's queue size will be decreased if device considered heavily overloaded.
- Max_exec_ratio - scst default timeout / worst command execution time
- Timer starts when the command is received and ends when the command is finished.
- If the number of commands on a device is equal to the queue depth of the device then the queue_full flag is set to true.
- When command is finished max_exec_ratio of the device is set to maximum of
- Execution time of command
- *100 / timeout of command.
- After finishing the command, if its execution time is above timeout/Q and time from latest quickfall is above QI,
- The device's queue_depth set to maximum of (1, device's queue_depth/QD).
- Flow control period reset.
- There should be a work, which once in a P seconds will check max_exec_ratio, then:
- If device neither underloaded, nor overloaded. i.e. max_exec_ratio between defined by MN and MX, do nothing.
If device was underloaded.
If queue_was_full is false, then do nothing.
If queue_was_full is true, then set device's queue_depth to min(default queue depth , device's queue_depth + I)
- If device was overloaded, then set queue_depth to max(1, device’s queue_depth/D).
- Components from other projects:
We have used following components:
On target side: scst-126.96.36.199
iscsi-scst -188.8.131.52 (target driver)
On initiator side: open-iscsi 2.0-871
Platform: Linux Vanilla Kernel-184.108.40.206
SCST is an interface between SCSI target drivers and Linux kernel. It is designed to simplify target driver development and making Linux compatible for various targets drivers. It provides all the necessary functionality of Linux SCSI midlayer, residing in the upper layer of SCSI subsystem. It provides simple command processing path which allows reaching maximum possible performance and scalability.
- Many systems like Sun Solaris, HP, and Linux offer several kernel based tuning parameters that are used to adjust the queue depth of the SCSI devices. But none of them offer any mechanism to handle the queue depth dynamically.
- If the system has a combination of devices that support smaller and larger queue depths, then a queue depth has to be set to a value, which would work for most devices. Using above algorithm the queue depth is utilized in efficient way such that every device works at its best, without compromising with queue depth.
- Setting the queue depth to a value larger than the device can handle will result in I/Os being held off once a QUEUE FULL condition exists on the disk.
Our algorithm provides a mechanism that will lower the queue depth of the device in case of overload condition avoiding infinite QUEUE FULL conditions on that device. This will be also beneficial when the connected SCSI devices support smaller queue depth for load balancing.
- Practical Application:
- If you have multiple active paths to a SCSI device (LUN), you might need to manage your device queue depths to maximize the device's performance. This is particularly true with dynamic multi-pathing applications - such as EMC's PowerPath application—which allow all multiple paths to a LUN to be in use simultaneously.
- In SAN (storage area network) even in single-pathing or static multi-pathing environments, management of device queue depths can be important to maximize the performance, throughput and reliability of the storage device.
- The market is demanding longer distance connectivity and greater peripheral connection capability as well as dynamic configuration features. Today, SCSI is the technology of choice for the vast majority of server and high-performance PC environments. This algorithm will improve the efficiency of SAN to large extent and will also improve the performance of SCSI storage devices.