|
DiskShadow TM - Technical Manual
It is recommended that this technical description be read after the "DiskShadow 4.24
User Manual".
DiskShadow uses the QNX prefix adoption mechanism to create a DiskShadow directory.
All I/O requests that 'resolve' to this adopted directory will be sent to DiskShadow
by operating system. These requests present themselves to the driver as messages
with structures defined in /usr/include/sys/*_msg.h header files. A write request
to a file within the DiskShadow directory will cause a write to be performed to the
two files on the destination directories/disks. QNX gives a process the ability to 'adopt' a path name in the file system. This is called a 'prefix' and system requests that refer to a prefix name are resolved against the prefix table maintained by the Process Manager (Proc). Please refer to the Systems Administration and System Architecture manuals. The prefix with the longest match is passed the system request message. In the case of DiskShadow, these are File System related messages.
Names that already exist cannot be re-adopted. By default the DiskShadow driver adopts the
'/shadow' directory on the current node. Therefore a utility that refers to the file
'/shadow/test' (for example) will cause I/O messages to be sent to the DiskShadow driver.
In the case of a DiskShadow disk the 'real' source or destination of data are two directories/disks. This causes a problem for some I/O functions such as iostat where the status may be different for the two 'real' files. Primary Path: Certain operations (such as stat) will only be performed on this path, as there is no way to return the status of both files (both should be the same anyway). All other operations (open/read/write, etc) will be performed on both paths. Any errors that are returned for a request will be taken from the Primary path. Secondary Path: This path is considered to be the shadow image of the first path. If an operation can be performed on the Primary path but not on the Secondary then a warning will be issued to inform a user that the data being DiskShadowed may not be as secure as the user intended. This problem should only occur in abnormal circumstances when a Secondary disk is full, or the Secondary directory structure doesn't match the Primary's structure.
As a performance improvement DiskShadow will perform reads from the local node, the
remote nodes file position will be updated to match the read operation.
To be sure of the integrity of the DiskShadowed file data, checks are done following most of these I/O operations to determine if the action returned the same result from both disks. If an operation resulted in two different replies or error returns then an internal 'insecure data' flag is set. When the 'insecure' file is closed a message reporting the possible problem is output on stderr or to the active console. There are no other tests or indications for the security or otherwise of a file. For normal operation and recovery situations the Primary and Secondary should not get out of step.
If DiskShadow is started WITHOUT any of the three update options then there is a
possibility that files will not match exactly. Reading or other operations on these
files will probably result in an 'insecure data' message. Manually changing the state
of files in the Primary or Secondary directory may also result in 'insecure' files.
Since all file operations are being performed on two disks, it is important that the two directory structures match. Any creation of directories by the DiskShadow will cause them to be created on both disks so they will match. At the startup of the DiskShadowing software it may optionally update the Secondary disk from the Primary. Three modes of update are provided. These modes are used in : Multiple Processes and Files DiskShadow conforms to the 'namespace' prefixing mechanism described in the QNX 'System Architecture' manual (Chapter 4). The manual refers to the differences between file descriptor namespace and pathname filespace. The discussed 'Open Control Block' (OCB) structure is implemented in this driver. This allows multiple processes to open the same file twice and make calls such as lseek, without affecting each other.
DiskShadow creates a new OCB for each open request by a process. The OCB will be 'linked
to' by a link block. If the process duplicates its file descriptors then a new link block
will be created and linked to the existing OCB. OCBs are keyed by their process id and the
processes file descriptor.
DiskShadow not only mirrors to two directories it also provides the means to manage those directories. By using DiskCtrl the user of DiskShadow may declare a directory (Primary or Secondary) unavailable and disable its use. DiskShadow will only access the remaining open directory. If the user attempts to close a directory when a directory is already closed then an error is returned to DiskCtrl.
The user may nominate a new disk or directory to replace the disabled one or re-open the
previous directory and update it (See DiskCtrl -N). When opening (enabling) a new disk or
directory the user may specify the method to be used to update that directory. The update
methods available are:
These activities take place without substantially disturbing the normal operations of file access through DiskShadow. Delays will be experienced while the new directory is being updated. DiskShadow will re-create files according to the update method chosen. Files that are open when the request is sent will be created and opened to exactly the same position as that in the 'other' path.
DiskShadow may be started with either the Primary or Secondary disabled.
DiskShadow is issued with a serial number and a license number. The serial number identifies the customers version of this software. The license number defines the maximum number nodes on which DiskShadow may simultaneously run. As described in the Users Manual, DiskShadow's use depends on your acceptance of the licensing and disclaimer conditions.
Each copy of DiskShadow is issued with an inbuilt license number. Additional licenses are
added by appending new license records to the license file ("/.ds_license"). This file
resides in the root directory, it is scanned by DiskShadow at startup to determine the
total number of licenses and the customer acceptance of the license. If DiskShadow is
being run without a valid root (DiskShadow will adopt the root) directory then it will
look for the license file in the Primary directory. In this case the file must be copied
to the Primary directory.
DiskShadow is able to detect and recover from either Primary or Secondary network paths that are lost due to network failure. If the "-a" input parameter is specified then DiskShadow will test the return of each I/O request to see if the network has failed. If an EHOSTUNREACH error is received then that path is disabled and the error reported to stdout / history or the console. From then on DiskShadow will test the failed path (every -T nn interval) to determine if it has returned to being useable. If 'notification' has been requested by a process then that proxy is triggered. No significant loss of performance should be noticed. Once the failed node becomes accessible then DiskShadow will attempt to recover that node and directory using the update method specified with the "-L" input parameter (0 | 1 | 2).
During the restore process DiskShadowing will be 'stalled'. DiskShadow will report the
progress of the recovery on the history log. Apart from the time taken there will be
little or no indication that a recovery is in progress. If 'notification' has been
requested by a process then that proxy is triggered.
A client process may send messages directly to DiskShadow requesting changes in status and also requesting a status structure in reply. In this way the user can 'poll' DiskShadow to detect status changes or to monitor its current state. If a user doesn't wish to 'poll' then he may use DiskShadow's ability to 'notify' client processes whenever there is a significant change in the status of any nodes (failed or recovered). The user provides a proxy for DiskShadow to trigger and send a message to DiskShadow containing the proxy pid.
Sample code is included in the release software set that 1) 'notifies' DiskShadow then
waits and 2) 'poll's DiskShadow every second and displays the returned data. These
structures are contained in the header file DiskCtrl.h (also included in the release
file set).
Status- DiskShadow V4.24A SN100/32 10/11/98 Debug Y Enabled Y Pri Enabled Y Sec Enabled Y Pri Net ok? Y Sec Net ok? Y Strict N AutoRebuild Y Retry Time 010
Similarly a history file may be enabled. Normally DiskShadow V4.24A will log errors such
as "insecure file" to stderr. Each message is preceded by a unique 4 digit number (e.g.
00020 [primary_filename][secondary filename][error]). You may then write your own monitoring
process to read the history file and take appropriate action based on the error code and
filenames in the error message.
Pipes are not supported by DiskShadow. It is recommended that the user invoke the QNX pipe
utility using 'Pipe &' in the sysinit file. Support for pipes may be added in future, but
for the present there is little or no need for DiskShadow to deal with them.
Fifos are also not supported by DiskShadow.
This software requires the following to run : * Report on memory usage and open file status. * Enabling and disabling of startup options such as verbose, strictness, Primary or Secondary directories. * Applications may be notified of significant DiskShadow status changes via proxies. * Open or Close a new Primary or Secondary while specifying the update mode. * Cleanly terminate the DiskShadow process. DiskShadow Control/Report Utility - DiskCtrl
DiskCtrl allows you to control the behavior of DiskShadow via a number of parameters.
One of the more useful scenarios for using these control variables is for on-line backup.
The procedure for this would be as follows:
Note that while the update/sync is occurring, all requests to DiskShadow will be stalled.
When the update is finished all pending requests will then be processed. Thus a short down
time will be experienced.
DiskShadow /dir1 //2/dir2 & Creates a '/shadow' directory and uses /dir1 and //2/dir2 for the Primary and Secondary drives. /dir1 and //2/dir2 will still be available as "unshadowed" directories. No update will be done. Operations done on the "/shadow" directory will be copied (shadowed) in the /dir1 and //2/dir2 directories. DiskShadow -m /alice //3/qny //4/qny &
Creates the directory "/alice" which will DiskShadow the two following paths. 'cp's of
files to /alice will result in the files being created on //3/qny AND //4/qny. Note that
in this example the two sub-directories are on remote nodes. The DiskShadow directory
"/alice" can only exist on the same node as the running DiskShadow process. DiskShadow -D -L0 //2/work //3/work & Will create '/shadow' on the current node. The "-D" option means that any errors will be displayed on the active console using the display_msg function. The startup option '-L' will cause any files that are older or not existing on the Secondary drive to be copied from the Primary drive. See DiskShadow startup options for details of the -L parameters. DiskShadow -D -L1 //2/work //3/work & This example is similar to the previous except that //2/work files and directories will always be copied to //3/work. Note: '-L2' will cause the files in //3/work to be removed then copied from //2/work.
The following sequence may be used to perform a non evasive, on-line backup of files.
NON-DiskShadowed File Test Results sin times SID PID PROGRAM PRI UTIME STIME UTIME CSTIME 0 1 sys/Proc32 30f 16.380 9.626 1.070 0.350 0 4 /bin/Fsys 22r 48.210 21.412 0.000 0.000 Disk Tester : standard file File created in 0 seconds, 500 writes of 512 bytes File read sequentially in 0 seconds, 500 reads of 512 bytes File read randomly in 1 seconds, 500 reads of 512 bytes File butterfly read in 3 seconds, 500 reads of 512 bytes File rewritten in 0 seconds, 500 writes of 512 bytes File rewritten randomly in 0 seconds, 500 writes of 512 bytes File butterfly rewrite in 1 seconds, 500 writes of 512 bytes Test time : 6.47s real 0.12s user 0.06s system Disk Tester : DiskShadowed file File created in 3 seconds, 500 writes of 512 bytes File read sequentially in 0 seconds, 500 reads of 512 bytes File read randomly in 1 seconds, 500 reads of 512 bytes File butterfly read in 0 seconds, 500 reads of 512 bytes File rewritten in 1 seconds, 500 writes of 512 bytes File rewritten randomly in 0 seconds, 500 writes of 512 bytes File butterfly write in 0 seconds, 500 writes of 512 bytes Test time : 7.40s real 0.11s user 0.06s system Disk Tester : shared file Sync flag set File created in 34 seconds, 500 writes of 512 bytes File read sequentially in 8 seconds, 500 reads of 512 bytes File read randomly in 13 seconds, 500 reads of 512 bytes File butterfly read in 12 seconds, 500 reads of 512 bytes File rewritten in 29 seconds, 500 writes of 512 bytes File rewritten randomly in 28 seconds, 500 writes of 512 bytes File butterfly write in 29 seconds, 500 writes of 512 bytes Test time : 154.68s real 0.05s user 0.09s system sin times SID PID PROGRAM PRI UTIME STIME UTIME CSTIME 0 1 sys/Proc32 30f 16.430 9.676 1.070 0.350 0 4 /bin/Fsys 22r 52.330 22.502 0.000 0.000 DiskShadowed File Test Results sin times SID PID PROGRAM PRI UTIME STIME UTIME CSTIME 0 1 sys/Proc32 30f 16.620 9.756 1.070 0.350 0 4 /bin/Fsys 22r 53.210 23.212 0.000 0.000 1 588 DiskShadow 10o 1.200 0.870 0.000 0.000Disk Tester : standard file File created in 1 seconds, 250 writes of 512 bytes File read sequentially in 0 seconds, 250 reads of 512 bytes File read randomly in 2 seconds, 250 reads of 512 bytes File butterfly read in 0 seconds, 250 reads of 512 bytes File rewritten in 1 seconds, 250 writes of 512 bytes File rewritten randomly in 0 seconds, 250 writes of 512 bytes File butterfly rewrite in 0 seconds, 250 writes of 512 bytes Test time : 5.84s real 0.07s user 0.04s system Disk Tester : shared file File created in 0 seconds, 250 writes of 512 bytes File read sequentially in 1 seconds, 250 reads of 512 bytes File read randomly in 1 seconds, 250 reads of 512 bytes File butterfly read in 1 seconds, 250 reads of 512 bytes File rewritten in 1 seconds, 250 writes of 512 bytes File rewritten randomly in 0 seconds, 250 writes of 512 bytes File butterfly write in 0 seconds, 250 writes of 512 bytes Test time : 5.87s real 0.01s user 0.09s system Disk Tester : shared file Sync flag set File created in 35 seconds, 250 writes of 512 bytes File read sequentially in 9 seconds, 250 reads of 512 bytes File read randomly in 13 seconds, 250 reads of 512 bytes File butterfly read in 14 seconds, 250 reads of 512 bytes File rewritten in 29 seconds, 250 writes of 512 bytes File rewritten randomly in 29 seconds, 250 writes of 512 bytes File butterfly write in 28 seconds, 250 writes of 512 bytes Test time : 158.58s real 0.06s user 0.05s system sin times SID PID PROGRAM PRI UTIME STIME UTIME CSTIME 0 1 sys/Proc32 30f 17.130 10.136 1.070 0.350 0 4 /bin/Fsys 22r 56.880 24.552 0.000 0.000 1 588 DiskShadow 10o 1.740 1.240 0.000 0.000 DiskShadowed Big File write/read test Test Conditions: DiskShadow running in Node 2 : 486 33Mhz, 8Mb memory: Node 1: 586 133Mhz, 8Mb memory. Network is 10Mb ethernet Ne1000 (node 1) and wd8003 (node2). The following tests were intended to provide some performance comparison between not using DiskShadow and using it over a network. The test involves :
5 times :- open a file (create it).
do 5 writes of 65520 bytes to the file then
read all 65520 by 5 bytes from the file and test the contents.
Direct to a disk 3.03s real 1.54s user 0.00s system 3.91s real 1.54s user 0.01s system 3.12s real 1.54s user 0.01s system 3.51s real 1.47s user 0.01s system 3.85s real 1.44s user 0.02s system 2.81s real 1.45s user 0.01s system 3.95s real 1.43s user 0.00s system 3.22s real 1.45s user 0.00s system 2.92s real 1.56s user 0.01s system 3.53s real 1.59s user 0.00s system Both Primary and Secondary are Remote. 9.02s real 1.45s user 0.04s system 9.14s real 1.50s user 0.01s system 9.10s real 1.51s user 0.01s system 9.08s real 1.49s user 0.02s system 9.08s real 1.48s user 0.03s system 9.12s real 1.52s user 0.05s system 9.08s real 1.48s user 0.03s system 9.08s real 1.54s user 0.00s system 9.16s real 1.48s user 0.03s system 9.08s real 1.48s user 0.03s system Primary is Remote 6.43s real 1.53s user 0.01s system 6.42s real 1.50s user 0.05s system 6.49s real 1.51s user 0.01s system 6.17s real 1.54s user 0.04s system 6.34s real 1.47s user 0.03s system 6.40s real 1.49s user 0.01s system 6.49s real 1.50s user 0.00s system 6.35s real 1.54s user 0.02s system 6.39s real 1.57s user 0.03s system 6.31s real 1.45s user 0.01s system Secondary is Remote 6.71s real 1.51s user 0.01s system 6.47s real 1.48s user 0.02s system 6.34s real 1.51s user 0.05s system 6.31s real 1.53s user 0.02s system 6.37s real 1.54s user 0.02s system 6.48s real 1.54s user 0.00s system 6.58s real 1.49s user 0.00s system 6.53s real 1.55s user 0.01s system 6.35s real 1.50s user 0.02s system 6.34s real 1.58s user 0.04s system Primary and Secondary are Local. 5.64s real 1.52s user 0.01s system 6.29s real 1.54s user 0.03s system 6.30s real 1.50s user 0.01s system 6.22s real 1.55s user 0.02s system 5.95s real 1.53s user 0.02s system 6.13s real 1.64s user 0.00s system 6.22s real 1.54s user 0.02s system 5.92s real 1.53s user 0.02s system 6.02s real 1.60s user 0.01s system 6.35s real 1.54s user 0.02s system BACK Copyright 1997-1999 Symmetry Innovations Pty Ltd |