DiskShadow TM - Technical Manual

It is recommended that this technical description be read after the "DiskShadow 4.24 User Manual".

Theory of Operation

DiskShadow uses the QNX prefix adoption mechanism to create a DiskShadow directory. All I/O requests that 'resolve' to this adopted directory will be sent to DiskShadow by operating system. These requests present themselves to the driver as messages with structures defined in /usr/include/sys/*_msg.h header files. A write request to a file within the DiskShadow directory will cause a write to be performed to the two files on the destination directories/disks.
The last two parameters in the DiskShadow drivers startup options are the disk/directory paths that will be used. The first is referred to as the 'primary' directory/drive and the second as the 'secondary'. Other startup options are available, and these provide the user with the tools to synchronize the two DiskShadow directories/disks and enable debugging.

Prefix Adoption

QNX gives a process the ability to 'adopt' a path name in the file system. This is called a 'prefix' and system requests that refer to a prefix name are resolved against the prefix table maintained by the Process Manager (Proc). Please refer to the Systems Administration and System Architecture manuals. The prefix with the longest match is passed the system request message. In the case of DiskShadow, these are File System related messages.

Names that already exist cannot be re-adopted. By default the DiskShadow driver adopts the '/shadow' directory on the current node. Therefore a utility that refers to the file '/shadow/test' (for example) will cause I/O messages to be sent to the DiskShadow driver.

Primary/Secondary Directories

In the case of a DiskShadow disk the 'real' source or destination of data are two directories/disks. This causes a problem for some I/O functions such as iostat where the status may be different for the two 'real' files.

Primary Path: Certain operations (such as stat) will only be performed on this path, as there is no way to return the status of both files (both should be the same anyway). All other operations (open/read/write, etc) will be performed on both paths. Any errors that are returned for a request will be taken from the Primary path. Secondary Path: This path is considered to be the shadow image of the first path. If an operation can be performed on the Primary path but not on the Secondary then a warning will be issued to inform a user that the data being DiskShadowed may not be as secure as the user intended. This problem should only occur in abnormal circumstances when a Secondary disk is full, or the Secondary directory structure doesn't match the Primary's structure.

As a performance improvement DiskShadow will perform reads from the local node, the remote nodes file position will be updated to match the read operation.

Insecure Files

To be sure of the integrity of the DiskShadowed file data, checks are done following most of these I/O operations to determine if the action returned the same result from both disks. If an operation resulted in two different replies or error returns then an internal 'insecure data' flag is set. When the 'insecure' file is closed a message reporting the possible problem is output on stderr or to the active console. There are no other tests or indications for the security or otherwise of a file. For normal operation and recovery situations the Primary and Secondary should not get out of step.

If DiskShadow is started WITHOUT any of the three update options then there is a possibility that files will not match exactly. Reading or other operations on these files will probably result in an 'insecure data' message. Manually changing the state of files in the Primary or Secondary directory may also result in 'insecure' files.

Directory Synchronisation

Since all file operations are being performed on two disks, it is important that the two directory structures match. Any creation of directories by the DiskShadow will cause them to be created on both disks so they will match.

At the startup of the DiskShadowing software it may optionally update the Secondary disk from the Primary. Three modes of update are provided.

  • Using the internal file copy routine, copy all newest/non-existent files to the Secondary, creating sub-directories as necessary. This copy is equivalent to using the cp utility with options '-Rpnvs'.
  • Copy All files and directories to the Secondary.
  • Delete all files and directories on the Secondary and then copy all files and directories from the Primary disk to the Secondary. This ensures that the two disks are synchronised.


  • These modes are used in :
  • DiskCtrl (-L option) for opening and updating directories,
  • DiskShadow at startup (-L option) for the initial directory synchronization and
  • DiskShadow when automatically rebuilding recovered directories (-a option).


  • Multiple Processes and Files

    DiskShadow conforms to the 'namespace' prefixing mechanism described in the QNX 'System Architecture' manual (Chapter 4). The manual refers to the differences between file descriptor namespace and pathname filespace. The discussed 'Open Control Block' (OCB) structure is implemented in this driver. This allows multiple processes to open the same file twice and make calls such as lseek, without affecting each other.

    DiskShadow creates a new OCB for each open request by a process. The OCB will be 'linked to' by a link block. If the process duplicates its file descriptors then a new link block will be created and linked to the existing OCB. OCBs are keyed by their process id and the processes file descriptor.

    Primary or Secondary - Enabling and Disabling

    DiskShadow not only mirrors to two directories it also provides the means to manage those directories. By using DiskCtrl the user of DiskShadow may declare a directory (Primary or Secondary) unavailable and disable its use. DiskShadow will only access the remaining open directory. If the user attempts to close a directory when a directory is already closed then an error is returned to DiskCtrl.

    The user may nominate a new disk or directory to replace the disabled one or re-open the previous directory and update it (See DiskCtrl -N). When opening (enabling) a new disk or directory the user may specify the method to be used to update that directory. The update methods available are:

    1. Copy only the newest files to the new directory. (option 0)
    2. Copy all files to the new directory. (option 1)
    3. Remove all files and sub-directories in the new directory then copy all files. (option 2)

    These activities take place without substantially disturbing the normal operations of file access through DiskShadow. Delays will be experienced while the new directory is being updated. DiskShadow will re-create files according to the update method chosen. Files that are open when the request is sent will be created and opened to exactly the same position as that in the 'other' path.

    DiskShadow may be started with either the Primary or Secondary disabled.

    Licensing and Distribution

    DiskShadow is issued with a serial number and a license number. The serial number identifies the customers version of this software. The license number defines the maximum number nodes on which DiskShadow may simultaneously run.

    As described in the Users Manual, DiskShadow's use depends on your acceptance of the licensing and disclaimer conditions.

    Each copy of DiskShadow is issued with an inbuilt license number. Additional licenses are added by appending new license records to the license file ("/.ds_license"). This file resides in the root directory, it is scanned by DiskShadow at startup to determine the total number of licenses and the customer acceptance of the license. If DiskShadow is being run without a valid root (DiskShadow will adopt the root) directory then it will look for the license file in the Primary directory. In this case the file must be copied to the Primary directory.

    Network Failures

    DiskShadow is able to detect and recover from either Primary or Secondary network paths that are lost due to network failure.

    If the "-a" input parameter is specified then DiskShadow will test the return of each I/O request to see if the network has failed. If an EHOSTUNREACH error is received then that path is disabled and the error reported to stdout / history or the console. From then on DiskShadow will test the failed path (every -T nn interval) to determine if it has returned to being useable. If 'notification' has been requested by a process then that proxy is triggered. No significant loss of performance should be noticed.

    Once the failed node becomes accessible then DiskShadow will attempt to recover that node and directory using the update method specified with the "-L" input parameter (0 | 1 | 2).

    During the restore process DiskShadowing will be 'stalled'. DiskShadow will report the progress of the recovery on the history log. Apart from the time taken there will be little or no indication that a recovery is in progress. If 'notification' has been requested by a process then that proxy is triggered.

    Notification

    A client process may send messages directly to DiskShadow requesting changes in status and also requesting a status structure in reply. In this way the user can 'poll' DiskShadow to detect status changes or to monitor its current state. If a user doesn't wish to 'poll' then he may use DiskShadow's ability to 'notify' client processes whenever there is a significant change in the status of any nodes (failed or recovered). The user provides a proxy for DiskShadow to trigger and send a message to DiskShadow containing the proxy pid.

    Sample code is included in the release software set that 1) 'notifies' DiskShadow then waits and 2) 'poll's DiskShadow every second and displays the returned data. These structures are contained in the header file DiskCtrl.h (also included in the release file set).

    Status and History Information

    If selected via the startup parameters DiskShadow will every 30 seconds re-write status information to a status file. With each write the file is rewound, re-written, and flushed. This means that the data may be accessed by external monitoring processes. The form of the status file is :

    Status- DiskShadow V4.24A SN100/32 10/11/98
    Debug	Y
    Enabled	Y
    Pri Enabled	Y
    Sec Enabled	Y
    Pri Net ok?	Y
    Sec Net ok?	Y
    Strict	N
    AutoRebuild	Y
    Retry Time	010
    

    Similarly a history file may be enabled. Normally DiskShadow V4.24A will log errors such as "insecure file" to stderr. Each message is preceded by a unique 4 digit number (e.g. 00020 [primary_filename][secondary filename][error]). You may then write your own monitoring process to read the history file and take appropriate action based on the error code and filenames in the error message.

    Pipe's

    Pipes are not supported by DiskShadow. It is recommended that the user invoke the QNX pipe utility using 'Pipe &' in the sysinit file. Support for pipes may be added in future, but for the present there is little or no need for DiskShadow to deal with them.

    Fifo's

    Fifos are also not supported by DiskShadow.

    Specifications

    This software requires the following to run :

    QNX4.23 or later.
    156K minimum available memory.
    Features :

  • Full length path names support. (_MAX_PATH)
  • Mixed file systems are allowed to be specified for the Primary and Secondary drives.
  • User selectable DiskShadow directory name.
  • Secondary drive update options, to permit full synchronization between the two shadowed drives.
  • Support for symbolic links that were created outside the DiskShadow driver. These include linked directories and multiple levels of linking.
  • Modification and examination of DiskShadow operating parameters while operating.
    * Report on memory usage and open file status.
    * Enabling and disabling of startup options such as verbose, strictness, Primary or Secondary directories.
    * Applications may be notified of significant DiskShadow status changes via proxies.
    * Open or Close a new Primary or Secondary while specifying the update mode.
    * Cleanly terminate the DiskShadow process.
  • Automated disabling of a Primary or Secondary if it becomes inaccessible via the network. Automated recovery of the same directory once the node becomes 'alive'.


  • DiskShadow Control/Report Utility - DiskCtrl

    DiskCtrl allows you to control the behavior of DiskShadow via a number of parameters.
    Usage: DiskCtrl[-n node] [-N path_Name] [-L update_mode] [-wait] -p pid REQUEST [ON| OFF|nn]
    Refer to the "DiskShadow 4.24A User Manual for a description of the DiskCtrl options and requests.

    One of the more useful scenarios for using these control variables is for on-line backup. The procedure for this would be as follows:

    1. Disable shadowing to secondary path (updates to primary path still continue)
    2. DiskCtrl -p nnn SEC OFF
    3. Backup secondary path
    4. When backup is complete, enable secondary path
    5. DiskCtrl -p nnn SEC ON
    6. Bring the secondary path up to date with the primary
    7. DiskCtrl -p nnn UPDATE 0

    Note that while the update/sync is occurring, all requests to DiskShadow will be stalled. When the update is finished all pending requests will then be processed. Thus a short down time will be experienced.

    Examples

    DiskShadow /dir1 //2/dir2 &

    Creates a '/shadow' directory and uses /dir1 and //2/dir2 for the Primary and Secondary drives. /dir1 and //2/dir2 will still be available as "unshadowed" directories. No update will be done.

    Operations done on the "/shadow" directory will be copied (shadowed) in the /dir1 and //2/dir2 directories.

    DiskShadow -m /alice //3/qny //4/qny &

    Creates the directory "/alice" which will DiskShadow the two following paths. 'cp's of files to /alice will result in the files being created on //3/qny AND //4/qny. Note that in this example the two sub-directories are on remote nodes. The DiskShadow directory "/alice" can only exist on the same node as the running DiskShadow process.
    "cp -Rcnv //6/bin/test /alice/ddd/bin/test" will creates files //3/qny/ddd/bin/test and //4/qny/ddd/bin/test.

    DiskShadow -D -L0 //2/work //3/work &

    Will create '/shadow' on the current node. The "-D" option means that any errors will be displayed on the active console using the display_msg function.

    The startup option '-L' will cause any files that are older or not existing on the Secondary drive to be copied from the Primary drive. See DiskShadow startup options for details of the -L parameters.

    DiskShadow -D -L1 //2/work //3/work &

    This example is similar to the previous except that //2/work files and directories will always be copied to //3/work. Note: '-L2' will cause the files in //3/work to be removed then copied from //2/work.

    The following sequence may be used to perform a non evasive, on-line backup of files.
    DiskCtrl close sec
    cp -Rcnv //3/work/* //3/work_copy/
    DiskCtrl -L1 open sec

    Note: The files will be copied in the state they were at the instant they were closed, indexes may not be updated etc. The state of the file contents is dependent on the applications method of operation.

      Benchmarks
      NON-DiskShadowed File Test Results
      sin times
      SID    PID	PROGRAM      PRI   UTIME   STIME   UTIME   CSTIME
      0      1    sys/Proc32   30f   16.380   9.626   1.070  	0.350
      0      4    /bin/Fsys    22r   48.210   21.412  0.000  	0.000
      

      Disk Tester : standard file
      File created in 0 seconds, 500 writes of 512 bytes
      File read sequentially in 0 seconds, 500 reads of 512 bytes
      File read randomly in 1 seconds, 500 reads of 512 bytes
      File butterfly read in 3 seconds, 500 reads of 512 bytes
      File rewritten in 0 seconds, 500 writes of 512 bytes
      File rewritten randomly in 0 seconds, 500 writes of 512 bytes
      File butterfly rewrite in 1 seconds, 500 writes of 512 bytes
      Test time : 6.47s real 0.12s user 0.06s system

      Disk Tester : DiskShadowed file
      File created in 3 seconds, 500 writes of 512 bytes
      File read sequentially in 0 seconds, 500 reads of 512 bytes
      File read randomly in 1 seconds, 500 reads of 512 bytes
      File butterfly read in 0 seconds, 500 reads of 512 bytes
      File rewritten in 1 seconds, 500 writes of 512 bytes
      File rewritten randomly in 0 seconds, 500 writes of 512 bytes
      File butterfly write in 0 seconds, 500 writes of 512 bytes
      Test time : 7.40s real 0.11s user 0.06s system

      Disk Tester : shared file
      Sync flag set
      File created in 34 seconds, 500 writes of 512 bytes
      File read sequentially in 8 seconds, 500 reads of 512 bytes
      File read randomly in 13 seconds, 500 reads of 512 bytes
      File butterfly read in 12 seconds, 500 reads of 512 bytes
      File rewritten in 29 seconds, 500 writes of 512 bytes
      File rewritten randomly in 28 seconds, 500 writes of 512 bytes
      File butterfly write in 29 seconds, 500 writes of 512 bytes
      Test time : 154.68s real 0.05s user 0.09s system

      sin times
      SID    PID	PROGRAM       PRI   UTIME   STIME   UTIME   CSTIME
      0      1   sys/Proc32    30f   16.430   9.676  1.070   0.350
      0      4   /bin/Fsys     22r   52.330  22.502  0.000   0.000
      

      DiskShadowed File Test Results

      sin times
      SID    PID	PROGRAM       PRI   UTIME   STIME   UTIME   CSTIME
      0      1    sys/Proc32   30f   16.620   9.756  1.070    0.350
      0      4    /bin/Fsys    22r   53.210  23.212  0.000    0.000
      1      588  DiskShadow   10o    1.200   0.870  0.000    0.000
      
      
      Disk Tester : standard file
      File created in 1 seconds, 250 writes of 512 bytes
      File read sequentially in 0 seconds, 250 reads of 512 bytes
      File read randomly in 2 seconds, 250 reads of 512 bytes
      File butterfly read in 0 seconds, 250 reads of 512 bytes
      File rewritten in 1 seconds, 250 writes of 512 bytes
      File rewritten randomly in 0 seconds, 250 writes of 512 bytes
      File butterfly rewrite in 0 seconds, 250 writes of 512 bytes
      Test time : 5.84s real 0.07s user 0.04s system

      Disk Tester : shared file
      File created in 0 seconds, 250 writes of 512 bytes
      File read sequentially in 1 seconds, 250 reads of 512 bytes
      File read randomly in 1 seconds, 250 reads of 512 bytes
      File butterfly read in 1 seconds, 250 reads of 512 bytes
      File rewritten in 1 seconds, 250 writes of 512 bytes
      File rewritten randomly in 0 seconds, 250 writes of 512 bytes
      File butterfly write in 0 seconds, 250 writes of 512 bytes
      Test time : 5.87s real 0.01s user 0.09s system

      Disk Tester : shared file
      Sync flag set
      File created in 35 seconds, 250 writes of 512 bytes
      File read sequentially in 9 seconds, 250 reads of 512 bytes
      File read randomly in 13 seconds, 250 reads of 512 bytes
      File butterfly read in 14 seconds, 250 reads of 512 bytes
      File rewritten in 29 seconds, 250 writes of 512 bytes
      File rewritten randomly in 29 seconds, 250 writes of 512 bytes
      File butterfly write in 28 seconds, 250 writes of 512 bytes
      Test time : 158.58s real 0.06s user 0.05s system
      sin times
      SID    PID	PROGRAM       PRI   UTIME   STIME   UTIME   CSTIME
      0      1    sys/Proc32    30f   17.130  10.136  1.070   0.350
      0      4    /bin/Fsys     22r   56.880  24.552  0.000   0.000
      1      588  DiskShadow    10o   1.740    1.240  0.000   0.000
      

      DiskShadowed Big File write/read test
      Test Conditions:
      DiskShadow running in Node 2 : 486 33Mhz, 8Mb memory:
      Node 1: 586 133Mhz, 8Mb memory.
      Network is 10Mb ethernet Ne1000 (node 1) and wd8003 (node2).
      The following tests were intended to provide some performance comparison between not using DiskShadow and using it over a network. The test involves :

      5 times :-	open a file (create it).
                  do 5 writes of 65520 bytes to the file then
                  read all 65520  by 5 bytes from the file and test the contents.
      
      
      Direct to a disk
         3.03s real    1.54s user    0.00s system
         3.91s real    1.54s user    0.01s system
         3.12s real    1.54s user    0.01s system
         3.51s real    1.47s user    0.01s system
         3.85s real    1.44s user    0.02s system
         2.81s real    1.45s user    0.01s system
         3.95s real    1.43s user    0.00s system
         3.22s real    1.45s user    0.00s system
         2.92s real    1.56s user    0.01s system
         3.53s real    1.59s user    0.00s system
      
      Both Primary and Secondary are Remote.
         9.02s real    1.45s user    0.04s system
         9.14s real    1.50s user    0.01s system
         9.10s real    1.51s user    0.01s system
         9.08s real    1.49s user    0.02s system
         9.08s real    1.48s user    0.03s system
         9.12s real    1.52s user    0.05s system
         9.08s real    1.48s user    0.03s system
         9.08s real    1.54s user    0.00s system
         9.16s real    1.48s user    0.03s system
         9.08s real    1.48s user    0.03s system
      
      Primary is Remote
         6.43s real    1.53s user    0.01s system
         6.42s real    1.50s user    0.05s system
         6.49s real    1.51s user    0.01s system
         6.17s real    1.54s user    0.04s system
         6.34s real    1.47s user    0.03s system
         6.40s real    1.49s user    0.01s system
         6.49s real    1.50s user    0.00s system
         6.35s real    1.54s user    0.02s system
         6.39s real    1.57s user    0.03s system
         6.31s real    1.45s user    0.01s system
      
      Secondary is Remote
         6.71s real    1.51s user    0.01s system
         6.47s real    1.48s user    0.02s system
         6.34s real    1.51s user    0.05s system
         6.31s real    1.53s user    0.02s system
         6.37s real    1.54s user    0.02s system
         6.48s real    1.54s user    0.00s system
         6.58s real    1.49s user    0.00s system
         6.53s real    1.55s user    0.01s system
         6.35s real    1.50s user    0.02s system
         6.34s real    1.58s user    0.04s system
      
      Primary and Secondary are Local.
         5.64s real    1.52s user    0.01s system
         6.29s real    1.54s user    0.03s system
         6.30s real    1.50s user    0.01s system
         6.22s real    1.55s user    0.02s system
         5.95s real    1.53s user    0.02s system
         6.13s real    1.64s user    0.00s system
         6.22s real    1.54s user    0.02s system
         5.92s real    1.53s user    0.02s system
         6.02s real    1.60s user    0.01s system
         6.35s real    1.54s user    0.02s system
      

      BACK

      Copyright 1997-1999 Symmetry Innovations Pty Ltd