Updated: 2022/Sep/29

Please read Privacy Policy. It's for your privacy.


DISK(9)                    Kernel Developer's Manual                   DISK(9)

NAME
     disk, disk_init, disk_attach, disk_begindetach, disk_detach,
     disk_destroy, disk_wait, disk_busy, disk_unbusy, disk_isbusy, disk_find,
     disk_set_info - generic disk framework

SYNOPSIS
     #include <sys/types.h>
     #include <sys/disklabel.h>
     #include <sys/disk.h>

     void
     disk_init(struct disk *, const char *name,
         const struct dkdriver *driver);

     void
     disk_attach(struct disk *);

     void
     disk_begindetach(struct disk *, int (*lastclose)(device_t),
         device_t self, int flags);

     void
     disk_detach(struct disk *);

     void
     disk_destroy(struct disk *);

     void
     disk_wait(struct disk *);

     void
     disk_busy(struct disk *);

     void
     disk_unbusy(struct disk *, long bcount, int read);

     bool
     disk_isbusy(struct disk *);

     struct disk *
     disk_find(const char *);

     void
     disk_set_info(device_t, struct disk *, const char *type);

DESCRIPTION
     The NetBSD generic disk framework is designed to provide flexible,
     scalable, and consistent handling of disk state and metrics information.
     The fundamental component of this framework is the disk structure, which
     is defined as follows:

     struct disk {
             TAILQ_ENTRY(disk) dk_link;      /* link in global disklist */
             const char      *dk_name;       /* disk name */
             prop_dictionary_t dk_info;      /* reference to disk-info dictionary */
             int             dk_bopenmask;   /* block devices open */
             int             dk_copenmask;   /* character devices open */
             int             dk_openmask;    /* composite (bopen|copen) */
             int             dk_state;       /* label state   ### */
             int             dk_blkshift;    /* shift to convert DEV_BSIZE to blks */
             int             dk_byteshift;   /* shift to convert bytes to blks */

             /*
              * Metrics data; note that some metrics may have no meaning
              * on certain types of disks.
              */
             struct io_stats *dk_stats;

             const struct dkdriver *dk_driver;       /* pointer to driver */

             /*
              * Information required to be the parent of a disk wedge.
              */
             kmutex_t        dk_rawlock;     /* lock on these fields */
             u_int           dk_rawopens;    /* # of opens of rawvp */
             struct vnode    *dk_rawvp;      /* vnode for the RAW_PART bdev */

             kmutex_t        dk_openlock;    /* lock on these and openmask */
             u_int           dk_nwedges;     /* # of configured wedges */
                                             /* all wedges on this disk */
             LIST_HEAD(, dkwedge_softc) dk_wedges;

             /*
              * Disk label information.  Storage for the in-core disk label
              * must be dynamically allocated, otherwise the size of this
              * structure becomes machine-dependent.
              */
             daddr_t         dk_labelsector;         /* sector containing label */
             struct disklabel *dk_label;     /* label */
             struct cpu_disklabel *dk_cpulabel;
     };

     The system maintains a global linked-list of all disks attached to the
     system.  This list, called disklist, may grow or shrink over time as
     disks are dynamically added and removed from the system.  Drivers which
     currently make use of the detachment capability of the framework are the
     ccd, dm, and vnd pseudo-device drivers.

     The following is a brief description of each function in the framework:

     disk_init()        Initialize the disk structure.

     disk_attach()      Attach a disk; allocate storage for the disklabel, set
                        the "attached time" timestamp, insert the disk into
                        the disklist, and increment the system disk count.

     disk_begindetach()
                        Check whether the disk is open, and if not, return 0.
                        If the disk is open, and DETACH_FORCE is not set in
                        flags, return EBUSY.  Otherwise, call the provided
                        lastclose routine (if not NULL) and return its exit
                        code.

     disk_detach()      Detach a disk; free storage for the disklabel, remove
                        the disk from the disklist, and decrement the system
                        disk count.  If the count drops below zero, panic.

     disk_destroy()     Release resources used by the disk structure when it
                        is no longer required.

     disk_wait()        Disk timings are measured by counting the number of
                        queued requests (wait counter) and requests issued to
                        the hardware (busy counter) and keeping timestamp when
                        the counters change.  The time interval between two
                        changes of a counter is accumulated into a total and
                        also multiplied by the counter value and the
                        accumulated into a sum.  Both values can be used to
                        determine how much time is spent in the driver queue
                        or in-flight to the hardware as well as the average
                        number of requests in either state.  disk_wait()
                        increment the disk's wait counter and handles the
                        accumulation.

     disk_busy()        Decrements the disk's wait counter and increments the
                        disk's "busy counter", and handles either
                        accumulation.  If the wait counter is still zero, it
                        is assumed that the driver hasn't been updated to call
                        disk_wait(), then only the values from the busy
                        counter are available.

     disk_unbusy()      Decrement the disk's busy counter and handles the
                        accumulation.  The third argument read specifies the
                        direction of I/O; if non-zero it means reading from
                        the disk, otherwise it means writing to the disk.

     disk_isbusy()      Returns true if disk is marked as busy and false if it
                        is not.

     disk_find()        Return a pointer to the disk structure corresponding
                        to the name provided, or NULL if the disk does not
                        exist.

     disk_set_info()    Setup disk-info dictionary and other dependent values
                        of the disk structure, the driver must have
                        initialized the dk_geom member of struct disk with
                        suitable values.  If type is not NULL, it will be
                        added to the dictionary.

     The functions typically called by device drivers are disk_init()
     disk_attach(), disk_begindetach(), disk_detach(), disk_destroy(),
     disk_wait(), disk_busy(), disk_unbusy(), and disk_set_info().  The
     function disk_find() is provided as a utility function.

DISK IOCTLS
     The following ioctls should be implemented by disk drivers:

     DIOCGDINFO struct disklabel
             Get disklabel.

     DIOCSDINFO struct disklabel
             Set in-memory disklabel.

     DIOCWDINFO struct disklabel
             Set in-memory disklabel and write on-disk disklabel.

     DIOCGPART struct partinfo
             Get partition information.  This is used internally.

     DIOCRFORMAT struct format_op
             Read format.

     DIOCWFORMAT struct format_op
             Write format.

     DIOCSSTEP int
             Set step rate.

     DIOCSRETRIES int
             Set number of retries.

     DIOCKLABEL int
             Specify whether to keep or drop the in-memory disklabel when the
             device is closed.

     DIOCWLABEL int
             Enable or disable writing to the part of the disk that contains
             the label.

     DIOCSBAD struct dkbad
             Set kernel dkbad.

     DIOCEJECT int
             Eject removable disk.

     DIOCLOCK int
             Lock or unlock disk pack.  For devices with removable media,
             locking is intended to prevent the operator from removing the
             media.

     DIOCGDEFLABEL struct disklabel
             Get default label.

     DIOCCLRLABEL
             Clear disk label.

     DIOCGCACHE int
             Get status of disk read and write caches.  The result is a
             bitmask containing the following values:

             DKCACHE_READ     Read cache enabled.

             DKCACHE_WRITE    Write(back) cache enabled.

             DKCACHE_RCHANGE  Read cache enable is changeable.

             DKCACHE_WCHANGE  Write cache enable is changeable.

             DKCACHE_SAVE     Cache parameters may be saved, so that they
                              persist across reboots or device detach/attach
                              cycles.

     DIOCSCACHE int
             Set status of disk read and write caches.  The input is a bitmask
             in the same format as used for DIOCGCACHE.

     DIOCCACHESYNC int
             Synchronise the disk cache.  This causes information in the
             disk's write cache (if any) to be flushed to stable storage.  The
             argument specifies whether or not to force a flush even if the
             kernel believes that there is no outstanding data.

     DIOCBSLIST struct disk_badsecinfo
             Get bad sector list.

     DIOCBSFLUSH
             Flush bad sector list.

     DIOCAWEDGE struct dkwedge_info
             Add wedge.

     DIOCGWEDGEINFO struct dkwedge_info
             Get wedge information.

     DIOCDWEDGE struct dkwedge_info
             Delete wedge.

     DIOCLWEDGES struct dkwedge_list
             List wedges.

     DIOCGSTRATEGY struct disk_strategy
             Get disk buffer queue strategy.

     DIOCSSTRATEGY struct disk_strategy
             Set disk buffer queue strategy.

     DIOCGDISKINFO struct plistref
             Get disk-info dictionary.

     DIOCGMEDIASIZE off_t
             Get disk size in bytes.

     DIOCGSECTORSIZE u_int
             Get sector size in bytes.

USING THE FRAMEWORK
     This section includes a description on basic use of the framework and
     example usage of its functions.  Actual implementation of a device driver
     which uses the framework may vary.

     Each device in the system uses a "softc" structure which contains
     autoconfiguration and state information for that device.  In the case of
     disks, the softc should also contain one instance of the disk structure,
     e.g.:

     struct foo_softc {
             device_t        sc_dev;         /* generic device information */
             struct  disk    sc_dk;          /* generic disk information */
             [ . . . more . . . ]
     };

     In order for the system to gather metrics data about a disk, the disk
     must be registered with the system.  The disk_attach() routine performs
     all of the functions currently required to register a disk with the
     system including allocation of disklabel storage space, recording of the
     time since boot that the disk was attached, and insertion into the
     disklist.  Note that since this function allocates storage space for the
     disklabel, it must be called before the disklabel is read from the media
     or used in any other way.  Before disk_attach() is called, a portions of
     the disk structure must be initialized with data specific to that disk.
     For example, in the "foo" disk driver, the following would be performed
     in the autoconfiguration "attach" routine:

     void
     fooattach(device_t parent, device_t self, void *aux)
     {
             struct foo_softc *sc = device_private(self);
             [ . . . ]

             /* Initialize and attach the disk structure. */
             disk_init(&sc->sc_dk, device_xname(self), &foodkdriver);
             disk_attach(&sc->sc_dk);

             /* Read geometry and fill in pertinent parts of disklabel. */
             /* Initialize geometry values of the disk structure */
             [ . . . ]
             disk_set_info(&self>, &sc->sc_dk, type);
     }

     The foodkdriver above is the disk's "driver" switch.  This switch
     currently includes pointers to several driver entry points, where only
     the d_strategy entry point is used by the disk framework.  This switch
     needs to have global scope and should be initialized as follows:

     void    (foostrategy)(struct buf *);
     void    (foominphys)(struct buf *);
     int     (fooopen)(dev_t, int, int, struct lwp *);
     int     (fooclose)(dev_t, int, int, struct lwp *);
     int     (foo_discard)(device_t, off_t, off_t);
     int     (foo_diskstart)(device_t, struct buf *);
     void    (foo_iosize)(device_t, int *);
     int     (foo_dumpblocks)(device_t, void *, daddr_t, int);
     int     (foo_lastclose)(device_t);
     int     (foo_firstopen)(device_t, dev_t, int, int);
     int     (foo_label)(device_t, struct disklabel *);

     const struct dkdriver foodkdriver = {
             .d_open = fooopen,
             .d_close = fooclose,
             .d_strategy = foostrategy,
             .d_minphys = foominphys,
             .d_discard = foo_discard,
             .d_diskstart = foo_diskstart,   /* optional */
             .d_dumpblocks = foo_dumpblocks, /* optional */
             .d_iosize = foo_iosize,         /* optional */
             .d_firstopen = foo_firstopen,   /* optional */
             .d_lastclose = foo_lastclose,   /* optional */
             .d_label = foo_label,           /* optional */
     };

     Once the disk is attached, metrics may be gathered on that disk.  In
     order to gather metrics data, the driver must tell the framework when the
     disk queues, starts and stops operations.  This functionality is provided
     by the disk_wait(), disk_busy() and disk_unbusy() routines.  Because
     struct disk is part of device driver private data it needs to be guarded.
     Mutual exclusion must be done by driver disk_wait(), disk_busy() and
     disk_unbusy() are not thread safe.  The disk_busy() routine should be
     called immediately before a command to the disk is sent, e.g.:

     void
     foostrategy(struct buf *bp)
     {
             [ . . . ]

             mutex_enter(&sc->sc_dk_mtx);
             disk_wait(&sc->sc_dk);

             /* Put buffer onto drive's transfer queue */

             mutex_exit(&sc->sc_dk_mtx);

             foostart(sc);
     }

     void
     foostart(struct foo_softc *sc)
     {
             [ . . . ]

             /* Get buffer from drive's transfer queue. */
             [ . . . ]

             /* Build command to send to drive. */
             [ . . . ]

             /* Tell the disk framework we're going busy. */
             mutex_enter(&sc->sc_dk_mtx);
             disk_busy(&sc->sc_dk);
             mutex_exit(&sc->sc_dk_mtx);

             /* Send command to the drive. */
             [ . . . ]
     }

     The routine disk_unbusy() performs some consistency checks, such as
     ensuring that the calls to disk_busy() and disk_unbusy() are balanced.
     It also performs the final steps of the metrics calcuation.  A byte count
     is added to the disk's running total, and if greater than zero, the
     number of transfers the disk has performed is incremented.  The third
     argument read specifies the direction of I/O; if non-zero it means
     reading from the disk, otherwise it means writing to the disk.

     void
     foodone(xfer)
             struct foo_xfer *xfer;
     {
             struct foo_softc = (struct foo_softc *)xfer->xf_softc;
             struct buf *bp = xfer->xf_buf;
             long nbytes;
             [ . . . ]

             /*
              * Get number of bytes transferred.  If there is no buf
              * associated with the xfer, we are being called at the
              * end of a non-I/O command.
              */
             if (bp == NULL)
                     nbytes = 0;
             else
                     nbytes = bp->b_bcount - bp->b_resid;

             [ . . . ]

             mutex_enter(&sc->sc_dk_mtx);
             /* Notify the disk framework that we've completed the transfer. */
             disk_unbusy(&sc->sc_dk, nbytes,
                 bp != NULL ? bp->b_flags & B_READ : 0);
             mutex_exit(&sc->sc_dk_mtx);

             [ . . . ]
     }

     disk_isbusy() is used to get status of disk device it returns true if
     device is currently busy and false if it is not.  Like disk_wait(),
     disk_busy() and disk_unbusy() it requires explicit locking from user
     side.

CODE REFERENCES
     The disk framework itself is implemented within the file
     sys/kern/subr_disk.c.  Data structures and function prototypes for the
     framework are located in sys/sys/disk.h.

     The NetBSD machine-independent SCSI disk and CD-ROM drivers use the disk
     framework.  They are located in sys/scsi/sd.c and sys/scsi/cd.c.

     The NetBSD ccd, dm, and vnd drivers use the detachment capability of the
     framework.  They are located in sys/dev/ccd.c, sys/dev/vnd.c, and
     sys/dev/dm/device-mapper.c.

SEE ALSO
     ccd(4), dm(4), vnd(4), dksubr(9)

HISTORY
     The NetBSD generic disk framework appeared in NetBSD 1.2.

AUTHORS
     The NetBSD generic disk framework was architected and implemented by
     Jason R. Thorpe <thorpej@NetBSD.org>.

NetBSD 10.99                     March 5, 2017                    NetBSD 10.99