Updated: 2022/Sep/29

Please read Privacy Policy. It's for your privacy.


BUFFERIO(9)                Kernel Developer's Manual               BUFFERIO(9)

NAME
     BUFFERIO, biodone, biowait, getiobuf, putiobuf, nestiobuf_setup,
     nestiobuf_done - block I/O buffer transfers

SYNOPSIS
     #include <sys/buf.h>

     void
     biodone(buf_t *bp);

     int
     biowait(buf_t *bp);

     buf_t *
     getiobuf(struct vnode *vp, bool waitok);

     void
     putiobuf(buf_t *bp);

     void
     nestiobuf_setup(buf_t *mbp, buf_t *bp, int offset, size_t size);

     void
     nestiobuf_done(buf_t *mbp, int donebytes, int error);

DESCRIPTION
     The BUFFERIO subsystem manages block I/O buffer transfers, described by
     the struct buf structure, which serves multiple purposes between users in
     BUFFERIO, users in buffercache(9), and users in block device drivers to
     execute transfers to physical disks.

BLOCK DEVICE USERS
     Users of BUFFERIO wishing to submit a buffer for block I/O transfer must
     obtain a struct buf, e.g. via getiobuf(), fill its parameters, and submit
     it to a block device with bdev_strategy(9), usually via VOP_STRATEGY(9).

     The parameters to an I/O transfer described by bp are specified by the
     following struct buf fields:

         bp->b_flags
                 Flags specifying the type of transfer.
                 B_READ  Transfer is read from device.  If not set, transfer
                         is write to device.
                 B_ASYNC
                         Asynchronous I/O.  Caller must not provide
                         bp->b_iodone and must not call biowait(bp).
                 For legibility, callers should indicate writes by passing the
                 pseudo-flag B_WRITE, which is zero.

         bp->b_data
                 Pointer to kernel virtual address of source/target for
                 transfer.

         bp->b_bcount
                 Nonnegative number of bytes requested for transfer.

         bp->b_blkno
                 Block number at which to do transfer.

         bp->b_iodone
                 I/O completion callback.  B_ASYNC must not be set in
                 bp->b_flags.

     Additionally, if the I/O transfer is a write associated with a vnode(9)
     vp, then before the user submits it to a block device, the user must
     increment vp->v_numoutput.  The user must not acquire vp's vnode lock
     between incrementing vp->v_numoutput and submitting bp to a block device
     -- doing so will likely cause deadlock with the syncer.

     Block I/O transfer completion may be notified by the bp->b_iodone
     callback, by signalling biowait() waiters, or not at all in the B_ASYNC
     case.

     -   If the user sets the bp->b_iodone callback to a non-NULL function
         pointer, it will be called in soft interrupt context when the I/O
         transfer is complete.  The user may not call biowait(bp) in this
         case.

     -   If B_ASYNC is set, then the I/O transfer is asynchronous and the user
         will not be notified when it is completed.  The user may not call
         biowait(bp) in this case.

     -   Otherwise, if bp->b_iodone is NULL and B_ASYNC is not specified, the
         user may wait for the I/O transfer to complete with biowait(bp).

     Once an I/O transfer has completed, its struct buf may be reused, but the
     user must first clear the BO_DONE flag of bp->b_oflags before reusing it.

NESTED I/O TRANSFERS
     Sometimes an I/O transfer from a single buffer in memory cannot go to a
     single location on a block device: it must be split up into smaller
     transfers for each segment of the memory buffer.

     After initializing the b_flags, b_data, and b_bcount parameters of an I/O
     transfer for the buffer, called the master buffer, the user can issue
     smaller transfers for segments of the buffer using nestiobuf_setup().
     When nested I/O transfers complete, in any order, they debit from the
     amount of work left to be done in the master buffer.  If any segments of
     the buffer were skipped, the user can report this with nestiobuf_done()
     to debit the skipped part of the work.

     The master buffer's I/O transfer is completed when all nested buffers'
     I/O transfers are completed, and if nestiobuf_done() is called in the
     case of skipped segments.

     For writes associated with a vnode vp, nestiobuf_setup() accounts for
     vp->v_numoutput, so the caller is not allowed to acquire vp's vnode lock
     before submitting the nested I/O transfer to a block device.  However,
     the caller is responsible for accounting the master buffer in
     vp->v_numoutput.  This must be done very carefully because after
     incrementing vp->v_numoutput, the caller is not allowed to acquire vp's
     vnode lock before either calling nestiobuf_done() or submitting the last
     nested I/O transfer to a block device.

     For example:

         struct buf *mbp, *bp;
         size_t skipped = 0;
         unsigned i;
         int error = 0;

         mbp = getiobuf(vp, true);
         mbp->b_data = data;
         mbp->b_resid = mbp->b_bcount = datalen;
         mbp->b_flags = B_WRITE;

         KASSERT(0 < nsegs);
         KASSERT(datalen == nsegs*segsz);
         for (i = 0; i < nsegs; i++) {
                 struct vnode *devvp;
                 daddr_t blkno;

                 vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
                 error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL);
                 VOP_UNLOCK(vp);
                 if (error == 0 && blkno == -1)
                         error = EIO;
                 if (error) {
                         /* Give up early, don't try to handle holes.  */
                         skipped += datalen - i*segsz;
                         break;
                 }

                 bp = getiobuf(vp, true);
                 nestiobuf_setup(bp, mbp, i*segsz, segsz);
                 bp->b_blkno = blkno;
                 if (i == nsegs - 1)     /* Last segment.  */
                         break;
                 VOP_STRATEGY(devvp, bp);
         }

         /*
          * Account v_numoutput for master write.
          * (Must not vn_lock before last VOP_STRATEGY!)
          */
         mutex_enter(&vp->v_interlock);
         vp->v_numoutput++;
         mutex_exit(&vp->v_interlock);

         if (skipped)
                 nestiobuf_done(mbp, skipped, error);
         else
                 VOP_STRATEGY(devvp, bp);

BLOCK DEVICE DRIVERS
     Block device drivers implement a `strategy' method, in the d_strategy
     member of struct bdevsw (driver(9)), to queue a buffer for disk I/O.  The
     inputs to the strategy method are:

         bp->b_flags
                 Flags specifying the type of transfer.
                 B_READ  Transfer is read from device.  If not set, transfer
                         is write to device.

         bp->b_data
                 Pointer to kernel virtual address of source/target for
                 transfer.

         bp->b_bcount
                 Nonnegative number of bytes requested for transfer.

         bp->b_blkno
                 Block number at which to do transfer, relative to partition
                 start.

     If the strategy method uses bufq(9), it must additionally initialize the
     following fields before queueing bp with bufq_put(9):

         bp->b_rawblkno
                 Block number relative to volume start.

     When the I/O transfer is complete, whether it succeeded or failed, the
     strategy method must:

     -   Set bp->b_error to zero on success, or to an errno(2) error code on
         failure.

     -   Set bp->b_resid to the number of bytes remaining to transfer, whether
         on success or on failure.  If no bytes were transferred, this must be
         set to bp->b_bcount.

     -   Call biodone(bp).

FUNCTIONS
     biodone(bp)
           Notify that the I/O transfer described by bp has completed.

           To be called by a block device driver.  Caller must first set
           bp->b_error to an error code and bp->b_resid to the number of bytes
           remaining to transfer.

     biowait(bp)
           Wait for the synchronous I/O transfer described by bp to complete.
           Returns the value of bp->b_error.

           To be called by a user requesting the I/O transfer.

           May not be called if bp has a callback or is asynchronous -- that
           is, if bp->b_iodone is set, or if B_ASYNC is set in bp->b_flags.

     getiobuf(vp, waitok)
           Allocate a struct buf for an I/O transfer.  If vp is non-NULL, the
           transfer is associated with it.  If waitok is false, returns NULL
           if none can be allocated immediately.

           The resulting struct buf pointer must eventually be passed to
           putiobuf() to release it.  Do not use brelse(9).

           The buffer may not be used for an asynchronous I/O transfer,
           because there is no way to know when it is completed and may be
           safely passed to putiobuf().  Asynchronous I/O transfers are
           allowed only for buffers in the buffercache(9).

           May sleep if waitok is true.

     putiobuf(bp)
           Free bp, which must have been allocated by getiobuf().  Either bp
           must never have been submitted to a block device, or the I/O
           transfer must have completed.

CODE REFERENCES
     The BUFFERIO subsystem is implemented in sys/kern/vfs_bio.c.

SEE ALSO
     buffercache(9), bufq(9)

BUGS
     The BUFFERIO abstraction provides no way to cancel an I/O transfer once
     it has been submitted to a block device.

     The BUFFERIO abstraction provides no way to do I/O transfers with non-
     kernel pages, e.g. directly to buffers in userland without copying into
     the kernel first.

     The struct buf type is all mixed up with the buffercache(9).

     The BUFFERIO abstraction is a totally idiotic API design.

     The v_numoutput accounting required of BUFFERIO callers is asinine.

NetBSD 10.99                  September 12, 2019                  NetBSD 10.99