Updated: 2022/Sep/29
Please read Privacy Policy. It's for your privacy.
BUFFERIO(9) Kernel Developer's Manual BUFFERIO(9) NAME BUFFERIO, biodone, biowait, getiobuf, putiobuf, nestiobuf_setup, nestiobuf_done - block I/O buffer transfers SYNOPSIS #include <sys/buf.h> void biodone(buf_t *bp); int biowait(buf_t *bp); buf_t * getiobuf(struct vnode *vp, bool waitok); void putiobuf(buf_t *bp); void nestiobuf_setup(buf_t *mbp, buf_t *bp, int offset, size_t size); void nestiobuf_done(buf_t *mbp, int donebytes, int error); DESCRIPTION The BUFFERIO subsystem manages block I/O buffer transfers, described by the struct buf structure, which serves multiple purposes between users in BUFFERIO, users in buffercache(9), and users in block device drivers to execute transfers to physical disks. BLOCK DEVICE USERS Users of BUFFERIO wishing to submit a buffer for block I/O transfer must obtain a struct buf, e.g. via getiobuf(), fill its parameters, and submit it to a block device with bdev_strategy(9), usually via VOP_STRATEGY(9). The parameters to an I/O transfer described by bp are specified by the following struct buf fields: bp->b_flags Flags specifying the type of transfer. B_READ Transfer is read from device. If not set, transfer is write to device. B_ASYNC Asynchronous I/O. Caller must not provide bp->b_iodone and must not call biowait(bp). For legibility, callers should indicate writes by passing the pseudo-flag B_WRITE, which is zero. bp->b_data Pointer to kernel virtual address of source/target for transfer. bp->b_bcount Nonnegative number of bytes requested for transfer. bp->b_blkno Block number at which to do transfer. bp->b_iodone I/O completion callback. B_ASYNC must not be set in bp->b_flags. Additionally, if the I/O transfer is a write associated with a vnode(9) vp, then before the user submits it to a block device, the user must increment vp->v_numoutput. The user must not acquire vp's vnode lock between incrementing vp->v_numoutput and submitting bp to a block device -- doing so will likely cause deadlock with the syncer. Block I/O transfer completion may be notified by the bp->b_iodone callback, by signalling biowait() waiters, or not at all in the B_ASYNC case. - If the user sets the bp->b_iodone callback to a non-NULL function pointer, it will be called in soft interrupt context when the I/O transfer is complete. The user may not call biowait(bp) in this case. - If B_ASYNC is set, then the I/O transfer is asynchronous and the user will not be notified when it is completed. The user may not call biowait(bp) in this case. - Otherwise, if bp->b_iodone is NULL and B_ASYNC is not specified, the user may wait for the I/O transfer to complete with biowait(bp). Once an I/O transfer has completed, its struct buf may be reused, but the user must first clear the BO_DONE flag of bp->b_oflags before reusing it. NESTED I/O TRANSFERS Sometimes an I/O transfer from a single buffer in memory cannot go to a single location on a block device: it must be split up into smaller transfers for each segment of the memory buffer. After initializing the b_flags, b_data, and b_bcount parameters of an I/O transfer for the buffer, called the master buffer, the user can issue smaller transfers for segments of the buffer using nestiobuf_setup(). When nested I/O transfers complete, in any order, they debit from the amount of work left to be done in the master buffer. If any segments of the buffer were skipped, the user can report this with nestiobuf_done() to debit the skipped part of the work. The master buffer's I/O transfer is completed when all nested buffers' I/O transfers are completed, and if nestiobuf_done() is called in the case of skipped segments. For writes associated with a vnode vp, nestiobuf_setup() accounts for vp->v_numoutput, so the caller is not allowed to acquire vp's vnode lock before submitting the nested I/O transfer to a block device. However, the caller is responsible for accounting the master buffer in vp->v_numoutput. This must be done very carefully because after incrementing vp->v_numoutput, the caller is not allowed to acquire vp's vnode lock before either calling nestiobuf_done() or submitting the last nested I/O transfer to a block device. For example: struct buf *mbp, *bp; size_t skipped = 0; unsigned i; int error = 0; mbp = getiobuf(vp, true); mbp->b_data = data; mbp->b_resid = mbp->b_bcount = datalen; mbp->b_flags = B_WRITE; KASSERT(0 < nsegs); KASSERT(datalen == nsegs*segsz); for (i = 0; i < nsegs; i++) { struct vnode *devvp; daddr_t blkno; vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL); VOP_UNLOCK(vp); if (error == 0 && blkno == -1) error = EIO; if (error) { /* Give up early, don't try to handle holes. */ skipped += datalen - i*segsz; break; } bp = getiobuf(vp, true); nestiobuf_setup(bp, mbp, i*segsz, segsz); bp->b_blkno = blkno; if (i == nsegs - 1) /* Last segment. */ break; VOP_STRATEGY(devvp, bp); } /* * Account v_numoutput for master write. * (Must not vn_lock before last VOP_STRATEGY!) */ mutex_enter(&vp->v_interlock); vp->v_numoutput++; mutex_exit(&vp->v_interlock); if (skipped) nestiobuf_done(mbp, skipped, error); else VOP_STRATEGY(devvp, bp); BLOCK DEVICE DRIVERS Block device drivers implement a `strategy' method, in the d_strategy member of struct bdevsw (driver(9)), to queue a buffer for disk I/O. The inputs to the strategy method are: bp->b_flags Flags specifying the type of transfer. B_READ Transfer is read from device. If not set, transfer is write to device. bp->b_data Pointer to kernel virtual address of source/target for transfer. bp->b_bcount Nonnegative number of bytes requested for transfer. bp->b_blkno Block number at which to do transfer, relative to partition start. If the strategy method uses bufq(9), it must additionally initialize the following fields before queueing bp with bufq_put(9): bp->b_rawblkno Block number relative to volume start. When the I/O transfer is complete, whether it succeeded or failed, the strategy method must: - Set bp->b_error to zero on success, or to an errno(2) error code on failure. - Set bp->b_resid to the number of bytes remaining to transfer, whether on success or on failure. If no bytes were transferred, this must be set to bp->b_bcount. - Call biodone(bp). FUNCTIONS biodone(bp) Notify that the I/O transfer described by bp has completed. To be called by a block device driver. Caller must first set bp->b_error to an error code and bp->b_resid to the number of bytes remaining to transfer. biowait(bp) Wait for the synchronous I/O transfer described by bp to complete. Returns the value of bp->b_error. To be called by a user requesting the I/O transfer. May not be called if bp has a callback or is asynchronous -- that is, if bp->b_iodone is set, or if B_ASYNC is set in bp->b_flags. getiobuf(vp, waitok) Allocate a struct buf for an I/O transfer. If vp is non-NULL, the transfer is associated with it. If waitok is false, returns NULL if none can be allocated immediately. The resulting struct buf pointer must eventually be passed to putiobuf() to release it. Do not use brelse(9). The buffer may not be used for an asynchronous I/O transfer, because there is no way to know when it is completed and may be safely passed to putiobuf(). Asynchronous I/O transfers are allowed only for buffers in the buffercache(9). May sleep if waitok is true. putiobuf(bp) Free bp, which must have been allocated by getiobuf(). Either bp must never have been submitted to a block device, or the I/O transfer must have completed. CODE REFERENCES The BUFFERIO subsystem is implemented in sys/kern/vfs_bio.c. SEE ALSO buffercache(9), bufq(9) BUGS The BUFFERIO abstraction provides no way to cancel an I/O transfer once it has been submitted to a block device. The BUFFERIO abstraction provides no way to do I/O transfers with non- kernel pages, e.g. directly to buffers in userland without copying into the kernel first. The struct buf type is all mixed up with the buffercache(9). The BUFFERIO abstraction is a totally idiotic API design. The v_numoutput accounting required of BUFFERIO callers is asinine. NetBSD 10.99 September 12, 2019 NetBSD 10.99