Updated: 2022/Sep/29

Please read Privacy Policy. It's for your privacy.


RUMPUSER(3)                Library Functions Manual                RUMPUSER(3)

NAME
     rumpuser - rump kernel hypercall interface

LIBRARY
     rump User Library (librumpuser, -lrumpuser)

SYNOPSIS
     #include <rump/rumpuser.h>

DESCRIPTION
     The rumpuser hypercall interfaces allow a rump kernel to access host
     resources.  A hypervisor implementation must implement the routines
     described in this document to allow a rump kernel to run on the host.
     The implementation included in NetBSD is for POSIX-like hosts (*BSD,
     Linux, etc.).  This document is divided into sections based on the
     functionality group of each hypercall.

     Since the hypercall interface is a C function interface, both the rump
     kernel and the hypervisor must conform to the same ABI.  The interface
     itself attempts to assume as little as possible from the type systems,
     and for example off_t is passed as int64_t and enums are passed as ints.
     It is recommended that the hypervisor converts these to the native types
     before starting to process the hypercall, for example by assigning the
     ints back to enums.

UPCALLS AND RUMP KERNEL CONTEXT
     A hypercall is always entered with the calling thread scheduled in the
     rump kernel.  In case the hypercall intends to block while waiting for an
     event, the hypervisor must first release the rump kernel scheduling
     context.  In other words, the rump kernel context is a resource and
     holding on to it while waiting for a rump kernel event/resource may lead
     to a deadlock.  Even when there is no possibility of deadlock in the
     strict sense of the term, holding on to the rump kernel context while
     performing a slow hypercall such as reading a device will prevent other
     threads (including the clock interrupt) from using that rump kernel
     context.

     Releasing the context is done by calling the hyp_backend_unschedule()
     upcall which the hypervisor received from rump kernel as a parameter for
     rumpuser_init().  Before a hypercall returns back to the rump kernel, the
     returning thread must carry a rump kernel context.  In case the hypercall
     unscheduled itself, it must reschedule itself by calling
     hyp_backend_schedule().

HYPERCALL INTERFACES
   Initialization
     int rumpuser_init(int version, struct rump_hyperup *hyp)

     Initialize the hypervisor.

     version          hypercall interface version number that the kernel
                      expects to be used.  In case the hypervisor cannot
                      provide an exact match, this routine must return a non-
                      zero value.

     hyp              pointer to a set of upcalls the hypervisor can make into
                      the rump kernel

   Memory allocation
     int rumpuser_malloc(size_t len, int alignment, void **memp)

     len              amount of memory to allocate

     alignment        size the returned memory must be aligned to.  For
                      example, if the value passed is 4096, the returned
                      memory must be aligned to a 4k boundary.

     memp             return pointer for allocated memory

     void rumpuser_free(void *mem, size_t len)

     mem              memory to free

     len              length of allocation.  This is always equal to the
                      amount the caller requested from the rumpuser_malloc()
                      which returned mem.

   Files and I/O
     int rumpuser_open(const char *name, int mode, int *fdp)

     Open name for I/O and associate a file descriptor with it.  Notably,
     there needs to be no mapping between name and the host's file system
     namespace.  For example, it is possible to associate the file descriptor
     with device I/O registers for special values of name.

     name             the identifier of the file to open for I/O

     mode             combination of the following:

                      RUMPUSER_OPEN_RDONLY   open only for reading

                      RUMPUSER_OPEN_WRONLY   open only for writing

                      RUMPUSER_OPEN_RDWR     open for reading and writing

                      RUMPUSER_OPEN_CREATE   do not treat missing name as an
                                             error

                      RUMPUSER_OPEN_EXCL     combined with
                                             RUMPUSER_OPEN_CREATE, flag an
                                             error if name already exists

                      RUMPUSER_OPEN_BIO      the caller will use this file for
                                             block I/O, usually used in
                                             conjunction with accessing file
                                             system media.  The hypervisor
                                             should treat this flag as
                                             advisory and possibly enable some
                                             optimizations for *fdp based on
                                             it.
                      Notably, the permissions of the created file are left up
                      to the hypervisor implementation.

     fdp              An integer value denoting the open file is returned
                      here.

     int rumpuser_close(int fd)

     Close a previously opened file descriptor.

     int rumpuser_getfileinfo(const char *name, uint64_t *size, int *type)

     name             file for which information is returned.  The namespace
                      is equal to that of rumpuser_open().

     size             If non-NULL, size of the file is returned here.

     type             If non-NULL, type of the file is returned here.  The
                      options are RUMPUSER_FT_DIR, RUMPUSER_FT_REG,
                      RUMPUSER_FT_BLK, RUMPUSER_FT_CHR, or RUMPUSER_FT_OTHER
                      for directory, regular file, block device, character
                      device or unknown, respectively.

     void rumpuser_bio(int fd, int op, void *data, size_t dlen, int64_t off,
     rump_biodone_fn biodone, void *donearg)

     Initiate block I/O and return immediately.

     fd               perform I/O on this file descriptor.  The file
                      descriptor must have been opened with RUMPUSER_OPEN_BIO.

     op               Transfer data from the file descriptor with
                      RUMPUSER_BIO_READ and transfer data to the file
                      descriptor with RUMPUSER_BIO_WRITE.  Unless
                      RUMPUSER_BIO_SYNC is specified, the hypervisor may cache
                      a write instead of committing it to permanent storage.

     data             memory address to transfer data to/from

     dlen             length of I/O.  The length is guaranteed to be a
                      multiple of 512.

     off              offset into fd where I/O is performed

     biodone          To be called when the I/O is complete.  Accessing data
                      is not legal after the call is made.

     donearg          opaque arg that must be passed to biodone.

     int rumpuser_iovread(int fd, struct rumpuser_iovec *ruiov, size_t iovlen,
     int64_t off, size_t *retv)

     int rumpuser_iovwrite(int fd, struct rumpuser_iovec *ruiov,
     size_t iovlen, int64_t off, size_t *retv)

     These routines perform scatter-gather I/O which is not block I/O by
     nature and therefore cannot be handled by rumpuser_bio().

     fd               file descriptor to perform I/O on

     ruiov            an array of I/O descriptors.  It is defined as follows:
                            struct rumpuser_iovec {
                                    void *iov_base;
                                    size_t iov_len;
                            };

     iovlen           number of elements in ruiov

     off              offset of fd to perform I/O on.  This can either be a
                      non-negative value or RUMPUSER_IOV_NOSEEK.  The latter
                      denotes that no attempt to change the underlying objects
                      offset should be made.  Using both types of offsets on a
                      single instance of fd results in undefined behavior.

     retv             number of bytes successfully transferred is returned
                      here

     int rumpuser_syncfd(int fd, int flags, uint64_t start, uint64_t len)

     Synchronizes fd with respect to backing storage.  The other arguments
     are:

     flags            controls how synchronization happens.  It must contain
                      one of the following:

                      RUMPUSER_SYNCFD_READ      Make sure that the next read
                                                sees writes from all other
                                                parties.  This is useful for
                                                example in the case that fd
                                                represents memory to write a
                                                DMA read is being performed.

                      RUMPUSER_SYNCFD_WRITE     Flush cached writes.

                      The following additional parameters may be passed in
                      flags:

                      RUMPUSER_SYNCFD_BARRIER   Issue a barrier.  Outstanding
                                                I/O operations which were
                                                started before the barrier
                                                complete before any operations
                                                after the barrier are
                                                performed.

                      RUMPUSER_SYNCFD_SYNC      Wait for the synchronization
                                                operation to fully complete
                                                before returning.  For
                                                example, this could mean that
                                                the data to be written to a
                                                disk has hit either the disk
                                                or non-volatile memory.

     start            offset into the object.

     len              the number of bytes to synchronize.  The value 0 denotes
                      until the end of the object.

   Clocks
     The hypervisor should support two clocks, one for wall time and one for
     monotonically increasing time, the latter of which may be based on some
     arbitrary time (e.g. system boot time).  If this is not possible, the
     hypervisor must make a reasonable effort to retain semantics.

     int rumpuser_clock_gettime(int enum_rumpclock, int64_t *sec, long *nsec)

     enum_rumpclock   specifies the clock type.  In case of
                      RUMPUSER_CLOCK_RELWALL the wall time should be returned.
                      In case of RUMPUSER_CLOCK_ABSMONO the time of a
                      monotonic clock should be returned.

     sec              return value for seconds

     nsec             return value for nanoseconds

     int rumpuser_clock_sleep(int enum_rumpclock, int64_t sec, long nsec)

     enum_rumpclock   In case of RUMPUSER_CLOCK_RELWALL, the sleep should last
                      at least as long as specified.  In case of
                      RUMPUSER_CLOCK_ABSMONO, the sleep should last until the
                      hypervisor monotonic clock hits the specified absolute
                      time.

     sec              sleep duration, seconds.  exact semantics depend on clk.

     nsec             sleep duration, nanoseconds.  exact semantics depend on
                      clk.

   Parameter retrieval
     int rumpuser_getparam(const char *name, void *buf, size_t buflen)

     Retrieve a configuration parameter from the hypervisor.  It is up to the
     hypervisor to decide how the parameters can be set.

     name             name of the parameter.  If the name starts with an
                      underscore, it means a mandatory parameter.  The
                      mandatory parameters are RUMPUSER_PARAM_NCPU which
                      specifies the amount of virtual CPUs bootstrapped by the
                      rump kernel and RUMPUSER_PARAM_HOSTNAME which returns a
                      preferably unique instance name for the rump kernel.

     buf              buffer to return the data in as a string

     buflen           length of buffer

   Termination
     void rumpuser_exit(int value)

     Terminate the rump kernel with exit value value.  If value is
     RUMPUSER_PANIC the hypervisor should attempt to provide something akin to
     a core dump.

   Console output
     Console output is divided into two routines: a per-character one and
     printf-like one.  The former is used e.g. by the rump kernel's internal
     printf routine.  The latter can be used for direct debug prints e.g. very
     early on in the rump kernel's bootstrap or when using the in-kernel
     routine causes too much skew in the debug print results (the hypercall
     runs outside of the rump kernel and therefore does not cause any locking
     or scheduling events inside the rump kernel).

     void rumpuser_putchar(int ch)

     Output ch on the console.

     void rumpuser_dprintf(const char *fmt, ...)

     Do output based on printf-like parameters.

   Signals
     A rump kernel should be able to send signals to client programs due to
     some standard interfaces including signal delivery in their
     specifications.  Examples of these interfaces include setitimer(2) and
     write(2).  The rumpuser_kill() function advises the hypercall
     implementation to raise a signal for the process containing the rump
     kernel.

     int rumpuser_kill(int64_t pid, int sig)

     pid              The pid of the rump kernel process that the signal is
                      directed to.  This value may be used as the hypervisor
                      as a hint on how to deliver the signal.  The value
                      RUMPUSER_PID_SELF may also be specified to indicate no
                      hint.  This value will be removed in a future version of
                      the hypercall interface.

     sig              Number of signal to raise.  The value is in NetBSD
                      signal number namespace.  In case the host has a native
                      representation for signals, the value should be
                      translated before the signal is raised.  In case there
                      is no mapping between sig and native signals (if any),
                      the behavior is implementation-defined.

     A rump kernel will ignore the return value of this hypercall.  The only
     implication of not implementing rumpuser_kill() is that some application
     programs may not experience expected behavior for standard interfaces.

     As an aside,the rump_sp(7) protocol provides equivalent functionality for
     remote clients.

   Random pool
     int rumpuser_getrandom(void *buf, size_t buflen, int flags, size_t *retp)

     buf              buffer that the randomness is written to

     buflen           number of bytes of randomness requested

     flags            The value 0 or a combination of RUMPUSER_RANDOM_HARD
                      (return true randomness instead of something from a
                      PRNG) and RUMPUSER_RANDOM_NOWAIT (do not block in case
                      the requested amount of bytes is not available).

     retp             The number of random bytes written into buf.

   Threads
     int rumpuser_thread_create(void *(*fun)(void *), void *arg,
     const char *thrname, int mustjoin, int priority, int cpuidx,
     void **cookie)

     Create a schedulable host thread context.  The rump kernel will call this
     interface when it creates a kernel thread.  The scheduling policy for the
     new thread is defined by the hypervisor.  In case the hypervisor wants to
     optimize the scheduling of the threads, it can perform heuristics on the
     thrname, priority and cpuidx parameters.

     fun              function that the new thread must call.  This call will
                      never return.

     arg              argument to be passed to fun

     thrname          Name of the new thread.

     mustjoin         If 1, the thread will be waited for by
                      rumpuser_thread_join() when the thread exits.

     priority         The priority that the kernel requested the thread to be
                      created at.  Higher values mean higher priority.  The
                      exact kernel semantics for each value are not available
                      through this interface.

     cpuidx           The index of the virtual CPU that the thread is bound
                      to, or -1 if the thread is not bound.  The mapping
                      between the virtual CPUs and physical CPUs, if any, is
                      hypervisor implementation specific.

     cookie           In case mustjoin is set, the value returned in cookie
                      will be passed to rumpuser_thread_join().

     void rumpuser_thread_exit(void)

     Called when a thread created with rumpuser_thread_create() exits.

     int rumpuser_thread_join(void *cookie)

     Wait for a joinable thread to exit.  The cookie matches the value from
     rumpuser_thread_create().

     void rumpuser_curlwpop(int enum_rumplwpop, struct lwp *l)

     Manipulate the hypervisor's thread context database.  The possible
     operations are create, destroy, and set as specified by enum_rumplwpop:

     RUMPUSER_LWP_CREATE    Inform the hypervisor that l is now a valid thread
                            context which may be set.  A currently valid value
                            of l may not be specified.  This operation is
                            informational and does not mandate any action from
                            the hypervisor.

     RUMPUSER_LWP_DESTROY   Inform the hypervisor that l is no longer a valid
                            thread context.  This means that it may no longer
                            be set as the current context.  A currently set
                            context or an invalid one may not be destroyed.
                            This operation is informational and does not
                            mandate any action from the hypervisor.

     RUMPUSER_LWP_SET       Set l as the current host thread's rump kernel
                            context.  A previous context must not exist.

     RUMPUSER_LWP_CLEAR     Clear the context previous set by
                            RUMPUSER_LWP_SET.  The value passed in l is the
                            current thread and is never NULL.

     struct lwp * rumpuser_curlwp(void)

     Retrieve the rump kernel thread context associated with the current host
     thread, as set by rumpuser_curlwpop().  This routine may be called when a
     context is not set and the routine must return NULL in that case.  This
     interface is expected to be called very often.  Any optimizations
     pertaining to the execution speed of this routine should be done in
     rumpuser_curlwpop().

     void rumpuser_seterrno(int errno)

     Set an errno value in the calling thread's TLS.  Note: this is used only
     if rump kernel clients make rump system calls.

   Mutexes, rwlocks and condition variables
     The locking interfaces have standard semantics, so we will not discuss
     each one in detail.  The data types struct rumpuser_mtx, struct
     rumpuser_rw and struct rumpuser_cv used by these interfaces are opaque to
     the rump kernel, i.e. the hypervisor has complete freedom over them.

     Most of these interfaces will (and must) relinquish the rump kernel CPU
     context in case they block (or intend to block).  The exceptions are the
     "nowrap" variants of the interfaces which may not relinquish rump kernel
     context.

     void rumpuser_mutex_init(struct rumpuser_mtx **mtxp, int flags)

     void rumpuser_mutex_enter(struct rumpuser_mtx *mtx)

     void rumpuser_mutex_enter_nowrap(struct rumpuser_mtx *mtx)

     int rumpuser_mutex_tryenter(struct rumpuser_mtx *mtx)

     void rumpuser_mutex_exit(struct rumpuser_mtx *mtx)

     void rumpuser_mutex_destroy(struct rumpuser_mtx *mtx)

     void rumpuser_mutex_owner(struct rumpuser_mtx *mtx, struct lwp **lp)

     Mutexes provide mutually exclusive locking.  The flags, of which at least
     one must be given, are as follows:

     RUMPUSER_MTX_SPIN     Create a spin mutex.  Locking this type of mutex
                           must not relinquish rump kernel context even when
                           rumpuser_mutex_enter() is used.

     RUMPUSER_MTX_KMUTEX   The mutex must track and be able to return the rump
                           kernel thread that owns the mutex (if any).  If
                           this flag is not specified, rumpuser_mutex_owner()
                           will never be called for that particular mutex.

     void rumpuser_rw_init(struct rumpuser_rw **rwp)

     void rumpuser_rw_enter(int enum_rumprwlock, struct rumpuser_rw *rw)

     int rumpuser_rw_tryenter(int enum_rumprwlock, struct rumpuser_rw *rw)

     int rumpuser_rw_tryupgrade(struct rumpuser_rw *rw)

     void rumpuser_rw_downgrade(struct rumpuser_rw *rw)

     void rumpuser_rw_exit(struct rumpuser_rw *rw)

     void rumpuser_rw_destroy(struct rumpuser_rw *rw)

     void rumpuser_rw_held(int enum_rumprwlock, struct rumpuser_rw *rw,
     int *heldp)

     Read/write locks provide either shared or exclusive locking.  The
     possible values for lk are RUMPUSER_RW_READER and RUMPUSER_RW_WRITER.
     Upgrading means trying to migrate from an already owned shared lock to an
     exclusive lock and downgrading means migrating from an already owned
     exclusive lock to a shared lock.

     void rumpuser_cv_init(struct rumpuser_cv **cvp)

     void rumpuser_cv_destroy(struct rumpuser_cv *cv)

     void rumpuser_cv_wait(struct rumpuser_cv *cv, struct rumpuser_mtx *mtx)

     void rumpuser_cv_wait_nowrap(struct rumpuser_cv *cv, struct rumpuser_mtx
     *mtx)

     int rumpuser_cv_timedwait(struct rumpuser_cv *cv,
     struct rumpuser_mtx *mtx, int64_t sec, int64_t nsec)

     void rumpuser_cv_signal(struct rumpuser_cv *cv)

     void rumpuser_cv_broadcast(struct rumpuser_cv *cv)

     void rumpuser_cv_has_waiters(struct rumpuser_cv *cv, int *waitersp)

     Condition variables wait for an event.  The mtx interlock eliminates a
     race between checking the predicate and sleeping on the condition
     variable; the mutex should be released for the duration of the sleep in
     the normal atomic manner.  The timedwait variant takes a specifier
     indicating a relative sleep duration after which the routine will return
     with ETIMEDOUT.  If a timedwait is signaled before the timeout expires,
     the routine will return 0.

     The order in which the hypervisor reacquires the rump kernel context and
     interlock mutex before returning into the rump kernel is as follows.  In
     case the interlock mutex was initialized with both RUMPUSER_MTX_SPIN and
     RUMPUSER_MTX_KMUTEX, the rump kernel context is scheduled before the
     mutex is reacquired.  In case of a purely RUMPUSER_MTX_SPIN mutex, the
     mutex is acquired first.  In the final case the order is implementation-
     defined.

RETURN VALUES
     All routines which return an integer return an errno value.  The
     hypervisor must translate the value to the native errno namespace used by
     the rump kernel.  Routines which do not return an integer may never fail.

SEE ALSO
     rump(3)

     Antti Kantee, "Flexible Operating System Internals: The Design and
     Implementation of the Anykernel and Rump Kernels", Aalto University
     Doctoral Dissertations, 2012, Section 2.3.2: The Hypercall Interface.

     For a list of all known implementations of the rumpuser interface, see
     https://github.com/rumpkernel/wiki/wiki/Platforms.

HISTORY
     The rump kernel hypercall API was first introduced in NetBSD 5.0.  The
     API described above first appeared in NetBSD 7.0.

NetBSD 10.99                     July 15, 2023                    NetBSD 10.99