Updated: 2021/Apr/14


MEMBAR_OPS(3)              Library Functions Manual              MEMBAR_OPS(3)

NAME
     membar_ops, membar_enter, membar_exit, membar_producer, membar_consumer,
     membar_datadep_consumer, membar_sync - memory ordering barriers

SYNOPSIS
     #include <sys/atomic.h>

     void
     membar_enter(void);

     void
     membar_exit(void);

     void
     membar_producer(void);

     void
     membar_consumer(void);

     void
     membar_datadep_consumer(void);

     void
     membar_sync(void);

DESCRIPTION
     The membar_ops family of functions prevent reordering of memory
     operations, as needed for synchronization in multiprocessor execution
     environments that have relaxed load and store order.

     In general, memory barriers must come in pairs -- a barrier on one CPU,
     such as membar_exit(), must pair with a barrier on another CPU, such as
     membar_enter(), in order to synchronize anything between the two CPUs.
     Code using membar_ops should generally be annotated with comments
     identifying how they are paired.

     membar_ops affect only operations on regular memory, not on device
     memory; see bus_space(9) and bus_dma(9) for machine-independent
     interfaces to handling device memory and DMA operations for device
     drivers.

     Unlike C11, all memory operations -- that is, all loads and stores on
     regular memory -- are affected by membar_ops, not just C11 atomic
     operations on _Atomic-qualified objects.

     membar_enter()
           Any store preceding membar_enter() will happen before all memory
           operations following it.

           An atomic read/modify/write operation (atomic_ops(3)) followed by a
           membar_enter() implies a load-acquire operation in the language of
           C11.

           WARNING: A load followed by membar_enter() does not imply a
           load-acquire operation, even though membar_exit() followed by a
           store implies a store-release operation; the symmetry of these
           names and asymmetry of the semantics is a historical mistake.  In
           the NetBSD kernel, you can use atomic_load_acquire(9) for a
           load-acquire operation without any atomic read/modify/write.

           membar_enter() is typically used in code that implements locking
           primitives to ensure that a lock protects its data, and is
           typically paired with membar_exit(); see below for an example.

     membar_exit()
           All memory operations preceding membar_exit() will happen before
           any store that follows it.

           A membar_exit() followed by a store implies a store-release
           operation in the language of C11.  membar_exit() should only be
           used before atomic read/modify/write, such as atomic_inc_uint(3).
           For regular stores, instead of membar_exit(); *p = x, you should
           use atomic_store_release(p, x).

           membar_exit() is typically paired with membar_enter(), and is
           typically used in code that implements locking or reference
           counting primitives.  Releasing a lock or reference count should
           use membar_exit(), and acquiring a lock or handling an object after
           draining references should use membar_enter(), so that whatever
           happened before releasing will also have happened before acquiring.
           For example:

                   /* thread A -- release a reference */
                   obj->state.mumblefrotz = 42;
                   KASSERT(valid(&obj->state));
                   membar_exit();
                   atomic_dec_uint(&obj->refcnt);

                   /*
                    * thread B -- busy-wait until last reference is released,
                    * then lock it by setting refcnt to UINT_MAX
                    */
                   while (atomic_cas_uint(&obj->refcnt, 0, -1) != 0)
                           continue;
                   membar_enter();
                   KASSERT(valid(&obj->state));
                   obj->state.mumblefrotz--;

           In this example, if the load in atomic_cas_uint() in thread B
           witnesses the store in atomic_dec_uint() in thread A setting the
           reference count to zero, then everything in thread A before the
           membar_exit() is guaranteed to happen before everything in thread B
           after the membar_enter(), as if the machine had sequentially
           executed:

                   obj->state.mumblefrotz = 42;    /* from thread A */
                   KASSERT(valid(&obj->state));
                   ...
                   KASSERT(valid(&obj->state));    /* from thread B */
                   obj->state.mumblefrotz--;

           membar_exit() followed by a store, serving as a store-release
           operation, may also be paired with a subsequent load followed by
           membar_sync(), serving as the corresponding load-acquire operation.
           However, you should use atomic_store_release(9) and
           atomic_load_acquire(9) instead in that situation, unless the store
           is an atomic read/modify/write which requires a separate
           membar_exit().

     membar_producer()
           All stores preceding membar_producer() will happen before any
           stores following it.

           membar_producer() has no analogue in C11.

           membar_producer() is typically used in code that produces data for
           read-only consumers which use membar_consumer(), such as
           `seqlocked' snapshots of statistics; see below for an example.

     membar_consumer()
           All loads preceding membar_consumer() will complete before any
           loads after it.

           membar_consumer() has no analogue in C11.

           membar_consumer() is typically used in code that reads data from
           producers which use membar_producer(), such as `seqlocked'
           snapshots of statistics.  For example:

           struct {
                   /* version number and in-progress bit */
                   unsigned        seq;

                   /* read-only statistics, too large for atomic load */
                   unsigned        foo;
                   int             bar;
                   uint64_t        baz;
           } stats;

                   /* producer (must be serialized, e.g. with mutex(9)) */
                   stats->seq |= 1;        /* mark update in progress */
                   membar_producer();
                   stats->foo = count_foo();
                   stats->bar = measure_bar();
                   stats->baz = enumerate_baz();
                   membar_producer();
                   stats->seq++;           /* bump version number */

                   /* consumer (in parallel w/ producer, other consumers) */
           restart:
                   while ((seq = stats->seq) & 1)  /* wait for update */
                           SPINLOCK_BACKOFF_HOOK;
                   membar_consumer();
                   foo = stats->foo;       /* read out a candidate snapshot */
                   bar = stats->bar;
                   baz = stats->baz;
                   membar_consumer();
                   if (seq != stats->seq)  /* try again if version changed */
                           goto restart;

     membar_datadep_consumer()
           Same as membar_consumer(), but limited to loads of addresses
           dependent on prior loads, or `data-dependent' loads:

                 int **pp, *p, v;

                 p = *pp;
                 membar_datadep_consumer();
                 v = *p;
                 consume(v);

           membar_datadep_consumer() is typically paired with membar_exit() by
           code that initializes an object before publishing it.  However, you
           should use atomic_store_release(9) and atomic_load_consume(9)
           instead, to avoid obscure edge cases in case the consumer is not
           read-only.

           membar_datadep_consumer() does not guarantee ordering of loads in
           branches, or `control-dependent' loads -- you must use
           membar_consumer() instead:

                 int *ok, *p, v;

                 if (*ok) {
                         membar_consumer();
                         v = *p;
                         consume(v);
                 }

           Most CPUs do not reorder data-dependent loads (i.e., most CPUs
           guarantee that cached values are not stale in that case), so
           membar_datadep_consumer() is a no-op on those CPUs.

     membar_sync()
           All memory operations preceding membar_sync() will happen before
           any memory operations following it.

           membar_sync() is a sequential consistency acquire/release barrier,
           analogous to atomic_thread_fence(memory_order_seq_cst) in C11.

           membar_sync() is typically paired with membar_sync().

           A load followed by membar_sync(), serving as a load-acquire
           operation, may also be paired with a prior membar_exit() followed
           by a store, serving as the corresponding store-release operation.
           However, you should use atomic_load_acquire(9) instead of
           load-then-membar_sync() if it is a regular load, or membar_enter()
           instead of membar_sync() if the load is in an atomic
           read/modify/write operation.

SEE ALSO
     atomic_ops(3), atomic_loadstore(9)

HISTORY
     The membar_ops functions first appeared in NetBSD 5.0.  The data-
     dependent load barrier, membar_datadep_consumer(), first appeared in
     NetBSD 7.0.

NetBSD 9.99                    September 2, 2020                   NetBSD 9.99