Updated: 2022/Sep/29

Please read Privacy Policy. It's for your privacy.


NAMEI(9)                   Kernel Developer's Manual                  NAMEI(9)

NAME
     namei, NDINIT, NDAT, namei_simple_kernel, namei_simple_user, relookup,
     lookup_for_nfsd, lookup_for_nfsd_index - pathname lookup

SYNOPSIS
     #include <sys/namei.h>
     #include <sys/uio.h>
     #include <sys/vnode.h>

     NDINIT(struct nameidata *ndp, u_long op, u_long flags,
         struct pathbuf *pathbuf);

     NDAT(struct nameidata *ndp, struct vnode *dvp);

     int
     namei(struct nameidata *ndp);

     int
     namei_simple_kernel(const char *path, namei_simple_flags_t sflags,
         struct vnode **ret);

     int
     namei_simple_user(const char *path, namei_simple_flags_t sflags,
         struct vnode **ret);

     int
     relookup(struct vnode *dvp, struct vnode **vpp,
         struct componentname *cnp, int dummy);

     int
     lookup_for_nfsd(struct nameidata *ndp, struct vnode *startdir,
         int neverfollow);

     int
     lookup_for_nfsd_index(struct nameidata *ndp, struct vnode *startdir);

DESCRIPTION
     The namei interface is used to convert pathnames to file system vnodes.
     The name of the interface is actually a contraction of the words name and
     inode for name-to-inode conversion, in the days before the vfs(9)
     interface was implemented.

     All access to the namei interface must be in process context.  Pathname
     lookups cannot be done in interrupt context.

     In the general form of namei, a caller must:
     1.   Allocate storage for a struct nameidata object nd.
     2.   Initialize nd with NDINIT() and optionally NDAT() to specify the
          arguments to a lookup.
     3.   Call namei() and handle failure if it returns a nonzero error code.
     4.   Read the resulting vnode out of nd.ni_vp.  If requested with
          LOCKPARENT, read the directory vnode out of nd.ni_dvp.
     5.   For directory operations, use the struct componentname object stored
          at nd.ni_cnd.

     The other fields of struct nameidata should not be examined or altered
     directly.

     Note that the nfs(4) code misuses struct nameidata and currently has an
     incestuous relationship with the namei code.  This is gradually being
     cleaned up.

     The struct componentname type has the following layout:

     struct componentname {
             /*
              * Arguments to VOP_LOOKUP and directory VOP routines.
              */
             uint32_t        cn_nameiop;     /* namei operation */
             uint32_t        cn_flags;       /* flags to namei */
             kauth_cred_t    cn_cred;        /* credentials */
             const char      *cn_nameptr;    /* pointer to looked up name */
             size_t          cn_namelen;     /* length of looked up comp */
             /*
              * Side result from VOP_LOOKUP.
              */
             size_t          cn_consume;     /* chars to consume in lookup */
     };

     This structure contains the information about a single directory
     component name, along with certain other information required by vnode
     operations.  See vnodeops(9) for more information about these vnode
     operations.

     The members:
           cn_nameiop    The type of operation in progress; indicates the
                         basic operating mode of namei.  May be one of LOOKUP,
                         CREATE, DELETE, or RENAME.  These modes are described
                         below.
           cn_flags      Additional flags affecting the operation of namei.
                         These are described below as well.
           cn_cred       The credentials to use for the lookup or other
                         operation the componentname is passed to.  This may
                         match the credentials of the current process or it
                         may not, depending on where the original operation
                         request came from and how it has been routed.
           cn_nameptr    The name of this directory component, followed by the
                         rest of the path being looked up.
           cn_namelen    The length of the name of this directory component.
                         The name is not in general null terminated, although
                         the complete string (the full remaining path) always
                         is.
           cn_consume    This field starts at zero; it may be set to a larger
                         value by implementations of VOP_LOOKUP(9) to indicate
                         how many more characters beyond cn_namelen are being
                         consumed.  New uses of this feature are discouraged
                         and should be discussed.

   Operating modes
     Each lookup happens in one of the following modes, specified by callers
     of namei with NDINIT() and specified internally by namei to
     VOP_LOOKUP(9):
        Callers of namei specify the mode for the last component of a lookup.
        Internally, namei recursively calls VOP_LOOKUP(9) in LOOKUP mode for
         each directory component, and then finally calls VOP_LOOKUP(9) in the
         caller-specified mode for the last component.
     Each mode can fail in different ways -- for example, LOOKUP mode fails
     with ENOENT if no entry exists, but CREATE mode succeeds with a NULL
     vnode.

     LOOKUP  Yield the vnode for an existing entry.  Callers specify LOOKUP
             for operations on existing vnodes: stat(2), open(2) without
             O_CREATE, etc.

             File systems:
             -   MUST refuse if user lacks lookup permission for directory.
             -   SHOULD use namecache(9) to cache lookup results.

             [ENOENT]      No entry exists.

     CREATE  Yield the vnode for an existing entry; or, if there is none,
             yield NULL and hint that it will soon be created.  Callers
             specify CREATE for operations that may create directory entries:
             mkdir(2), open(2) with O_CREATE, etc.

             File systems:
             -   MUST refuse if user lacks lookup permission for directory.
             -   MUST refuse if no entry exists and user lacks write
                 permission for directory.
             -   MUST refuse if no entry exists and file system is read-only.
             -   SHOULD NOT use namecache(9) to cache negative lookup results.
             -   SHOULD save lookup hints internally in the directory for a
                 subsequent operation to create a directory entry.

             [EPERM]       The user lacks lookup permission for the directory.
             [EPERM]       No entry exists and the user lacks write permission
                           for the directory.
             [EROFS]       No entry exists and the file system is read-only.

     DELETE  Yield the vnode of an existing entry, and hint that it will soon
             be deleted.  Callers specify DELETE for operations that delete
             directory entries: unlink(2), rmdir(2), etc.

             File systems:
             -   MUST refuse if user lacks lookup permission for directory.
             -   MUST refuse if entry exists and user lacks write permission
                 for directory.
             -   MUST refuse if entry exists and file system is read-only.
             -   SHOULD NOT use namecache(9) to cache lookup results.
             -   SHOULD save lookup hints internally in the directory for a
                 subsequent operation to delete a directory entry.

             [ENOENT]      No entry exists.
             [EPERM]       The user lacks lookup permission for the directory.
             [EPERM]       An entry exists and the user lacks write permission
                           for the directory.
             [EROFS]       An entry exists and the file system is read-only.

     RENAME  Yield the vnode of an existing entry, and hint that it will soon
             be overwritten; or, if there is none, yield NULL, and hint that
             it will soon be created.

             Callers specify RENAME for an entry that is about to be created
             or overwritten, namely for the target of rename(2).

             File systems:
             -   MUST refuse if user lacks lookup permission for directory.
             -   MUST refuse if user lacks write permission for directory.
             -   MUST refuse if file system is read-only.
             -   SHOULD NOT use namecache(9) to cache lookup results.
             -   SHOULD save lookup hints internally in the directory for a
                 subsequent operation to create or overwrite a directory
                 entry.

             [EPERM]       The user lacks lookup permission for the directory.
             [EPERM]       The user lacks write permission for the directory.
             [EROFS]       The file system is read-only.

     If a caller decides not to perform an operation it hinted at by a
     destructive operating mode (CREATE, DELETE, or RENAME), it SHOULD call
     VOP_ABORTOP(9) to release the hints.  If a file system fails to perform
     such an operation, it SHOULD call VOP_ABORTOP(9) to release the hints.
     However, the current code is inconsistent about this, and every
     implementation of VOP_ABORTOP(9) does nothing.

   Flags
     The following flags may be specified by callers of namei, and MUST NOT be
     used by file systems:

     FOLLOW        Follow symbolic links in the last path component.  Used by
                   operations that do not address symbolic links directly,
                   such as stat(2).  (Does not affect symbolic links found in
                   the middle of a path.)

     NOFOLLOW      Do not follow symbolic links in the last path component.
                   Used by operations that address symbolic links directly,
                   such as lstat(2).

                   Note: The value of NOFOLLOW is 0.  We define the constant
                   to let callers say either FOLLOW or NOFOLLOW explicitly.

     LOCKLEAF      On successful lookup, lock the vnode, if any, in
                   ndp->ni_vp.  Without this flag, it would be unlocked.

     LOCKPARENT    On successful lookup, lock and return the directory vnode
                   in ndp->ni_dvp.  Without this flag, it is not returned at
                   all.

     TRYEMULROOT   If set, the path is looked up in the emulation root of the
                   current process first.  If that fails, the system root is
                   used.

     EMULROOTSET   Indicates that the caller has set ndp->ni_erootdir prior to
                   calling namei.  This is only useful or permitted when the
                   emulation in the current process is partway through being
                   set up.

     NOCHROOT      Bypass normal chroot(8) handling for absolute paths.

     NOCROSSMOUNT  Do not cross mount points.

     RDONLY        Enforce read-only behavior.

     CREATEDIR     Accept slashes after a component name that does not exist.
                   This only makes sense in CREATE mode and when creating a
                   directory.

     NOCACHE       Do not cache the lookup result for the last component name.
                   This is used only with the RENAME mode for the target; the
                   cache entry would be invalidated immediately.

     The following flag may be set by a caller of namei and tested by a file
     system in VOP_LOOKUP(9) or other subsequent directory operations:

     DOWHITEOUT    Allow whiteouts to be seen as objects instead of
                   functioning as "nothing there".

     The following flags are set by namei for calling VOP_LOOKUP(9):

     ISDOTDOT      The current pathname component is "..".  May be tested by
                   subsequent directory operations too.

     ISLASTCN      The current pathname component is the last component found
                   in the pathname.  Guaranteed to remain set in subsequent
                   directory operations.

     REQUIREDIR    The current object to be looked up must be a directory.
                   May not be used by subsequent directory operations.

     MAKEENTRY     The lookup result for the current pathname component should
                   be added to the namecache(9).  May be used to make
                   additional caching decisions, e.g. to store an mtime for
                   determining whether our cache for a remote vnode is stale.
                   May not be used by subsequent directory operations.

     A file system may set the following flag on return from VOP_LOOKUP(9) for
     use by namei, namecache(9), and subsequent directory operations:

     ISWHITEOUT    The object at the current pathname component is a whiteout.

     The following additional historic flags have been removed from NetBSD and
     should be handled as follows if porting code from elsewhere:

     INRENAME      Part of a misbegotten and incorrect locking scheme.  Any
                   file-system-level code using this is presumptively
                   incorrect.  File systems should use the genfs_rename(9)
                   interface to handle locking in VOP_RENAME().

     INRELOOKUP    Used at one point for signaling to puffs(3) to work around
                   a protocol deficiency that was later rectified.

     ISSYMLINK     Useless internal state.

     SAVESTART     Unclean setting affect vnode reference counting.  Now
                   effectively never in effect.  Any code referring to this is
                   suspect.

     SAVENAME      Unclean setting relating to responsibility for freeing
                   pathname buffers in the days before the pathbuf structure.
                   Now effectively always in effect; the caller of namei owns
                   the pathbuf structure and is always responsible for
                   destroying it.

     HASBUF        Related to SAVENAME.  Any uses can be replaced with "true".

FUNCTIONS
     NDINIT(ndp, op, flags, pathbuf)
           Initialise a nameidata structure pointed to by ndp for use by the
           namei interface.  The operating mode and flags (as documented
           above) are specified by op and flags respectively.  The pathname is
           passed as a pathbuf structure, which should be initialized using
           one of the pathbuf(9) operations.  Destroying the pathbuf is the
           responsibility of the caller; this must not be done until the
           caller is finished with all of the namei results and all of the
           nameidata contents except for the result vnode.

           This routine stores the credentials of the calling thread (curlwp)
           in ndp.  NDINIT() sets the credentials using kauth_cred_get(9).  In
           the rare case that another set of credentials is required for the
           namei operation, ndp->ni_cnd.cn_cred must be set manually after
           NDINIT().

     NDAT(ndp, dvp)
           This macro is used after NDINIT() to set the starting directory.
           This supersedes the current process's current working directory as
           the initial point of departure for looking up relative paths.  This
           mechanism is used by openat(2) and related calls.

     namei(ndp)
           Convert a pathname into a pointer to a vnode.  The nameidata
           structure pointed to by ndp should be initialized with the NDINIT()
           macro, and perhaps also the NDAT() macro.  Direct initialization of
           members of struct nameidata is not supported and may (will) break
           silently in the future.

           The vnode for the pathname is returned in ndp->ni_vp.  The parent
           directory is returned locked in ndp->ni_dvp iff LOCKPARENT is
           specified.

           Any or all of the flags documented above as set by the caller can
           be enabled by passing them (OR'd together) as the flags argument of
           NDINIT().  As discussed above every such call should explicitly
           contain either FOLLOW or NOFOLLOW to control the behavior regarding
           final symbolic links.

     namei_simple_kernel(path, sflags, ret)
           Look up the path path and translate it to a vnode, returned in ret.
           The path argument must be a kernel (UIO_SYSSPACE) pointer.  The
           sflags argument chooses the precise behavior.  It may be set to one
           of the following symbols:
                 NSM_NOFOLLOW_NOEMULROOT
                 NSM_NOFOLLOW_TRYEMULROOT
                 NSM_FOLLOW_NOEMULROOT
                 NSM_FOLLOW_TRYEMULROOT
           These select (or not) the FOLLOW/NOFOLLOW and TRYEMULROOT flags.
           Other flags are not available through this interface, which is
           nonetheless sufficient for more than half the namei() usage in the
           kernel.  Note that the encoding of sflags has deliberately been
           arranged to be type-incompatible with anything else.  This prevents
           various possible accidents while the namei() interface is being
           rototilled.

     namei_simple_user(path, sflags, ret)
           This function is the same as namei_simple_kernel() except that the
           path argument shall be a user pointer (UIO_USERSPACE) rather than a
           kernel pointer.

     relookup(dvp, vpp, cnp, dummy)
           Reacquire a path name component is a directory.  This is a quicker
           way to lookup a pathname component when the parent directory is
           known.  The locked parent directory vnode is specified by dvp and
           the pathname component by cnp.  The vnode of the pathname is
           returned in the address specified by vpp.  The dummy argument is
           unused.  Note that one may only use relookup() to repeat a lookup
           of a final path component previously done by namei, and one must
           use the same componentname structure that call produced.  Otherwise
           the behavior is undefined and likely adverse.

     lookup_for_nfsd(ndp, startdir, neverfollow)
           This is a private entry point into namei used by the NFS server
           code.  It looks up a path starting from startdir.  If neverfollow
           is set, any symbolic link (not just at the end of the path) will
           cause an error.  Otherwise, it follows symlinks normally.  It
           should not be used by new code.

     lookup_for_nfsd_index(ndp, startdir)
           This is a (second) private entry point into namei used by the NFS
           server code.  It looks up a single path component starting from
           startdir.  It should not be used by new code.

INTERNALS
     The nameidata structure has the following layout:

     struct nameidata {
             /*
              * Arguments to namei.
              */
             struct vnode *ni_atdir;         /* startup dir, cwd if null */
             struct pathbuf *ni_pathbuf;     /* pathname container */
             char *ni_pnbuf;                 /* extra pathname buffer ref (XXX) */
             /*
              * Internal starting state. (But see notes.)
              */
             struct  vnode *ni_rootdir;      /* logical root directory */
             struct  vnode *ni_erootdir;     /* emulation root directory */
             /*
              * Results from namei.
              */
             struct  vnode *ni_vp;           /* vnode of result */
             struct  vnode *ni_dvp;          /* vnode of intermediate directory */
             /*
              * Internal current state.
              */
             size_t          ni_pathlen;     /* remaining chars in path */
             const char      *ni_next;       /* next location in pathname */
             unsigned int    ni_loopcnt;     /* count of symlinks encountered */
             /*
              * Lookup parameters: this structure describes the subset of
              * information from the nameidata structure that is passed
              * through the VOP interface.
              */
             struct componentname ni_cnd;
     };

     These fields are:
           ni_atdir      The directory to use for the starting point of
                         relative paths.  If null, the current process's
                         current directory is used.  This is initialized to
                         NULL by NDINIT() and set by NDAT().
           ni_pathbuf    The abstract path buffer in use, passed as an
                         argument to NDINIT().  The name pointers that appear
                         elsewhere, such as in the componentname structure,
                         point into this buffer.  It is owned by the caller
                         and must not be destroyed until all namei operations
                         are complete.  See pathbuf(9).
           ni_pnbuf      This is the name pointer used during namei.  It
                         points into ni_pathbuf.  It is not initialized until
                         entry into namei.
           ni_rootdir    The root directory to use as the starting point for
                         absolute paths.  This is retrieved from the current
                         process's current root directory when namei starts
                         up.  It is not initialized by NDINIT().
           ni_erootdir   The root directory to use as the emulation root, for
                         processes running in emulation.  This is retrieved
                         from the current process's emulation root directory
                         when namei starts up and not initialized by NDINIT().
                         As described elsewhere, it may be set by the caller
                         if the EMULROOTSET flag is used, but this should only
                         be done when the current process's emulation root
                         directory is not yet initialized.  (And ideally in
                         the future things would be tidied so that this is not
                         necessary.)
           ni_vp
           ni_dvp        Returned vnodes, as described above.  These only
                         contain valid values if namei returns successfully.
           ni_pathlen    The length of the full current remaining path string
                         in ni_pnbuf.  This is not initialized by NDINIT() and
                         is used only internally.
           ni_next       The remaining part of the path, after the current
                         component found in the componentname structure.  This
                         is not initialized by NDINIT() and is used only
                         internally.
           ni_loopcnt    The number of symbolic links encountered (and
                         traversed) so far.  If this exceeds a limit, namei
                         fails with ELOOP.  This is not initialized by
                         NDINIT() and is used only internally.
           ni_cnd        The componentname structure holding the current
                         directory component, and also the mode, flags, and
                         credentials.  The mode, flags, and credentials are
                         initialized by NDINIT(); the rest is not initialized
                         until namei runs.

     There is also a namei_state structure that is hidden within vfs_lookup.c.
     This contains the following additional state:
           docache        A flag indicating whether to cache the last pathname
                          component.
           rdonly         The read-only state, initialized from the RDONLY
                          flag.
           slashes        The number of trailing slashes found after the
                          current pathname component.
           attempt_retry  Set on some error cases (and not others) to indicate
                          that a failure in the emulation root should be
                          followed by a retry in the real system root.

     The state in namei_state is genuinely private to namei.  Note that much
     of the state in nameidata should also be private, but is currently not
     because it is misused in some fashion by outside code, usually nfs(4).

     The control flow within the namei portions of vfs_lookup.c is as follows.

     namei()              does a complete path lookup by calling namei_init(),
                          namei_tryemulroot(), and namei_cleanup().

     namei_init()         sets up the basic internal state and makes some
                          (precondition-type) assertions.

     namei_cleanup()      makes some postcondition-type assertions; it
                          currently does nothing besides this.

     namei_tryemulroot()  handles TRYEMULROOT by calling namei_oneroot() once
                          or twice as needed, and attends to making sure the
                          original pathname is preserved for the second try.

     namei_oneroot()      does a complete path search from a single root
                          directory.  It begins with namei_start(), then calls
                          lookup_once() (and if necessary, namei_follow())
                          repeatedly until done.  It also handles returning
                          the result vnode(s) in the requested state.

     namei_start()        sets up the initial state and locking; it calls
                          namei_getstartdir().

     namei_getstartdir()  initializes the root directory state (both
                          ni_rootdir and ni_erootdir) and picks the starting
                          directory, consuming the leading slashes of an
                          absolute path and handling the magic "/../" string
                          for bypassing the emulation root.  A different
                          version namei_getstartdir_for_nfsd() is used for
                          lookups coming from nfsd(8) as those are required to
                          have different semantics.

     lookup_once()        calls VOP_LOOKUP() for one path component, also
                          handling any needed crossing of mount points (either
                          up or down) and coping with locking requirements.

     lookup_parsepath()   is called prior to each lookup_once() call to
                          examine the pathname and find where the next
                          component starts.

     namei_follow()       reads the contents of a symbolic link and updates
                          both the path buffer and the search directory
                          accordingly.

     As a final note be advised that the magic return value associated with
     CREATE mode is different for namei than it is for VOP_LOOKUP().  The
     latter "fails" with EJUSTRETURN.  namei translates this into succeeding
     and returning a null vnode.

CODE REFERENCES
     The name lookup subsystem is implemented within the file
     sys/kern/vfs_lookup.c.

SEE ALSO
     intro(9), namecache(9), vfs(9), vnode(9), vnodeops(9)

BUGS
     There should be no such thing as operating modes.  Only LOOKUP is
     actually needed.  The behavior where removing an object looks it up
     within namei and then calls into the file system (which must look it up
     again internally or cache state from VOP_LOOKUP()) is particularly
     contorted.

     Most of the flags are equally bogus.

     Most of the contents of the nameidata structure should be private and
     hidden within namei; currently it cannot be because of abuse elsewhere.

     The EMULROOTSET flag is messy.

     There is no good way to support file systems that want to use a more
     elaborate pathname schema than the customary slash-delimited components.

NetBSD 10.99                      May 5, 2019                     NetBSD 10.99