Updated: 2025/Nov/16

Please read Privacy Policy. It's for your privacy.


MBRTOC16(3)                Library Functions Manual                MBRTOC16(3)

NAME
     mbrtoc16 - Restartable multibyte to UTF-16 conversion

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include <uchar.h>

     size_t
     mbrtoc16(char16_t * restrict pc16, const char * restrict s, size_t n,
         mbstate_t * restrict ps);

DESCRIPTION
     The mbrtoc16 function decodes multibyte characters in the current locale
     and converts them to UTF-16, keeping state so it can restart after
     incremental progress.

     Each call to mbrtoc16:
     1.   examines up to n bytes starting at s,
     2.   yields a UTF-16 code unit if available by storing it at *pc16,
     3.   saves state at ps, and
     4.   returns either the number of bytes consumed if any or a special
          return value.

     Specifically:

        If the multibyte sequence at s is invalid after any previous input
         saved at ps, or if an error occurs in decoding, mbrtoc16 returns
         (size_t)-1 and sets errno(2) to indicate the error.

        If the multibyte sequence at s is still incomplete after n bytes,
         including any previous input saved in ps, mbrtoc16 saves its state in
         ps after all the input so far and returns (size_t)-2.  All n bytes of
         input are consumed in this case.

        If mbrtoc16 had previously decoded a multibyte character but has not
         yet yielded all the code units of its UTF-16 encoding, it stores the
         next UTF-16 code unit at *pc16 and returns (size_t)-3.  No bytes of
         input are consumed in this case.

        If mbrtoc16 decodes the null multibyte character, then it stores zero
         at *pc16 and returns zero.

        Otherwise, mbrtoc16 decodes a single multibyte character, stores the
         first (and possibly only) code unit in its UTF-16 encoding at *pc16,
         and returns the number of bytes consumed to decode the first
         multibyte character.

     If pc16 is a null pointer, nothing is stored, but the effects on ps and
     the return value are unchanged.

     If s is a null pointer, the mbrtoc16 call is equivalent to:

           mbrtoc16(NULL, "", 1, ps)

     This always returns zero, and has the effect of resetting ps to the
     initial conversion state, without writing to pc16, even if it is nonnull.

     If ps is a null pointer, mbrtoc16 uses an internal mbstate_t object with
     static storage duration, distinct from all other mbstate_t objects
     (including those used by mbrtoc8(3), mbrtoc32(3), c8rtomb(3),
     c16rtomb(3), and c32rtomb(3)), which is initialized at program startup to
     the initial conversion state.

IMPLEMENTATION NOTES
     On well-formed input, the mbrtoc16 function yields either a Unicode
     scalar value in the Basic Multilingual Plane (BMP), i.e., a 16-bit
     Unicode code point that is not a surrogate code point, or, over two
     successive calls, yields the high and low surrogate code points (in that
     order) of a Unicode scalar value outside the BMP.

RETURN VALUES
     The mbrtoc16 function returns:

     0                 [null] if mbrtoc16 decoded a null multibyte character.

     i                 [code unit] where 1 <= i <= n, if mbrtoc16 consumed i
                       bytes of input to decode the next multibyte character,
                       yielding a UTF-16 code unit.

     (size_t)-3        [continuation] if mbrtoc16 consumed no new bytes of
                       input but yielded a UTF-16 code unit that was pending
                       from previous input.

     (size_t)-2        [incomplete] if mbrtoc16 found only an incomplete
                       multibyte sequence after all n bytes of input and any
                       previous input, and saved its state to restart in the
                       next call with ps.

     (size_t)-1        [error] if any encoding error was detected; errno(2) is
                       set to reflect the error.

EXAMPLES
     Print the UTF-16 code units of a multibyte string in hexadecimal text:

           char *s = ...;
           size_t n = ...;
           mbstate_t mbs = {0};    /* initial conversion state */

           while (n) {
                   char16_t c16;
                   size_t len;

                   len = mbrtoc16(&c16, s, n, &mbs);
                   switch (len) {
                   case 0:         /* NUL terminator */
                           assert(c16 == 0);
                           goto out;
                   default:        /* scalar value or high surrogate */
                           printf("U+%04"PRIx16"\n", (uint16_t)c16);
                           break;
                   case (size_t)-3: /* low surrogate */
                           printf("continue U+%04"PRIx16"\n", (uint16_t)c16);
                           break;
                   case (size_t)-2: /* incomplete */
                           printf("incomplete\n");
                           goto readmore;
                   case (size_t)-1: /* error */
                           printf("error: %d\n", errno);
                           goto out;
                   }
                   s += len;
                   n -= len;
           }

ERRORS
     [EILSEQ]      The multibyte sequence cannot be decoded in the current
                   locale as a Unicode scalar value.

     [EIO]         An error occurred in loading the locale's character
                   conversions.

SEE ALSO
     c16rtomb(3), c32rtomb(3), c8rtomb(3), mbrtoc32(3), mbrtoc8(3), uchar(3)

     The Unicode Standard,
     https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf,
     The Unicode Consortium, September 2022, Version 15.0 -- Core
     Specification.

     P. Hoffman and F. Yergeau, UTF-16, an encoding of ISO 10646, Internet
     Engineering Task Force, RFC 2781,
     https://datatracker.ietf.org/doc/html/rfc2781, February 2000.

STANDARDS
     The mbrtoc16 function conforms to ISO/IEC 9899:2011 ("ISO C11").

HISTORY
     The mbrtoc16 function first appeared in NetBSD 11.0.

NetBSD 11.99                    August 14, 2024                   NetBSD 11.99