Updated: 2022/Sep/29

Please read Privacy Policy. It's for your privacy.

NLS(7)                 Miscellaneous Information Manual                 NLS(7)

     NLS - Native Language Support Overview

     Native Language Support (NLS) provides commands for a single worldwide
     operating system base.  An internationalized system has no built-in
     assumptions or dependencies on language-specific or cultural-specific
     conventions such as:

              Character classifications
              Character comparison rules
              Character collation order
              Numeric and monetary formatting
              Date and time formatting
              Message-text language
              Character sets

     All information pertaining to cultural conventions and language is
     obtained at program run time.

     "Internationalization" (often abbreviated "i18n") refers to the operation
     by which system software is developed to support multiple cultural-
     specific and language-specific conventions.  This is a generalization
     process by which the system is untied from calling only English strings
     or other English-specific conventions.  "Localization" (often abbreviated
     "l10n") refers to the operations by which the user environment is
     customized to handle its input and output appropriate for specific
     language and cultural conventions.  This is a specialization process, by
     which generic methods already implemented in an internationalized system
     are used in specific ways.  The formal description of cultural
     conventions for some country, together with all associated translations
     targeted to the native language, is called the "locale".

     NetBSD provides extensive support to programmers and system developers to
     enable internationalized software to be developed.  NetBSD also supplies
     a large variety of locales for system localization.

   Localization of Information
     All locale information is accessible to programs at run time so that data
     is processed and displayed correctly for specific cultural conventions
     and language.

     A locale is divided into categories.  A category is a group of language-
     specific and culture-specific conventions as outlined in the list above.
     ISO C specifies the following six standard categories supported by

     LC_COLLATE     string-collation order information
     LC_CTYPE       character classification, case conversion, and other
                    character attributes
     LC_MESSAGES    the format for affirmative and negative responses
     LC_MONETARY    rules and symbols for formatting monetary numeric
     LC_NUMERIC     rules and symbols for formatting nonmonetary numeric
     LC_TIME        rules and symbols for formatting time and date information

     Localization of the system is achieved by setting appropriate values in
     environment variables to identify which locale should be used.  The
     environment variables have the same names as their respective locale
     categories.  Additionally, the LANG, LC_ALL, and NLSPATH environment
     variables are used.  The NLSPATH environment variable specifies a colon-
     separated list of directory names where the message catalog files of the
     NLS database are located.  The LC_ALL and LANG environment variables also
     determine the current locale.

     The values of these environment variables contains a string format as:


     Valid values for the language field come from the ISO639 standard which
     defines two-character codes for many languages.  Some common language
     codes are:

     Language Name      Code       Language Family
     ABKHAZIAN          AB         IBERO-CAUCASIAN
     AFAN (OROMO)       OM         HAMITIC
     AFAR               AA         HAMITIC
     AFRIKAANS          AF         GERMANIC
     ALBANIAN           SQ         INDO-EUROPEAN (OTHER)
     AMHARIC            AM         SEMITIC
     ARABIC             AR         SEMITIC
     ARMENIAN           HY         INDO-EUROPEAN (OTHER)
     ASSAMESE           AS         INDIAN
     AYMARA             AY         AMERINDIAN
     BASHKIR            BA         TURKIC/ALTAIC
     BASQUE             EU         BASQUE
     BENGALI            BN         INDIAN
     BHUTANI            DZ         ASIAN
     BIHARI             BH         INDIAN
     BISLAMA            BI
     BRETON             BR         CELTIC
     BULGARIAN          BG         SLAVIC
     BURMESE            MY         ASIAN
     BYELORUSSIAN       BE         SLAVIC
     CAMBODIAN          KM         ASIAN
     CATALAN            CA         ROMANCE
     CHINESE            ZH         ASIAN
     CORSICAN           CO         ROMANCE
     CROATIAN           HR         SLAVIC
     CZECH              CS         SLAVIC
     DANISH             DA         GERMANIC
     DUTCH              NL         GERMANIC
     ENGLISH            EN         GERMANIC
     ESTONIAN           ET         FINNO-UGRIC
     FAROESE            FO         GERMANIC
     FIJI               FJ         OCEANIC/INDONESIAN
     FINNISH            FI         FINNO-UGRIC
     FRENCH             FR         ROMANCE
     FRISIAN            FY         GERMANIC
     GALICIAN           GL         ROMANCE
     GEORGIAN           KA         IBERO-CAUCASIAN
     GERMAN             DE         GERMANIC
     GREEK              EL         LATIN/GREEK
     GREENLANDIC        KL         ESKIMO
     GUARANI            GN         AMERINDIAN
     GUJARATI           GU         INDIAN
     HAUSA              HA         NEGRO-AFRICAN
     HEBREW             HE         SEMITIC
     HINDI              HI         INDIAN
     HUNGARIAN          HU         FINNO-UGRIC
     ICELANDIC          IS         GERMANIC
     INUKTITUT          IU
     INUPIAK            IK         ESKIMO
     IRISH              GA         CELTIC
     ITALIAN            IT         ROMANCE
     JAPANESE           JA         ASIAN
     KANNADA            KN         DRAVIDIAN
     KASHMIRI           KS         INDIAN
     KAZAKH             KK         TURKIC/ALTAIC
     KIRGHIZ            KY         TURKIC/ALTAIC
     KURUNDI            RN         NEGRO-AFRICAN
     KOREAN             KO         ASIAN
     KURDISH            KU         IRANIAN
     LAOTHIAN           LO         ASIAN
     LATIN              LA         LATIN/GREEK
     LATVIAN            LV         BALTIC
     LINGALA            LN         NEGRO-AFRICAN
     LITHUANIAN         LT         BALTIC
     MACEDONIAN         MK         SLAVIC
     MALAY              MS         OCEANIC/INDONESIAN
     MALAYALAM          ML         DRAVIDIAN
     MALTESE            MT         SEMITIC
     MAORI              MI         OCEANIC/INDONESIAN
     MARATHI            MR         INDIAN
     MOLDAVIAN          MO         ROMANCE
     MONGOLIAN          MN
     NAURU              NA
     NEPALI             NE         INDIAN
     NORWEGIAN          NO         GERMANIC
     OCCITAN            OC         ROMANCE
     ORIYA              OR         INDIAN
     PASHTO             PS         IRANIAN
     PERSIAN (farsi)    FA         IRANIAN
     POLISH             PL         SLAVIC
     PORTUGUESE         PT         ROMANCE
     PUNJABI            PA         INDIAN
     QUECHUA            QU         AMERINDIAN
     ROMANIAN           RO         ROMANCE
     RUSSIAN            RU         SLAVIC
     SAMOAN             SM         OCEANIC/INDONESIAN
     SANGHO             SG         NEGRO-AFRICAN
     SANSKRIT           SA         INDIAN
     SCOTS GAELIC       GD         CELTIC
     SERBIAN            SR         SLAVIC
     SESOTHO            ST         NEGRO-AFRICAN
     SETSWANA           TN         NEGRO-AFRICAN
     SHONA              SN         NEGRO-AFRICAN
     SINDHI             SD         INDIAN
     SINGHALESE         SI         INDIAN
     SISWATI            SS         NEGRO-AFRICAN
     SLOVAK             SK         SLAVIC
     SLOVENIAN          SL         SLAVIC
     SOMALI             SO         HAMITIC
     SPANISH            ES         ROMANCE
     SWAHILI            SW         NEGRO-AFRICAN
     SWEDISH            SV         GERMANIC
     TAGALOG            TL         OCEANIC/INDONESIAN
     TAJIK              TG         IRANIAN
     TAMIL              TA         DRAVIDIAN
     TATAR              TT         TURKIC/ALTAIC
     TELUGU             TE         DRAVIDIAN
     THAI               TH         ASIAN
     TIBETAN            BO         ASIAN
     TIGRINYA           TI         SEMITIC
     TONGA              TO         OCEANIC/INDONESIAN
     TSONGA             TS         NEGRO-AFRICAN
     TURKISH            TR         TURKIC/ALTAIC
     TURKMEN            TK         TURKIC/ALTAIC
     TWI                TW         NEGRO-AFRICAN
     UIGUR              UG
     UKRAINIAN          UK         SLAVIC
     URDU               UR         INDIAN
     UZBEK              UZ         TURKIC/ALTAIC
     VIETNAMESE         VI         ASIAN
     VOLAPUK            VO         INTERNATIONAL AUX.
     WELSH              CY         CELTIC
     WOLOF              WO         NEGRO-AFRICAN
     XHOSA              XH         NEGRO-AFRICAN
     YIDDISH            YI         GERMANIC
     YORUBA             YO         NEGRO-AFRICAN
     ZHUANG             ZA
     ZULU               ZU         NEGRO-AFRICAN

     For example, the locale for the Danish language spoken in Denmark using
     the ISO 8859-1 character set is da_DK.ISO8859-1.  The da stands for the
     Danish language and the DK stands for Denmark.  The short form of da_DK
     is sufficient to indicate this locale.

     The environment variable settings are queried by their priority level in
     the following manner:

        If the LC_ALL environment variable is set, all six categories use the
         locale it specifies.

        If the LC_ALL environment variable is not set, each individual
         category uses the locale specified by its corresponding environment

        If the LC_ALL environment variable is not set, and a value for a
         particular LC_* environment variable is not set, the value of the
         LANG environment variable specifies the default locale for all
         categories.  Only the LANG environment variable should be set in
         /etc/profile, since it makes it most easy for the user to override
         the system default using the individual LC_* variables.

        If the LC_ALL environment variable is not set, a value for a
         particular LC_* environment variable is not set, and the value of the
         LANG environment variable is not set, the locale for that specific
         category defaults to the C locale.  The C or POSIX locale assumes the
         ASCII character set and defines information for the six categories.

   Character Sets
     A character is any symbol used for the organization, control, or
     representation of data.  A group of such symbols used to describe a
     particular language make up a character set.  It is the encoding values
     in a character set that provide the interface between the system and its
     input and output devices.

     The following character sets are supported in NetBSD:

     ASCII            The American Standard Code for Information Exchange
                      (ASCII) standard specifies 128 Roman characters and
                      control codes, encoded in a 7-bit character encoding

     ISO 8859 family  Industry-standard character sets specified by the
                      ISO/IEC 8859 standard.  The standard is divided into 15
                      numbered parts, with each part specifying broad script
                      similarities.  Examples include Western European,
                      Central European, Arabic, Cyrillic, Hebrew, Greek, and
                      Turkish.  The character sets use an 8-bit character
                      encoding scheme which is compatible with the ASCII
                      character set.

     Unicode          The Unicode character set is the full set of known
                      abstract characters of all real-world scripts.  It can
                      be used in environments where multiple scripts must be
                      processed simultaneously.  Unicode is compatible with
                      ISO 8859-1 (Western European) and ASCII.  Many character
                      encoding schemes are available for Unicode, including
                      UTF-8, UTF-16 and UTF-32.  These encoding schemes are
                      multi-byte encodings.  The UTF-8 encoding scheme uses
                      8-bit, variable-width encodings which is compatible with
                      ASCII.  The UTF-16 encoding scheme uses 16-bit,
                      variable-width encodings.  The UTF-32 encoding scheme
                      using 32-bit, fixed-width encodings.

   Font Sets
     A font set contains the glyphs to be displayed on the screen for a
     corresponding character in a character set.  A display must support a
     suitable font to display a character set.  If suitable fonts are
     available to the X server, then X clients can include support for
     different character sets.  xterm(1) includes support for Unicode with
     UTF-8 encoding.  xfd(1) is useful for displaying all the characters in an
     X font.

     The NetBSD wscons(4) console provides support for loading fonts using the
     wsfontload(8) utility.  Currently, only fonts for the ISO8859-1 family of
     character sets are supported.

   Internationalization for Programmers
     To facilitate translations of messages into various languages and to make
     the translated messages available to the program based on a user's
     locale, it is necessary to keep messages separate from the programs and
     provide them in the form of message catalogs that a program can access at
     run time.

     Access to locale information is provided through the setlocale(3) and
     nl_langinfo(3) interfaces.  See their respective man pages for further

     Message source files containing application messages are created by the
     programmer and converted to message catalogs.  These catalogs are used by
     the application to retrieve and display messages, as needed.

     NetBSD supports two message catalog interfaces: the X/Open catgets(3)
     interface and the Uniforum gettext(3) interface.  The catgets(3)
     interface has the advantage that it belongs to a standard which is well
     supported.  Unfortunately the interface is complicated to use and
     maintenance of the catalogs is difficult.  The implementation also
     doesn't support different character sets.  The gettext(3) interface has
     not been standardized yet, however it is being supported by an increasing
     number of systems.  It also provides many additional tools which make
     programming and catalog maintenance much easier.

   Support for Multi-byte Encodings
     Some character sets with multi-byte encodings may be difficult to decode,
     or may contain state (i.e., adjacent characters are dependent).  ISO C
     specifies a set of functions using 'wide characters' which can handle
     multi-byte encodings properly.  The behaviour of these functions is
     affected by the LC_CTYPE category of the current locale.

     A wide character is specified in ISO C as being a fixed number of bits
     wide and is stateless.  There are two types for wide characters: wchar_t
     and wint_t.  wchar_t is a type which can contain one wide character and
     operates like 'char' type does for one character.  wint_t can contain one
     wide character or WEOF (wide EOF).

     There are functions that operate on wchar_t, and substitute for functions
     operating on 'char'.  See wmemchr(3) and towlower(3) for details.  There
     are some additional functions that operate on wchar_t.  See wctype(3) and
     wctrans(3) for details.

     Wide characters should be used for all I/O processing which may rely on
     locale-specific strings.  The two primary issues requiring special use of
     wide characters are:

              All I/O is performed using multibyte characters.  Input data is
               converted into wide characters immediately after reading and
               data for output is converted from wide characters to multi-byte
               encoding immediately before writing.  Conversion is controlled
               by the mbstowcs(3), mbsrtowcs(3), wcstombs(3), wcsrtombs(3),
               mblen(3), mbrlen(3), and mbsinit(3).

              Wide characters are used directly for I/O, using getwchar(3),
               fgetwc(3), getwc(3), ungetwc(3), fgetws(3), putwchar(3),
               fputwc(3), putwc(3), and fputws(3).  They are also used for
               formatted I/O functions for wide characters such as fwscanf(3),
               wscanf(3), swscanf(3), fwprintf(3), wprintf(3), swprintf(3),
               vfwprintf(3), vwprintf(3), and vswprintf(3), and wide character
               identifier of %lc, %C, %ls, %S for conventional formatted I/O

     gencat(1), xfd(1), xterm(1), catgets(3), gettext(3), nl_langinfo(3),
     setlocale(3), wsfontload(8)

     This man page is incomplete.

NetBSD 10.99                   February 21, 2007                  NetBSD 10.99