Here is a proposal for a framework to support the auto-generation of block/character device switch tables by config(8).

All changes is merged into the trunk. All features is available in -current. (2002/Sep/6)


The Auto-Generation Block/Character Device Switch Tables by config(8)

1. Background

It is too painful to maintain port-dependent conf.c, conf.h and sys/conf.h.

2. Current Implementation

Now we have block/character device switch (bdevsw, cdevsw) tables in sys/arch/<ARCH>/<ARCH>/conf.c as an array defined statically and maintain by our own hands. Each entries are filled out by device interfaces (i.e. open, close, etc...) using macros. That many macros are defined for convenience in sys/conf.h and machine/conf.h.

Whether their entries are active is determined in compile time by NXXX which is generated by config(8). In addition, many functions which just return EXXX value defined in errno.h as an error are in many device drivers.

I have one basic questions.

What do you think that many macros for each devices are defined in sys/conf.h or machine/conf.h? In addition, in order to use that macros compels us to define many new functions same as nullop/enodev and kludge aliases by #define. In order to just maintain sys/arch/<ARCH>/<ARCH>/conf.c by hands, what that kludge hacks/macros are used is a Bad Thing, right?

3. Summary

3.1. Ideas

The framework needs to support both static and dynamic assignment of the device majors. To realize the latter, the initial bdevsw/cdevsw tables are generated automatically by config(8).

For kernel:
In the current implementation, each bdevsw/cdevsw entry (a set of device interface functions such as open, close, etc.) is embedded in the device switch definition (conf.c). Instead, it is modified to be distributed in the corresponding device driver (backend) source as a constant data. Because of this, when you write a new device driver, all you have to do to the machine-dependent part is to add one line to the ``majors'' file of each port (see below) and config files.

The interface functions are called from not global but local in their source. They should be always called via the device switch. It's a bad idea to call them directly from outside of the driver. To get the device switch entry corresponding to a specific device, devsw_lookup(9) function is introduced. Similarly, newly added devsw_lookup_major(9) can be used to get the major number of a specific device.

For config(8):
In order to support this feature, a new grammer ``device-major'' is added to ``files''. All ``device-major;' is put within the new machine-dependent file ``majors.<ARCH>'' under sys/arch/<ARCH>/conf, which is included from ``files.<ARCH>''. This is the only file which contains the device number definitions.

To support the dynamic assignment of the device major, devsw_attach(9) and devsw_detach(9) are added; these can be used to attach/detach the device switch data dynamically instead of memcpy(9). These functions are useful for LKM framework and used only from LKM.

These features provide greater flexibility and make less pain to maintain device majors and remove many macros and many functions which just return an error, cdev_*_init, bdev_*_init and so on. We can get simple port-dependent conf.c and sys/conf.h and machine/conf.h.

IMPORTANT: I DON'T merge bdevsw/cdevsw into a single structure. I have tried to merge them at first, but that have big impacts/afftects seriously. So at this time, I decided NOT to merge them. If necessary, we should discuss about this in OTHER thread. Even if conclude to merge them into a single structure in that discussion, my proposal affects nothing.

3.2. Examples

3.2.1. Kernel

Before:

(foo.c)

foo_open(...) {
        ...
}

(bar.c)

extern foo_open();

bar() {
        ...
        foo_open(...)
        ...
}

After:

(foo.c)

const struct cdevsw foo_cdevsw = {
        foo_open, ...
};

foo_open(...) {
        ...
}

(bar.c)

bar() {
        const struct cdevsw *cdev;

        ...
        cdev = cdevsw_lookup(<dev_t>);
        (*cdev->d_open)(...)
                ...
}

If not available major numbers,

extern const struct cdevsw foo_cdevsw;

bar() {
        ...
        (*foo_cdevsw.d_open)(...)
        ...
}

3.2.2. Userland - config(8)

The ``fd'' driver have device interfaces as block/character devices. If ``fd'' is defined in your kernel configuration file, config(8) generates below:

(devsw.c)

extern const struct bdevsw fd_bdevsw;
extern const struct cdevsw fd_cdevsw;

const struct bdevsw *bdevsw0[] = {
        ...
        &fd_bdevsw,
        ...
};

const struct cdevsw *cdevsw0[] = {
        ...
        &fd_cdevsw,
        ...
};

const struct bdevsw **bdevsw = bdevsw0;
const struct cdevsw **cdevsw = cdevsw0;

If not, each entries are filled out by NULL.

Here, fd_cdevsw/fd_bdevsw must be provided by the fd driver. So, we need to add the definision of these data to ``fd'' driver source. Similarly, any other devices have to provide their own device switches.

The block device switch structure variable must be named foo_bdevsw by appending the letters ``_bdevsw'' to the driver's base name. The character device switch structure variable must be named foo_cdevsw by appending the letters ``_cdevsw'' to the driver's base name. This convention is mandated by the autogeneration framework.

4. Synopsis

4.1. Userland - config(8)

4.1.1. Grammer

device-major <name> char <num> [block <num>] [<rule>]

name - The prefix of bdevsw/cdevsw entry (required)
char - A character major number (required)
block - A block major number (optional)
rule - Conditions to determine whether must be attached (optional)

4.1.2. Structures and Variables

struct devm {
        struct devm     *dm_next;       /* linked list */
        const char      *dm_srcfile;    /* the name of the "majors" file */
        u_short         dm_srcline;     /* the line number */
        const char      *dm_name;       /* [bc]devsw name */
        int             dm_cmajor;      /* character major */
        int             dm_bmajor;      /* block major */
        struct nvlist   *dm_opts;       /* rule */
};

struct devm *alldevms; /* list of all device-major */
struct devm **nextdevm; /* to construct a linked list */

struct hashtab *alldevmtab; /* all devm lookup */
struct hashtab *cdevmtab; /* character devm lookup */
struct hashtab *bdevmtab; /* block devm lookup */

int maxcdevm; /* max number of character major */
int maxbdevm; /* max number of block major */

These are only used in config(8). NOT EXPORTED TO ANYWHERE.

4.2. Functions

4.2.1. Kernel

const struct bdevsw *bdevsw_lookup(dev_t dev);
const struct cdevsw *cdevsw_lookup(dev_t dev);
int bdevsw_lookup_major(const struct bdevsw *devsw);
int cdevsw_lookup_major(const struct cdevsw *devsw);
dev_t devsw_name2blk(const char *name, char *devname, size_t devnamlen);
const char *devsw_blk2name(int);
dev_t devsw_chr2blk(dev_t chrdev);
dev_t devsw_blk2chr(dev_t blkdev);

int devsw_attach(const char *devname, const struct bdevsw *bdev, int *bmajor, const struct cdevsw *cdev, int *cmajor);
void devsw_detach(const struct bdevsw *bdev, const struct cdevsw *cdev);

4.2.2. Userland - config(8)

int adddevm(const char *name, int cmaj, int bmaj, struct nvlist *opts);
int mkdevsw(void);
int fixdevm(void);

5. Description

5.1. New Functionality

5.1.1. Kernel

const struct bdevsw *bdevsw_lookup(dev_t dev);
const struct cdevsw *cdevsw_lookup(dev_t dev);

Get a device switch associated with the dev_t ``dev''. In the internal of this function, get the major number from ``dev'' by using major(). Return the device switch on success. Otherwise, return NULL.

int bdevsw_lookup_major(const struct bdevsw *devsw);
int cdevsw_lookup_major(const struct cdevsw *devsw);

Get a device major number associated with the device switch ``devsw''. Return the device switch on success. Otherwise, return -1.

const char *devsw_blk2name(dev_t dev);

Convert from block dev_t to device name Return the pointer to device name string on succuess. Otherwise return NULL.

dev_t devsw_name2blk(const char *name, char *devname, size_t devnamelen);

Convert from device name to block dev_t. Return the non NODEV on success. Otherwise return NODEV. If ``devname'' is not NULL and success to convert, the device name without unit numbers and partition name is stored. ``devname'' is always null-terminated. If device name is longer than ``devnamelen'', its value is truncated with null-termination.

dev_t devsw_blk2chr(dev_t chrdev);
dev_t devsw_chr2blk(dev_t blkdev);

Convert from block dev_t to character dev_t and vice versa. Return the non NODEV on success. Otherwise return NODEV.

int devsw_attach(const char *devname, const struct bdevsw *bdev, int *bmajor, const struct cdevsw *cdev, int *cmajor);

Attach a block device switch ``bdev'' associated with the block major number ``bmajor'' and a character device switch ``cdev'' associated with the character major number ``cmajor''. If ``bmajor'' or ``cmajor'' is -1, assign a major number dynamically. Return 0 on success or an error value.

void devsw_detach(const struct bdevsw *bdev, const struct cdevsw *cdev);

Detach a block device switch ``bdev'' and a character device switch ``cdev''.

5.1.2. Userland - config(8)

These functions are used in config(8) ONLY.

int adddevm(const char *name, int cmaj, int bmaj, struct nvlist *opts);

Make a list entry of ``alldevms'' and a lookup table ``alldevmtab'' which is associated with the name ``name'' and character major number ``cmaj'' and block major number ``bmaj'' and the rule ``opts''. The rule is used to determine whether this device switch must be attached in fixdevm().

int mkdevsw(FILE *fp);

Generate initial bdevsw/cdevsw tables, sys_bdevsws, max_bdevsws, sys_cdevsws, max_cdevsws, swapdev, zerodev and mem_no.

sys_bdevsws, max_bdevsws - # of bdevsw (i.e. initial bdevsw table size)
sys_cdevsws, max_cdevsws - # of cdevsw (i.e. initial cdevsw table size)
swapdev - a fake swap device
zerodev - dev_t value for /dev/zero
mem_no - a memory device character major number

int fixdevm(void);

Determine which device switch must be attached.

6. Compatibility

6.1 LKM

The LKM framework defines some types of kernel modules. LM_DEV is one of them. It is for loading device drivers.

This type LM_DEV also has a feature to attach device switches. LM_DT_BLOCK is defined for attaching block device switch and LM_DT_CHAR is defined for attaching character device switch.

These features seems to be good, but have one problem.

If the type of kernel module is LM_DEV, there is no way to attach both block/character device switches at once.

For example,

ipfilter(4) and coda(4) have ONLY a character device switch. The type of this kernel module is LM_DEV and its device switch is attached by using LM_DT_CHAR feature.

But,

iwm_fd(4) has BOTH block/character device switches. iwm_fd(4) cannot use neither LM_DT_BLOCK nor LM_DT_CHAR features of LM_DEV. So the type of this kernel module is LM_MISC and its device switches are attached by using memcpy() with hard-corded major numbers.

It's possible to write the LKM module with both device switches. But there is still one big problem. There is no way to update block <-> character conversion table. To update it, we MUST attach both at once.

As you know, if a device driver has a block device switch, it has also a character device switch. We MUST be able to attach them at once.

This is one of the essential issues of the LKM framework, isn't this?

In gehenna-devsw branch, this issue is solved, but I break the backward compatibility of LKM framework. The LKM framework is one of developing features of NetBSD. If you have an opinion or any other good solution, please let me know.

7. Implementation

The all patches are merged into gehenna-devsw branch of NetBSD tree. I have a plan to merge into main trunk at an early date.

8. Current Status

Almost work for ``gehenna-devsw'' is done. Now I do only to ``catch up with -current''. To make the merge easy, I call for reviewers.

Here is the TODO list.

  1. kernel build/run test

    Now we have build.sh in the tree, so I have tested almost ports. But I cannot test some ports by compiler bugs or no toolchains. And I don't have machines of all ports, so I don't know the kernel with my changes really work or not. At least, my i386 box works fine.

  2. check consistency and validity of ``majors.<arch>''

    The ``majors.<arch>'' is converted from the machine dependent conf.c. I may have a mistakes to do that. I don't hope so.


MAEKAWA Masahide

To index