LXC team releases LXCFS

Cgroups has sometimes been an area of focus and difficulty for the LXC team, especially to support unprivileged containers.

LXC uses namespace support in the kernel, PID namespace, network namespaces for normal containers and user namespaces for unprivileged containers. LXC also uses cgroups to control resource usage. And cgroup is not namespace aware.

This has been a bit of a perennial issue because it seems the kernel namespace and cgroup teams do not see eye to eye on things. Without cgroup support for namespaces, the LXC project has had to find workarounds. One of this is CGManager. CGManager is basically used to support cgroups for unprivileged containers.

Unprivileged containers by virtue of running as an unprivileged user cannot access cgroups and thus the CGmanager daemon was developed to act as a sort of a proxy between the container and kernel cgroups. We have covered unprivileged containers here.

Unprivileged containers can be run by non-root normal users. They also depend on a number of upstream features that are currently only available only on the latest versions of Ubuntu, and thus work well in Ubuntu for now, untill these upstream features become more widely available for other distros.

Back to normal LXC containers the best way to get container resource usage information currently is in the host's /sys/fs/cgroups/lxc/containername directory. The /proc folder in the container is a bind mount from the host's proc and thus not accurate. To address some of these issues and also issues arising from Systemd being the cgroup manager, the LXC team has recently announced and released LXCFS.

Lxcfs is a fuse filesystem mainly designed for use by LXC containers. It will be used by default in Ubuntu 15.04 to provide a virtualized view of some /proc files filtered access to the host’s cgroup filesystems. So now this /proc well be bind mounted to the container's /proc giving the cotnainer easier access to resoruce usage. LXCFS also allows users to run systemd privileged and unprivileged containers.

More from Serge Hallyn's blog post.

The proc files filtered by lxcfs are cpuinfo, meminfo, stat, and uptime. These are filtered using cgroup information to show only the cpus and memory which are available to the reading task. They can be seen on the host under /var/lib/lxcfs/proc, and containers by default will bind-mount the proc files over the container’s proc files. There have been several attempts to push this virtualization into /proc itself, but those have been rejected. The proposed alternative was to write a library which all userspace would use to get filtered /proc information. Unfortunately no such effort seems to be taking off, and if it took off now it wouldn’t help with legacy containers. In contrast, lxcfs works perfectly with 12.04 and 14.04 containers.

The cgroups are mounted per-host-mounted-hierarchy under /var/lib/lxcfs/cgroup/. When a container is started, each filtered hierarchy will be bind-mounted under /sys/fs/cgroup/* in the container. The container cannot see any information for ancestor cgroups, so for instance /var/lib/lxcfs/cgroup/freezer will contain only a directory called ‘lxc’ or ‘user.slice’.

Lxcfs was instrumental in allowing us to boot systemd containers, both privileged and unprivileged. It also, through its proc filtering, answers a frequent years-old request. We do hope that kernel support for cgroup namespaces will eventually allow us to drop the cgroup part of lxcfs. Since we’ll need to support LTS containers for some time, that will definitely require cgroup namespace support for non-unified hierarchies, but that’s not out of the realm of possibilities.

Lxcfs is packaged in ubuntu 15.04, the source is hosted at github.com/lxc/lxcfs, and news can be tracked at linuxcontainers.org/lxcfs
Recommended Posts

Leave a Comment


Register | Lost your password?