What?
Control Groups aka cgroups – see the docs .
Resource control and monitoring.
Some examples follow for throttling i/o speed(s) for a process control group (cgroup).
They’re used heavily in containers (like lxc or docker) to limit things like memory use, i/o requests, network traffic etc. They can also expose some useful stats (e.g. how much disk activity / cpu time / memory a container is using/has used).
Why?
In my case, I’m trying to share resources among multiple containers (websites) … and stop any one from gobbling up all resources and making the response times of the others suck.
How?
Setting up cgroup support
If you’re using systemd, everything’s weird and messy (aka, I know they’re different, but I’ve not looked into what’s going on).
I’m used to everything being lumped together – and as on these servers I’m not using systemd, /sys/fs/cgroup contains all the types/controllers together :
mount -t cgroup cgroup /sys/fs/cgroup
There are also a couple of useful cmdline parameters that are needed to properly turn on e.g. memory tracking – so you may need to fix your kernel command line (on booting, via grub) so /proc/cmdline contains :
cgroup_enable=memory swapaccount=1
Creating a cgroup
While you can create an arbitrary cgroup quite easily –
mkdir /sys/fs/cgroup/david
there’s normally no need – as things like docker or lxc will do this for you. At least for ‘lxc’ it creates them in :
/sys/fs/cgroup/lxc/<container-name>
But for the sake of demonstrating, once you’ve created your cgroup, adding a process to it is straightforward – e.g. pid 12345 –
echo 12345 >> /sys/fs/cgroup/david/tasks
Any child processes of pid 12345 will also be in the ‘david’ cgroup.
So, the /sys/fs/cgroup/david/tasks file lists the PIDs of all processes in that cgroup.
A process can belong to many cgroups. See: /proc/12345/cgroup which might look like :
1:cpuset,cpu,cpuacct,blkio,memory,devices,freezer,net_cls,perf_event,net_prio:/david
disk io / bps –
You can specify a bits-per-second limit for a specific device.
echo blk:id bps > /sys/fs/cgroup/whatever/blkio.throttle.read_bps_device
echo blk:id bps > /sys/fs/cgroup/whatever/blkio.throttle.write_bps_device
Example 1 . Setting a 10 MB/s limit on writing to the device 202:8 (See ‘lsblk’ to see what magic number you need for /path/to/filesystem)
echo "202:8 $(( 10 * 1024 * 1024 )) " > /sys/fs/cgroup/whatever/blkio.throttle.write_bps_device
To disable a limit, set it to zero – i.e.
echo "202:8 0" > /sys/fs/cgroup/lxc/container-name/blkio.throttle.write_bps_device
Notes:
- The limit only becomes apparent most of the time when you call sync/fsync – so e.g
dd if=/dev/zero of=/path/to/whatever.dd bs=4k count=10240 conv=fdatasync
- Or when you outgrow the kernel’s write buffer cache – see
/proc/sys/vm/dirty_background_bytes
- Or if you drop the kernel’s page cache –
echo 3 >/proc/sys/vm/drop_caches
As (obviously) a cached file doesn’t lead to a disk read, so the cgroup restraint isn’t used.
Disk Weighting
Additionally, you can assign an arbitrary weight to a container, and assuming you’re using the CFQ disk schedular, you’ll then be able to prioritise one container’s disk access over anothers – See :
echo 200 > /sys/fs/cgroup/lxc/container1/blkio.weight
From memory, the default is 500.
Disk Usage
See /sys/fs/cgroup/lxc/container1/blkio.io_service_bytes which lists something like :
202:112 Read 106496 202:112 Write 0 202:112 Sync 0 202:112 Async 106496 202:112 Total 106496 202:80 Read 92684288 202:80 Write 59547648 202:80 Sync 59547648 202:80 Async 92684288 202:80 Total 152231936
So you can report based on disk (e.g. 202:112 is the root file system and 202:80 is what’s mounted for the website’s document root). Use ‘lsblk’ to match up to partition/device.
memory limiting …
Useful knobs to meddle with :
- memory.limit_in_bytes – Limit the cgroup to however much RAM / memory
- For a 100Mb limit : echo $(( 100 * 1024 * 1024 )) > /sys/fs/cgroup/lxc/container1/memory.limit_in_bytes
- If memory.memsw.limit_in_bytes has a larger value than memory.limit_in_bytes then the container will then start swapping.
- memory.memsw.limit_in_bytes – Limit the cgroup to however much RAM + SWAP
- For a 100Mb limit: echo $(( 100 * 1024 * 1024 )) > /sys/fs/cgroup/lxc/container1/memory.memsw.limit_in_bytes
- memory.usage_in_bytes – How much RAM/memory has been used
- So to get the MB usage – echo $(( $(< /sys/fs/cgroup/lxc/container1/memory.usage_in_bytes ) / ( 1024 * 1024 ) ))
- memory.memsw.usage_in_bytes – How much RAM+SWAP has been used.
Memory Fail Counters
There’s also memory.failcnt and memory.memsw.failcnt which are incremented each time the container hits a memory limit you’ve assigned.
Note these numbers can jump quickly and may not necessarily indicate a problem – as e.g. reading/writing a large file may push the cgroup’s memory usage up (as it buffers it) causing it to hit the memory limit a few hundred thousand times in a 5 minute period.
You can reset the failcnt counters by writing 0 to them.