Re reduce binary size for embedded devices added by robert n 4 months ago


If you don't want to take the risk of replacing all applets by newer versions, you can employ this trick:. Download most recent release, configure it with "make allnoconfig", then use "make menuconfig" to switch on just the applet you want to test and maybe a couple of tuning options.

Then, on target system, delete the old applet symlink that points to your old Busybox, and replace it with the new Busybox binary, renamed to applet's name. Now you can test the new applet and post a more useful email to the mailing list, either "I see such and such bug even in latest release" or "I see such and such bug in release X.

Z, but it seems to be fixed in last release". Deleting the old symlink still leaves the old functionality in your existing old Busybox binary, you just wouldn't be using it anymore. If things will get even worse with new version, you can always restore the symlink. The volunteers are happy to fix any bugs you point out in the current versions because doing so helps everybody and makes the project better. We want to make the current version work for you. But diagnosing, debugging, and backporting fixes to old versions isn't something we do for free, because it doesn't help anybody but you.

The cost of volunteer tech support is using a reasonably current version of the project. If you don't want to upgrade, you have the complete source code and thus the ability to fix it yourself, or hire a consultant to do it for you.

If you got your version from a vendor who still supports the older version, they can help you. But there are limits as to what the volunteers will feel obliged to do for you. As a rule of thumb, volunteers will generally answer polite questions about a given version for about three years after its release before it's so old we don't remember the answer off the top of our head.

It's also hard for us to fix a problem of yours if we can't reproduce it because we don't have any systems running an environment that old. A consultant will happily set up a special environment just to reproduce your problem, and you can always ask on the list if any of the developers have consulting rates. Init is the first program that runs, so it might be that no programs are working on your new system because of a problem with your cross-compiler, kernel, console settings, shared libraries, root filesystem To rule all that out, first build a statically linked version of the following "hello world" program with your cross compiler toolchain:.

Did you see the hello world message? Until you do, don't bother messing with Busybox init. Once you've got it working statically linked, try getting it to work dynamically linked. A good thing to do would be to explain in details how this bug can be reproduced under emulation such as qemu by people who have only typical x86 machine at their disposal.

This makes it possible for other people to independently verify that bug indeed exists, and work on fixes for it. Example of a recipe to reproduce a bug under qemu: Download and unpack armqarm-none-linux-gnueabi-ipc-linux-gnu. Build dynamic busybox binary using this cross-compiler: Start this kernel with this file system image under qemu: Job control will be turned off since your shell can not obtain a controlling terminal.

You should run your shell on a normal tty such as tty1 or ttyS0 and everything will work. Note that above example talks about interactive shell with PID 1. Thus, it painstakingly uses "exec If you have "sh: You will need to drop it: I recommend you instead run your shell on a real console. In some common cases like "make allnoconfig", both. If you have a build script which modifies.

If filesystem time granularity is low for example, 1 second , these mtimes may end up being equal, and dependent files wouldn't be rebuilt.

The work around is to add "sleep 1" before sed. Busybox has a feature called the "standalone shell" , where the Busybox shell runs any built-in applets before checking the command path. This feature is not enabled by "make defconfig". For example, in linux Since standalone shell option is not the default, it is less thoroughly tested.

If you think you see a bug in standalone shell's behavior, first verify that the bug is indeed caused by this option: If shell's behavior has changed, report the bug to the Busybox mailing list.

Busybox has nothing to do with the timezone. Please consult your libc documentation. Busybox aims to be the smallest and simplest correct implementation of the standard Linux command line tools. First and foremost, this means the smallest executable size we can manage.

We also want to have the simplest and cleanest implementation we can manage, be standards compliant , minimize run-time memory usage heap and stack , run fast, and take over the world. Busybox is like a swiss army knife: The Busybox executable can act like many different programs depending on the name used to invoke it. Normal practice is to create a bunch of symlinks pointing to the Busybox binary, each of which triggers a different Busybox function.

See getting started in the FAQ for more information on usage, and the Busybox documentation for a list of symlink names and what they do. The "one binary to rule them all" approach is primarily for size reasons: This way Busybox only has one set of ELF headers, it can easily share code between different apps even when statically linked, it has better packing efficiency by avoding gaps between files or compression dictionary resets, and so on. Work is underway on new options such as "make standalone" to build separate binaries for each applet, and a "libbb.

Neither is ready yet at the time of this writing. The individual applet takes it from there. This is why calling Busybox under a different name triggers different functionality: The applet subdirectories archival, console-tools, coreutils, debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils, modutils, networking, procps, shell, sysklogd, and util-linux correspond to the configuration sub-menus in menuconfig. Each subdirectory contains the code to implement the applets in that sub-menu, as well as a Config.

During the build this help text is also used to generate the Busybox documentation in html, txt, and man page formats in the docs directory. See adding an applet to Busybox for more information. Most non-setup code shared between Busybox applets lives in the libbb directory. It's a mess that evolved over the years without much auditing or cleanup.

For anybody looking for a great project to break into Busybox development with, documenting libbb would be both incredibly useful and good experience. To conserve bytes it's good to know where they're being used, and the size of the final executable isn't always a reliable indicator of the size of the components since various structures are rounded up, so a small change may not even be visible by itself, but many small savings add up. To use it, first build a base version with "make baseline".

Then build the new version with your changes and run "make bloatcheck" to see the size differences from the old version. The first line of output has totals: The remaining lines show each individual symbol, the old and new sizes, and the increase or decrease in size which results are sorted by.

This is the output from the "nm --size-sort" command see "man nm" for more information , and is the information bloat-o-meter parses to produce the comparison report above. For defconfig, this is a good way to find the largest symbols in the tree which is a good place to start when trying to shrink the code.

To take a closer look at individual applets, configure Busybox with just one applet run "make allnoconfig" and then switch on a single applet with menuconfig , and then use "make sizes" to see the size of that applet's components. The "showasm" command in the scripts directory produces an assembly dump of a function, providing a closer look at what changed. Note that paying attention isn't necessarily the same thing as following it.

SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor commonly used options like echo's '-e' and '-n', or sed's '-i'.

Busybox is driven by what real users actually need, not the fact the standard believes we should implement ed or sccs.

For size reasons, we're unlikely to include much internationalization support beyond UTF-8, and on top of all that, our configuration menu lets developers chop out features to produce smaller but very non-standard utilities. Also, Busybox is aimed primarily at Linux.

Unix standards are interesting because Linux tries to adhere to them, but portability to dozens of platforms is only interesting in terms of offering a restricted feature set that works everywhere, not growing dozens of platform-specific extensions. Busybox should be portable to all hardware platforms Linux supports, and any other similar operating systems that are easy to do and won't require much maintenance.

In practice, standards compliance tends to be a clean-up step once an applet is otherwise finished. When polishing and testing a Busybox applet, we ensure we have at least the option of full standards compliance, or else document where we intentionally fall short.

Busybox is a Linux project, but that doesn't mean we don't have to worry about portability. First of all, there are different hardware platforms, different C library implementations, different versions of the kernel and build toolchain To start with, Linux runs on dozens of hardware platforms. We try to test each release on x86, x, arm, power pc, and mips. Since qemu can handle all of these, this isn't that hard. This means we have to care about a number of portability issues like endianness, word size, and alignment, all of which belong in platform.

That header handles conditional includes and gives us macros we can use in the rest of our code. At some point in the future we might grow a platform. As long as the applets themselves don't have to care. On a related note, we made the "default signedness of char varies" problem go away by feeding the compiler -funsigned-char.

This gives us consistent behavior on all platforms, and defaults to 8-bit clean text processing which gets us halfway to UTF-8 support.

NOMMU support is less easily separated see the tips section later in this document , but we're working on it. Another type of portability is build environments: As for gcc, we take advantage of newer compiler optimizations to get the smallest possible size, but we also regression test against an older build environment using the Red Hat 9 image at http: This has a 2. If anyone takes an interest in older kernels you're welcome to submit patches, but the effort would probably be better spent trimming down the 2.

Older gcc versions than that are uninteresting since we now use c99 features, although tcc might be worth a look. We also test Busybox against the current release of uClibc. Older versions of uClibc aren't very interesting they were buggy, and uClibc wasn't really usable as a general-purpose C library before version 0.

Other unix implementations are mostly uninteresting, since Linux binaries have become the new standard for portable Unix programs. Specifically, the ubiquity of Linux was cited as the main reason the Intel Binary Compatability Standard 2 died, by the standards group organized to name a successor to ibcs2: That project disbanded in with the endorsement of an existing standard: Supporting these systems is largely a question of providing a clean subset of Busybox's functionality -- whichever applets can easily be made to work in that environment.

Annotating the configuration system to indicate which applets require which prerequisites such as procfs is also welcome. Other efforts to support these systems swapping include files to build in different environments, adding adapter code to platform. Support that can be cleanly hidden in platform.

Special-case code in the body of an applet is something we're trying to avoid. The "salt" is a bunch of ramdom characters generally 8 the encryption algorithm uses to perturb the password in a known and reproducible way such as by appending the random data to the unencrypted password, or combining them with exclusive or. Salt is randomly generated when setting a password, and then the same salt value is re-used when checking the password. Salt is thus stored unencrypted.

The advantage of using salt is that the same cleartext password encrypted with a different salt value produces a different encrypted value. If each encrypted password uses a different salt value, an attacker is forced to do the cryptographic math all over again for each password they want to check. Without salt, they could simply produce a big dictionary of commonly used passwords ahead of time, and look up each password in a stolen password file to see if it's a known value.

Even if there are billions of possible passwords in the dictionary, checking each one is just a binary search against a file only a few gigabytes long. With salt they can't even tell if two different users share the same password without guessing what that password is and decrypting it.

They also can't precompute the attack dictionary for a specific password until they know what the salt value is. On systems that haven't got a Memory Management Unit, fork is unreasonably expensive to implement and sometimes even impossible , so a less capable function called vfork is used instead. Busybox hides the difference between fork and vfork in libbb.

Making program to daemonize is trickier. Usually, it's done by forking, and exiting in parent , leaving child to continue to run. This is not possible with vfork, because with vfork, while child is running, parent does not return from vfork, and therefore it can't exit. This can be worked around by execing the same program with parameters set up so that it knows that it doesn't need to daemonize anymore after vfork in child.

This unblocks parent, which can then exit. Consult comments in libbb. Implementing fork depends on having a Memory Management Unit. With a MMU you can simply set up a second set of page tables and share the physical memory via copy-on-write. So a fork followed quickly by exec only copies a few pages of the parent's memory, just the ones it changes before freeing them. With a very primitive MMU using a base pointer plus length instead of page tables, which can provide virtual addresses and protect processes from each other, but no copy on write you can still implement fork.

But it's unreasonably expensive, because you have to copy all the parent process' memory into the new process which could easily be several megabytes per fork. And you have to do this even though that memory gets freed again as soon as the exec happens. This is not just slow and a waste of space but causes memory usage spikes that can easily cause the system to run out of memory. Without even a primitive MMU, you have no virtual addresses. Every process can reach out and touch any other process' memory, because all pointers are to physical addresses with no protection.

Even if you copy a process' memory to new physical addresses, all of its pointers point to the old objects in the old process. Searching through the new copy's memory for pointers and redirect them to the new locations is not an easy problem.

In theory, vfork is just a fork that writeably shares process memory rather than copying it so what one process writes the other one sees. In practice, vfork has to suspend the parent process until the child does exec, at which point the parent wakes up and resumes by returning from the call to vfork.

There's just no other way to make it work: In fact without suspending the parent there's no way to even store separate copies of the return value the pid from the vfork call itself: One way to understand vfork is this: It thus becomes obvious why the child should not return this would destroy data on stack needed by parent , or modify any memory variables it doesn't want the parent to see changed when it resumes. Note a common mistake: It means you can't have two processes sharing the same memory without stomping all over each other.

As soon as the child calls exec , the parent resumes. This avoids any atexit code that might confuse the parent. Another thing to keep in mind is that if vforked child allocates any memory and does not free it before exec or exit, parent will also have this memory allocated.

Unless child takes care to record the address of these memory areas and parent frees them, they will be leaked. This applies to "hidden" allocations as well, in particular, ones inside setenv. The prime example is the shell. Is this a real world consideration? Note that read should never return 0 unless it has hit the end of input, and an attempt to write 0 bytes should be ignored by the OS. The writer can experience short writes, which are especially dangerous because if you don't notice them you'll discard data.

They can also happen when a system is under load and a fast process is piping to a slower one. Such as an xterm waiting on x11 when the scheduler decides X is being a CPU hog with all that text console scrolling So will data always be read from the far end of a pipe at the same chunk sizes it was written in?

Don't rely on that. The downside of standard dynamic linking is that it results in self-modifying code. Although each executable's pages are mmaped into a process' address space from the executable file and are thus naturally shared between processes out of the page cache, the library loader ld-linux. This dirties the pages, triggering copy-on-write allocation of new memory for each processes' dirtied pages. One solution to this is Position Independent Code PIC , a way of linking a file so all the relocations are grouped together.

This dirties fewer pages often just a single page for each process' relocations. The down side is this results in larger executables, which take up more space on disk and a correspondingly larger space in memory. But when many copies of the same program are running, PIC dynamic linking trades a larger disk footprint for a smaller memory footprint, by sharing more pages.

A third solution is static linking. A statically linked program has no relocations, and thus the entire executable is shared between all running instances. This tends to have a significantly larger disk footprint, but on a system with only one or two executables, shared libraries aren't much of a win anyway.

Dynamic linking without fixed load addresses fundamentally requires at least one dirty page per dso that uses symbols.

Making calls but never taking the address explicitly to functions within the same dso does not require a dirty page by itself, but will with ELF unless you use -Bsymbolic or hidden symbols when linking.

ELF uses significant additional stack space for the kernel to pass all the ELF data structures to the newly created process image. These are located above the argument list and environment. This normally adds 1 dirty page to the process size. The ELF dynamic linker has its own data segment, adding one or more dirty pages. I believe it also performs relocations on itself. The ELF dynamic linker makes significant dynamic allocations to manage the global symbol table and the loaded dso's.

This data is never freed. It will be needed again if libdl is used, so unconditionally freeing it is not possible, but normal programs do not use libdl. Of course with glibc all programs use libdl due to nsswitch so the issue was never addressed. ELF also has the issue that segments are not page-aligned on disk.

This saves up to 4k on disk, but at the expense of using an additional dirty page in most cases, due to a large portion of the first data page being filled with a duplicate copy of the last text page.

The above is just a partial list of the tiny memory penalties of ELF dynamic linking, which eventually add up to quite a bit. The smallest I've been able to get a process down to is 8 dirty pages, and the above factors seem to mostly account for it but some were difficult to measure.

In a perfect world, applications shouldn't include these headers directly, but we don't live in a perfect world. What we actually do is check if we're building on 2. But we still need the version check, since 2.

The Busybox developers spent two years trying to figure out a clean way to do all this. The losetup in the util-linux package from kernel. The compiler flag "-Os" optimizes the binary for size instead of speed.

I'll open another thread for the segfault. Did you compile successfully? And Could you share the 4. I have a tuner with ci slot and may try to get it working Everything that may be exposed is simply not there or just a stub; I imagine that building for multiple languages is exposing this limitation in musl and may be the root of your problem.

Added by Robert N 4 months ago Hey guys, I'm using tvheadend 4. Hey Em, thanks for the tipps. Disabling all EPG and Transcoding connected modules, will save maximum space. I just tried to ldd the binary, but it doesn't seem to be linked against libiconv: