Replacing ld with gold - any experience?

后端 未结 8 819
你的背包
你的背包 2020-11-28 04:32

Has anyone tried to use gold instead of ld?

gold promises to be much faster than ld, so it may help speeding up test cycles fo

相关标签:
8条回答
  • 2020-11-28 05:04

    You could link ld to gold (in a local binary directory if you have ld installed to avoid overwriting):

    ln -s `which gold` ~/bin/ld
    

    or

    ln -s `which gold` /usr/local/bin/ld
    
    0 讨论(0)
  • 2020-11-28 05:04

    Minimal synthetic benchmark: LD vs gold vs LLVM LLD

    Outcome:

    • gold was about 3x to 4x faster for all values I've tried when using -Wl,--threads -Wl,--thread-count=$(nproc) to enable multithreading
    • LLD was about 2x faster than gold!

    Tested on:

    • Ubuntu 20.04, GCC 9.3.0, binutils 2.34, sudo apt install lld LLD 10
    • Lenovo ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB), Samsung MZVLB512HAJQ-000L7 SSD (3,000 MB/s).

    Simplified description of the benchmark parameters:

    • 1: number of object files providing symbols
    • 2: number of symbols per symbol provider object file
    • 3: number of object files using all provided symbols symbols

    Results for different benchmark parameters:

    10000 10 10
    nogold:  wall=4.35s user=3.45s system=0.88s 876820kB
    gold:    wall=1.35s user=1.72s system=0.46s 739760kB
    lld:     wall=0.73s user=1.20s system=0.24s 625208kB
    
    1000 100 10
    nogold:  wall=5.08s user=4.17s system=0.89s 924040kB
    gold:    wall=1.57s user=2.18s system=0.54s 922712kB
    lld:     wall=0.75s user=1.28s system=0.27s 664804kB
    
    100 1000 10
    nogold:  wall=5.53s user=4.53s system=0.95s 962440kB
    gold:    wall=1.65s user=2.39s system=0.61s 987148kB
    lld:     wall=0.75s user=1.30s system=0.25s 704820kB
    
    10000 10 100
    nogold:  wall=11.45s user=10.14s system=1.28s 1735224kB
    gold:    wall=4.88s user=8.21s system=0.95s 2180432kB
    lld:     wall=2.41s user=5.58s system=0.74s 2308672kB
    
    1000 100 100
    nogold:  wall=13.58s user=12.01s system=1.54s 1767832kB
    gold:    wall=5.17s user=8.55s system=1.05s 2333432kB
    lld:     wall=2.79s user=6.01s system=0.85s 2347664kB
    
    100 1000 100
    nogold:  wall=13.31s user=11.64s system=1.62s 1799664kB
    gold:    wall=5.22s user=8.62s system=1.03s 2393516kB
    lld:     wall=3.11s user=6.26s system=0.66s 2386392kB
    

    This is the script that generates all the objects for the link tests:

    generate-objects

    #!/usr/bin/env bash
    set -eu
    
    # CLI args.
    
    # Each of those files contains n_ints_per_file ints.
    n_int_files="${1:-10}"
    n_ints_per_file="${2:-10}"
    
    # Each function adds all ints from all files.
    # This leads to n_int_files x n_ints_per_file x n_funcs relocations.
    n_funcs="${3:-10}"
    
    # Do a debug build, since it is for debug builds that link time matters the most,
    # as the user will be recompiling often.
    cflags='-ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic'
    
    # Cleanup previous generated files objects.
    ./clean
    
    # Generate i_*.c, ints.h and int_sum.h
    rm -f ints.h
    echo 'return' > int_sum.h
    int_file_i=0
    while [ "$int_file_i" -lt "$n_int_files" ]; do
      int_i=0
      int_file="${int_file_i}.c"
      rm -f "$int_file"
      while [ "$int_i" -lt "$n_ints_per_file" ]; do
        echo "${int_file_i} ${int_i}"
        int_sym="i_${int_file_i}_${int_i}"
        echo "unsigned int ${int_sym} = ${int_file_i};" >> "$int_file"
        echo "extern unsigned int ${int_sym};" >> ints.h
        echo "${int_sym} +" >> int_sum.h
        int_i=$((int_i + 1))
      done
      int_file_i=$((int_file_i + 1))
    done
    echo '1;' >> int_sum.h
    
    # Generate funcs.h and main.c.
    rm -f funcs.h
    cat <<EOF >main.c
    #include "funcs.h"
    
    int main(void) {
    return
    EOF
    i=0
    while [ "$i" -lt "$n_funcs" ]; do
      func_sym="f_${i}"
      echo "${func_sym}() +" >> main.c
      echo "int ${func_sym}(void);" >> funcs.h
      cat <<EOF >"${func_sym}.c"
    #include "ints.h"
    
    int ${func_sym}(void) {
    #include "int_sum.h"
    }
    EOF
      i=$((i + 1))
    done
    cat <<EOF >>main.c
    1;
    }
    EOF
    
    # Generate *.o
    ls | grep -E '\.c$' | parallel --halt now,fail=1 -t --will-cite "gcc $cflags -c -o '{.}.o' '{}'"
    

    GitHub upstream.

    Note that the object file generation can be quite slow, since each C file can be quite large.

    Given an input of type:

    ./generate-objects [n_int_files [n_ints_per_file [n_funcs]]]
    

    it generates:

    main.c

    #include "funcs.h"
    
    int main(void) {
        return f_0() + f_1() + ... + f_<n_funcs>();
    }
    

    f_0.c, f_1.c, ..., f_<n_funcs>.c

    extern unsigned int i_0_0;
    extern unsigned int i_0_1;
    ...
    extern unsigned int i_1_0;
    extern unsigned int i_1_1;
    ...
    extern unsigned int i_<n_int_files>_<n_ints_per_file>;
    
    int f_0(void) {
        return
        i_0_0 +
        i_0_1 +
        ...
        i_1_0 +
        i_1_1 +
        ...
        i_<n_int_files>_<n_ints_per_file>
    }
    

    0.c, 1.c, ..., <n_int_files>.c

    unsigned int i_0_0 = 0;
    unsigned int i_0_1 = 0;
    ...
    unsigned int i_0_<n_ints_per_file> = 0;
    

    which leads to:

    n_int_files x n_ints_per_file x n_funcs
    

    relocations on the link.

    Then I compared:

    gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic               -o main *.o
    gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -fuse-ld=gold -Wl,--threads -Wl,--thread-count=`nproc` -o main *.o
    gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -fuse-ld=lld  -o main *.o
    

    Some limits I've been trying to mitigate when selecting the test parameters:

    • at 100k C files, both methods get failed mallocs occasionally
    • GCC cannot compile a function with 1M additions

    I have also observed a 2x in the debug build of gem5: https://gem5.googlesource.com/public/gem5/+/fafe4e80b76e93e3d0d05797904c19928587f5b5

    Similar question: https://unix.stackexchange.com/questions/545699/what-is-the-gold-linker

    Phoronix benchmarks

    Phoronix did some benchmarking in 2017 for some real world projects, but for the projects they examined, the gold gains were not so significant: https://www.phoronix.com/scan.php?page=article&item=lld4-linux-tests&num=2 (archive).

    Known incompatibilities

    • gold
      • https://sourceware.org/bugzilla/show_bug.cgi?id=23869 gold failed if I do a partial link with LD and then try the final link with gold. lld worked on the same test case.
      • https://github.com/cirosantilli/linux-kernel-module-cheat/issues/109 my debug symbols appeared broken in some places

    LLD benchmarks

    At https://lld.llvm.org/ they give build times for a few well known projects. with similar results to my synthetic benchmarks. Project/linker versions are not given unfortunately. In their results:

    • gold was about 3x/4x faster than LD
    • LLD was 3x/4x faster than gold, so a greater speedup than in my synthetic benchmark

    They comment:

    This is a link time comparison on a 2-socket 20-core 40-thread Xeon E5-2680 2.80 GHz machine with an SSD drive. We ran gold and lld with or without multi-threading support. To disable multi-threading, we added -no-threads to the command lines.

    and results look like:

    Program      | Size     | GNU ld  | gold -j1 | gold    | lld -j1 |    lld
    -------------|----------|---------|----------|---------|---------|-------
      ffmpeg dbg |   92 MiB |   1.72s |   1.16s  |   1.01s |   0.60s |  0.35s
      mysqld dbg |  154 MiB |   8.50s |   2.96s  |   2.68s |   1.06s |  0.68s
       clang dbg | 1.67 GiB | 104.03s |  34.18s  |  23.49s |  14.82s |  5.28s
    chromium dbg | 1.14 GiB | 209.05s |  64.70s  |  60.82s |  27.60s | 16.70s
    
    0 讨论(0)
  • 2020-11-28 05:11

    Some projects seem to be incompatible with gold, because of some incompatible differences between ld and gold. Example: OpenFOAM, see http://www.openfoam.org/mantisbt/view.php?id=685 .

    0 讨论(0)
  • 2020-11-28 05:13

    DragonFlyBSD switched over to gold as their default linker. So it seems to be ready for a variety of tools.
    More details: http://phoronix.com/scan.php?page=news_item&px=DragonFlyBSD-Gold-Linker

    0 讨论(0)
  • 2020-11-28 05:15

    As it took me a little while to find out how to selectively use gold (i.e. not system-wide using a symlink), I'll post the solution here. It's based on http://code.google.com/p/chromium/wiki/LinuxFasterBuilds#Linking_using_gold .

    1. Make a directory where you can put a gold glue script. I am using ~/bin/gold/.
    2. Put the following glue script there and name it ~/bin/gold/ld:

      #!/bin/bash
      gold "$@"
      

      Obviously, make it executable, chmod a+x ~/bin/gold/ld.

    3. Change your calls to gcc to gcc -B$HOME/bin/gold which makes gcc look in the given directory for helper programs like ld and thus uses the glue script instead of the system-default ld.

    0 讨论(0)
  • 2020-11-28 05:17

    At the moment it is compiling bigger projects on Ubuntu 10.04. Here you can install and integrate it easily with the binutils-gold package (if you remove that package, you get your old ld). Gcc will automatically use gold then.

    Some experiences:

    • gold doesn't search in /usr/local/lib
    • gold doesn't assume libs like pthread or rt, had to add them by hand
    • it is faster and needs less memory (the later is important on big C++ projects with a lot of boost etc.)

    What does not work: It cannot compile kernel stuff and therefore no kernel modules. Ubuntu does this automatically via DKMS if it updates proprietary drivers like fglrx. This fails with ld-gold (you have to remove gold, restart DKMS, reinstall ld-gold.

    0 讨论(0)
提交回复
热议问题