How to assemble ARM SVE instructions with GNU GAS or LLVM and run it on QEMU?

后端 未结 1 1850
孤独总比滥情好
孤独总比滥情好 2021-01-25 01:36

I want to play with the new ARM SVE instructions using open source tools.

As a start, I would like to assemble the minimal example present at: https://developer.arm.com/

1条回答
  •  清酒与你
    2021-01-25 02:25

    Automated example with an assertion

    • usage
    • source

    Below I described how that example was achieved.

    Assembly

    The aarch64-linux-gnu-as 2.30 in Ubuntu 18.04 is already new enough for SVE as can be seen from: https://sourceware.org/binutils/docs-2.30/as/AArch64-Extensions.html#AArch64-Extensions

    Otherwise, compiling Binutils from source is easy on Ubuntu 16.04, just do:

    git clone git://sourceware.org/git/binutils-gdb.git
    cd binutils-gdb
    # master that I tested with.
    git checkout 4de5434b694fc260d02610e8e7fec21b2923600a
    ./configure --target aarch64-elf --prefix "$(pwd)/ble"
    make -j `nproc`
    make install
    

    I didn't check out to a tag because the last tag is a few months old, and I don't feel like grepping log messages for when SVE was introduced ;-)

    Then use the compiled as and link with the packaged GCC on Ubuntu 16.04:

    ./binutils-gdb/ble/bin/aarch64-elf-as -c -march=armv8.5-a+sve \
        -o example1.o example1.S
    aarch64-linux-gnu-gcc -march=armv8.5-a -nostdlib -o example1 example1.o
    

    On Ubuntu 16.04, aarch64-linux-gnu-gcc 5.4 does not have -march=armv8.5-a, so just use -march=armv8-a and it should be fine. In any case, neither Ubuntu 16.04 nor 18.04 has -march=armv8-a+sve which will be the best option when it arrives.

    Alternatively, instead of passing -march=armv8.5-a+sve, you can also add the following to the start of the .S source code:

    .arch armv8.5-a+sve
    

    On Ubuntu 19.04 Binutils 2.32, I also learnt about and tested:

    aarch64-linux-gnu-as -march=all
    

    which also works for SVE, I think I'll be using more of that in the future, as it seems to just enable all features in one go, not just SVE!

    QEMU simulation

    The procedure to step debug it on QEMU is explained at: How to single step ARM assembly in GDB on QEMU?

    First I made the example into a minimal self contained Linux executable:

    .data
        x: .double        1.5,  2.5,  3.5,  4.5
        y: .double        5.0,  6.0,  7.0,  8.0
        y_expect: .double 8.0, 11.0, 14.0, 17.0
        a: .double        2.0
        n: .word          4
    
    .text
    .global _start
    _start:
        ldr x0, =x
        ldr x1, =y
        ldr x2, =a
        ldr x3, =n
        bl daxpy
    
        /* exit */
        mov x0, #0
        mov x8, #93
        svc #0
    
    
    /* Multiply by a scalar and add.
     *
     * Operation:
     *
     *      Y += a * X
     *
     * C signature:
     *
     *      void daxpy(double *x, double *y, double *a, int *n)
     *
     * The name "daxpy" comes from LAPACK:
     * http://www.netlib.org/lapack/explore-html/de/da4/group__double__blas__level1_ga8f99d6a644d3396aa32db472e0cfc91c.html
     *
     * Adapted from: https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf
     */
    daxpy:
        ldrsw x3, [x3]
        mov x4, #0
        whilelt p0.d, x4, x3
        ld1rd z0.d, p0/z, [x2]
    .loop:
        ld1d z1.d, p0/z, [x0, x4, lsl #3]
        ld1d z2.d, p0/z, [x1, x4, lsl #3]
        fmla z2.d, p0/m, z1.d, z0.d
        st1d z2.d, p0, [x1, x4, lsl #3]
        incd x4
        whilelt p0.d, x4, x3
        b.first .loop
        ret
    

    You can run it with:

    qemu-aarch64 -L /usr/aarch64-linux-gnu -E LD_BIND_NOW=1 ./example1
    

    then it exits nicely.

    Next, we can step debug to confirm that the sum was actually made:

    qemu-aarch64 -g 1234 -L /usr/aarch64-linux-gnu -E LD_BIND_NOW=1 ./example1
    

    and:

    ./binutils-gdb/ble/bin/aarch64-elf-gdb -ex 'file example1' \
      -ex 'target remote localhost:1234' -ex 'set sysroot /usr/aarch64-linux-gnu'
    

    Now, step up to right after bl daxpy, and run:

    >>> p (double[4])y_expect
    $1 = {[0] = 8, [1] = 11, [2] = 14, [3] = 17}
    >>> p (double[4])y
    $2 = {[0] = 8, [1] = 11, [2] = 14, [3] = 17}
    

    which confirms that the sum was actually done as expected.

    Observing SVE registers seems unimplemented as I can't find anything under: https://github.com/qemu/qemu/tree/v3.0.0/gdb-xml but it should not be too hard to implement by copying other FP registers? Asked at: http://lists.nongnu.org/archive/html/qemu-discuss/2018-10/msg00020.html

    You can currently already observe it partially and indirectly by doing:

    i r d0 d1 d2
    

    because the first entry of SVE register zX is shared with the older vX FP registers, but we can't see p at all.

    0 讨论(0)
提交回复
热议问题