• Manuel Pégourié-Gonnard's avatar
    aria: optimize byte perms on Arm · 377b2b62
    Manuel Pégourié-Gonnard authored
    Use specific instructions for moving bytes around in a word. This speeds
    things up, and as a side-effect, slightly lowers code size.
    
    ARIA_P3 and ARIA_P1 are now 1 single-cycle instruction each (those
    instructions are available in all architecture versions starting from v6-M).
    Note: ARIA_P3 was already translated to a single instruction by Clang 3.8 and
    armclang 6.5, but not arm-gcc 5.4 nor armcc 5.06.
    
    ARIA_P2 is already efficiently translated to the minimal number of
    instruction (1 in ARM mode, 2 in thumb mode) by all tested compilers
    
    Manually compiled and inspected generated code with the following compilers:
    arm-gcc 5.4, clang 3.8, armcc 5.06 (with and without --gnu), armclang 6.5.
    
    Size reduction (arm-none-eabi-gcc -march=armv6-m -mthumb -Os): 5288 -> 5044 B
    
    Effect on executing time of self-tests on a few boards:
    FRDM-K64F   (Cortex-M4):    444 ->  385 us (-13%)
    LPC1768     (Cortex-M3):    488 ->  432 us (-11%)
    FRDM-KL64Z  (Cortex-M0):   1429 -> 1134 us (-20%)
    
    Measured using a config.h with no cipher mode and the following program with
    aria.c and aria.h copy-pasted to the online compiler:
    
     #include "mbed.h"
     #include "aria.h"
    
    int main() {
        Timer t;
        t.start();
        int ret = mbedtls_aria_self_test(0);
        t.stop();
        printf("ret = %d; time = %d us\n", ret, t.read_us());
    }
    377b2b62