      Motivation is similar to NO_UDBL_DIVISION.
      The alternative implementation of 64-bit mult is straightforward and aims at
      obvious correctness. Also, visual examination of the generate assembly show
      that it's quite efficient with clang, armcc5 and arm-clang. However current
      GCC generates fairly inefficient code for it.
      I tried to rework the code in order to make GCC generate more efficient code.
      Unfortunately the only way to do that is to get rid of 64-bit add and handle
      the carry manually, but this causes other compilers to generate less efficient
      code with branches, which is not acceptable from a side-channel point of view.
      So let's keep the obvious code that works for most compilers and hope future
      versions of GCC learn to manage registers in a sensible way in that context.
      See https://bugs.launchpad.net/gcc-arm-embedded/+bug/1775263
