Plain Diff
Merge "Speed up iht8x8 by rearranging instructions. Speed improves from 282% to 302% faster based on assembly-perf."