• Geza Lore's avatar
    Eliminate copying for FLIPADST in fwd transforms. · 01bb4a31
    Geza Lore authored
    This patch eliminates the copying of data when using FLIPADST forward
    transforms, by incorporating the necessary data flipping into the
    load_buffer_* functions of the SSE2 optimized forward transforms. The
    load_buffer_* functions are normally inlined, so the overhead of copying
    the data is removed and the overhead of flipping is minimized. Left to
    right flipping is still not free, as the columns need to be shuffled in
    registers.
    
    To preserve identity between the C and SSE2 implementations, the
    appropriate C implementations now also do the data flipping as part of
    the transform, rather than relying on the caller for flipping the input.
    
    Overall speedup is about 1.5-2% in encode on my tests. Note that these
    are only the forward transforms. Inverse transforms to come in a later
    patch.
    
    There are also a few code hygiene changes:
    - Fixed some indents of switch statements.
    - DCT_DCT transform now always use vp10_fht* functions, which dispatch
      to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct*
      directly, some of them used to call vp10_fht*).
    
    Change-Id: I93439257dc5cd104ac6129cfed45af142fb64574
    01bb4a31