• Kyle Siefring's avatar
    Optimize convolve8 SSSE3 and AVX2 intrinsics · ae35425a
    Kyle Siefring authored
    Changed the intrinsics to perform summation similiar to the way the assembly does.
    
    The new code diverges from the assembly by preferring unsaturated additions.
    
    Results for haswell
    
    SSSE3
    Horiz/Vert  Size  Speedup
    Horiz       x4    ~32%
    Horiz       x8    ~6%
    Vert        x8    ~4%
    
    AVX2
    Horiz/Vert  Size  Speedup
    Horiz       x16   ~16%
    Vert        x16   ~14%
    
    BUG=webm:1471
    
    Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
    ae35425a