Add AVX2 autovectorized versions of premultiply
Following up on using GCC's autovectorizing for faster SSE4.1
premultiply, this patch adds specialized autovectorized versions
of premultiply for AVX2, giving another almost doubling in speed.
To make the speed up for AVX2 and also SSE4_1 available to non-GCC
compilers, the target-specific methods have been moved to separate
files.
Change-Id: I97ce05be67f4adeeb9a096eef80fd5fb662099f3
Reviewed-by:
Gunnar Sletta <gunnar@sletta.org>
Showing
Please register or sign in to comment