• Roland Scheidegger's avatar
    h264: new assembly version of get_cabac for x86_64 with PIC · 82c71913
    Roland Scheidegger authored
    
    
    This adds a hand-optimized assembly version for get_cabac much like the
    existing one, but it works if the table offsets are RIP-relative.
    Compared to the non-RIP-relative version this adds 2 lea instructions
    and it needs one extra register.
    There is a surprisingly large performance improvement over the c version (more
    so than the generated assembly seems to suggest) just in get_cabac, I measured
    roughly 40% faster for get_cabac on a K8. However, overall the difference is
    not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
    Hopefully it still compiles on x86 32bit...
    Now that only one table is used, there's some chance even darwin as compiles
    this (apparently the label arithmetic used previously doesn't work if it
    involves symbols defined in a different file, thanks to Ronald S. Bultje for
    helping me with this).
    
    Signed-off-by: default avatarMichael Niedermayer <michaelni@gmx.at>
    82c71913