Commit b5f7c2ef authored by Michael Hamburg's avatar Michael Hamburg

tidy up

parent f18cf359
April 23, 2015:
Removed the original Goldilocks code; Decaf now stands on its own.
This cuts the source code approximately in half, to a still-large
13.7k wc-lines. (Most of these lines are in the arch-specific
field implementations.)
Note that the decaf_crypto routines are not intended to set
standards. They should be secure, but they're intended more as
examples of how the core ECC library could be used.
The SHAKE stuff is also mostly an experiment, particularly the
STROBE protocol/mode stuff. This is all fine, because the ECC
library itself is the core, and doesn't require the SHAKE stuff.
(Except for the C++ header, which should probably also be factored
so that it doesn't need the SHAKE stuff.)
I've started work on making a Decaf BAT, but not done yet.
I haven't ripped out all old multi-field code, because I intend
to add support for other fields eventually. Maybe properly this
time, instead of with a million compile flags like the original.
March 23, 2015:
I've been fleshing out Decaf, and hopefully the API is somewhere
near final. I will probably move a few things around and add a
......
Ed448-Goldilocks
Ed448-Goldilocks, Decaf version.
This software is an experimental implementation of a new 448-bit elliptic
curve called Ed448-Goldilocks. The implementation itself is based on that of
an earlier, unnamed 252-bit curve which should probably be referred to as
Ed252-MontgomeryStation. See http://eprint.iacr.org/2012/309 for details of
that implementation.
curve called Ed448-Goldilocks, with "Decaf" cofactor removal.
The source files here are all by Mike Hamburg. Most of them are (c) 2014
Cryptography Research, Inc (a division of Rambus). The cRandom
implementation is the exception: these files are from the OpenConflict video
game protection system out of Stanford, and are (c) 2011 Stanford
University. All of these files are usable under the MIT license contained in
LICENSE.txt.
The source files here are all by Mike Hamburg. Most of them are (c)
2014-2015 Cryptography Research, Inc (a division of Rambus). All of these
files are usable under the MIT license contained in LICENSE.txt.
The Makefile is set for my 2013 MacBook Air. You can `make bench` to run
a completely arbitrary set of benchmarks and tests, or `make
build/goldilocks.so` to build a stripped-down version of the library. For
non-Haswell platforms, you need to replace -mavx2 -mbmi2 by an appropriate
vector declaration. For non-Mac platforms, you won't be able to build a
library with this Makefile. This is fine, because you shouldn't be using
this for much at this stage anyway.
a completely arbitrary set of benchmarks and tests, or `make lib` to build
a stripped-down version of the library. For non-Haswell platforms, you may
need to replace -mavx2 -mbmi2 by an appropriate vector declaration.
I've attempted to protect against timing attacks and invalid point attacks,
but as of yet no attempt to protect against power analysis. This is an early
revision, so I haven't done much analysis or correctness testing of
corner-cases.
The code in ec_point.c and ec_point.h was generated with the help of a tool
written in SAGE. The field code in p448.h doesn't reduce after add/sub, and
so it requires care to prevent overflow. The SAGE tool figures out where to
put reductions and adjustments to prevent overflow. It also formally
verifies that the formulas produce points on the curve. I'm planning to add
more features to it eventually. That tool is even more experimental than
this library, though, and so I won't be releasing it just yet.
but as of yet no attempt to protect against power analysis.
This software is incomplete, and lacks documentation. None of the APIs are
stable. The software is probably not secure. Please consult TODO.txt for
additional agenda items. Do not taunt happy fun ball.
yet stable, though they may be getting there. The software is probably not
secure. Please consult TODO.txt for additional agenda items. Do not taunt
happy fun ball.
Cheers,
-- Mike Hamburg
Important work items for Ed448-Goldilocks:
* Better architecture detection / factoring of arch-related headers.
[PROGRESS]
* Better factoring of high-level vs low-level library.
Important work items for Ed448-Goldilocks / decaf:
* Factor out hash, crandom from core library?
[DONE, except for C++ headers]
* Signed 32-bit NEON implementation to avoid bias/reduce after subtract
* Documentation: write high-level API docs, and internal docs to help
other implementors.
* Partial progress on Doxygenating the code.
......@@ -20,65 +14,27 @@ Important work items for Ed448-Goldilocks:
* Cleanup: rename everything consistently.
* namespace_op or op_namespace? namespace_op_type?
* We don't have to be super-careful with the namespacing, because
symbols will be scrubbed by exported.sym.
* Cleanup: hard-coded tables (probably?)
* This reduces the work required for goldilocks_init() at the expense
of library size.
* Makes error-handling and thread safety easier.
* Use the SAGE tool?
symbols will be scrubbed by visibility
* Cleanup: unify intrinsics code
* Word_t, mask_t, bigregister_t, etc.
* Generate asm intrinsics with a script?
* [DONE] Bugfix: make sure that init() and randomization are thread-safe.
* [DONE] Security: check on deserialization that points are < p.
* [NEEDS TESTING] Check also that they're nonzero or otherwise non-pathological?
* Testing:
* Corner-case testing
* More bulk random testing
* Negative testing.
* SAGE-(auto?)-generated test vectors
* Test the Barrett fields
* More testing. Testing, testing and testing.
* Test corner cases better.
* Safety: add static analysis attributes for compilers that support them
* Most functions now have warn on ignored return.
* [ MOSTLY DONE ]
* Safety:
* [DONE] Check for init() if it's still required once we've done the above
* Decide what to do about RNG failures
* abort
* return error and zeroize
* return error but continue if RNG is kind of mostly OK
* Flexibility: decide which API options are good.
* [DONE?] Eg, should functions take nbits and table sizes?
* [DONE] Remove hardcoded adjustments from comb control.
* These adjustments make the output wrong when it's not 450 bits.
* Other slow Barrett fields? Montgomery fields?
* Mid-level API
* Make it easier to work with untwisted Edwards objects.
* Probably use extended or projective, not extensible coordinates.
* Scalarmul with other cofactor modes.
* High-level API:
* SHA512 Elligator Edition? Maybe write a paper first.
* Elligator.
* Need to write Elligator inverse. Might not be Elligator-2S.
* FHMQV? Is this patented?
* What low-level APIs to expose?
* Edwards points with add, sub, scalarmul, =, ==, ser/deser?
* High-level API: [DONE]
* Portability: test and make clean with other compilers
* Using a fair amount of __attribute__ code.
......@@ -89,47 +45,14 @@ Important work items for Ed448-Goldilocks:
* I can't get a simple for-loop to autovectorize :-/
* SAGE tool?
* Portability: make the inner layers of the code 32-bit clean.
* Write new versions of the field code.
* [DONE] 28-bit limbs give less headroom for carries.
* [DONE] Now have a vectorless ARM version; need NEON.
* Improve speed of 32-bit field code.
* [DONE] Run through the SAGE tool to generate new bias & bound.
* [DONE] Portability: make the outer layers of the code 32-bit clean.
* [DONE] Performance/flexibility: decide which parameters should be hard-coded.
* Perhaps useful for comb precomputation.
* Performance: Improve SHA512.
* [DONE?] Improve portability.
* Improve speed.
* Except not, because this adds too much code size.
* Link OpenSSL if a fast SHA is desired.
* Protocol:
* Decide what things to stir into hashes for various functions.
* Performance: improve the Barrett field code.
* Support other primes?
* Capture prime shape into a struct instead of passing 3 params.
* [DONE] Make 32-bit clean.
* Automation:
* Improve the SAGE tool to cover more cases
* Real SSA classes to cover branching and looping
* Constant-time selection
* Intrinsics code
* Field code?
* SAGE tool is impossibly slow on 32-bit
* Currently stuck on Elligator after 19 hours.
* [FIXED] at least for now.
* Vector-mul-chains
* Negation "bubble pushing" optimization
* Performance: Improve SHAKE.
* Improve speed. (Maybe)
* Clear other TODO/FIXME/HACK/PERF items in the code
* [DONE?] Submit to SUPERCOP
* Submit Decaf to SUPERCOP
......@@ -4,7 +4,10 @@
* Copyright (c) 2015 Cryptography Research, Inc. \n
* Released under the MIT License. See LICENSE.txt for license information.
* @author Mike Hamburg
* @brief Decaf cyrpto routines.
* @brief Example Decaf cyrpto routines.
* @warning These are merely examples, though they ought to be secure. But real
* protocols will decide differently on magic numbers, formats, which items to
* hash, etc.
* @warning Experimental! The names, parameter orders etc are likely to change.
*/
......
......@@ -5,7 +5,7 @@
* Copyright (c) 2015 Cryptography Research, Inc. \n
* Released under the MIT License. See LICENSE.txt for license information.
* @author Mike Hamburg
* @brief Decaf cyrpto routines.
* @brief Example Decaf cyrpto routines.
*/
#include "decaf_crypto.h"
......
......@@ -1106,12 +1106,6 @@ static void gf_batch_invert (
/* const */ gf *in,
unsigned int n
) {
// if (n==0) {
// return;
// } else if (n==1) {
// field_inverse(out[0],in[0]);
// return;
// }
assert(n>1);
gf_cpy(out[1], in[0]);
......@@ -1254,7 +1248,7 @@ void decaf_448_precomputed_scalarmul (
for (k=0; k<t; k++) {
unsigned int bit = i + s*(k + j*t);
if (bit < SCALAR_WORDS * WBITS) {
if (bit < DECAF_448_SCALAR_BITS) {
tab |= (scalar1x->limb[bit/WBITS] >> (bit%WBITS) & 1) << k;
}
}
......
......@@ -24,45 +24,6 @@ typedef struct field_t field_a_t[1];
#define IF32(s)
#endif
/** @brief Bytes in a field element */
#define FIELD_BYTES (1+(FIELD_BITS-1)/8)
/** @brief Words in a field element */
#define FIELD_WORDS (1+(FIELD_BITS-1)/sizeof(word_t))
/* TODO: standardize notation */
/** @brief The number of words in the Goldilocks field. */
#define GOLDI_FIELD_WORDS DIV_CEIL(FIELD_BITS,WORD_BITS)
/** @brief The number of bits in the Goldilocks curve's cofactor (cofactor=4). */
#define COFACTOR_BITS 2
/** @brief The number of bits in a Goldilocks scalar. */
#define SCALAR_BITS (FIELD_BITS - COFACTOR_BITS)
/** @brief The number of bytes in a Goldilocks scalar. */
#define SCALAR_BYTES (1+(SCALAR_BITS)/8)
/** @brief The number of words in the Goldilocks field. */
#define SCALAR_WORDS WORDS_FOR_BITS(SCALAR_BITS)
/**
* @brief For GMP tests: little-endian representation of the field modulus.
*/
extern const uint8_t FIELD_MODULUS[FIELD_BYTES];
/**
* Copy one field element to another.
*/
static inline void
__attribute__((unused,always_inline))
field_copy (
field_a_restrict_t a,
const field_a_restrict_t b
) {
memcpy(a,b,sizeof(*a));
}
/**
* Returns 1/sqrt(+- x).
*
......@@ -76,38 +37,21 @@ field_isr (
field_a_t a,
const field_a_t x
);
/**
* Batch inverts out[i] = 1/in[i]
*
* If any input is zero, all the outputs will be zero.
*/
void
field_simultaneous_invert (
field_a_t *__restrict__ out,
const field_a_t *in,
unsigned int n
);
/**
* Returns 1/x.
*
* If x=0, returns 0.
*
* TODO: this is currently unused in Decaf, but I've left a decl
* for it because field_inverse is different (and simpler) than
* field_isqrt for 5-mod-8 fields.
*/
void
field_inverse (
field_a_t a,
const field_a_t x
);
/**
* Returns -1 if a==b, 0 otherwise.
*/
mask_t
field_eq (
const field_a_t a,
const field_a_t b
);
/**
* Square x, n times.
......@@ -135,53 +79,6 @@ field_sqrn (
}
}
static __inline__ mask_t
__attribute__((unused,always_inline))
field_high_bit (const field_a_t f) {
field_a_t red;
field_copy(red,f);
field_weak_reduce(red);
field_add_RAW(red,red,red);
field_strong_reduce(red);
return -(1&red->limb[0]);
}
static __inline__ mask_t
__attribute__((unused,always_inline))
field_make_nonzero (field_a_t f) {
mask_t z = field_is_zero(f);
field_addw( f, -z );
return z;
}
/* Multiply by signed curve constant */
static __inline__ void
field_mulw_scc (
field_a_restrict_t out,
const field_a_t a,
int64_t scc
) {
if (scc >= 0) {
field_mulw(out, a, scc);
} else {
field_mulw(out, a, -scc);
field_neg_RAW(out,out);
field_bias(out,2);
}
}
/* Multiply by signed curve constant and weak reduce if biased */
static __inline__ void
field_mulw_scc_wr (
field_a_restrict_t out,
const field_a_t a,
int64_t scc
) {
field_mulw_scc(out, a, scc);
if (scc < 0)
field_weak_reduce(out);
}
static __inline__ void
field_subx_RAW (
field_a_t d,
......@@ -214,40 +111,6 @@ field_add (
field_weak_reduce ( d );
}
static __inline__ void
field_subw (
field_a_t d,
word_t c
) {
field_subw_RAW ( d, c );
field_bias( d, 1 );
field_weak_reduce ( d );
}
static __inline__ void
field_neg (
field_a_t d,
const field_a_t a
) {
field_neg_RAW ( d, a );
field_bias( d, 2 );
field_weak_reduce ( d );
}
/**
* Negate a in place if doNegate.
*/
static inline void
__attribute__((unused,always_inline))
field_cond_neg (
field_a_t a,
mask_t doNegate
) {
field_a_t negated;
field_neg(negated, a);
constant_time_select(a, negated, a, sizeof(negated), doNegate);
}
/** Require the warning annotation on raw routines */
#define ANALYZE_THIS_ROUTINE_CAREFULLY const int ANNOTATE___ANALYZE_THIS_ROUTINE_CAREFULLY = 0;
#define MUST_BE_CAREFUL (void) ANNOTATE___ANALYZE_THIS_ROUTINE_CAREFULLY
......
......@@ -17,12 +17,6 @@ typedef struct p448_t {
extern "C" {
#endif
static __inline__ void
p448_set_ui (
p448_t *out,
uint64_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_add_RAW (
p448_t *out,
......@@ -37,24 +31,6 @@ p448_sub_RAW (
const p448_t *b
) __attribute__((unused,always_inline));
static __inline__ void
p448_neg_RAW (
p448_t *out,
const p448_t *a
) __attribute__((unused,always_inline));
static __inline__ void
p448_addw (
p448_t *a,
uint32_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_subw (
p448_t *a,
uint32_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_copy (
p448_t *out,
......@@ -70,11 +46,6 @@ void
p448_strong_reduce (
p448_t *inout
);
mask_t
p448_is_zero (
const p448_t *in
);
static __inline__ void
p448_bias (
......@@ -116,19 +87,6 @@ p448_deserialize (
/* -------------- Inline functions begin here -------------- */
void
p448_set_ui (
p448_t *out,
uint64_t x
) {
int i;
out->limb[0] = x & ((1<<28)-1);
out->limb[1] = x>>28;
for (i=2; i<16; i++) {
out->limb[i] = 0;
}
}
void
p448_add_RAW (
p448_t *out,
......@@ -165,39 +123,6 @@ p448_sub_RAW (
*/
}
void
p448_neg_RAW (
p448_t *out,
const p448_t *a
) {
unsigned int i;
for (i=0; i<sizeof(*out)/sizeof(uint32xn_t); i++) {
((uint32xn_t*)out)[i] = -((const uint32xn_t*)a)[i];
}
/*
unsigned int i;
for (i=0; i<sizeof(*out)/sizeof(out->limb[0]); i++) {
out->limb[i] = -a->limb[i];
}
*/
}
void
p448_addw (
p448_t *a,
uint32_t x
) {
a->limb[0] += x;
}
void
p448_subw (
p448_t *a,
uint32_t x
) {
a->limb[0] -= x;
}
void
p448_copy (
p448_t *out,
......
......@@ -17,12 +17,6 @@ typedef struct p448_t {
extern "C" {
#endif
static __inline__ void
p448_set_ui (
p448_t *out,
uint64_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_add_RAW (
p448_t *out,
......@@ -37,24 +31,6 @@ p448_sub_RAW (
const p448_t *b
) __attribute__((unused,always_inline));
static __inline__ void
p448_neg_RAW (
p448_t *out,
const p448_t *a
) __attribute__((unused,always_inline));
static __inline__ void
p448_addw (
p448_t *a,
uint32_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_subw (
p448_t *a,
uint32_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_copy (
p448_t *out,
......@@ -70,11 +46,6 @@ void
p448_strong_reduce (
p448_t *inout
);
mask_t
p448_is_zero (
const p448_t *in
);
static __inline__ void
p448_bias (
......@@ -116,19 +87,6 @@ p448_deserialize (
/* -------------- Inline functions begin here -------------- */
void
p448_set_ui (
p448_t *out,
uint64_t x
) {
int i;
out->limb[0] = x & ((1<<28)-1);
out->limb[1] = x>>28;
for (i=2; i<16; i++) {
out->limb[i] = 0;
}
}
void
p448_add_RAW (
p448_t *out,
......@@ -165,39 +123,6 @@ p448_sub_RAW (
*/
}
void
p448_neg_RAW (
p448_t *out,
const p448_t *a
) {
unsigned int i;
for (i=0; i<sizeof(*out)/sizeof(uint32xn_t); i++) {
((uint32xn_t*)out)[i] = -((const uint32xn_t*)a)[i];
}
/*
unsigned int i;
for (i=0; i<sizeof(*out)/sizeof(out->limb[0]); i++) {
out->limb[i] = -a->limb[i];
}
*/
}
void
p448_addw (
p448_t *a,
uint32_t x
) {
a->limb[0] += x;
}
void
p448_subw (
p448_t *a,
uint32_t x
) {
a->limb[0] -= x;
}
void
p448_copy (
p448_t *out,
......
......@@ -27,12 +27,6 @@ typedef struct p448_t {
extern "C" {
#endif
static __inline__ void
p448_set_ui (
p448_t *out,
uint64_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_add_RAW (
p448_t *out,
......@@ -47,24 +41,6 @@ p448_sub_RAW (
const p448_t *b
) __attribute__((unused,always_inline));
static __inline__ void
p448_neg_RAW (
p448_t *out,
const p448_t *a
) __attribute__((unused,always_inline));
static __inline__ void
p448_addw (
p448_t *a,
uint32_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_subw (
p448_t *a,
uint32_t x
) __attribute__((unused,always_inline));
static __inline__ void
p448_copy (
p448_t *out,
......@@ -80,11 +56,6 @@ void
p448_strong_reduce (
p448_t *inout
);
mask_t
p448_is_zero (
const p448_t *in
);
static __inline__ void
p448_bias (
......@@ -169,39 +140,6 @@ p448_sub_RAW (
*/
}
void
p448_neg_RAW (
p448_t *out,
const p448_t *a
) {
unsigned int i;
for (i=0; i<sizeof(*out)/sizeof(uint32xn_t); i++) {
((uint32xn_t*)out)[i] = -((const uint32xn_t*)a)[i];
}
/*
unsigned int i;
for (i=0; i<sizeof(*out)/sizeof(out->limb[0]); i++) {
out->limb[i] = -a->limb[i];
}
*/
}
void
p448_addw (
p448_t *a,
uint32_t x
) {
a->limb[0] += x;
}
void
p448_subw (
p448_t *a,
uint32_t x
) {
a->limb[0] -= x;
}
void
p448_copy (
p448_t *out,
......
......@@ -18,12 +18,6 @@ typedef struct p448_t {
extern "C" {
#endif
static __inline__ void
p448_set_ui (
p448_t *out,
uint64_t x
) __attribute__((unused));
static __inline__ void
p448_add_RAW (
p448_t *out,
......@@ -38,24 +32,6 @@ p448_sub_RAW (
const p448_t *b
) __attribute__((unused));