Update on category 1 security

1,184 views
Skip to first unread message

Perlner, Ray A. (Fed)

unread,
Aug 17, 2020, 1:41:56 PM8/17/20
to pqc-forum

Dear all,

 

We’ve gotten a lot of questions from submitters of lattice schemes, regarding what parameters we think meet security strength category 1. The short answer is that we aren’t sure yet – we think there are still scientific questions that need to be resolved. We're interested in having the best possible estimates of the security of each scheme; but we also realize that, to compare different schemes, we have to evaluate them using a common methodology. In round 3, we are aiming to find the right balance between these two goals. As we’ve asked submitters to be ready with tweaks by October 1, and we are aware that several of the teams are working towards better understanding the true costs of lattice reductions, we think it would be good to share our preliminary thoughts at this time.

 

At this point, we think the submitters are better able than us to precisely quantify the best known attacks along the various dimensions of number of bit operations (gates), total memory used, number of reads and writes, and memory access patterns (local vs nonlocal). (Note here that we expect classical attacks to be the deciding factor for whether any of the Round 3 candidates meets category 1.) One place where we feel we might be able to add a bit to the discussion is how to combine these numbers into a single cost estimate.

 

There have been various cost metrics in the literature for doing this. The most common metrics we’ve seen are

 

  1. The RAM model (with basic operations converted to equivalent gate count)
  2. Time * Area (Circuit Model)
  3. Time * Volume (Circuit Model)

 

These models have the virtue of being fairly simple, but based on fundamental physical considerations, there are various aspects of each that suggest they may overestimate or underestimate real world costs of some attacks.  An alternative approach, given that a category 1 classical attack is “only” something like 6 to 10 orders of magnitude more expensive than what a nation state adversary might be capable of, is to extrapolate from current technology.

 

We can start out by assigning a dollar cost to NIST’s estimate that brute force AES key search requires 2^143 gates. We’re pretty sure any attack on that scale would need to use custom hardware, like what’s used in bitcoin mining. By doing a little comparison shopping for bitcoin mining equipment, we estimated that the cost of 2^143 bit operations was about 2^64 US$. Most of this went towards paying for energy. The amount of energy involved is roughly what could be collected over the course of 500 years if the earth’s entire surface was tiled with solar panels.

 

For $2^64 / 500 years, we estimated that something like 2^95-2^100  bits of memory could be maintained. Most of this cost was the cost of periodically replacing hardware, although the cost of power, e.g. assuming the technology in question was hard drive, was sometimes within an order of magnitude.

 

Given that heat dissipation considerations and the scale of the attack suggest that computing units making random access queries would be distributed around the globe, we found the limiting factor on memory access would probably be long distance communication, e.g. via fiber optics. The cost was somewhat difficult to estimate, so we had a wide range of estimates regarding how much memory bandwidth could be afforded, but we think it’s fairly safe to say that it’s somewhere in the range of 2^105-2^120 bits worth of random access memory queries, with hardware costs exceeding energy costs by a couple of orders of magnitude. Sources of uncertainty included how much installation costs could be amortized away, how long various components last, and whether we should estimate based on market prices for commercial products (which typically transmit less than 100Gbps on a single fiber optic) or based on wildly guessing the cost, if mass produced, of experimentally demonstrated hardware that can transmit 10s of Tbps on a single core fiber optic cable or 100s of Tbps on a multicore fiber optic cable. Note, at the high end, local memory access speeds start mattering.

 

This left us fairly confident that, for borderline category 1 attacks where memory costs make up a significant portion of the cost of an attack, the RAM model would almost always underestimate the cost, the time*area  model would almost always overestimate the cost, and the time*volume may either be an underestimate or an overestimate, depending on the details. For reference, the RAM model suggests no limit on memory size or memory bandwidth below 2^143 bits for category 1, and the time* area model suggests a limit of 2^95 bits of accesses to a memory of size 2^95 bit or 2^105 accesses to a memory of size 2^76. The time*volume model suggests a limit of 2^107 accesses to a memory of size 2^107 or 2^110 accesses to a memory of size 2^100. Additionally, we expect both time*area and time*volume to overestimate the cost of memory access when the memory size per processing unit is large or the memory is infrequently accessed.

 

We can also consider the latency cost of random memory access on a global scale. Two random points on earth are about 10000 km apart. Sending a request and a reply 10000 km would take at least about 2^-4 seconds. This means that for random access memory, in 500 years we could compute to a RAM depth of up to 2^38. Let’s say we model  depth by treating RAM access as a cubic nearest neighbor circuit. Assuming a memory size of 2^99 cubic nearest neighbor model would model the depth of the request and response as having depth 2^34, yielding a total depth of 2^72.

 

Overall, we’re open to arguments that a given attack requiring fewer than 2^143 gates when costed in the RAM model might not contradict category 1 security if for example, the depth of the computation (assuming a 3-d nearest neighbor arrangement) exceeds 2^75, or the number of bits written or read from random access memory exceeds 2^120, or the total size of the memory exceeds 2^100. We would of course need to be convinced that the attack was well optimized given the above criteria, e.g. that the attack really requires random access to memory as opposed to a more local memory access pattern. We’re also not sure how these limits should be extrapolated to the higher security strength categories. A rough guess is that everything other than depth should scale linearly in the exponent, but we’re even less sure about that than everything we’ve previously said.

 

Attached are slides from a recent internal presentation where we discussed these issues. (Labeling has been added to slide 1, references have been added at the end, a commercial disclaimer has been added to slide 16, and a typo in an equation on slide 19 has been corrected. The slides are otherwise identical.)

 

We hope to be able to get a better understanding of the concrete analysis of lattice attacks, so we can determine how much the real world costs of these attacks are increased by memory considerations. Any comments on our thoughts thus far would be greatly appreciated.

 

Thanks,

 

NIST PQC team

 

cost models forum.pptx

daniel.apon

unread,
Aug 17, 2020, 2:19:07 PM8/17/20
to pqc-forum, Perlner, Ray A. (Fed), pqc-forum
Hi Ray,
Speaking for myself.

I still think there had to have been a good slide in there somewhere where you could legitimately add an image of a Death Star.
Just saying. =)

Best,
--Daniel

D. J. Bernstein

unread,
Aug 17, 2020, 5:26:19 PM8/17/20
to pqc-forum
Many (not all) numbers seem to be missing from the slides under Linux.
Can you please post a PDF?

> 2^105-2^120 bits worth of random access memory queries

To clarify, that's 2^120 bits from random spots around the globe fitting
the energy budget of 2^143 bit operations, so retrieving N bits from N
random spots costs <=2^23*N bit operations? (Let's assume big enough
chunks of bits that the routing data isn't the issue.)

Intel said in

https://www.openfabrics.org/images/eventpresos/workshops2015/DevWorkshop/Tuesday/tuesday_10.pdf

that a double-precision floating-point multiplication costs 6.4 pJ at
22nm (scaling "well with process and voltage", so actually <6.4 pJ with
a more modern process). That's around 2^14 bit operations, so 2^23*N bit
operations are around 512*6.4*N pJ (or less).

Intel also said that moving 64 bits straight through a wire costs 11.20
pJ per 5 mm ("more difficult to scale down"). Moving 18752*N bits
through the same 5mm wire thus costs 512*6.4*N pJ.

For comparison, a bit at some random spot on the globe is typically
around 10000000000 mm away. That's five orders of magnitude farther away
than 18752*5 mm, and it's at one of a huge number of different possible
positions. I understand that you have in mind fiber optics for the
long-distance communication, but fiber optics don't magically avoid
per-distance costs, plus there's a massive parallel routing problem at
every moment since we're talking about attacks bottlenecked by RAM
access. What's the global RAM architecture that you're referring to, and
how did you evaluate its costs?

For comparison, supercomputers on much smaller scales (tens of meters)
spend large fractions of their budgets on fiber-optic communication but
still don't have the bisection bandwidth necessary to constantly feed
new data to their processors. For example,

https://www.nextplatform.com/2018/06/26/peeling-the-covers-off-the-summit-supercomputer/

reports Summit's 200 petaflops dropping to 3 petaflops when there's lots
of communication: the bisection bandwidth is only 920 Tb/sec.

> the time*area model would almost always overestimate the cost

Sorry, can you please clarify the source of this claim?

---Dan
signature.asc

Perlner, Ray A. (Fed)

unread,
Aug 17, 2020, 6:05:16 PM8/17/20
to D. J. Bernstein, pqc-forum
Here's a pdf
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20200817212602.2154049.qmail%40cr.yp.to.
cost models forum.pdf

Kirk Fleming

unread,
Aug 18, 2020, 7:25:13 PM8/18/20
to ray.p...@nist.gov, pqc-forum
"Perlner, Ray A. (Fed)" wrote:
> We’ve gotten a lot of questions from submitters of lattice
> schemes, regarding what parameters we think meet
> security strength category 1. The short answer is that we
> aren’t sure yet ...
 
> We hope to be able to get a better understanding of the
> concrete analysis of lattice attacks, so we can determine
> how much the real world costs of these attacks are
> increased by memory considerations.
 
It's important to avoid thinking of this as a problem just for
lattices. Many attacks against non-lattice schemes also
require exponential memory. Information set decoding is
a clear example of this.
 
The LEDAcrypt team have released code which gives
non-asymptotic cost estimates for six classical and two
quantum information set decoding algorithms. The
classical estimates are expressed in bit operations and
include a logarithmic cost for memory. The code is
available from github.com/LEDAcrypt/LEDAtools. More
details can be found in [Baldi, Barenghi, Chiaraluce,
Pelosi and Santini. "A finite regime analysis of information
set decoding algorithms." Algorithms, 2019].
 
Modifying the LEDAcrypt code so that it includes no cost,
logarithmic cost, cube root cost, or square root cost for
memory gives the following classical estimates for BJMM
with the Classic McEliece round 2 parameter sets.
 
       Parameter Set         Free    Log     Cbrt     Sqrt
   mceliece-3488-064:  142.5  148.8  156.7  160.4
   mceliece-4608-096:  182.9  189.0  198.2  202.0
   mceliece-6688-128:  253.4  260.1  275.5  279.6
   mceliece-6960-119:  252.7  259.6  276.8  281.0
   mceliece-8192-128:  285.1  292.2  313.8  318.4
 
There are a few interesting things to point out here.
 
 - mceliece-4608-096 fails to meet the classical target for
   Category 3 using any of the memory cost models.
 
 - mceliece-6688-128 and mceliece-6960-119 both appear
   to meet the classical target for Category 5 when a cube
   root or square root memory cost is included but Stern
   needs less than 2^50 memory for these parameter sets.
   Assuming a logarithmic memory cost for Stern gives
   estimates that are marginally below Category 5 (see
   Tables 4 and 5 in the paper).
 
 - Of the five Classic McEliece parameter sets only two,
   mceliece-3488-064 and mceliece-8192-128, clearly meet
   their classical targets.
 
Kirk

D. J. Bernstein

unread,
Aug 18, 2020, 8:40:42 PM8/18/20
to pqc-...@list.nist.gov
Kirk Fleming writes:
> - Of the five Classic McEliece parameter sets only two,
> mceliece-3488-064 and mceliece-8192-128, clearly meet
> their classical targets.

That submission already says for (e.g.) mceliece6960119 that "Subsequent
ISD variants have reduced the number of bit operations considerably
below 2^256" but that "We expect that switching from a bit-operation
analysis to a cost analysis will show that this parameter set is more
expensive to break than AES-256 pre-quantum and much more expensive to
break than AES-256 post-quantum." The 281 that you mention matches this.

I agree with your broader point that the lack of clear, stable metrics
for NIST's category definitions isn't just a lattice issue. There's also
massive memory usage in state-of-the-art MQ attacks, SIKE attacks, the
BHT hash-collision algorithm, etc.

Why does the call for proposals say that SHA3-384 collisions require
2^210 gates, when BHT clearly does much better than this? A separate
NIST document cited my paper

https://cr.yp.to/papers.html#collisioncost

as a rationale for dismissing BHT, but that paper uses a different
metric (the 1981 Brent--Kung metric and a quantum version of it) to
account for communication costs. In other words, NIST already rejected
BHT on the basis of communication costs, but then mislabeled this as a
gates calculation, giving two contradictory messages regarding which
metrics NIST is using to define its categories.

---Dan (speaking for myself)
signature.asc

Perlner, Ray A. (Fed)

unread,
Aug 19, 2020, 10:53:51 AM8/19/20
to D. J. Bernstein, pqc-forum
Hi Dan.

You ask:

" Why does the call for proposals say that SHA3-384 collisions require
2^210 gates, when BHT clearly does much better than this?"

The CFP specifically estimates the cost of a SHA3-384 collision as 2^210 classical gates. BHT only has fewer gates than this if its cost is evaluated in the quantum RAM model. In the circuit model, which is used in almost all concrete quantum resource estimates in the literature, BHT gives no advantage over the van Oorschot-Wiener algorithm (vOW97: https://people.scs.carleton.ca/~paulv/papers/JoC97.pdf ) -- the classical algorithm that was used to arrive at our estimate of 2^210 classical gates. I would argue that there are strong theoretical reasons to suspect that the quantum RAM model is a much less realistic model of computation than the classical RAM model, since I know of no proposal for a physically plausible implementation of a fault tolerant quantum RAM that performs significantly better than would be suggested by the circuit model cost of simulating the RAM via the methods in Beals et al. 2012 ( https://arxiv.org/abs/1207.2307 .)

A better case could be made that the hybrid classical/quantum algorithm of C, N-P, S 2017 (https://eprint.iacr.org/2017/847 ) may offer some small advantage over vOW when quantum operations adhere to a circuit model and classical operations adhere to a RAM model. But, even here, the advantage only exists when a computational depth of greater than O(2^(n/4)) for an n bit hash function is achievable. For SHA3-384, this is 2^96, ignoring constant factors which probably make it significantly worse. And, if one assumes something like Bennett's Brownian computation (See http://www.pitt.edu/~jdnorton/lectures/Rotman_Summer_School_2013/thermo_computing_docs/Bennett_1982.pdf ), where the energy cost of a gate is inversely proportional to how long it takes to perform, then any advantage from C, N-P, S 2017 disappears. An argument of this form was applied to BHT in my own paper with Yi-Kai Liu https://arxiv.org/abs/1709.10510 and it extends straightforwardly to C, N-P, S 2017.

You also mention your own paper: https://cr.yp.to/papers.html#collisioncost . I don't see the main conclusion of that paper (that vOW is at least as good as BHT) as relying on assuming communication costs proportional to the square root of the width of the computation. E.g. figure 1.2 from your paper already shows the main conclusion while assuming that communication is free.

-- Ray

-----Original Message-----
From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> On Behalf Of D. J. Bernstein
Sent: Tuesday, August 18, 2020 8:40 PM
To: pqc-forum <pqc-...@list.nist.gov>
Subject: Re: [pqc-forum] Update on category 1 security

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20200819004024.2200029.qmail%40cr.yp.to.

D. J. Bernstein

unread,
Aug 19, 2020, 11:09:26 PM8/19/20
to pqc-...@list.nist.gov
'Perlner, Ray A. (Fed)' via pqc-forum writes:
> The CFP specifically estimates the cost of a SHA3-384 collision as
> 2^210 classical gates. BHT only has fewer gates than this if its cost
> is evaluated in the quantum RAM model.

That's a standard model in the literature on quantum algorithms. The
word "gates" doesn't exclude this model. See, e.g., Ambainis's famous
element-distinctness paper https://arxiv.org/pdf/quant-ph/0311001.pdf
using "random access gates" in the model of computation.

Excluding RAM gates would be an earthquake for pretty much the entire
literature building interesting quantum-walk algorithms, including
cryptanalytic algorithms such as 2018 Kirshanova. If NIST wants to
exclude RAM gates then it needs to make this clear.

In June 2020 you announced four principles for evaluating a metric.
Three of those principles say that it's bad to switch from the metric in
Ambainis's paper to a metric that excludes RAM gates:

* "The value of the proposed metric can be accurately measured (or at
least lower bounded) for all known attacks": Quantum algorithms,
notably quantum-walk papers, are constantly using RAM gates.
Forcing these gates to be replaced by sequences of (e.g.) T, H, and
CNOT complicates the analysis and optimization. How do you expect
people to come up with measurements, or useful lower bounds, for
these algorithms within the NISTPQC timeframe?

* "We can be reasonably confident that all known attacks have been
optimized with respect to the proposed metric": Again, quantum-walk
papers are _trying_ to optimize in a model that allows low-cost RAM
gates. There's little evidence of people trying to re-optimize
these algorithms for more restricted models of computation.

* "The proposed metric will more accurately reflect the real-world
feasibility of implementing attacks with future technology": This
is the only principle that favors discarding low-cost RAM gates.

* "The proposed metric will not replace these underestimates with
overestimates": This again opposes eliminating RAM gates. People
forced to produce estimates of RAM cost in a more restricted model
of quantum computation will plug in 2018 Babbush--Gidney--Berry--
Wiebe--McClean--Paler--Fowler--Neven, which gives an upper bound on
the number of T-gates used, with free H and CNOT, to simulate a RAM
lookup of an n-bit table entry x[i] given an m-bit index i. This
gives an overestimate in the context of typical higher-level
algorithms, since it's missing all the opportunities to share work
across multiple table lookups by, e.g., sorting.

Note also that counting CNOT as _zero_ cost, as in the 2018 paper,
raises questions that show up in some algorithmic contexts but that
most algorithm designers haven't considered: how much can we reduce
multiplications if we can do massive linear maps for free? Instead
counting CNOT as, say, 1% the cost of T makes me think that
Kronrod-type algorithms would beat the 2018 paper when m and n are
both large, but I haven't seen any papers quantifying this. The
general theme here is that, no, people have not been optimizing
most attack algorithms with respect to these metrics.

It's not clear what weights NIST is applying to these principles when
they are in conflict, as in this case. (I'm a big fan of accuracy, and
I'm not a big fan of encouraging fake confidence, so I'd be happy to see
NIST throwing away RAM gates, faster-than-light communication, magical
power transport into three-dimensional parallel databases, etc.)

It's astonishing that, four years into NISTPQC, we still don't have
clear, stable definitions of the metrics that NIST is using to specify
its security "categories". It doesn't look good for NIST to be applying
definitions selectively to _some_ attack algorithms and to _some_
submissions. Clear, stable definitions would let everybody go through
the exercise of evaluating, e.g., BHT in the announced metrics and
seeing whether BHT outperforms VoW in any of those metrics.

> In the circuit model, which is used in almost all concrete quantum
> resource estimates in the literature

"The" circuit model? There are multiple circuit models in the quantum
literature, just as in the non-quantum literature. Ambainis's paper uses
a circuit model. The details matter.

I agree that there are papers quantifying gates for Shor (vs., e.g.,
RSA) and Grover (vs., e.g., AES) and Kuperberg (vs., e.g., CSIDH) while
not using RAM gates. However, the fact that a paper isn't using some
feature of a model does _not_ imply that the paper is opposing the
model. These papers are meaningful in, e.g., Ambainis's model. RAM gates
can help for a few extra optimizations in some of these algorithms but
don't seem to have the sort of dramatic impact that one sees in BHT etc.

The 2018 paper mentioned above is an exception---the paper would be
silly in a model that assigns low cost to RAM gates---but this is very
far from "almost all" papers on "concrete quantum resource estimates".

The bigger picture is that algorithm designers naturally gravitate
towards models that make their algorithms simple, such as RAM models.
This isn't the same as making the definitions simple, and it isn't the
same as reality. Single-tape Turing machines are simpler to define than
RAM models but harder to program. VLSI models are more realistic than
RAM models but harder to program. There's a subcommunity digging into
bit-level optimizations, layout optimizations, etc., but there's also a
strong argument for optimizing algorithms at the easy level first.

> I would argue that there are strong theoretical reasons to suspect
> that the quantum RAM model is a much less realistic model of
> computation than the classical RAM model

What about putting a laser into a superposition of angles, firing it
across the room at a wall of data (encoded as filters and mirrors), and
seeing what comes back?

Of course reality is more complicated: e.g., real lasers spread. The
communication costs end up being Omega(distance), just like every other
known communication mechanism. But I haven't seen any papers claiming to
prove that low-cost quantum RAM is contrary to the laws of physics.

Meanwhile you seem to have been saying that, since the laws of physics
aren't known to prevent data from moving without energy to a specified
location any distance away, we should allow this operation for free in
our algorithms. What makes this plausible and quantum RAM implausible?

When I look at a supposedly physical model of computation, I ask where
the data is stored and how errors are corrected. 2-D quantum circuits
have had clear answers to this for many years: if we can place atoms
close enough to a specified configuration then, according to everything
we know about physics, they'll run a quantum computation. Other models
that say "We don't know where the atoms should go but maybe someone will
have an idea" sound to me like "We don't know how to use less memory in
this algorithm but maybe someone will have an idea". Sure, we should
recognize the security risks of possible attack improvements, but in the
meantime let's accurately analyze the _known_ attacks.

One important purpose of accurate analyses of known algorithms is to be
able to see when algorithms are improving. ("Maturity of analysis" is
one of the NISTPQC evaluation criteria.) This evaluation is damaged if
accurate analyses are replaced by claimed or proven lower bounds: this
hides small improvements, and then people are surprised when the small
improvements add up to breaking the lower bounds, as we've seen for
lattices. It's a mistake to confuse known attacks with possible
improvements, whether the confusion is at an "algorithmic" level or an
"implementation" level or an "architectural" level.

> I don't see the main conclusion of that paper (that vOW is at least as
> good as BHT) as relying on assuming communication costs proportional
> to the square root of the width of the computation. E.g. figure 1.2
> from your paper already shows the main conclusion while assuming that
> communication is free.

Assigning realistic costs to memory has two aspects. #1 is common-sense
resource allocation, saying that someone who can afford massive storage
can also afford a maybe-not-quite-as-massive number of processing units.
#2 is locality, saying that costs increase with distance. The paper
computes all results in a #1+#2 model, while also showing results in a
#1 model to make the component effects clear. The paper doesn't endorse
leaving out #2: on the contrary, it says that leaving out #2 is making
"the naive assumption that communication is free".

(The gap between two-dimensional models such as Brent--Kung and
science-fiction three-dimensional models such as Preparata makes a
smaller difference in the numbers, basically going 1/3 of the way from
#1+#2 to #1. The paper briefly mentions three-dimensional models.)

Regarding BHT, I'm not sure what you're disagreeing with in what I
wrote. I'm disputing NIST's claim in the call for proposals that BHT
requires as many gates as VoW, around 2^210 for SHA3-384. There are
various other metrics that account for communication costs and make BHT
as expensive than VoW, and NIST rejected BHT on this basis, but NIST
mislabeled this as a gates calculation. The fact that BHT can be
rejected by #1 _or_ #1+#2 _or_ other possibilities helps illustrate the
lack of clarity regarding the definitions of the categories.

---Dan (speaking for myself)
signature.asc

daniel.apon

unread,
Aug 22, 2020, 4:27:43 PM8/22/20
to pqc-forum, D. J. Bernstein, pqc-...@list.nist.gov
Hi Dan,
(Speaking for myself)

As is typical when you write a 14-paragraph response (sorry!-- 26 paragraph response), I am happy to respond to this point or that that seem to me to be the most important, or otherwise best explain my current thinking (or NIST's thinking, insofar as I understand it myself as a member of the team). When the situation with some technical issue rises to a sufficient threshold, we will discuss beforehand and respond as a team.

So, it is my hope that this effort in responding to a specific point does not come across as a "lack of transparency" simply by the effort to respond, nor on the other hand, as a bad-faith "Gish gallop" argument on your part simply by the multitude of paragraphs you've presented. (Hopefully we can work through whatever issues you feel are the most important before you present another ~26 paragraphs for discussion on unrelated topics.)

Please note that we have our own work to do aside from responding to every single comment you personally make on this forum.
For example, we (as human beings) judge the importance of responding to issues raised on this forum, in part, by the presence of commentary by multiple people from the broader community as well as our own availability.
(We can all agree that it would be quite unfortunate if the NIST PQC team's entire work days were consumed by working through responses to a pqc-forum filibuster by one member of the community on the forum.)

-----

That said--


Dan wrote:
> * "The proposed metric will more accurately reflect the real-world
> feasibility of implementing attacks with future technology": This
> is the only principle that favors discarding low-cost RAM gates.  

Yes, this is the core point here.


As you wrote:
> I'm a big fan of accuracy, and I'm not a big fan of encouraging fake confidence, so
> I'd be happy to see NIST throwing away RAM gates...

it is clear that the RAM model excludes a number of real-world issues regarding the costing of memory access.
Naturally, the RAM model is most useful in the context of efficiently sorting a list of a few thousand elements on one's home PC for an introductory algorithms course assignment in undergrad or grad school.
However, as the computation becomes more complex and much larger, the RAM model no longer "properly models" what is going on in real life.

Much of our perspective on this issue so far is laid out in the slides that Ray posted in the initial thread here. Yes, the issue is complex, and we hope that submission teams independently work through these issues when they consider tweaks for the 3rd Round (the final round for many such submissions, one way or the other).
As a separate matter, it was our hope that posting the slides of our internal meeting on this topic represents an extraordinary, historically unique instance of transparency to the public. You are welcome to tweet otherwise.

-----


Further, while I'll leave the BHT vs VoW issue (or other issues) as something Ray might respond to, you wrote:
> It's astonishing that, four years into NISTPQC, we still don't have
> clear, stable definitions of the metrics that NIST is using to specify
> its security "categories". It doesn't look good for NIST to be applying
> definitions selectively to _some_ attack algorithms and to _some_ submissions.  

The definitions of the metrics NIST is using to specify security categories has not changed since the Call for Proposals. They have been stable for the entire process.

Indeed, NIST's role in measurement science includes an active effort in specializing the prior, stable definitions to precision of very low-order targets as the process proceeds.
In particular, the definition of Category 1 as "hard to break as AES-128" has not changed since the Call for Proposals.

Nonetheless, when one specializes to the exact issue of NIST PQC lattice-based KEMs' real-world security with respect to CoreSVP strength in the context of "as real-world as possible" machine and memory modeling, there remains a difference of scientific opinion within the broader scientific/research community on the low-order details of this issue.

(This should not be a surprise to you. You've argued on this issue at length elsewhere.)

As we've indicated with you (via your private input to us) and with other teams previously, one can simply 'mask' the imprecision with this issue by increasing CoreSVP strength of given parameter sets by a very small amount; i.e. a handful of bits, 6-7 or so. Alternatively, a team could choose to make a very rigorous argument that they can get away with 2 or 3 bits of lower CoreSVP strength by diving down the rabbit hole of machine and memory modeling. We have chosen to leave both of those options open to every team in the 3rd Round, and make those decisions available for public vetting over the next 11-17 months.

Hope this is helpful,
--Daniel Apon

Watson Ladd

unread,
Aug 24, 2020, 9:52:27 AM8/24/20
to daniel.apon, D. J. Bernstein, pqc-...@list.nist.gov
On Sat, Aug 22, 2020 at 4:27 PM 'daniel.apon' via pqc-forum
<pqc-...@list.nist.gov> wrote:
<snip>
> Indeed, NIST's role in measurement science includes an active effort in specializing the prior, stable definitions to precision of very low-order targets as the process proceeds.
> In particular, the definition of Category 1 as "hard to break as AES-128" has not changed since the Call for Proposals.
>
> Nonetheless, when one specializes to the exact issue of NIST PQC lattice-based KEMs' real-world security with respect to CoreSVP strength in the context of "as real-world as possible" machine and memory modeling, there remains a difference of scientific opinion within the broader scientific/research community on the low-order details of this issue.
>
> (This should not be a surprise to you. You've argued on this issue at length elsewhere.)
>
> As we've indicated with you (via your private input to us) and with other teams previously, one can simply 'mask' the imprecision with this issue by increasing CoreSVP strength of given parameter sets by a very small amount; i.e. a handful of bits, 6-7 or so. Alternatively, a team could choose to make a very rigorous argument that they can get away with 2 or 3 bits of lower CoreSVP strength by diving down the rabbit hole of machine and memory modeling. We have chosen to leave both of those options open to every team in the 3rd Round, and make those decisions available for public vetting over the next 11-17 months.

I think this is unfortunate from a process perspective. Ideally given
submissions A and B we'd be able to compare their security and
performance against the best attacks and relative to AES-128. But if
team B goes and adds a few more dimensions, while team A argues the
exact same attack that applies to B and A is actually a few bits worse
than anticipated, then submission B is slower and might not get
"credit" for being more secure than submission A. This may cast doubt
on the final choice, akin to the issues with SHA3, where an
unrealistic preimage security parameter was later changed post-contest
despite influencing the performance of many submissions negatively.

Imagine we had a PreQuantum contest, and were tasked with evaluating
RSA vs prime field Diffie Hellman. The impact of parallelism and multi
targets matters for both, and does so differently from AES, and does
so differently in different models of computation. But the underlying
attacks are very similar, differing in one slight way. In this
hypothetical I think the right thing would be to look at the
similarity of the problems underlying them, and separately develop the
best attacks in the models, vs. letting the RSA team carefully
evaluate the asymptotics while the prime field DH team throws on more
bits.

Sincerely,
Watson Ladd

daniel.apon

unread,
Aug 24, 2020, 2:30:03 PM8/24/20
to pqc-forum, watso...@gmail.com, D. J. Bernstein, pqc-...@list.nist.gov, daniel.apon
Hi Watson,
(Again, speaking for myself)

Thanks for your comments! (Apologies for the length of my response ahead of time. =))

Let me explain my view of the core of the situation as follows: This memory / machine modeling issue (underlying this thread, and with a plethora of discussion in previous threads that motivated creating this discussion thread) is essentially specific to the case of the real-world, concrete security of the lattice-based candidates (mainly KEMs, but also certainly relevant to signatures as well). As this issue is entirely generic, it would naturally have implications for the non-lattice candidates as well. However, for each possible comparison case involving purely non-lattice schemes, it doesn't appear that machine / memory modeling is a critical issue regarding the decision-point of whether to standardize this or that scheme. (That is, we will or will not standardize some non-lattice scheme essentially independent of this issue.)

So, in Round 3, we (still) have a plethora of lattice schemes, for which this issue is potentially relevant: DILITHIUM, Falcon, Frodo, KYBER, NTRU, sNTRUPrime, NTRULRPrime, Saber.
Please allow me to point out the following issues:

1) For the most part, the lattice schemes' practical efficiency has a logarithmic dependence on CoreSVP strength. That is, if you want to increase CoreSVP strength from 2^X to 2^{X+5}, say, then roughly real-world security would increase from 2^Y to 2^{Y+5}. Correspondingly, in order to meet this goal, the dimension, modulus, and error distribution may change slightly, and the overall performance of the system (in terms of PK/SK/CT size, and KeyGen/Enc/Dec timing) would increase by, say, ~5-6%. (This is in the context of Category 1 security, where the security parameter is noticeably close to 100.)

2) While most schemes can target their choice of CoreSVP strength, there is one caveat: For most lattice schemes, there are 'algebraic constraints' on the choice of dimension, module-rank, modulus, etc. These constraints imply that certain CoreSVP values may not be naturally achievable. For example, a scheme may be able to efficiently target, say, 124 CoreSVP strength and 130 CoreSVP strength, but the scheme may not naturally be able to target something in the range 125-129 without significant additional efficiency-cost. (Many submissions include a script that allows one to find such 'equilibrium points' in their parameter selections, and then choose among those.)

Therefore, the choice for each lattice-candidate in the 3rd Round will be something as follows:

a) Hit 128/129/130 CoreSVP strength (as their parameter options allow), and definitely meet the Category 1 security target -- at the cost of definitely over-shooting the minimum real-world security for Category 1.
b) Hit (something like) 122/123/124 CoreSVP strength (as their parameter options allow), and make a simple argument that real-world memory costing makes up the 4-6 bit difference.
c) Hit (something like) 116-120 CoreSVP strength (as their parameter options allow), and then really convince everyone(!) that this is sufficient CoreSVP strength based on a strong argument of the possibilities of memory costs of future machines.
d) Hit something noticeably lower than the above thresholds, make a unique breakthrough argument, and hope that it can be formally vetted (by everyone, not just NIST) within the duration of the 3rd Round. (We'll call this the "risky option.")

Again, speaking for myself -- this is an attempt to convey information about my view of where people should target their tweaks for the 3rd Round. Historically (looking back at Round 1 and Round 2), some teams were very good about presenting exact CoreSVP strength values for each parameter set, and an even fewer number of teams were very good about arguing the exact details of why their selection truly met the Category 1 requirements. On the other hand, some teams merely described the nature of BKZ-like attacks against their underlying computational assumptions and simply deferred to the (now, somewhat outdated) "Estimate All The...!" paper/website.

Further, it's worth noting that the only "TLS-ready" (meaning, the 'closest to a drop-in replacement' for online protocols) signature schemes appear to be the lattice schemes: DILITHIUM and Falcon. At the same time, the lattice signature schemes tend to have presented parameters with noticeably lower CoreSVP strengths than the lattice KEMs. This is somewhat of a Catch-22: As a process issue, NIST would strongly prefer to standardize a TLS-ready post-quantum signature scheme at the end of the 3rd Round, and so we need to give the lattice signature teams an opportunity to re-evaluate their parameter of their own volition, and then provide a sufficient period of time for public vetting. At the same time, we need to treat the lattice KEM schemes fairly.

So, here is the final point:

The key deciding factor between this lattice KEM and that lattice KEM is extremely unlikely to be a 5% (or even 10%) difference in performance. (In that sense, the topic we're discussing now is not so important for the process..)
There are a variety of subtle, non-performance-related issues still to be resolved, which must take precedence. Here is a short list:

1) What is the relative effectiveness and cost of masking the lattice KEM candidates against side-channel attacks, like DPA/cold boot/template attacks/etc? (This requires masked implementations to be published, and then attacked or not, within the timespan of the 3rd Round..)
2) Does the side-channel situation change in the context of hardware implementations of these schemes?
4) Are there significant implementation challenges with one lattice scheme compared to another? How difficult is it for a 'reasonably-informed' engineer to produce safe, efficient code for this scheme or that?
5) Are lattices over cyclotomic number fields vulnerable to algebraic cryptanalysis, or not? (Related: Is the sNTRUPrime/NTRULRPrime field vulnerable in a way that the cyclotomic rings of the Finalists are not?)
6) Are structured lattices generally vulnerable in a way unstructured lattices are not?
7) Are any of the remaining lattice KEMS overly patent-encumbered, particularly in a way that would dissuade free commercial adoption?
8) etc

While it may end up that very specific, low-order efficiency-comparisons of lattice schemes w.r.t. multi-target attack issues or parallelism or machine / memory modeling issues factor noticeably into the decision between this lattice scheme and that lattice scheme, these would only be a key factor in a decision-point for NIST (speaking for myself) if and only if the above non-performance issues end up essentially "all equal."

Let me know if I can answer any other questions!
Best,
--Daniel

Watson Ladd

unread,
Aug 24, 2020, 9:06:33 PM8/24/20
to daniel.apon, D. J. Bernstein, pqc-...@list.nist.gov
On Mon, Aug 24, 2020 at 2:30 PM daniel.apon <danie...@nist.gov> wrote:
>
> Hi Watson,
> (Again, speaking for myself)
>
> Thanks for your comments! (Apologies for the length of my response ahead of time. =))

I appreciate its thoroughness!

> Further, it's worth noting that the only "TLS-ready" (meaning, the 'closest to a drop-in replacement' for online protocols) signature schemes appear to be the lattice schemes: DILITHIUM and Falcon. At the same time, the lattice signature schemes tend to have presented parameters with noticeably lower CoreSVP strengths than the lattice KEMs. This is somewhat of a Catch-22: As a process issue, NIST would strongly prefer to standardize a TLS-ready post-quantum signature scheme at the end of the 3rd Round, and so we need to give the lattice signature teams an opportunity to re-evaluate their parameter of their own volition, and then provide a sufficient period of time for public vetting. At the same time, we need to treat the lattice KEM schemes fairly.

This makes sense to me, but I'd like to push back against the need for
a TLS-ready signature scheme.

As an occasional implementor and someone involved (albiet
peripherally) in the design of TLS 1.3 I feel this is a nonissue. TLS
worked with a RSA in a KEM like way in TLS 1.2, TLS 1.1, etc. It can
work that way in the future, there has been discussion of what is
necessary to do this because of the problems with signature schemes
already. We at $DAYJOB have deployed mechanisms that can use offline
signatures on short lived KEM if for some reason certificates cannot
be made with KEMs (but again, RSA certs were a thing). Furthermore
the timeline for authentication keys is very different from encryption
keys: data encrypted today can be read tomorrow, but authentication
today cannot be undone tomorrow. This mitigates in favor of not
feeling compelled to approve a high performance signing algorithm.
Verification time matters more: we have perennial problems with P384
for that reason.

I'd hate to see NIST enshrine a signature scheme prematurely or, even
worse, enshrine the use of SIGMA like protocols when KEM based ones
are more efficient in a post-quantum world.

Sincerely,
Watson Ladd

Panos Kampanakis (pkampana)

unread,
Aug 24, 2020, 11:09:03 PM8/24/20
to Watson Ladd, daniel.apon, pqc-...@list.nist.gov
Hi Watson,

> I'd hate to see NIST enshrine a signature scheme prematurely

I agree. And I would like to see PQ Signature Finalists to come up with new
Level 1 parameters with ~128 bits classical security.

But it would benefit the community to have a PQ signature scheme at the end of
Round 3 imo.

Below are some issues with your suggestion:

> TLS worked with a RSA in a KEM like way in TLS 1.2, TLS 1.1, etc. It can
> work that way in the future,

PQ KEMs cannot serve in place of RSAEncrypt for PKI certs and cert chains. The
verifier will not be able to decapsulate the ciphertest in the cert.

> there has been discussion of what is necessary to do this because of the
> problems with signature schemes already. We at $DAYJOB have deployed
> mechanisms that can use offline signatures on short lived KEM if for some
> reason certificates cannot be made with KEMs (but again, RSA certs were a
> thing). Furthermore

I am not sure every TLS usecase will want to pick potential mechanisms you
have in mind.

Also, I am not sure about the details of these mechanisms, but on a similar
note https://eprint.iacr.org/2020/534 proposes KEMTLS with static and
ephemeral KEM keys. KEMTLS introduces changes to TLS which can definitely not
be deployed trivially, but even then the cert chain still uses PQ signatures.

Additionally, let's not forget there are other long lived usecases that use
signatures and need PQ ones. One example is signed BIOS that lives without
being upgraded and would rather see PQ signatures standardized sooner than
later.

Panos


-----Original Message-----
From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> On Behalf Of Watson
Ladd
Sent: Monday, August 24, 2020 9:06 PM
To: daniel.apon <danie...@nist.gov>
Cc: D. J. Bernstein <d...@cr.yp.to>; pqc-...@list.nist.gov
<pqc-...@list.nist.gov>
Subject: Re: [pqc-forum] Update on category 1 security

--
You received this message because you are subscribed to the Google Groups
"pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit
https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CACsn0cmf7da0WErxh7%3D0hXJiXYpBjcbAdp%2BGV3NhBc74RAyvZw%40mail.gmail.com.

Watson Ladd

unread,
Aug 25, 2020, 5:25:49 PM8/25/20
to Panos Kampanakis (pkampana), daniel.apon, pqc-...@list.nist.gov


On Mon, Aug 24, 2020, 11:08 PM Panos Kampanakis (pkampana) <pkam...@cisco.com> wrote:
Hi Watson,

> I'd hate to see NIST enshrine a signature scheme prematurely

I agree. And I would like to see PQ Signature Finalists to come up with new
Level 1 parameters with ~128 bits classical security.

But it would benefit the community to have a PQ signature scheme at the end of
Round 3 imo.

Nothing I said is against that.

Below are some issues with your suggestion:

> TLS worked with a RSA in a KEM like way in TLS 1.2, TLS 1.1, etc. It can
> work that way in the future,

PQ KEMs cannot serve in place of RSAEncrypt for PKI certs and cert chains. The
verifier will not be able to decapsulate the ciphertest in the cert.

Signatures on end-entity certs are the canonical offline case. The chain isn't signed per connnection. This changes the relative importance of signing and verification speed vs. the key in the end entity cert.


> there has been discussion of what is necessary to do this because of the
> problems with signature schemes already.  We at $DAYJOB have deployed
> mechanisms that can use offline signatures on short lived KEM if for some
> reason certificates cannot be made with KEMs (but again, RSA certs were a
> thing).  Furthermore

I am not sure every TLS usecase will want to pick potential mechanisms you
have in mind.

Also, I am not sure about the details of these mechanisms, but on a similar
note https://eprint.iacr.org/2020/534 proposes KEMTLS with static and
ephemeral KEM keys. KEMTLS introduces changes to TLS which can definitely not
be deployed trivially, but even then the cert chain still uses PQ signatures.

A new signature scheme is not trivial to deploy either, although slightly easier. We've seen substantial changes to TLS over the past 5 years get deployed. Why must TLS always use online signatures when KEMs could be faster? It doesn't!


Additionally, let's not forget there are other long lived usecases that use
signatures and need PQ ones. One example is signed BIOS that lives without
being upgraded and would rather see PQ signatures standardized sooner than
later.

These have tight demands on verification but not signing time, and require very high confidence. That's different from onlining signing in TLS. My comment was purely about the need for signatures for end-entiry authentication in TLS.

Panos Kampanakis (pkampana)

unread,
Aug 25, 2020, 11:38:41 PM8/25/20
to Watson Ladd, daniel.apon, pqc-...@list.nist.gov

I think I understand your point better now. You were not saying that NIST should not standardize a PQ Signature scheme for TLS. You were suggesting that NIST should not feel pressure to standardize an insecure PQ Signature scheme with fast signing for TLS because we can introduce KEMs for auth instead (similar to what draft-ietf-tls-semistatic-dh does for Diffie-Hellman).

 

I don’t think PQ KEMs in TLS are straightforward which I explain more below, but regardless of that I would rephrase your point to “NIST should not feel pressure to standardize any insecure scheme”.

 

> A new signature scheme is not trivial to deploy either, although slightly easier. We've seen substantial changes to TLS over the past 5 years get deployed. Why must TLS always use online signatures when KEMs could be faster? It doesn't!

 

I would argue that introducing PQ KEMs in the end entity cert is not as simple as with Diffie-Hellman in draft-ietf-tls-semistatic-dh. PQ KEMs in the leaf cert PK would automatically mean an extra round-trip (for the peer to get the PK before encapsulating something), an altered state machine and potentially a different security analysis for the protocol; all with doubtful benefit. The last paragraph of [1] Section VII-C talks about this briefly. Additionally, changes like the ones in [2] with KEMTLS are even more drastic imo.

 

I am pretty sure no one wants to jump into standardizing a drastically different or slower TLS handshake that uses KEMs for auth unless we definitely have to. [1] shows that there are a couple of options from the NIST Round 3 Signature candidates that could do satisfactorily in TLS (certs + SCTs + OCSP) (assuming the algorithms are secure of course).

 

Anyway, we can have this discussion in the IETF TLS WG in due time, no need to address it here.

 

[1] https://eprint.iacr.org/2020/071

[2] https://eprint.iacr.org/2020/534

Douglas Stebila

unread,
Aug 26, 2020, 10:38:57 AM8/26/20
to Panos Kampanakis (pkampana), Watson Ladd, daniel.apon, pqc-...@list.nist.gov
On Tue, Aug 25, 2020 at 11:38 PM 'Panos Kampanakis (pkampana)' via
pqc-forum <pqc-...@list.nist.gov> wrote:
>
> I would argue that introducing PQ KEMs in the end entity cert is not as simple as with Diffie-Hellman in draft-ietf-tls-semistatic-dh. PQ KEMs in the leaf cert PK would automatically mean an extra round-trip (for the peer to get the PK before encapsulating something), an altered state machine and potentially a different security analysis for the protocol; all with doubtful benefit. The last paragraph of [1] Section VII-C talks about this briefly. Additionally, changes like the ones in [2] with KEMTLS are even more drastic imo.
>
> I am pretty sure no one wants to jump into standardizing a drastically different or slower TLS handshake that uses KEMs for auth unless we definitely have to. [1] shows that there are a couple of options from the NIST Round 3 Signature candidates that could do satisfactorily in TLS (certs + SCTs + OCSP) (assuming the algorithms are secure of course).

KEMTLS is certainly a different handshake than signatures+ephemeral
KEMs for TLS 1.3, but it is not dramatically slower nor does it
automatically mean an extra round trip.

In terms of performance, when using NTRU or Module-LWE/SIS schemes,
KEMTLS is faster than signatures+ephemeral KEMs (although this is
marginal, since NTRU and module-LWE schemes are already very fast),
and has smaller communication bandwidth (especially when switching
server authentication from Dilithium to Kyber). If one chooses to use
SIKE for authentication instead of a faster PQ signature, then yes, it
is slower, but it does have even smaller communication bandwidth, if
that is your priority.

In terms of number of round trips, in the standard
client-request/server-response scenario, KEMTLS allows the client to
send its request in exactly the same flight as would be the case in
TLS 1.3, and receive the response in the same flight as in TLS 1.3.
TLS 1.3 does permit the scenario where the server sends application
data a flight before this, in the flight containing its certificate.
Few applications seem to use this pattern; the SETTINGS frame in
HTTP/3 is one such example. KEMTLS does not fully support that
scenario: data sent at this time in KEMTLS could be kept confidential
against a passive adversary, but would not be authenticated like in
TLS 1.3. (See our just-posted revision which has a more extensive
discussion of the security and downgrade resilience properties of
KEMTLS keys and data. https://eprint.iacr.org/2020/534.pdf)

That we can do PQ authenticated key exchange using KEMs for end-entity
authentication rather than signatures does not mean there is no need
for PQ signatures. And indeed a design like KEMTLS is a bigger change
than just substituting PQ signatures and KEMs into signed+DH
protocols. But there are concrete benefits, and we'll be making
changes anyway to add PQ algorithms, so it is worth keeping on the
table for now.

Douglas

Patrick Longa

unread,
Dec 18, 2020, 5:06:51 PM12/18/20
to Perlner, Ray A. (Fed), pqc-forum

Dear all and NIST team,

On slide 18 of Ray’s talk (“Cost Models”), it is assumed that an attacker has a budget of $2^64 over a period of 500 years. To calculate the memory budget, this becomes an annual budget of $2^64/500 that leads to having access to up to 2^96.5 bits of HDD memory over the period of 500 years.

 

However, the analysis of some (if not most) attacks in the literature assumes access to the necessary hardware from the very start, when launching the attack (otherwise the time to carry out an attack successfully would need to take into the addition/upgrade of hardware for CPU and memory as time goes by).

 

If we fix the goal of breaking AES128 to 500 years, I estimate that one needs

 

2^128 * (12.7 * 10^-9 sec) / (500 years * 365 * 24 * 3600 sec/year)  =  2^67.89 AES engines,

 

and this number of engines costs about 2^67.89 * 11590 GEs * $6.57*10^-9/GE = $2^54.21, assuming that an AES computation runs in 12.7nsec, occupies 11,590 gate equivalents (GEs) and that a GE costs 6.57*10^-3 microcents each [1]. Therefore, an attacker would need $2^54.21 to launch the attack ($2^64/500 per year was actually not a bad approximation if we assume that’s the starting budget at day 1).

 

Now let’s look at the cost of attacking some other scheme X with this initial budget ($2^54.21). For simplicity, let’s assume that for breaking X half of the money goes to get computing power and half of the money goes to memory resources (this of course depends on the scheme and the attack). In this case, we get at most

 

2^53.21 * 2^47 bits / $400 = 2^91.6 bits at launch time (compare to the 2^96.5 bits on slide 18). So there might be (at least) a 5-bit difference here, and this appears to affect all the memory estimates on slide 18.

 

OTOH, (back to the AES analysis) if we look at the energy costs for the 500-year run, I get

 

2^128 * 2.02*10^-6 kW *(12.7 * 10^-9 sec)/(3600 sec/hour) * 0.10 cents/kWh = $2^67.7, which exceeds the total budget of $2^64. This doesn’t include other costs that NIST takes into account in the analysis (e.g., the cost of replacing HDD hardware every 5 years, and the energy cost of the memory units).

How is NIST calculating the energy cost for AES128 exactly? It’s somewhat unclear by just looking at slide 17.

 

(*) The 2.02mW figure for AES was provided by Rei Ueno, based on their implementation [2].

 

Apparently, some differences in all these calculations can be due to the numbers we use for AES (actual AES results for area, timing and energy from the implementation in [2] versus NIST’s AES gate complexity of 2^143).

 

Overall, I don’t see huge differences between NIST’s numbers and our analysis ([1] presents an analysis of this nature focused on SIKE). But maybe the relatively small differences are still something to think about, given that only a few bits might make the difference between a parameter set matching a particular security level or not. For some schemes slight differences in the assumptions can change the results of the analysis (this is especially true for attacks that require a large shared memory and whose costs rely heavily on the CPU:memory cost ratio and the budget that are assumed).

 

Any comments?

 

Best,

Patrick

 

[1] P. Longa, W. Wang and J. Szefer. “The Cost to Break SIKE: A Comparative Hardware-Based Analysis with AES and SHA-3”, 2020. https://eprint.iacr.org/2020/1457

[2] R. Ueno, N. Homma, S. Morioka, N. Miura, K. Matsuda, M. Nagata, S. Bhasin, Y. Mathieu, T. Graba, and J.-L. Danger. “High throughput/gate AES hardware architectures based on datapath compression”. IEEE Trans. Computers, 69(4):534–548, 2020.

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

Perlner, Ray A. (Fed)

unread,
Dec 21, 2020, 12:49:00 PM12/21/20
to Patrick Longa, pqc-forum

Hi Patrick.

 

Thanks for checking my calculations. Overall it seems like we’re getting fairly similar numbers. I’m less convinced that a bit or two really matters here, since what we’re actually worried about is technology a couple of orders of magnitude better than current technology – i.e. technology that might make breaking a category 1 scheme close to feasible. I think we can agree that no one is seriously worried that their keys are going to be cryptanalyzed after 500 years if someone just manages the simple task of blanketing the entire planet, land and sea, with solar panels. Given that we’re really worried about hypothetical future technology, it seems entirely plausible that the cost of computation and the cost of memory could fall by different amounts. Thus I see this exercise mostly as a sanity check to see whether an amount of memory, claimed to be prohibitive for the attacks relevant to a given security strength category, is actually prohibitive.

 

On specific points: The assumption that all hardware must be purchased in the first year of an attack lasting 500 years seems really weird to me. Also, as you said, the exact portion of the budget which should be allocated to memory as opposed to computation need not be 50%. This doesn’t make as big a difference as one might think, though, since the assumption I used to do my calculation was that hard drives would wear out and need to be replaced after 5 years. Additionally, re-running the numbers, I got something closer to 2^95.5 when I multiplied out the equation on slide 18, so it seems there was a typo there. You also asked about the calculation on slide 17: This was based on the specs for the bitcoin mining platform referenced earlier in the slide and the assumption that a “hash” was a double-SHA2 operation, which was assumed to be about 2^19 gates. This gives:

 

2^143 gates * 1 hash / 2^19 gates / ( 1.1x10^14 hash / s) * 3.25 kW * $0.1/(kW h) * (1 h / 3600s) = $2^63.92

 

Finally, I should note that while the small differences in our estimates for things like the energy consumption of a single AES computation on optimized hardware don’t make very much difference, what does make a sizable difference is the methodology for how one deals with the fact that category 1 type attacks are several orders of magnitude beyond both the budget and timeline of real world adversaries, at least with existing technology. For example, the approach in the eprint paper you cited: https://eprint.iacr.org/2020/1457 is to fix the adversary’s hardware budget at something basically on a human scale (a million to a billion dollars) while allowing attack timelines to stretch as long as they need to in order to compensate, even if that number turns out to be unfathomably large. For reference, the shortest attack contemplated by this paper was over 250,000,000 years long, or about 1000 times the amount of time anatomically modern humans have existed. The problem with doing things this way becomes evident when you look at the way the paper combines this model with extrapolations of technological progress. The paper extrapolates technological progress out 20 years, to 2040, or about a ten-millionth the length of the shortest attack contemplated by the paper.  This brings the timeline for a billion dollar AES128 attack down from 8 billion years to 250 million years. Obviously the attacker would be better off waiting for technology to improve. (Also, if the adversary had invested the billion dollars in the stock market rather than rushing out to buy hardware, assuming historical rates of return, he would have something like a 4 billion dollar hardware budget in 2040 rather than 1 billion.) But, I see no reason to think 2040 would be an optimal time to start buying hardware either. Most likely, it would be optimal for the attacker to wait for something like 80-100 years if technological improvements and investment returns continue at the same exponential rate, or longer if they slow down, resulting in an attack that takes decades rather than millions of years. This can significantly change the comparative cost of memory-intensive attacks like the classical attacks on SIKE. The paper says SIKE 377 would take 2^7 times as long to attack as AES 128 with technology frozen in 2020, but only 2^4 times as long with technology frozen in 2040. If we blindly extrapolate these exponential trends to 2120, we’d find the time to attack AES with technology frozen at that point with a billion dollar budget would be something like 250 years, while the time to attack SIKE 377 would be something like 1 year. (Note: This is a back of the envelope calculation based on exponents rounded to the nearest whole number. I expect using more precise numbers from the paper will result in slightly different numbers, but I don’t expect it to change the overall conclusion that, barring cryptanalysis improvements, SIKE 377 is likely to be within the reach of an adversary with both a reasonable budget and timeline before AES128.)

 

Best,

Ray

Patrick Longa

unread,
Dec 21, 2020, 2:27:49 PM12/21/20
to ray.p...@nist.gov, pqc-...@list.nist.gov

Hi Ray,

Thanks for the reply.

 

> The assumption that all hardware must be purchased in the first year of an attack lasting 500 years seems really weird to me.

 

What I meant was that the hw available on day 1 should be the one used to estimate the time to run the attack. I agree with the idea of taking into account the cost of replacing hw that breaks as time goes by, but if you want to consider the possibility of “upgrades” (future faster hw, more memory, etc.) then the analysis gets much more complicated.

 

> Finally, I should note that while the small differences in our estimates for things like the energy consumption of a single AES computation

> on optimized hardware don’t make very much difference, what does make a sizable difference is the methodology for how one deals with

> the fact that category 1 type attacks are several orders of magnitude beyond both the budget and timeline of real world adversaries, at least

> with existing technology.

 

Yes, this is what I meant by “for some schemes slight differences in the assumptions can change the results of the analysis”. Given the divergence of results depending on the assumptions for a scenario that is purely hypothetical, the main question is then on how to decide what those assumptions are.

In our paper, we decided to try something that we think is (slightly) closer to reality than assuming an unrealistic budget with an unrealistic timeline. Basically, the convention was to use “somewhat manageable” budgets (we have evaluated from millions to up to trillions of dollars without too much change in the results) and then determine the time to break AES and SIKE, which gives unrealistic timelines as expected. Of course one can turn this around and decide to assume a reasonably-human timeline and then determine the cost of cryptanalysis.

 

> Obviously the attacker would be better off waiting for technology to improve…. If we blindly extrapolate these exponential trends to 2120, …

 

Of course. But then how is that relevant to determine the strength of AES and other algorithms today or during the next few decades?

It should be irrelevant if SIKEp377 is believed to be weaker than AES128 in year 2120 (is it reasonable to try to do a forecast that far in the future anyway?).

 

The larger issue boils down to the need of reasonably fair metrics for attacks that require very large shared memory. Should teams avoid more realistic models that use the cost of technology to analyze attacks or should they simplify the analysis with potentially imprecise theoretical costs (e.g. using a square root cost for memory accesses)? The risk is to penalize otherwise strong algorithms that have a strong cryptanalytic profile (whose best attacks are forced to use huge amounts of shared memory versus other schemes that fall to attacks that require negligible memory).  

 

Best,

Patrick     

Perlner, Ray A. (Fed)

unread,
Dec 21, 2020, 3:44:09 PM12/21/20
to Patrick Longa, pqc-forum

Hi Patrick

 

I take it as axiomatic that an attack which begins in 100 years and ends in 101 is more relevant to practical security than one that begins in 20 years and ends 250 million years from now.  I could see you arguing that what we’re really worried about are scenarios where an attack might finish even sooner than 101 years from now, and that’s fair. The only scenarios I can come up with involve a larger starting budget, hijacking computation resources without purchasing them, or cryptanalysis. I would expect the first two scenarios to behave more like my 101 year scenario than your 250 million year scenario in terms of the relative costs of attacking AES128 and SIKE377. Cryptanalysis is probably the biggest concern, but it’s hard to quantify the relative likelihood of a nontrivial attack on SIKE and AES.

Patrick Longa

unread,
Dec 21, 2020, 6:10:15 PM12/21/20
to ray.p...@nist.gov, pqc-...@list.nist.gov

Hi Ray,

It seems the discussion turns philosophical at this point 😊. The question for me is not about what scenarios we should worry about the most (because it’s clear that we assume level 1 is infeasible with current technological trends for several decades). From my viewpoint, it is more about determining (relative) security fairly, and for this goal it’s more relevant to evaluate strength as assessed today (or in the next few years), not in 100 years.

 

I suppose we agree to disagree.

 

> The larger issue boils down to the need of reasonably fair metrics for attacks that require very large shared memory. Should teams avoid

> more realistic models that use the cost of technology to analyze attacks or should they simplify the analysis with potentially imprecise

> theoretical costs (e.g. using a square root cost for memory accesses)? The risk is to penalize otherwise strong algorithms that have a strong

> cryptanalytic profile (whose best attacks are forced to use huge amounts of shared memory versus other schemes that fall to attacks that

> require negligible memory).

 

Do you have any additional comments on this aspect, beyond what NIST’s been saying on this forum?

 

All the best,

Patrick

Reply all
Reply to author
Forward
0 new messages