The cr.yp.to blog

Newer (Access-K): 2019年04月30日: An introduction to vectorization: Understanding one of the most important changes in the high-speed-software ecosystem. #vectorization #sse #avx #avx512 #antivectors

Older (Access-J): 2017年10月17日: Quantum algorithms to find collisions: Analysis of several algorithms for the collision problem, and for the related multi-target preimage problem. #collision #preimage #pqcrypto

Table of contents (Access-I for index page)

2025年10月05日: MODPOD: The collapse of IETF's protections for dissent. #ietf #objections #censorship #hybrids

2025年10月04日: NSA and IETF: Can an attacker simply purchase standardization of weakened cryptography? #pqcrypto #hybrids #nsa #ietf #antitrust

2025年09月30日: Surreptitious surveillance: On the importance of not being seen. #marketing #stealth #nsa

2025年04月23日: McEliece standardization: Looking at what's happening, and analyzing rationales. #nist #iso #deployment #performance #security

2025年01月18日: As expensive as a plane flight: Looking at some claims that quantum computers won't work. #quantum #energy #variables #errors #rsa #secrecy

2024年10月28日: The sins of the 90s: Questioning a puzzling claim about mass surveillance. #attackers #governments #corporations #surveillance #cryptowars

2024年08月03日: Clang vs. Clang: You're making Clang angry. You wouldn't like Clang when it's angry. #compilers #optimization #bugs #timing #security #codescans

2024年06月12日: Bibliography keys: It's as easy as [1], [2], [3]. #bibliographies #citations #bibtex #votemanipulation #paperwriting

2024年01月02日: Double encryption: Analyzing the NSA/GCHQ arguments against hybrids. #nsa #quantification #risks #complexity #costs

2023年11月25日: Another way to botch the security analysis of Kyber-512: Responding to a recent blog post. #nist #uncertainty #errorbars #quantification

2023年10月23日: Reducing "gate" counts for Kyber-512: Two algorithm analyses, from first principles, contradicting NIST's calculation. #xor #popcount #gates #memory #clumping

2023年10月03日: The inability to count correctly: Debunking NIST's calculation of the Kyber-512 security level. #nist #addition #multiplication #ntru #kyber #fiasco

2023年06月09日: Turbo Boost: How to perpetuate security problems. #overclocking #performancehype #power #timing #hertzbleed #riskmanagement #environment

2022年08月05日: NSA, NIST, and post-quantum cryptography: Announcing my second lawsuit against the U.S. government. #nsa #nist #des #dsa #dualec #sigintenablingproject #nistpqc #foia

2022年01月29日: Plagiarism as a patent amplifier: Understanding the delayed rollout of post-quantum cryptography. #pqcrypto #patents #ntru #lpr #ding #peikert #newhope

2020年12月06日: Optimizing for the wrong metric, part 1: Microsoft Word: Review of "An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development" by Knauff and Nejasmic. #latex #word #efficiency #metrics

2019年10月24日: Why EdDSA held up better than ECDSA against Minerva: Cryptosystem designers successfully predicting, and protecting against, implementation failures. #ecdsa #eddsa #hnp #lwe #bleichenbacher #bkw

2019年04月30日: An introduction to vectorization: Understanding one of the most important changes in the high-speed-software ecosystem. #vectorization #sse #avx #avx512 #antivectors

2017年11月05日: Reconstructing ROCA: A case study of how quickly an attack can be developed from a limited disclosure. #infineon #roca #rsa

2017年10月17日: Quantum algorithms to find collisions: Analysis of several algorithms for the collision problem, and for the related multi-target preimage problem. #collision #preimage #pqcrypto

2017年07月23日: Fast-key-erasure random-number generators: An effort to clean up several messes simultaneously. #rng #forwardsecrecy #urandom #cascade #hmac #rekeying #proofs

2017年07月19日: Benchmarking post-quantum cryptography: News regarding the SUPERCOP benchmarking system, and more recommendations to NIST. #benchmarking #supercop #nist #pqcrypto

2016年10月30日: Some challenges in post-quantum standardization: My comments to NIST on the first draft of their call for submissions. #standardization #nist #pqcrypto

2016年06月07日: The death of due process: A few notes on technology-fueled normalization of lynch mobs targeting both the accuser and the accused. #ethics #crime #punishment

2016年05月16日: Security fraud in Europe's "Quantum Manifesto": How quantum cryptographers are stealing a quarter of a billion Euros from the European Commission. #qkd #quantumcrypto #quantummanifesto

2016年03月15日: Thomas Jefferson and Apple versus the FBI: Can the government censor how-to books? What if some of the readers are criminals? What if the books can be understood by a computer? An introduction to freedom of speech for software publishers. #censorship #firstamendment #instructions #software #encryption

2015年11月20日: Break a dozen secret keys, get a million more for free: Batch attacks are often much more cost-effective than single-target attacks. #batching #economics #keysizes #aes #ecc #rsa #dh #logjam

2015年03月14日: The death of optimizing compilers: Abstract of my tutorial at ETAPS 2015. #etaps #compilers #cpuevolution #hotspots #optimization #domainspecific #returnofthejedi

2015年02月18日: Follow-You Printing: How Equitrac's marketing department misrepresents and interferes with your work. #equitrac #followyouprinting #dilbert #officespaceprinter

2014年06月02日: The Saber cluster: How we built a cluster capable of computing 3000000000000000000000 multiplications per year for just 50000 EUR. #nvidia #linux #howto

2014年05月17日: Some small suggestions for the Intel instruction set: Low-cost changes to CPU architecture would make cryptography much safer and much faster. #constanttimecommitment #vmul53 #vcarry #pipelinedocumentation

2014年04月11日: NIST's cryptographic standardization process: The first step towards improvement is to admit previous failures. #standardization #nist #des #dsa #dualec #nsa

2014年03月23日: How to design an elliptic-curve signature system: There are many choices of elliptic-curve signature systems. The standard choice, ECDSA, is reasonable if you don't care about simplicity, speed, and security. #signatures #ecc #elgamal #schnorr #ecdsa #eddsa #ed25519

2014年02月13日: A subfield-logarithm attack against ideal lattices: Computational algebraic number theory tackles lattice-based cryptography.

2014年02月05日: Entropy Attacks! The conventional wisdom says that hash outputs can't be controlled; the conventional wisdom is simply wrong.

2017年11月05日: Reconstructing ROCA: A case study of how quickly an attack can be developed from a limited disclosure. #infineon #roca #rsa

[Authors of this blog post, in alphabetical order: Daniel J. Bernstein and Tanja Lange.]

Nemec, Sýs, Švenda, Klinec, and Matyáš announced on 16 October that millions of supposedly high-security RSA public keys from a popular Infineon software library ("v1.02.013") were actually vulnerable to low-cost attacks. Specifically, Infineon's 1024-bit keys take "less than 3 CPU-months" to factor "on a single core of a common recent CPU", and half as much time on average. Infineon's 2048-bit keys take less than "100 CPU-years" to factor. On an Intel Haswell running at only 3GHz (similar to Amazon's "c4" cores), the 2048-bit keys take "140.8 CPU-years" to factor.

The news said "To give people time to change keys, the paper describing the factorization method isn't being published until it's presented at the conference." The authors of the paper included in their announcement only limited information about their factorization method.

We decided to see whether we could reconstruct the attack from this limited information, rather than waiting for the paper to be released. We figured out the main ideas within a day. Within a week we sent the authors our own attack software, running even faster than theirs. We certainly weren't working full time on this during the week.

We emphasize that this work was not independent of what the paper authors did, and the fact that we were able to reconstruct the attack so quickly should not be viewed as criticism of the merits of the paper. We were starting from some information released by the authors. What we are saying is that, from a security perspective, this information was in fact the critical information in the paper. Here are some questions to think about:

Presumably, after seeing the 16 October announcement, some users managed to replace their RSA signature keys before the paper was released. Were these users safe against serious attackers on 17 October?
Did delaying the release of the paper for two weeks make users think they were safer than they actually were?
RSA is used not just for signatures but also for encryption. Once a ciphertext has been sent and recorded by an attacker, replacing the key doesn't protect the ciphertext. What was the impact of the delay on the amount of private user data encrypted under weak keys?
Overall, did delaying the release of the paper help security, or hurt security?
The authors notified Infineon at the beginning of February 2017. It seems that Infineon negotiated nine months of delay before the paper was posted. Did this much larger delay help security, or hurt security?
ISO maintains a standard for prime generation (ISO/IEC 18032:2005) that isn't strict enough to prohibit what Infineon was doing. ISO wasn't told the attack details, and wasn't asked to fix the standard, until this month. Did this delay help security, or hurt security?

An August 2016 paper by Švenda, Nemec, Sekan, Kvašňovský, Formánek, Komárek, and Matyáš had already pointed out that Infineon keys were visibly non-random; see Table 6 in the paper. Attackers could already have figured out the whole attack from these details. Or attackers could have looked at Infineon keys on their own and found the same information. Our best guess is that serious attackers found the Infineon vulnerability years ago and have been quietly exploiting it since then.

Costs of an attack. We are deeply concerned by the statement "Large-scale vote fraud is not conceivable due to the considerable cost and computing power necessary" regarding the Estonian ID cards used for e-voting. This statement appears to be based on the claim that breaking all 750000 ID cards would cost 60 billion Euros, which in turn is based on the claim that breaking one card would cost 80000 Euros. Actual attack costs are thousands of times lower, for several reasons:

10x improvement: "Large-scale vote fraud" does not require breaking all of the ID cards; 10% already buys massive influence.
More than 4x improvement: It seems that 80000 EUR was the cost of a preliminary algorithm reported by the authors. The authors subsequently reported that a faster algorithm uses 40000 EUR worst-case and, more to the point, 20000 EUR on average. We have already beaten this, and we expect further research attention to produce further improvements.
More than 10x improvement: 20000 EUR is a cost using rented computer equipment. Serious attackers reduce costs by an order of magnitude by purchasing computer equipment. Concretely, 20000 EUR buys 20 computers, each with 8-core CPUs, completing an average factorization in roughly 4 months; and the cost of a computer is amortized across years of calculations.
More than 10x improvement: 20000 EUR is a cost using general-purpose CPUs. There is extensive literature studying the cost reductions achievable with other types of chips such as GPUs, FPGAs, and ASICs. Our assessment is that this factorization algorithm is reasonably well suited to take advantage of such chips.

We are happy to see Estonia finally blocking the Infineon keys starting 3 November, but tremendous damage could already have been done.

Certification. Infineon says that its "Fast Prime" function was "certified by the BSI (Federal Office for Information Security) in Germany. No mathematical weaknesses were known, nor have been discovered during the certification processes." The Estonian authorities say that compliance with "security requirements" has been "certified by the competent German and French certification bodies". There are several BSI certifications, ANSSI certifications, and so on for security systems that include Infineon's library v1.02.013: see, e.g., the BSI certification report for Infineon Security Controller M7892 B11 based on tests by TÜV Informationstechnik GmbH.

It seems that the prime-generation algorithm used by Infineon was never subjected to public review. It also seems that this failure was encouraged by the certification process. As stated in the paper:

The certification process counter-intuitively "rewards" the secrecy of design by additional certification "points" when an implementation is difficult for potential attackers to obtain—thus favoring security by obscurity. Relevant certification bodies might want to reconsider such an approach in favor of open implementations and specifications. Secrecy may increase the dificulty of spotting a flaw (above the capability of some attackers) but may also increase the impacts of the flaw due to the later discovery thereof.

Did BSI never ask Infineon to justify its prime-generation algorithm? Or did Infineon supply a justification with flaws that BSI failed to detect? Apparently Hanno Böck asked BSI for comments on this incident and didn't receive an answer. What is the value of certification if such weaknesses are not caught?

The reconstruction process

The rest of this blog post dives into the details of (1) what information was released and (2) our process of reconstructing an attack from this information.

Divisors in residue classes. As part of their 16 October announcement, the paper authors posted a tool for checking whether a key is generated by Infineon's library. We inspected the source code of the tool to see that an Infineon key N has a limited set of remainders modulo various small primes. For example, N is guaranteed to be 1 or 10 modulo 11, and 1 or 10 or 26 modulo 37, whereas properly generated RSA keys are equally distributed across 10 different possibilities modulo 11 and 36 different possibilities modulo 37.

The numbers 1, 10, 26 modulo 37 are exactly the powers of 10 modulo 37. How could Infineon be generating primes p and q so that the product N = pq is a power of 10 modulo 37? The simplest explanation is that each of p and q is generated as a power of 10 modulo 37, forcing the product N = pq to be a power of 10 modulo 37. Similar comments apply when 37 is replaced by various other possibilities.

Inspecting examples of Infineon primes p and q, rather than public keys N, would have immediately shown that this explanation is in fact correct. The 2016 paper mentioned above already stated that the primes were limited in the same way as N. We hadn't seen that paper, and we weren't looking at any secret keys, but we were using Occam's Razor to make reasonable guesses.

This type of information about p and q is important because the literature already explains a polynomial-time algorithm to find a prime divisor of N in a specified arithmetic progression u, u+v, u+2v, u+3v, etc., assuming that gcd{v,N} = 1 and that v ≥ N^1/4. For v ≥ N^1/2 this is an easy exercise. For v ≥ N^1/3 this goes back to a classic 1984 paper "Divisors in residue classes" by Lenstra. For v ≥ N^1/4 this is a result of Coppersmith, Howgrave-Graham, and Nagaraj, presented in Howgrave-Graham's thesis in 1998 and in a paper "Divisors in residue classes, constructively". The N^1/4 is a soft cutoff: one can go down to N^1/4/poly(N), although the time goes up correspondingly.

This isn't a complete discussion of the history. The underlying algorithmic idea was developed through a series of papers by many people aiming at various applications; see the survey paper "Reducing lattice bases to find small-height values of univariate polynomials" for more information. Perhaps the idea should be called "extended linearization" ("XL"), recognizing its similarity to a technique introduced by Lazard in 1979 for solving equations in many variables over a field; the name XL for this technique goes back to a 2000 paper by Courtois, Klimov, Patarin, and Shamir. But this name isn't standard outside the multivariate context. People instead typically say that they're using "Coppersmith's method", even if they're actually using, e.g., Lenstra's algorithm.

Internally, these algorithms perform lattice-basis reduction. There are various parameters such as the dimension of the lattice and the largest power of N that is used in creating the lattice. Larger lattice dimensions use more time for reduction but handle smaller values of v.

Nemec, Sýs, Švenda, Klinec, and Matyáš had announced their paper title as "The Return of Coppersmith's Attack: Practical Factorization of Widely Used RSA Moduli". This sounded consistent with the idea that the primes are in fact known or guessed modulo some v.

Random powers modulo primes. It's normal to generate large random primes by generating large random integers and testing them for primality. It's also normal to double the success chance per integer, saving half of the randomness, by generating large odd random integers and testing them for primality. One can go further in this direction, for example saving 77% of the original randomness as follows:

Decide that p mod 2 is 1.
Choose randomly what p mod 3 will be, either 1 or 2.
Choose randomly what p mod 5 will be, either 1 or 2 or 3 or 4.
Choose randomly what p mod 7 will be, either 1 or 2 or 3 or 4 or 5 or 6.
Reconstruct p mod 210 by the "Chinese Remainder Theorem" ("CRT"). There are 2*4*6 = 48 different possibilities at this point. These are exactly the 48 integers between 0 and 210 coprime to 210, i.e., not divisible by 2 or 3 or 5 or 7.
Add a large random multiple of 210 to obtain p.
Check whether p is in fact prime.

One can replace 2*3*5*7 by an even larger product of primes. Beware that at some point this breaks down: for example, if the goal is to generate a 1024-bit prime, then one can't use the product 2*3*5*7*...*997 of all primes below 1000. (This product is nearly 2¹³⁸⁰, so p mod the product is almost certainly above 2¹³⁰⁰.)

There is an extensive literature on generating sequences of random numbers. Lehmer in 1949 suggested generating a sequence of 8-digit random numbers by multiplying an initial number repeatedly by 23 modulo 10⁸+1. What would happen if Infineon generated a sequence of possibilities for p mod 11 as, e.g., powers of 2 mod 11? This isn't inherently a problem: 2 is a generator mod 11, meaning that 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 are powers of 2 mod 11.

What would happen if Infineon instead generated a sequence of possibilities for p mod 11 as powers of 10 mod 11? Suddenly there would be only two possibilities: the only powers of 10 mod 11 are 1 and 10, matching what Infineon seemed to be doing. This could even be a deliberate weakness with plausible deniability: "Oh, sorry, nobody warned us that 2 mod 11 was okay and 10 mod 11 wasn't."

We wrote down details of our observations so far, and tweeted a hash of the details:

Had fun reverse engineering https://github.com/crocs-muni/roca/blob/master/roca/detect.py w/ @hashbreaker SHA256: 01463fbab8a8f9e345cd3f2201556a26d2f81b03cf2b8760643148b9a01255a6

Random powers modulo products of primes. The next day we sent email to the paper authors with the details and also with a simpler guess for what Infineon was doing. The simpler guess was that, instead of computing separate powers modulo 3, 5, 7, ... Infineon was computing a power of a single number g modulo the product L = 2*3*5*7*... This would save Infineon the CRT effort, and would have the same basic effect of guaranteeing that the resulting integers are coprime to L, i.e., not divisible by any of the primes dividing L. This would also force all output keys to be 1 or 10 modulo 11 if g happened to be 10 modulo 11.

This guess would also mean that Infineon was limiting the set of keys it could generate. Specifically, a standard number-theory exercise says that, if L is divisible by two different odd primes, then it's impossible to find an integer g such that all integers coprime to L are powers of g. For example, no matter what g you choose, there are at most 12 different powers of g modulo 210; recall that CRT was producing 48 different possibilities.

A related algorithm "GenPrime" was published by Joye and Paillier in 2006. The Joye–Paillier generator, like Lehmer's generator, starts from a new random number r coprime to L and then multiplies repeatedly by a constant g modulo L, obtaining r times a power of g. This can produce any number modulo L, and produces only a slight bias in the resulting primes. The guess was that Infineon was oversimplifying and generating merely a power of g; this produces far fewer numbers modulo L.

The authors wrote back promptly, confirming that this guess was correct but not revealing more details.

First attack. We were busy with other things but came back to this reverse-engineering on 20 October. We downloaded a pile of PGP keys and extracted the 2048-bit keys that survived all the tests announced by the authors, presumably Infineon-generated keys. We tried these keys modulo further small primes, finding, for example, that N mod 331 was always 1 or 330. We then tried these keys modulo products of small primes, for example finding that N mod 2*11*37*97*331 was always 1, 65537, 4878941, 8942297, 14367385, or 24016035. These are exactly the powers of 65537 modulo 2*11*37*97*331. For comparison, there were 2 possibilities for N mod 11, 3 for N mod 37, 6 for N mod 97, and 2 for N mod 331, so if these possibilities had been independent then there would have been 72 possibilities for N mod 2*11*37*97*331.

Further computations were consistent with the guess that p and q were being generated as powers of 65537 modulo L, where L was either the product of all primes through 691, or the product of all primes through 701.

A simple attack strategy at this point is to try all powers of 65537 modulo L, and apply one of the "divisors in residue classes" algorithms to each possibility. L is beyond the N^1/4 cutoff mentioned above, even beyond the N^1/3 cutoff: the two possibilities for L are roughly 2⁹⁶⁰ and 2⁹⁷⁰, whereas N^1/4 is roughly 2⁵¹².

This comparison of sizes shows that knowing p mod L is much more information than necessary. One can thus replace L by something smaller. If p is a power of 65537 modulo L then it is also a power of 65537 modulo v for any divisor v of L, and the "divisors in residue classes" algorithms efficiently find p from this power if v is above N^1/4.

The number of powers to try is the order of 65537 modulo v, and one can reduce v to save time in trying powers. (The analysis of orders is similar to the analysis of a standard factorization method, the "p-1 method".) On the other hand, as v drops down towards N^1/4, the required lattice dimension increases steeply. We presumed that L includes 701, checked the orders and basis-reduction time for some possibilities for v, and then took v as the product of all primes r through 701 such that the order of 65537 modulo r divides 2⁷*3³*5²*7*11*13*17*19*23. We wrote an attack script using the Sage computer-algebra system, and sent the script and a description by email to the authors. We tweeted a hash of this email:

More exploration with @hyperelliptic has now produced working attack code: 3f5ba89d705a1059683c4c406dcda87f8af73f37cf0202cc74b875fcc28b3cb6

The authors confirmed that our attack script worked on real Infineon keys. They also confirmed that L did in fact include 701. We were already assuming 701 in our script, and we would have already been sure about it if we had seen some examples of Infineon secret keys (for example, by putting some computer time into factoring some public keys). Anyway, covering both possibilities in the script wouldn't have been much extra effort.

Second attack. Finally, we put some effort into speeding up our script, stopping when we had something faster than what the paper authors had announced. We sent the details by email to the authors and tweeted a hash:

Yup. Our 2048bit attack using @sagemath is now 5-25% faster than ROCA blog. 3fd6a53a3b6362248ac10de4a8108df3c839a7193a96d0991c6675990599d917

The "Yup" here was in response to a tweet by Graham Steel regarding our first tweet:

I guess that was inevitable... will they have a faster version of the attack before the paper is even released?

The biggest speedup from our first version to our second was "chaining". The authors also helpfully pointed out to us that the 2016 paper had observed a narrower range of p and q for Infineon than what we had been assuming; this provided an additional speedup. As noted above, we see many possibilities for further speedups beyond what we implemented.

[2022年01月09日 update: Updated links above.]

Version: This is version 2022年01月09日 of the 20171105-infineon.html web page.