Linear Crypt Analysis
February 15, 2010
12
Similar to Differential Crypt Analysis (DCA) Linear Crypt Analysis (LCA) is a technique for attacking encryption algorithms of many rounds with subkeys entering linearly in each round. Like DCA it exploits non-uniformities in the structure of a round reducing them to linear expressions of unequal probabilities. A sufficiently large number of input/output pairs (known or chosen) can reveal the effect of these inequalities and enable conclusions about the sub-keys to be drawn.
LCA was first described by Mitsura Matsui in 1992, (Computer and Information Systems Laboratory Mitsubishi Electric Corporation, Japan ). Matsui used LCA to describe an attack on DES. The discussion that follows here is directly taken from his original paper, and uses his notation.
12.1
The general idea of LCA is to find a linear approximation to a cryptographic algorithm, which holds with a certain probability p = 12 , of the form
P[i1 i2 ...ir ] ⊕ K[l1 l2 . . . ls ] = C[j1 j2 . . . jt ]
(0.1)
Here P is the plain-text, C is the cypher-text, and K is the key (usually a subset of the sub-keys of a multi-round cypher).
The notation P [i1 i2 . . . ir ] means the XOR sum of bits i1 i2 ...ir of P (that is P [i1 ] ⊕ P [i2 ] . . . ⊕ P [ir ] = 0 or 1).
1
If such an expression as equation ?? can be found then using N known plain-texts and cypher-text pairs we would expect equation ?? to be satisfied on N p occasions (±2.58 N p(1 − p)) for one percent confidence, assuming a binomial distribution).
This enables one to decide, given sufficiently large N, what is the value of K[ ] given P [ ] and C[ ]. For example, if p > 12 so that equation ?? is on average true, but P [ ] and C[ ] are found to be on average different (example:
P [ ] = 1 and C[ ] = 0) then we can plausibly conclude that K[ ] = 1. This would give us a relationship between key-bits effectively reducing the key space by one bit. We shall see how we can estimate many key bits later.
12.2
The method for producing approximations like equation ?? is to find such approximations for single rounds and then combine all the rounds together in such a way as to eliminate all intermediate data values, leaving only P [ ],
C[ ] and key bits (from some or all of the rounds).
As an example consider the DES f () function. Permuting bits does not affect the XOR sum of their values. The only non-linear operations which need to be linearly approximated are the S-boxes. Consider S1 , and the input bit corresponding to 10HEX . (See table 11.1 of Chapter 11) The XOR of the four bits of the output of S1 (corresponding to OFHEX ) is equal to the input bit only for fourteen out of the sixty-four possible input patterns for S1 . But the input to S1 is the XOR of a data bit P [i] and a key bit
K[i]; so if P [i] = C[j1 j2 j3 j4 ] (where C[j1 j2 j3 j4 ] means the XOR of the four bit output of S1) we may assume K[i] = 1 with high probability. (It can be shown that for high confidence, if the probability of an expression is p we need Order( 21 − p)−2 observations, approximately if p is very small.)
Of course in DES, and other algorithms, one must keep track of bits as they are moved around by permutations and expansions in functions such as
DES f (). But the essential point is the fact that the S-boxes allow certain
= 21 . linear approximations with probabilities sufficiently different from 32
64
12.3
Putting together different rounds is done by assuming that the linear approximations of the individual rounds are simultaneously all true. If they are, then the linear equations for the relevant rounds may be XORed together to form a single linear approximation for the whole algorithm; which is true with probability
2
(1)/(2) + 2n−1
n
1
i=1 (pi − 2 )
(0.2)
if there are n stages and the probability of the linear approximation for the ith stage being true is pi . (This expression is the probability of the XOR sum of n random variables being zero, if their individual