SIAM J. COMPUT. (C) 1988 Society for Industrial and Applied Mathematics
Vol. 17, No. 2, April 1988
HOW TO GENERATE FACTORED RANDOM NUMBERS* ERIC BACH?
001
Abstract. This paper presents an efficient method for generating a random integer with known factoriz- ation. When given a positive integer N, the algorithm produces the prime factorization of an integer x drawn uniformly from N/2 < x <= N. The expected running time is that required for O(log N) prime tests on integers less than or equal to N.
If there is a fast deterministic algorithm for primality testing, this is a polynomial-time process. The algorithm can also be implemented with randomized primality testing; in this case, the distribution of correctly factored outputs is uniform, and the possibility of an incorrectly factored output can in practice be disregarded.
Key words, factorization, primality, random variate generation AMS(MOS) subject classifications. 1104, 11A51, 11Y05, 65C10
1. Introduction. Let N be a positive number, and suppose that we want a random integer x uniformly distributed on the interval N/2 < x <: N. Further suppose that we do not want to output x in the usual decimal form, but rather as an explicit product of primes.
This is clearly possible if we are willing to factor x. However, the best known
"/lgx/lglgx
algorithms for factorization [8], [16] require roughly O(logx)
input x, so this approach is out of the question if N is large. In contrast, the method of this paper uses primality testing rather than factorization. Since there are efficient algorithms for determining primality [1], [10], [13], [17], the method is useful even when N is so large that factorization is infeasible.
The algorithm works by assembling random primes, but it is not clear a priori with what distribution these should be selected, nor how to efficiently implement a desired distribution on the primes. Much of the paper will deal with these questions, in a rather detailed fashion. However, if one is willing to overlook these technicalities, the resulting method can be easily sketched.
It selects a factor q of x whose length is roughly uniformly distributed between 0 and the length of N, then recursively selects the factors of a number y between N/2q and N/q and sets x--y. q. It has now chosen x with a known bias; to correct this, it flips an unfair coin to decide whether to output x or repeat the whole process.
The results of this paper show not only that the distribution of x is uniform, but that this is a fast algorithm. A rough measure of its running time is the number of primality tests required; this quantity has expected value and standard deviation that are both O(log N)--the same as required to generate a random prime of the same size.
This estimate is the basis for a finer analysis of the running time, which uses some assumptions about primality testing. If there is a deterministic polynomial-time prime test, as proved under the Extended Riemann Hypothesis by Miller [10], then the
Received by the editors April 25, 1983; accepted for publication (in revised form) August 6, 1985. Sections 1-8 of this article originally appeared as Chapter 2 of the author¡¯s Ph.D. dissertation, Analytic Methods in the Analysis and Design ofNumber-Theoretic Algorithms, (C) 1985 by the Massachusetts Institute ofTechnology Press. A preliminary version ofthis article was presented at the 15th Annual ACM Symposium on the Theory of Computing 1983.
? Computer Sciences Department, University of Wisconsin, Madison, Wisconsin 53706. This research was supported by the National Science Foundation, grants MCS-8204506 and DCR-8504485, and the Wisconsin Alumni Research Foundation.
179
steps on
Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
180 ERIC BACH
expected number of single-precision operations needed to generate x in factored form can be bounded by a polynomial in log N.
If the method uses a probabilistic polynomial-time prime test such as those of Rabin [13] and Solovay and Strassen [17], a similar result holds. In this case, the distribution of correctly factored numbers is still uniform, and the possibility of producing an incompletely factored output can in practice be disregarded- all within an expected polynomial time bound.
The method has been implemented; on a medium-sized computer, it will generate a 120-digit number in about 2 minutes.
The rest of this paper is organized as follows. Section 2 gives a heuristic derivation ofthealgorithm,and 3givesageneraldiscussionofrandomvariategeneration. Section 4 presents the algorithm in explicit form; its running time is analyzed in 5-8. Finally, 9givesexperimentalresults.
2. Heuristics. Later sections present a detailed algorithm; this one provides motiva- tion and sketches a design based on heuristic arguments.
First, what is meant by a "random factor" of a number? If we write down all numbers between N/2 to N in factored form, we will have an array that is roughly rectangular, because the juxtaposition of a number¡¯s factors is about as long as the number itself. If the factorizations are arranged one per line, and given in binary notation, the picture will look something like this:
10 10 10 10011 100111111 100001100111001 11 11 11 1011011101 10100010111000111
10 101 11010011 10111110111110101011 10111111 100000111101110001010101
10 10 11 1111111010011 100000111110011 100101 1101101 1100011111010101101
10 111 111 111 10010100011 11111101011 11 101 1011110110111011110111001111 10 10 10 10 11101 11111 11100000000111101 11010100011 11101101001011011011
Choosing a random factorization is equivalent to picking a row at random from this list; if the list were perfectly rectangular, we could do this by choosing a bit at random and taking the row in which it appears.
Now suppose that we wanted to get the effect of this process by choosing a prime factor p first and selecting one of the remaining N/p possibilities uniformly. To do this, we would pick p with probability proportional to its "surface area," that is, proportional to the total number of bits occurring in all copies of p.
This suggests selecting the first factor p with probability about log p/p log N, since p occurs in about 1/p of the numbers, and a random bit of such a number will be in p about log p/log N of the time (ignoring repeated factors).
It is instructive to see what effect this would have on the length of p. A weak form of the prime number theorem [6, Thm. 7] implies that for 0 < x < N,
log p log x p=xplogN logN"
Unlessotherwiseindicated,alllogarithmsinthispaperaretothebasee 2.718281
Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
GENERATING FACTORED RANDOM NUMBERS 181
Therefore, if we chose p with the proposed probability, the.length of p (relative to the length of N) would be close to that produced by a uniform distribution, since for 0=__ Arv(p)>__ q<=N pNN
Iogp Iogp 1
pNN 2p log N
=-+O1( ) p<_-N2SlogN 2 logN
The independence statement is a consequence of Theorem 1.
Just as in the heuristic sketch, the factor generator is a subroutine in the main
program, presented below. This process uses a "stopping value" No, which can be any convenient number.
PROCESS R: Random factorization generator.
If N =
logx+2 F(x)logN 2
log p —< 1.04N
will not prove this as it is not needed later.
Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
and
where
is Euler¡¯s constant and
GENERATING FACTORED RANDOM NUMBERS 185
1 logp logN-y-/3 <---- E
<_--logN,
-2logN p<=N P 0.577215
/3 logp/p =0.755366...
(these are (3.21), (3.24) and (3.35) from Rosser and Schoenfeld [14]). Using these inequalities, plus (1) and (2),
logp P
log N
NowapplythesetotheformulaFN(X)--_.q<=xAN(q)/.q<=NAN(q). ["] LEMMA2.ForN>30,
21ogNqNAN(q)>logN y /3 21ogN Similarly
21ogN A(q)__< ylgP+/3+ q<=x p<=x p
p__
logN logN+4 E[log(N/q)]<-.
Proof. The expectation can be expressed as a Stieltjes integral: N
log (N/x)dFN(X). 2-
Using integration by parts and Lemma 1, NNN
lgx+2
Ilog(N/x)dFrq(x) ;FN(X)dx I
x logN-2 x¡¯
2- 2
now computing the integral gives the result. LEMMA3.ForN>30,
(logN)2 logN+6 Elog2(N/q)<-.
Proof As in the last proof, the expectation is NN
q<=x
2 logN-2"
<_
3 logN2
<2
(N/x)dFN(X)=logN-2 x
Ilg2 2-
6. The expected number of prime-power tests. This section proves that the number of prime-power tests done by Process R on input N has expected value and standard deviation that are both O(log N).
I(logN-logx)(2+logx)dX.
Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
186 ERIC BACH
For every N, define random variables as follows. Tu is the number of prime-power tests done by Process R on input N, and Uu, VN, and Wn count the prime-power tests done during the first call to Process F, the recursive call, and after the first return to (?), respectively.
THEOREM 4. IfNo> 106, E(T)= O(log N).
Proof Let N > No, for otherwise the theorem is true immediately. By Theorem 2, we can choose C > 0 so that Uu-<-C log N; we now prove by induction on N that
TN<=6ClogN.
Since Tu Uu+Vu+Wu, E(Tu)=E(Uu)+E(Vu)+E(Wu). By the
definition of C and the formula E(X)= E y(E(XIY)) applied to Vu, this gives
log2 E(Tn)_-
The corresponding estimate for the variance is given below. THEOREM5.IfNo>106,0″2(TN) O(1ogN)2.
Proof Let R denote the process obtained by replacing the top level stopping
condition A
ProofLetd [log2q];thennecessarilyce<_-d.Foreachsuchvalueofc,wesolve X"=q by bisection, using 0 and 2[d/]+l as starting values. This will find a solution
The total time is or prove that none exists after O(d/a) evaluations off(X)=X.
therefore at most a constant times
(log q)3y2<-- c _<-- dlg a=O(log q)3(loglog q)2. [-]
The next two results assume the Extended Riemann Hypothesis (ERH), a famous conjecture of analytic number theory. The details of this hypothesis (for which see Davenport¡¯s book [4, p. 124]) are not important here; what matters is the following consequence, first proved by Miller [10].
LEMMA9(ERH). TotestifanintegerpisprimerequiresO(logp)5operations.
ProofWewrite/t2(X forthelargestesuchthat2e[x,andZp*forthemultiplicative subgroup of integers modulo p. Then Miller¡¯s primality criterion states that p is prime
Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
GENERATING FACTORED RANDOM NUMBERS 189
ifandonlyifeverya Zp*satisfies
(7) ap-1 1andforallk