CS代考程序代写 scheme algorithm SIAM J. COMPUT. (C) 1988 Society for Industrial and Applied Mathematics

SIAM J. COMPUT. (C) 1988 Society for Industrial and Applied Mathematics
Vol. 17, No. 2, April 1988
HOW TO GENERATE FACTORED RANDOM NUMBERS* ERIC BACH?
001
Abstract. This paper presents an efficient method for generating a random integer with known factoriz- ation. When given a positive integer N, the algorithm produces the prime factorization of an integer x drawn uniformly from N/2 < x <= N. The expected running time is that required for O(log N) prime tests on integers less than or equal to N. If there is a fast deterministic algorithm for primality testing, this is a polynomial-time process. The algorithm can also be implemented with randomized primality testing; in this case, the distribution of correctly factored outputs is uniform, and the possibility of an incorrectly factored output can in practice be disregarded. Key words, factorization, primality, random variate generation AMS(MOS) subject classifications. 1104, 11A51, 11Y05, 65C10 1. Introduction. Let N be a positive number, and suppose that we want a random integer x uniformly distributed on the interval N/2 < x <: N. Further suppose that we do not want to output x in the usual decimal form, but rather as an explicit product of primes. This is clearly possible if we are willing to factor x. However, the best known "/lgx/lglgx algorithms for factorization [8], [16] require roughly O(logx) input x, so this approach is out of the question if N is large. In contrast, the method of this paper uses primality testing rather than factorization. Since there are efficient algorithms for determining primality [1], [10], [13], [17], the method is useful even when N is so large that factorization is infeasible. The algorithm works by assembling random primes, but it is not clear a priori with what distribution these should be selected, nor how to efficiently implement a desired distribution on the primes. Much of the paper will deal with these questions, in a rather detailed fashion. However, if one is willing to overlook these technicalities, the resulting method can be easily sketched. It selects a factor q of x whose length is roughly uniformly distributed between 0 and the length of N, then recursively selects the factors of a number y between N/2q and N/q and sets x--y. q. It has now chosen x with a known bias; to correct this, it flips an unfair coin to decide whether to output x or repeat the whole process. The results of this paper show not only that the distribution of x is uniform, but that this is a fast algorithm. A rough measure of its running time is the number of primality tests required; this quantity has expected value and standard deviation that are both O(log N)--the same as required to generate a random prime of the same size. This estimate is the basis for a finer analysis of the running time, which uses some assumptions about primality testing. If there is a deterministic polynomial-time prime test, as proved under the Extended Riemann Hypothesis by Miller [10], then the Received by the editors April 25, 1983; accepted for publication (in revised form) August 6, 1985. Sections 1-8 of this article originally appeared as Chapter 2 of the author¡¯s Ph.D. dissertation, Analytic Methods in the Analysis and Design ofNumber-Theoretic Algorithms, (C) 1985 by the Massachusetts Institute ofTechnology Press. A preliminary version ofthis article was presented at the 15th Annual ACM Symposium on the Theory of Computing 1983. ? Computer Sciences Department, University of Wisconsin, Madison, Wisconsin 53706. This research was supported by the National Science Foundation, grants MCS-8204506 and DCR-8504485, and the Wisconsin Alumni Research Foundation. 179 steps on Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 180 ERIC BACH expected number of single-precision operations needed to generate x in factored form can be bounded by a polynomial in log N. If the method uses a probabilistic polynomial-time prime test such as those of Rabin [13] and Solovay and Strassen [17], a similar result holds. In this case, the distribution of correctly factored numbers is still uniform, and the possibility of producing an incompletely factored output can in practice be disregarded- all within an expected polynomial time bound. The method has been implemented; on a medium-sized computer, it will generate a 120-digit number in about 2 minutes. The rest of this paper is organized as follows. Section 2 gives a heuristic derivation ofthealgorithm,and 3givesageneraldiscussionofrandomvariategeneration. Section 4 presents the algorithm in explicit form; its running time is analyzed in 5-8. Finally, 9givesexperimentalresults. 2. Heuristics. Later sections present a detailed algorithm; this one provides motiva- tion and sketches a design based on heuristic arguments. First, what is meant by a "random factor" of a number? If we write down all numbers between N/2 to N in factored form, we will have an array that is roughly rectangular, because the juxtaposition of a number¡¯s factors is about as long as the number itself. If the factorizations are arranged one per line, and given in binary notation, the picture will look something like this: 10 10 10 10011 100111111 100001100111001 11 11 11 1011011101 10100010111000111 10 101 11010011 10111110111110101011 10111111 100000111101110001010101 10 10 11 1111111010011 100000111110011 100101 1101101 1100011111010101101 10 111 111 111 10010100011 11111101011 11 101 1011110110111011110111001111 10 10 10 10 11101 11111 11100000000111101 11010100011 11101101001011011011 Choosing a random factorization is equivalent to picking a row at random from this list; if the list were perfectly rectangular, we could do this by choosing a bit at random and taking the row in which it appears. Now suppose that we wanted to get the effect of this process by choosing a prime factor p first and selecting one of the remaining N/p possibilities uniformly. To do this, we would pick p with probability proportional to its "surface area," that is, proportional to the total number of bits occurring in all copies of p. This suggests selecting the first factor p with probability about log p/p log N, since p occurs in about 1/p of the numbers, and a random bit of such a number will be in p about log p/log N of the time (ignoring repeated factors). It is instructive to see what effect this would have on the length of p. A weak form of the prime number theorem [6, Thm. 7] implies that for 0 < x < N, log p log x p=xplogN logN" Unlessotherwiseindicated,alllogarithmsinthispaperaretothebasee 2.718281 Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php GENERATING FACTORED RANDOM NUMBERS 181 Therefore, if we chose p with the proposed probability, the.length of p (relative to the length of N) would be close to that produced by a uniform distribution, since for 0=__ Arv(p)>__ q<=N pNN Iogp Iogp 1 pNN 2p log N =-+O1( ) p<_-N2SlogN 2 logN The independence statement is a consequence of Theorem 1. Just as in the heuristic sketch, the factor generator is a subroutine in the main program, presented below. This process uses a "stopping value" No, which can be any convenient number. PROCESS R: Random factorization generator. If N =30,and2<-_x<-_N,then Proof For N > 30,
logx+2 F(x)logN 2
log p —< 1.04N will not prove this as it is not needed later. Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php and where is Euler¡¯s constant and GENERATING FACTORED RANDOM NUMBERS 185 1 logp logN-y-/3 <---- E <_--logN, -2logN p<=N P 0.577215 /3 logp/p =0.755366... (these are (3.21), (3.24) and (3.35) from Rosser and Schoenfeld [14]). Using these inequalities, plus (1) and (2), logp P log N NowapplythesetotheformulaFN(X)--_.q<=xAN(q)/.q<=NAN(q). ["] LEMMA2.ForN>30,
21ogNqNAN(q)>logN y /3 21ogN Similarly
21ogN A(q)__< ylgP+/3+ q<=x p<=x p p__ logN-2.
logN logN+4 E[log(N/q)]<-. Proof. The expectation can be expressed as a Stieltjes integral: N log (N/x)dFN(X). 2- Using integration by parts and Lemma 1, NNN lgx+2 Ilog(N/x)dFrq(x) ;FN(X)dx I x logN-2 x¡¯ 2- 2 now computing the integral gives the result. LEMMA3.ForN>30,
(logN)2 logN+6 Elog2(N/q)<-. Proof As in the last proof, the expectation is NN q<=x 2 logN-2" <_ 3 logN2 <2 (N/x)dFN(X)=logN-2 x Ilg2 2- 6. The expected number of prime-power tests. This section proves that the number of prime-power tests done by Process R on input N has expected value and standard deviation that are both O(log N). I(logN-logx)(2+logx)dX. Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 186 ERIC BACH For every N, define random variables as follows. Tu is the number of prime-power tests done by Process R on input N, and Uu, VN, and Wn count the prime-power tests done during the first call to Process F, the recursive call, and after the first return to (?), respectively. THEOREM 4. IfNo> 106, E(T)= O(log N).
Proof Let N > No, for otherwise the theorem is true immediately. By Theorem 2, we can choose C > 0 so that Uu-<-C log N; we now prove by induction on N that TN<=6ClogN. Since Tu Uu+Vu+Wu, E(Tu)=E(Uu)+E(Vu)+E(Wu). By the definition of C and the formula E(X)= E y(E(XIY)) applied to Vu, this gives log2 E(Tn)_-106 thecoefficientofClogNislessthan6. S
The corresponding estimate for the variance is given below. THEOREM5.IfNo>106,0″2(TN) O(1ogN)2.
Proof Let R denote the process obtained by replacing the top level stopping
condition A 0 so that 0.2(Uu)+E(u)2<=D(logN) TN,SO Furthermore, for any N, <- Tu log2 log N -2 E( TN) <=D(log N)2+ Eq(E( N/q]))+ E(%). log N Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php GENERATING FACTORED RANDOM NUMBERS 187 An argument similar to the proof of Theorem 4, using Lemma 3, will show that -2 E(TN) 4D(logN)2.Since theresultfollows. [3 O¡¯2(TN)< E(T2)<=E(2N), 7. A single-precision time bound. The next two sections analyze the average number of single-precision operations needed to generate a random factorization. Any serious discussion of this must answer two questions. First, the algorithm uses real numbers, which are not finite objects; how can these be simulated? Second, one might wish to use randomized prime testing; what happens when the prime tester can make mistakes ? This section addresses the real-number issue, assuming perfect prime testing; probabilistic prime testing will be treated in the next section. In what follows, a "step" means one of the basic arithmetic operations (including comparison) on single-bit numbers, or a coin flip. All questions of addressing and space requirements will be ignored. The following result will be used repeatedly. LEMMA 4. Let T1,T2, be a sequence ofrandom variables, and let n be a positive integer-valuedrandomvariable,suchthatT 0forevery >n.IfETiln>-i)=i)Pr[n>=i]=i]<=AB. [3 i=1 i=1 i=1 i=1 At several points in our algorithm we need to flip a coin with success probability 0, where 0 is a fixed real number between 0 and 1. This means we compare 0 with a randomly generated real value ,thus: 0 =.0101010101..., A =.0110010111 A finitary procedure with the same effect simply does the comparison bit-by-bit from the left, only generating bits as they are needed. This is clearly fast; since the bits of A are specified by independent coin flips, we expect to use only two of them before reaching a decision. However, it may not be convenient to generate the true bits of 0 one at a time; to avoid this difficulty, we base our procedure on approximations that might be produced by some scheme like interval arithmetic. We need the following definition. Let 0, 0 -< 0 -< 1, be a real number. A k-bit approximationto0isanintegermultiple0kof2-k with10k 0l 2-kand0<0k<1. The lemma below eliminates real numbers from our algorithm; it states, in effect, that if 0 is approximable by any polynomial-time procedure, then a biased coin flip with success probability 0 takes constant time on the average. LEMMA 5. Let 0 <= 0 <= 1, and assume that a k-bit approximation to 0 can be computed inf(k) steps, wherefis a polynomial ofdegree rn with nonnegative coefficients. Then the expected time to decide if a uniform (0,1) random variable is less than 0 is at most C,(1),whereC dependsonlyonm. Proof Let A be the uniform (0,1) value; we can assume it is irrational, since the set of rational numbers has measure 0. Consider the following procedure: for k= 1, 2, 3,..., compute a k-bit approximation Ok to 0 and compare this to A k, the number formedfromthefirstkbitsof,if0kq-2-k<-AkorAk<0k 2-k,terminatetheprocess. The probability that no decision is reached after k steps is at most 2-k, SO the expected Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 188 ERIC BACH total time is at most a constant times Y,f(k)21-k=2 Y. ajkJ2-k=2 Y, aj ,kJ2-k, k=l k=lj=0 j=0 k=l where ao,..., a,, are the coefficients off. Poly5. and Szeg/5 [12, p. 9] give the formula (valid for [z] < 1) k=l (1-z where g2 is a polynomial with nonnegative coefficients, satisfying g(1)=j!. The result followsbytakingz 1/2andC,, 2m+Zm!. [-] LEMMA 6. Letp,q,Nbe integers with 2<-p<-q<-N. Then a k-bit approximation to logp #(N/2q,N/q]2¡¯g2q logN N q- . canbecomputedinO(k loglogN)steps. Proof.Letp 2¡¯.eandN 2 r/,whereaand/3areintegersand1- 1 be an integer. Then solving p¡¯=q for an integer p and the largestpossibleinteger canbedoneinO(logq)a(loglogq)2steps.
ProofLetd [log2q];thennecessarilyce<_-d.Foreachsuchvalueofc,wesolve X"=q by bisection, using 0 and 2[d/]+l as starting values. This will find a solution The total time is or prove that none exists after O(d/a) evaluations off(X)=X. therefore at most a constant times (log q)3y2<-- c _<-- dlg a=O(log q)3(loglog q)2. [-] The next two results assume the Extended Riemann Hypothesis (ERH), a famous conjecture of analytic number theory. The details of this hypothesis (for which see Davenport¡¯s book [4, p. 124]) are not important here; what matters is the following consequence, first proved by Miller [10]. LEMMA9(ERH). TotestifanintegerpisprimerequiresO(logp)5operations. ProofWewrite/t2(X forthelargestesuchthat2e[x,andZp*forthemultiplicative subgroup of integers modulo p. Then Miller¡¯s primality criterion states that p is prime Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php GENERATING FACTORED RANDOM NUMBERS 189 ifandonlyifeverya Zp*satisfies (7) ap-1 1andforallk 106. The proof of Lemma 1 amounted to a lower bound on the relative probability that q _-< N and an upper bound on the relative probability that q _-< x. The lower bound still holds, and the new upper bound is at most logy #(N/2y,N/y] q<=x y<=x log N N The second term is at most 2logN"Nyx\ y / -2logN and this will not cause the bound to be exceeded. For the estimate relating to incorrect output, apply Lemma 4 to the random variables X that are 1 if the ith prime test is incorrect, and 0 otherwise, and use the inequality Pr[X _-> 1] _-< E (X). ByLemma10,errorprobabilitylessthanN-2 canbeobtainedwithaboutlog N tests, each using O(log N) steps. This will give a polynomial time bound analogous to Theorem 6. 9. Experiments. This section has two purposes" to show how the algorithm actually behaves in practice, and to discuss what modifications are necessary to implement it efficiently. Call an arrival at (*) in Process F a probe; Theorem 4 implies that Process R requires O(log N) probes on the average. The constant implied by the "O" symbol can be estimated by the following heuristic argument. Typically the algorithm will produce factors whose lengths are 1/2, 1/4, 1/8,... the length of N. Presumably, then, the average number of probes is close to 21ogzN (1 + 1/2+ 1/4+ 41og2N, since by (5), we expect Process F to use about 21og_N probes on input N. This value of 41og2N is also the best bound provable as the stopping value No and the first set of experiments were designed to see if this estimate is at all realistic when No is small. To do this, I coded the algorithm verbatim in C (the "real" numbers have about 15 decimal digits of precision) with No=4. Table 1 gives statistics on the number of probes required to generate 100 random numbers for various values of N, together with the presumed mean value 41og2N. It will be seen from this table that the standard deviation tends to be a bit smaller than the mean; however, I do not even have a heuristic argument to justify this observation. To test its feasibility for large N, I also coded the algorithm with multiprecise arithmetic on a DEC VAX 11-780, a machine that takes about 5 microseconds to multiply two 32-bit integers. When dealing with multi-word numbers, efficiency is of primary importance; this concern led to the following version of Process F. 1) After building the random value q, the program first checks that q_-< N, and then computes AN(q)2[g2q] in single precision with the formula (6), as if q Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php GENERATING FACTORED RANDOM NUMBERS 191 N 41og2N 102 26.56 103 39.86 104 53.15 105 66.44 10 79.73 10 93.01 108 106.30 10 119.59 Average Maximum 20.77 60 35.59 218 56.30 367 71.30 329 72.49 472 74.61 346 111.78 491 120.82 453 Standard deviation 13.95 34.31 47.49 62.24 59.58 63.43 85.30 96.78 TABLE Statistics on the number ofprobes. were prime. Only if this exceeds the random value A does it subject q to a prime-power test. 2) The first part of this test sees if for any small prime p, Plq and p2,q; ifthis is true, q cannot be a prime power. This sieve procedure eliminates most q from further consideration, for the probability that a random number q survivesthistestforallp<=Bisabout + p<=B P Ifwelet a=I] 1+ then the survival probability is, by Mertens¡¯s theorem [6, p. 22], close to a¡¯logB TheprogramusedB 1000,whichscreensoutapproximately84%oftheq¡¯s. 3) If q passes through the sieve, it is subjected to a "pseudo" test designed to cheaply eliminate numbers that are not prime powers. This checks that 2q-1 1 andgcd(2q-l-1,q)=1;ifso,qcanbethrownaway.Theaveragecostofthis is one modular exponentiation and one gcd computation. 4) Any number q that has survived so far is tested for primality, using (7) with a 2,3,5,...,29(thefirsttenprimes).Thereisaslightadvantagetosmall values of a, since about one-third of the multiplications done by the modular exponentiation algorithm will involve a single-word quantity. 5) Only if q is not declared "prime" by the above procedure does the program try to see if it is a perfect power. (Various orderings of steps 1-5 were tried, and the one above seems to be the best.) For the multiprecise implementation, a more realistic measure of the work done is the number of times q reaches the sieve. Statistics on these values are given in Table 2, from runs that generated 50 numbers each; the last column gives the average CPU time required per random factorization. =1.943596... p ( p(p-1)) Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 192 ERIC BACH N t5 10 1030 1060 10120 Average Maximum 29.66 67 65.42 225 158.92 387 250.34 805 Standard deviation time 17.70 1.51 46.36 4.66 95.04 20.52 174.54 100.08 TABLE 2 Statistics on the number ofsieve tests, and the running time in seconds. It is worth noting that for these values of N the average running time is only slightly worse than O(log N)2. Table 3 presents the mean values of four quantities related to the output values x" the number of prime factors of x, and the number of decimal digits in the largest three factors of x (from the same experiments). It will be seen that the average number of prime factors grows very slowly with N; the observations are close to the mean valuesofloglogN+1.03465... predictedbyprimenumbertheory[7,p.346].Finally, the average lengths of the largest three factors are roughly proportional to the length of N; again, this is predicted by theory [7, p. 343], with constants of proportionality close to 0.62, 0.21, and 0.088, respectively. Average N number 1015 4.48 1030 5.32 1060 6.18 1012 6.70 TABLE 3 Statistics on the prime factors. Digits in 2nd 3rd largest largest largest 10.06 3.76 1.50 19.84 6.80 2.80 37.36 13.48 5.94 78.54 26.56 8.46 Acknowledgments. I would like to thank the following people for offering encouragement and ideas: Silvio Micali, Manuel Blum, Martin Hellman, Michael Luby, and James Demmel. I also heartily thank the seminonymous referees, whose comments improved the paper immensely. REFERENCES [1] L.M.ADLEMAN,C.POMERANCEAND R.S.RUMELY,Ondistinguishingprimenumbersfromcomposite numbers, Ann. of Math., 117 (1983), pp. 173-206. [2] E. BACH, Analytic Methods in the Analysis and Design of Number-Theoretic Algorithms, MIT Press, Cambridge, 1985 (U.C. Berkeley Ph.D. dissertation, 1984). [3] R. P. BRENT, Fast multiple-precision evaluation ofelementaryfunctions, J. Assoc. Comput. Mach., 23 (1976), pp. 242-251. [4] H. DAVENPORT, Multiplicative Number Theory, Springer, New York, 1980. [5] W. FELLER, An Introduction to Probability Theory and Its Applications (Volume I), John Wiley, New York, 1968. [6] A. E. INGHAM, The Distribution ofPrime Numbers, Cambridge University Press, Cambridge, 1932. [7] D.KNUTHAND L.TRABBPARDO,Analysisofasimplefactorizationalgorithm,Theoret.Comput.Sci., 3 (1976), pp. 321-348. [8] H. W. LENSTRA, JR., Elliptic Curve Factorization, Ann. of Math., 126 (1987), pp. 649-673. [9] M. MIGNoTTE, Tests de primalitd, Theoret. Comput. Sci., 12 (1980), pp. 109-117. Average Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php GENERATING FACTORED RANDOM NUMBERS 193 [10] G. L. MILLER, Riemann¡¯s hypothesis and tests for primality, J. Comput. System Sci., 13 (1976), pp. 300-317. 11] J.VON NEUMANN,Varioustechniquesusedinconnectionwithrandomdigits,J.Res.Nat.Bur.Standards (Applied Mathematics Series), 3 (1951), pp. 36-38. 12] G. POLY.A AND G. SZEG(, Problems and Theorems in Analysis I, Springer, New York, 1972. [13] M. O. RABIN, Probabilistic algorithm for testing primality, J. Number Theory, 12 (1980), pp. 128-138. [14] J. B. ROSSER AND L. SCHOENFELD, Approximateformulasforsomefunctionsofprime numbers, Ill. J. Math., 6 (1962), pp. 64-94. [15] B. W. SCHMEISER, Random variate generation: a survey, in Simulation with Discrete Models: A State-of-the-Art View, T. I. Oren, C. M. Shub and P. F. Roth, eds., IEEE Press, New York, 1981. [16] C. P. SCHNORR AND H. W. LENSTRA, JR., A Monte Carlofactoring algorithm with linear storage, Math. Comp., 43 (1984), pp. 289-311. [17] R. SOLOVAY AND V. STRASSEN, AfastMonte-Carlo testforprimality, SIAM J. Comput., 6 (1977), pp. 84-85. Downloaded 02/28/17 to 128.104.46.196. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php