Tag Archives: Primes

Project Euler in R: Problem 10

Problem:  Find the sum of all the primes below two million.

Commentary:  If you’re not careful, this problem can really get you in R.  In poking around the internet, there are a few solutions to this problem, but all the ones I’ve tested are slow.  Some, even from people who are better at R than me, are VERY slow.  That linked one in particular took 6 minutes and 8.728000 seconds to complete.  My solutions runs in under 2 seconds.

So we’ll be using that old prime algorithm I’ve already discussed to death.  Other than a few tricks in there that I never see get implemented, there’s a crucial trick to this solution.  Since we only want the sum, let’s not be slow about finding the primes we want.  I only have to find all primes below \sqrt{n}, where n=2000000.  It takes basically no time to find them.  From there, I start evaluating my sum, and like a good little R user, I vectorize by getting the possible list of possible primes above \sqrt{n}.  From there, I check if any of my known primes divide into those possible primes.  This is actually much faster than generating primes and storing them.

R Code:

ElapsedTime <- system.time({
##########################
n <- 2000000

### Function to get primes below specified n; option to get only those below sqrt(n)
PrimesBelow <- function(n, below.sqrt=FALSE){
  if (below.sqrt == TRUE){
    m <- ceiling(sqrt(n))
  } else {
    m <- n
  }

  primes <- c(2,3)
  i <- 3
  while (i < m-1){
    flag <- 0
    i <- i+2
    if (i%%6 == 3){
      flag <- 1
    }
    if (flag == 0){
      s <- sqrt(i)+1
      possible.primes <- primes[primes<s]
      for (prime in possible.primes){
        if ((i%%prime == 0)){
          flag <- 1
          break
        }
      }
      if (flag == 0){
        primes <- c(primes, i)
      }
    }
  }
  primes
}
###

primes <- PrimesBelow(n, below.sqrt=TRUE)

biglist <- ceiling(sqrt(n)):n
biglist <- biglist[-which(biglist%%2 == 0)]
biglist <- biglist[-which(biglist%%6 == 3)]

for (prime in primes){
  biglist <- biglist[which(biglist%%prime != 0)]
}

answer <- sum(primes)+sum(as.numeric(biglist))
##########################
})[3]
ElapsedMins <- floor(ElapsedTime/60)
ElapsedSecs <- (ElapsedTime-ElapsedMins*60)
cat(sprintf("\nThe answer is:  %f\nTotal elapsed time:  %d minutes and %f seconds\n", answer, ElapsedMins, ElapsedSecs))

Output:
The answer is:  142913828922.000000
Total elapsed time:  0 minutes and 1.915000 seconds


Prime testing function in R

I was hoping to begin tinkering a bit with the multicore package in R beyond some extremely trivial examples.  Thanks to a combination of R’s dumb quirkiness (for example, being worthless on loops), my poor planning, and general bad programming, my Saturday afternoon tinkering project is ultimately worthless in fulfilling that purpose.

I was really hoping to take the prime searcher I had been using to solve Project Euler problems and use it to make a prime testing function which would utilize the 4 cores on my adorable little core i5.  I succeeded in creating it, but it wasn’t until I had finished (which thankfully was only about an hour of time wasted) that I realized that my plan was stupid and I was going to have to loop over entries (or use sapply(), which is the same thing; don’t kid yourself).  This, as we all know, is where R is fantastically bad.

A very hasty test showed that relying on a single core, the original way I was doing things is a little over 2 times faster than this new implementation.  This doesn’t bode well for the possible speedup when scaling up to 4 cores, and since I have more pressing projects at the moment, I’m going to have to dump this one barring some illumination.

However, it is a workable prime tester.  And for testing against single primes, it’s honestly not that slow.  But, as pointed out above, if you’re using it to test primality of a (reasonably large) range of (reasonably large) values, then you’re going to have to loop and R is going to catch fire in the process.

The syntax is simple enough; simply conjure your integer of choice n and ask R

IsPrime(n)

which is like the old Maple syntax back when I used it extensively (which was ~ Maple 8 or 9, I believe).  Functions and sample output below:

IsPrime <- function(n){ # n=Integer you want to know if is/not prime
  if ((n-floor(n)) > 0){
    cat(sprintf("Error: function only accepts natural number inputs\n"))
  } else if (n < 1){
      cat(sprintf("Error: function only accepts natural number inputs\n"))
    } else
  # Prime list exists
  if (try(is.vector(primes), silent=TRUE) == TRUE){

    # Prime list is already big enough
    if (n %in% primes){
      TRUE
    } else
    if (n < tail(primes,1)){
      FALSE
    } else
    if (n <= (tail(primes,1))^2){
      flag <- 0
      for (prime in primes){
        if (n%%prime == 0){
          flag <- 1
          break
        }
      }
    
      if (flag == 0){
        TRUE
      }
      else {
        FALSE
      }
    }

    # Prime list is too small; get more primes
    else {
      last.known <- tail(primes,1)
      while ((last.known)^2 < n){
        assign("primes", c(primes,GetNextPrime(primes)), envir=.GlobalEnv)
        last.known <- tail(primes,1)
      }
      IsPrime(n)
    }
  } else {
    # Prime list does not exist
    assign("primes", PrimesBelow(n,below.sqrt=TRUE), envir=.GlobalEnv)
    IsPrime(n)
  }
}

# Get next prime
GetNextPrime <- function(primes){ # primes=Known prime list
  i <- tail(primes,1)
  while (TRUE){
    flag <- 0
    i <- i+2
    if (i%%6 == 3){
      flag <- 1
    }
    if (flag == 0){
      s <- sqrt(i)+1
      possible.primes <- primes[primes<s]
      for (prime in possible.primes){
        if ((i%%prime == 0)){
          flag <- 1
          break
        }
      }
      if (flag == 0){
        break
      }
    }
  }
  i
}

# Primes below specified integer n; optionally only those below sqrt(n)
PrimesBelow <- function(n, below.sqrt=FALSE){
  if (below.sqrt == TRUE){
    m <- ceiling(sqrt(n))
  } else {
    m <- n
  }
  
  primes <- c(2,3)
  i <- 3
  while (i < m-1){
    flag <- 0
    i <- i+2
    if (i%%6 == 3){
      flag <- 1
    }
    if (flag == 0){
      s <- sqrt(i)+1
      possible.primes <- primes[primes<s]
      for (prime in possible.primes){
        if ((i%%prime == 0)){
          flag <- 1
          break
        }
      }
      if (flag == 0){
        primes <- c(primes, i)
      }
    }
  }
  primes
}

Some sample output:

 > primes
Error: object 'primes' not found
> IsPrime(100)
[1] FALSE
> IsPrime(101)
[1] TRUE
> primes
[1]  2  3  5  7 11
>

After screwing around with the prime tester for a few minutes, I was able to find this adorable little gem

1> IsPrime(1234567891)
[1] TRUE

So 1234567891 is prime, but none of 1, 12, 123, 1234, 12345, 123456, 1234567, 12345678, 123456789, 12345678910, 123456789101, or 1234567891011 are (of course, nearly half of those are even, but the point stands).


Project Euler in R: Problem 7

Problem:  What is the 10001st prime number?

Commentary:  So it’s finally time to discuss my prime-searching algorithm, as promised back in Problem 5.  We will find primes in the following way:

  1. 2 and 3 are prime
  2. We examine the odd integers (having already discovered the only even prime) starting with i=5.
  3. First, check if i is of the form 6k+3 for some integer k>0.  If so, then i can’t be prime, because then i=6k+3=3(2k+1), and since k>0, 2k+1 > 1 (so i\neq 3).  In this case, don’t even bother proceeding; replace i with i+2 and start over.
  4. Calculate s(i)=\sqrt{i}.  Among all known primes below s(i)  (going no further for the same reason as that outlined in Problem 3), check if these primes divide into i.  If we find one that does, then there is no reason to proceed; i is not prime.  Go back to 3. and try again with i replced with i+2.  If none of the known primes below s(i) divide i, then stop; i must be prime.  Add it to the list of known primes.
  5. Repeat this process by replacing i with i+2 until we have 10001 primes.

It’s nothing fancy, but it works.  As for the implementation, believe it or not, I actually don’t construct a list of length 10001 and insert into it as I gather my primes like a good little R user.  See the post-code commentary for an explanation.

R Code:

ElapsedTime <- system.time({
##########################
primes <- c(2, 3)
i <- 3
while (length(primes) < 10001){
  flag <- 0
  i <- i+2
  if (i%%6 == 3){
    flag <- 1
  }
  if (flag == 0){
    s <- sqrt(i)+1
    for (prime in primes){
      if ((i%%prime == 0)){
        flag <- 1
        break
      }
      if (prime > s){
        break
      }
    }
    if (flag == 0){
      primes <- c(primes, i)
    }
  }
}

answer <- tail(primes, 1)
##########################
})[3]
ElapsedMins <- floor(ElapsedTime/60)
ElapsedSecs <- (ElapsedTime-ElapsedMins*60)
cat(sprintf("\nThe answer is:  %d\nTotal elapsed time:  %d minutes and %f seconds\n",
answer, ElapsedMins, ElapsedSecs))

Output:
The answer is:  104743
Total elapsed time:  0 minutes and 1.191000 seconds

Post-Code Commentary:  You may notice that I’m not pre-allocating the prime list and then storing new prime values as I go, like a good little R user.  The reason is that it’s actually slower to do it this way.  Now maybe I’m an idiot and I don’t really understand how R does things, but if I take this algorithm and only modify the prime storage part of it to include preallocating the list and then injecting into that list (replacing a 0) as I go, then the time is nearly a full order of magnitude slower.  Here’s the R code that you would produce if you were to do what I just described:

primes <- numeric(10001)
primes[1:2] <- c(2, 3)
i <- 3
prime.position <- 3

while (primes[10001] == 0){
  flag <- 0
  i <- i+2
  if (i%%6 == 3){
    flag <- 1
  }
  if (flag == 0){
    s <- sqrt(i)+1
    for (prime in primes[primes > 0]){
      if ((i%%prime == 0)){
        flag <- 1
        break
      }
      if (prime > s){
        break
      }
    }
    if (flag == 0){
      primes[prime.position] <- i
      prime.position<-prime.position+1
    }
  }
}

answer <- tail(primes, 1)

So how do the two run?  Well, on my work computer (which is a little slower than my home computer–also don’t tell my boss!), the original “bad R user” code runs in 1.852 seconds.  The “better” solution with pre-fetching the vector runs in 11.100 seconds.  That is a 9.248 second differential, which amounts to a nearly 500% increase in run time!

Of course, there are plenty of times when it’s objectively better to pre-allocate; but as it turns out, this isn’t one of them (though I have no idea why).


Project Euler in R: Problem 3

Problem: What is the largest prime factor of the number 600851475143 ?

Commentary: This problem is not very hard, and there is much less of interest to say about it than in problems 2 and 25. The only real hitch is knowing what a prime is and knowing a few things about factorization.

As it turns out, there is a surprising amount of research–even current day research, regarding factorizations of number-like things. Here, thankfully, the problem is easy enough. Since we’re working in the integers, we have nice things like the Fundamental Theorem of Arithmetic, which says that every natural number (1,2,3,…) greater than 1 can be written as a product of powers of primes, and that ignoring the order in which you present the factors of that factorization, that any such factorization is unique.

If you actually managed to read that sentence, then you’re probably thinking “Well of course! That’s just how things are!” But you should know better. One of the most fundamental questions you can ask in any field, but in particular mathematics, is to ask what a symbol means. What does “1” mean, or what does “\cdot” (for multiplication) mean? These are important questions that you’ve probably never had to deal with unless you’ve studied math. As it turns out, such questions aren’t irrelevant–at all.

So what is “1“? It’s a (actually THE) thing which when you take any other thing–I’ve forgotten the other thing’s name, so let’s call it x–take that other thing and do this

x\cdot 1

or this

1\cdot x

then you’re guaranteed to “get x back”.  Well, shit, now we need to know what “\cdot” is.  Our friend “\cdot” is a thing which gives you a thing when you use it on two allowed things.  Oh, but it can’t give you just any old thing.  That would be ridiculous.  We mathematicians are civilized folk, afterall, so there are rules here.  Let’s say we have three things–and I’ve forgotten their names, so let’s just call them x, y, and z.  Then the rules of the game are

  1. Whatever x\cdot y is, it had better be a thing
  2. x\cdot (y\cdot z), whatever that is, had better be the same thing as (x\cdot y)\cdot z
  3. 1 is a thing, and 1\cdot x=x and x\cdot 1=x

Ok, so what is a prime?  A prime p is a thing which isn’t 0 (the thing so that when you do x+0 or 0+x you get x–let’s not get into what + is right now, though…), it also isn’t a “unit”, meaning that there is no thing–let’s say x, so that x\cdot p=1 never happens and so that p\cdot x=1 never happens.  But wait, there’s more!  There’s one final catch, and it’s the biggie.  If you have two things, say x and y, and you form a (potentially) new thing called x\cdot y; well, if it turns out that p divides x\cdot y (meaning there is a thing z so that x\cdot y = p\cdot z), then p had better divide x or p had better divide y.

Wait, did I just say that 1 isn’t prime?  That’s right, it isn’t.  It never has been.  It never will be.  Nobody but crackpots thinks it is.  Deal with it, nerd.

Oh right, primes.  So we’ve talked about 1; what do you think 2 means?  Well, it usually means 1+1, but I don’t want to get into + right now, so let’s stick to the numbers we’re familiar with.

I pose a simple question to you.  Is 2 prime?  The answer is:  it depends (because the question is poorly worded!).  Remember what it means to be primes.  First and foremost, it isn’t 0–which, and you’re just going to have to trust me, we can check off the list–and it isn’t a unit.  So think about the collection of all integers.  The integers are the things that go 1,2,3,…, together with 0 and things like -1,-2,-3,… . As it turns out, this isn’t yet another thing your gradeschool teacher got wrong.  Our old friend 2 is prime in the integers.  But what about the collection of all rational numbers–the things which can be written as ratios of integers.  Is 2 prime in this collection?  Nope!  Well, not so long as you believe that \frac{1}{2} is a thing (in the collection of rational numbers).  Because if it is, then 2\cdot\frac{1}{2}=1, which means that 2 is a unit in the rationals.  Which means, by definition, it can’t be prime.

So now that no one is confused, let’s return to the original problem.

The R code here isn’t something I’d probably do over again, but this was the first solution (which I did slightly clean up post hoc, but the integrity of the dumbness was kept in tact).  The key here is realizing that any natural number n can have at most one prime divisor greater than \sqrt{n}.  Indeed, if it had two such, say p and q with p\neq q, then since p divides n and q divides n, by virtue of being prime (actually all we need is gcd(p,q)=1, which we certainly get from primality) we would have the product p\cdot q dividing n (note that this is the part of the argument that fails if the divisors aren’t prime; for example, 100 has several divisors greater than \sqrt{100}=10, namely 20, 25, and 50.  Of course, none of these is prime…but that was my whole point).  The easiest way to convince yourself that this is so is to think about the “general” prime factorization of n (see the Fundamental Theorem of Arithmetic).  So by definition of division in the integers, that means there exists some natural number k with k\geq 1 and n=p\cdot q\cdot k.  But then

n = p \cdot q \cdot k \\ > \sqrt{n} \cdot \sqrt{n} \cdot k \\ = n \cdot k \\ \geq n \cdot 1 \\ =n

In short, we have shown that under the assumption that n has two prime divisors each of which larger than \sqrt{n}, we are able to conclude that n>n, which is absurd.  So that original assumption must not have been a very good one; in other words, it’s not true.

The above will be useful in later problems.  Basically, all of the above is necessary to know if you’re in the game of prime finding.

So now that we’ve made the observation, all that remains is the code for the solution.  There’s lots of fussing around to do depending on a couple of things that could happen, but don’t lose sight of the forest for the trees.  The algorithm basically goes like this:  Let’s say that big old number is n.  Then we should

  • Check the odd integers from 3 up to \sqrt{n} to see if they divide n, stow these away in a list
  • Take the smallest entry from this list, say k, and replace n with \frac{n}{k}.  Make sure k^2 doesn’t divide n; also k^3, …
  • Get your new upper bound, \sqrt{n} (where n is really the original number divided by k)
  • Continue in this fashion until either n=1, in which case the smallest entry in the aforementioned list will be the largest prime dividing the original number, or until that smallest entry has broken the next upper bound, in which case your winner is n.

R Code:

ElapsedTime <- system.time({
##########################
n <- 600851475143

test <- 3

if (n%%2 == 0){
  n <- n/2
  test <- 2
}

if (n == 1){
  answer <- 2
} else {
  upper.bound <- sqrt(n)
  temp.vec <-
    (test:ceiling(upper.bound))[test:
    ceiling(upper.bound)%%2 == 1]

  while (test < upper.bound){
    temp.vec <- temp.vec[n%%temp.vec == 0]

    if (length(temp.vec) != 0){
      test <- head(temp.vec, 1)
      n <- n/test

      while (n%%test == 0){
        n <- n/test
      }
      upper.bound <- sqrt(n)
      temp.vec <-
        (test:ceiling(upper.bound))[test:
        ceiling(upper.bound)%%2 == 1]
    } else {
      test <- n
    }
  }
  if (n == 1){
    answer <- test
  } else {
    answer <- n
  }
}
##########################
})[3]
ElapsedMins <- floor(ElapsedTime/60)
ElapsedSecs <- (ElapsedTime-ElapsedMins*60)
cat(sprintf("\nThe answer is:  %d\nTotal elapsed time:  %d minutes and %f seconds\n", answer, ElapsedMins, ElapsedSecs))

Output:
The answer is:  6857
Total elapsed time:  0 minutes and 0.050000 seconds

Final Thoughts:  This solution isn’t exactly fantastic.  For one, it’s complicated to even look at.  The solution provided in the post-solution commentary for this problem is a bit more elegant, though it feels more C-ish than R-ish–and my solution is definitely unique to R and its adorable quirks.  Additionally, the arguably more elegant solution (that provided by Project Euler) doesn’t really do a whole lot better speed-wise.  Basically what I’m trying to say is that my solution will piss off programmers, but who says they don’t deserve it?