Project Euler in R: Problem 7

Problem:  What is the 10001st prime number?

Commentary:  So it’s finally time to discuss my prime-searching algorithm, as promised back in Problem 5.  We will find primes in the following way:

1. 2 and 3 are prime
2. We examine the odd integers (having already discovered the only even prime) starting with $i=5$.
3. First, check if $i$ is of the form $6k+3$ for some integer $k>0$.  If so, then $i$ can’t be prime, because then $i=6k+3=3(2k+1)$, and since $k>0$, $2k+1 > 1$ (so $i\neq 3$).  In this case, don’t even bother proceeding; replace $i$ with $i+2$ and start over.
4. Calculate $s(i)=\sqrt{i}$.  Among all known primes below $s(i)$  (going no further for the same reason as that outlined in Problem 3), check if these primes divide into $i$.  If we find one that does, then there is no reason to proceed; $i$ is not prime.  Go back to 3. and try again with $i$ replced with $i+2$.  If none of the known primes below $s(i)$ divide $i$, then stop; $i$ must be prime.  Add it to the list of known primes.
5. Repeat this process by replacing $i$ with $i+2$ until we have 10001 primes.

It’s nothing fancy, but it works.  As for the implementation, believe it or not, I actually don’t construct a list of length 10001 and insert into it as I gather my primes like a good little R user.  See the post-code commentary for an explanation.

R Code:

ElapsedTime <- system.time({
##########################
primes <- c(2, 3)
i <- 3
while (length(primes) < 10001){
flag <- 0
i <- i+2
if (i%%6 == 3){
flag <- 1
}
if (flag == 0){
s <- sqrt(i)+1
for (prime in primes){
if ((i%%prime == 0)){
flag <- 1
break
}
if (prime > s){
break
}
}
if (flag == 0){
primes <- c(primes, i)
}
}
}

##########################
})[3]
ElapsedMins <- floor(ElapsedTime/60)
ElapsedSecs <- (ElapsedTime-ElapsedMins*60)
cat(sprintf("\nThe answer is:  %d\nTotal elapsed time:  %d minutes and %f seconds\n",


Output:
Total elapsed time:  0 minutes and 1.191000 seconds

Post-Code Commentary:  You may notice that I’m not pre-allocating the prime list and then storing new prime values as I go, like a good little R user.  The reason is that it’s actually slower to do it this way.  Now maybe I’m an idiot and I don’t really understand how R does things, but if I take this algorithm and only modify the prime storage part of it to include preallocating the list and then injecting into that list (replacing a 0) as I go, then the time is nearly a full order of magnitude slower.  Here’s the R code that you would produce if you were to do what I just described:

primes <- numeric(10001)
primes[1:2] <- c(2, 3)
i <- 3
prime.position <- 3

while (primes[10001] == 0){
flag <- 0
i <- i+2
if (i%%6 == 3){
flag <- 1
}
if (flag == 0){
s <- sqrt(i)+1
for (prime in primes[primes > 0]){
if ((i%%prime == 0)){
flag <- 1
break
}
if (prime > s){
break
}
}
if (flag == 0){
primes[prime.position] <- i
prime.position<-prime.position+1
}
}
}



So how do the two run?  Well, on my work computer (which is a little slower than my home computer–also don’t tell my boss!), the original “bad R user” code runs in 1.852 seconds.  The “better” solution with pre-fetching the vector runs in 11.100 seconds.  That is a 9.248 second differential, which amounts to a nearly 500% increase in run time!

Of course, there are plenty of times when it’s objectively better to pre-allocate; but as it turns out, this isn’t one of them (though I have no idea why).

Project Euler in R: Problem 6

Problem:  Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

Commentary:  I taught a variety of freshman math courses at my university for roughly 5 years.  As such, this problem is very near and dear to my heart.

I can’t tell you how many kids show up at university thinking they’re hot shits because they took Calc 1 in high school, but don’t know that

$a^2+b^2\neq (a+b)^2$

This is just one of many adorable things freshmen tend to belive, along with all their ridiculous political beliefs and thinking they’ll somehow change the world in any appreciable way.  Adorable.

Actually, students thinking that the sum of the squares is equal to the square of the sum is so common, that it has its own name.  We call it the “freshman dream”.  I’ve also heard it called “Baby’s Binomial Theorem”, which is way funnier, but much less commonly heard.  And since I guess I have to mention it, there are number-like structures out there where the freshman dream is true (fields of characteristic 2 for my algebros), but the real numbers certainly aren’t one of them.

What’s especially funny about this misunderstanding is that it’s literally almost never true (for real numbers).  Pick your favorite two numbers–they don’t even have to be integers.  They can even be the same number.  Unless one (or both) of them is zero, then tada, the sum of the squares isn’t equal to the square of the sum.

So what other crazy (mathematical) things do university students tend to come up with in freshman math class?  Just on the order of basic arithmetic, it’s almost shocking how little mastery they have over the subject.  And we’re not talking about “complicated” things like the limit, integration, or any other worthless calculus crap.  This is arithmetic.  Our university students don’t understand arithmetic.

Just off the top of my head, these are some of the most common misunderstandings, not counting that mentioned above.  Keep in mind, everything that follows is not true, and what’s more, they’re not true for practically any numbers you throw at them (try it yourself!)

$\sqrt{a^2+b^2}=a+b$

$-(a+b)=-a+b$

$\frac{a}{c}+\frac{b}{d} = \frac{a+b}{c+d}$

$\frac{a+b}{a}=b$

$\frac{a}{a+b}=\frac{1}{b}$

And that’s not even getting into logarithms or trigonometry, which none of them understand.

This has a lot to do with why I gave up teaching for the cushy office life.  I haven’t looked back since.

R Code:

ElapsedTime<-system.time({
##########################
##########################
})[3]
ElapsedMins<-floor(ElapsedTime/60)
ElapsedSecs<-(ElapsedTime-ElapsedMins*60)
cat(sprintf("\nThe answer is:  %d\nTotal elapsed time:  %d minutes and %f seconds\n", answer, ElapsedMins, ElapsedSecs))


Output:
Total elapsed time:  0 minutes and 0.000000 seconds

A Math Degree in the Kitchen

So today I was making some rice, and I decided based on calorie load that I wanted more than 1/3 cup of rice, but less than 1/2.  I decided that 5/12 cup was acceptable, but my only tools were a 1/4 cup scoop and a 1/3 cup scoop.

I didn’t have to think very hard about this; I instantly knew it was possible to get 5/12 cup of rice with these tools, because 4 and 3 are relatively prime–i.e., they share no common factors.  More on this in a minute, but first, how to do it.

It’s a very uncomplicated idea, and it should look very similar to the Diehard 3 water puzzle.  In the film, John McClain has a 3 liter jug and a 5 liter jug, and wants to get exactly 4 liters of water.  My rice puzzle is basically the same game.    To get my 5/12 cup of rice, I fill the 1/3 cup scoop and pour that rice into my rice cooker.  Then I again fill the 1/3 scoop and pour that (without spilling) into the 1/4 scoop.  How much remains in the 1/3 scoop?  Well, I had 4/12 cup of rice in it, and I took out 3/12 cup of rice.  So only 1 remains in the 1/3 cup scoop.  Add the 1/12 to the rice cooker, and since 1/3+1/12=5/12, we’re done.  And although that’s how I did it, I actually didn’t even need to get the rice cooker involved.  I can form 5/12 cups inside the scoops without having to use another vessel (though the thing holding all my rice is allowed).  I could have filled the 1/3 cup scoop, poured that into the 1/4 cup scoop, emptied the 1/4 cup scoop, poured the 1/12 cup in the 1/3 cup scoop into the 1/4 cup scoop, then filled the 1/3 cup scoop.  Tada!

I warned you that it was very uncomplicated; the above is really not very interesting on any level.  What is interesting is why this works, and how I knew so quickly that it must be possible to do it.  If you don’t know why, then all that stuff I just wrote might seem like a cute curiosity, when really there is a very basic underlying principle involved.  Instead of needing to ask “I wonder if it’s possible to get 1/6 cups of rice with those tools”, I can instantly say that the answer is “yes”.  I didn’t have to think about it at all.  Here’s how it works.

Instead of thinking of the scoops as 1/4 and 1/3 cup, I’ll do you a favor and eliminate the fractions for you.  Let’s think of them as 3 and 4 serving scoops (out of 12), respectively.  Believe it or not, we’re almost done.  As it turns out, if the greatest common divisor (written $gcd$) of these two values (in this case 3 and 4 from 1/4 cup and 1/3 cup scoops, respectively) is 1, then it’s possible to get any of 1/12, 2/12, 3/12, 4/12, 5/12, 6/12, or 7/12 cups of rice (maxing out at 3+4=7, because that’s the maximum amount of stuff our two scoops can hold).  Said another way, I can get 1, 2, 3, 4, 5, 6, or 7 servings (out of 12) in this fashion.  Why?  Because if $gcd(3,4)=1$ (and thank goodness that’s the case), then there exist integers $x$ and $y$ satisfying

$3x+4y=1$

But that isn’t what I wanted!  I wanted something like $3a+4b=5$.  But this is easy.  Multiply the above equation on both sides by 5 and we get

$5(3x+4y)=3(5x)+4(5y)=5$

So just call $5x=a$ and $5y=b$.  So it must be possible.

And people say that pure math has no practical application.

Project Euler in R: Problem 5

Problem:  What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?

Commentary:  This problem again isn’t very difficult.  Basically we just exploit the existence of prime factorizations and construct the least common multiple (lcm) in the most intuitive way possible.   You might find something a bit strange in this solution, namely in the prime-y stuff; however, I don’t really want to get into that until Problem 7.  So I won’t.

R Code:

ElapsedTime <- system.time({
##########################
###
# Function to get primes below specified n; option to get only those below sqrt(n)
PrimesBelow <- function(n,below.sqrt=FALSE){
if (below.sqrt==TRUE){
m <- ceiling(sqrt(n))
} else {
m <- n
}

primes <- c(2, 3)
i <- 3
while (i < m-1){
flag <- 0
i <- i+2
if (i%%6 == 3){
flag <- 1
}
if (flag == 0){
s <- sqrt(i)+1
possibleprimes <- primes[primes < s]
for (prime in possibleprimes){
if ((i%%prime == 0)){
flag <- 1
break
}
}
if (flag == 0){
primes <- c(primes,i)
}
}
}
primes
}

# Function to get the prime factorization of all integers below
PrimeFact <- function(n){
list <- numeric(0)
for (i in 2:n){
ps <- numeric(0)
for (prime in primes){
p <- 1
while (i%%(prime^p) == 0){
p <- p+1
}
p <- p-1
ps <- c(ps, p)
}
if (length(list) == 0){
list <- data.frame(primes, ps)
} else {
list <- data.frame(list, data.frame(ps))
}
}
list
}
###

n <- 20

primes <- PrimesBelow(n)

list <- PrimeFact(n)

expnts <- numeric(0)
for (nump in 1:length(primes)){
temp <- numeric(0)
for (i in 2:n){
temp <- c(temp, list[[i]][nump])
}
expnts <- c(expnts, sort(temp, decreasing=TRUE)[1])
}

##########################
})[3]
ElapsedMins <- floor(ElapsedTime/60)
ElapsedSecs <- (ElapsedTime-ElapsedMins*60)
cat(sprintf("\nThe answer is:  %d\nTotal elapsed time:  %d minutes and %f seconds\n",



Output:
Total elapsed time:  0 minutes and 0.008600 seconds

Project Euler in R: Problem 4

Problem 4: Find the largest palindrome made from the product of two 3-digit numbers.

Commentary:  Believe it or not, I don’t have much to say about this problem.  There are cuter, more general ways to attack this problem, but computer scientists suffer from an acute case of overgeneralizationitis.  I had a math professor once who loved to say “never do a calculation before its time”.  This is good advice, even in programming, but I think there’s a useful addendum to this.  Never generalize before it’s necessary.  Encapsulating an idea into a function is a stupid waste of time unless you’re pretty sure there’s a good reason to do it. But in my experience, most programmers don’t bother thinking much about anything.  They literally behave like computers, as though some algorithm needs to be fed to them and that is then all they are capable of doing from that point forward.

My biggest problem with programmers is they have this mindset that there’s only one possible solution.  I hesitate to say this of computer scientists in general, but it’s certainly true of people who call themselves programmers:  these people lack imagination.  They talk about “right” and “wrong” solutions, when what they really mean is “good” and “bad”, or more often, “elegant” and “inelegant”, a completely subjective, ill-defined notion.

I’m not advocating that we make every proof (or program) into a brutal slog of calculations and appeals to first principles.  But elegance without insight can be very costly.

And that is why programmers are terrible at mathematics.

R Code:

ElapsedTime <- system.time({
##########################
RearrangeVector <- function(vec, n=1){
vec
}

palindromes <- numeric(9*10*10)
i <- 1
for (a1 in 1:9){
for (a2 in 0:9){
for (a3 in 0:9){
palindromes[i] <- a1*10^5 + a2*10^4 + a3*10^3
+ a3*10^2 + a2*10^1 + a1
i <- i+1
}
}
}

palindromes <- sort(palindromes, decreasing=TRUE)

prod <- numeric(0)
temp <- 899:999
for (n in 1:100){
prod <- c(prod, temp*RearrangeVector(temp, n))
}

acceptable <- palindromes[palindromes %in% prod]

##########################
})[3]
ElapsedMins <- floor(ElapsedTime/60)
ElapsedSecs <- (ElapsedTime-ElapsedMins*60)
cat(sprintf("\nThe answer is:  %d\nTotal elapsed time:  %d minutes and %f seconds\n",


Output:
Total elapsed time:  0 minutes and 0.011000 seconds

Project Euler in R: Problem 3

Problem: What is the largest prime factor of the number 600851475143 ?

Commentary: This problem is not very hard, and there is much less of interest to say about it than in problems 2 and 25. The only real hitch is knowing what a prime is and knowing a few things about factorization.

As it turns out, there is a surprising amount of research–even current day research, regarding factorizations of number-like things. Here, thankfully, the problem is easy enough. Since we’re working in the integers, we have nice things like the Fundamental Theorem of Arithmetic, which says that every natural number (1,2,3,…) greater than 1 can be written as a product of powers of primes, and that ignoring the order in which you present the factors of that factorization, that any such factorization is unique.

If you actually managed to read that sentence, then you’re probably thinking “Well of course! That’s just how things are!” But you should know better. One of the most fundamental questions you can ask in any field, but in particular mathematics, is to ask what a symbol means. What does “$1$” mean, or what does “$\cdot$” (for multiplication) mean? These are important questions that you’ve probably never had to deal with unless you’ve studied math. As it turns out, such questions aren’t irrelevant–at all.

So what is “$1$“? It’s a (actually THE) thing which when you take any other thing–I’ve forgotten the other thing’s name, so let’s call it $x$–take that other thing and do this

$x\cdot 1$

or this

$1\cdot x$

then you’re guaranteed to “get $x$ back”.  Well, shit, now we need to know what “$\cdot$” is.  Our friend “$\cdot$” is a thing which gives you a thing when you use it on two allowed things.  Oh, but it can’t give you just any old thing.  That would be ridiculous.  We mathematicians are civilized folk, afterall, so there are rules here.  Let’s say we have three things–and I’ve forgotten their names, so let’s just call them $x$, $y$, and $z$.  Then the rules of the game are

1. Whatever $x\cdot y$ is, it had better be a thing
2. $x\cdot (y\cdot z)$, whatever that is, had better be the same thing as $(x\cdot y)\cdot z$
3. $1$ is a thing, and $1\cdot x=x$ and $x\cdot 1=x$

Ok, so what is a prime?  A prime $p$ is a thing which isn’t $0$ (the thing so that when you do $x+0$ or $0+x$ you get $x$–let’s not get into what $+$ is right now, though…), it also isn’t a “unit”, meaning that there is no thing–let’s say $x$, so that $x\cdot p=1$ never happens and so that $p\cdot x=1$ never happens.  But wait, there’s more!  There’s one final catch, and it’s the biggie.  If you have two things, say $x$ and $y$, and you form a (potentially) new thing called $x\cdot y$; well, if it turns out that $p$ divides $x\cdot y$ (meaning there is a thing $z$ so that $x\cdot y = p\cdot z$), then $p$ had better divide $x$ or $p$ had better divide $y$.

Wait, did I just say that $1$ isn’t prime?  That’s right, it isn’t.  It never has been.  It never will be.  Nobody but crackpots thinks it is.  Deal with it, nerd.

Oh right, primes.  So we’ve talked about $1$; what do you think $2$ means?  Well, it usually means $1+1$, but I don’t want to get into $+$ right now, so let’s stick to the numbers we’re familiar with.

I pose a simple question to you.  Is $2$ prime?  The answer is:  it depends (because the question is poorly worded!).  Remember what it means to be primes.  First and foremost, it isn’t $0$–which, and you’re just going to have to trust me, we can check off the list–and it isn’t a unit.  So think about the collection of all integers.  The integers are the things that go 1,2,3,…, together with 0 and things like -1,-2,-3,… . As it turns out, this isn’t yet another thing your gradeschool teacher got wrong.  Our old friend $2$ is prime in the integers.  But what about the collection of all rational numbers–the things which can be written as ratios of integers.  Is $2$ prime in this collection?  Nope!  Well, not so long as you believe that $\frac{1}{2}$ is a thing (in the collection of rational numbers).  Because if it is, then $2\cdot\frac{1}{2}=1$, which means that $2$ is a unit in the rationals.  Which means, by definition, it can’t be prime.

So now that no one is confused, let’s return to the original problem.

The R code here isn’t something I’d probably do over again, but this was the first solution (which I did slightly clean up post hoc, but the integrity of the dumbness was kept in tact).  The key here is realizing that any natural number $n$ can have at most one prime divisor greater than $\sqrt{n}$.  Indeed, if it had two such, say $p$ and $q$ with $p\neq q$, then since $p$ divides $n$ and $q$ divides $n$, by virtue of being prime (actually all we need is $gcd(p,q)=1$, which we certainly get from primality) we would have the product $p\cdot q$ dividing $n$ (note that this is the part of the argument that fails if the divisors aren’t prime; for example, 100 has several divisors greater than $\sqrt{100}=10$, namely 20, 25, and 50.  Of course, none of these is prime…but that was my whole point).  The easiest way to convince yourself that this is so is to think about the “general” prime factorization of $n$ (see the Fundamental Theorem of Arithmetic).  So by definition of division in the integers, that means there exists some natural number $k$ with $k\geq 1$ and $n=p\cdot q\cdot k$.  But then

$n = p \cdot q \cdot k \\ > \sqrt{n} \cdot \sqrt{n} \cdot k \\ = n \cdot k \\ \geq n \cdot 1 \\ =n$

In short, we have shown that under the assumption that $n$ has two prime divisors each of which larger than $\sqrt{n}$, we are able to conclude that $n>n$, which is absurd.  So that original assumption must not have been a very good one; in other words, it’s not true.

The above will be useful in later problems.  Basically, all of the above is necessary to know if you’re in the game of prime finding.

So now that we’ve made the observation, all that remains is the code for the solution.  There’s lots of fussing around to do depending on a couple of things that could happen, but don’t lose sight of the forest for the trees.  The algorithm basically goes like this:  Let’s say that big old number is $n$.  Then we should

• Check the odd integers from 3 up to $\sqrt{n}$ to see if they divide n, stow these away in a list
• Take the smallest entry from this list, say $k$, and replace $n$ with $\frac{n}{k}$.  Make sure $k^2$ doesn’t divide $n$; also $k^3$, …
• Get your new upper bound, $\sqrt{n}$ (where $n$ is really the original number divided by $k$)
• Continue in this fashion until either $n=1$, in which case the smallest entry in the aforementioned list will be the largest prime dividing the original number, or until that smallest entry has broken the next upper bound, in which case your winner is $n$.

R Code:

ElapsedTime <- system.time({
##########################
n <- 600851475143

test <- 3

if (n%%2 == 0){
n <- n/2
test <- 2
}

if (n == 1){
} else {
upper.bound <- sqrt(n)
temp.vec <-
(test:ceiling(upper.bound))[test:
ceiling(upper.bound)%%2 == 1]

while (test < upper.bound){
temp.vec <- temp.vec[n%%temp.vec == 0]

if (length(temp.vec) != 0){
n <- n/test

while (n%%test == 0){
n <- n/test
}
upper.bound <- sqrt(n)
temp.vec <-
(test:ceiling(upper.bound))[test:
ceiling(upper.bound)%%2 == 1]
} else {
test <- n
}
}
if (n == 1){
} else {
}
}
##########################
})[3]
ElapsedMins <- floor(ElapsedTime/60)
ElapsedSecs <- (ElapsedTime-ElapsedMins*60)
cat(sprintf("\nThe answer is:  %d\nTotal elapsed time:  %d minutes and %f seconds\n", answer, ElapsedMins, ElapsedSecs))


Output:
Total elapsed time:  0 minutes and 0.050000 seconds

Final Thoughts:  This solution isn’t exactly fantastic.  For one, it’s complicated to even look at.  The solution provided in the post-solution commentary for this problem is a bit more elegant, though it feels more C-ish than R-ish–and my solution is definitely unique to R and its adorable quirks.  Additionally, the arguably more elegant solution (that provided by Project Euler) doesn’t really do a whole lot better speed-wise.  Basically what I’m trying to say is that my solution will piss off programmers, but who says they don’t deserve it?

Project Euler in R: Problem 25

Problem 25:  What is the first term in the Fibonacci sequence to contain 1000 digits?

Commentary:  Well, I certainly hope you paid attention to all the boring crap in the solution to Problem 2, because you’re going to need it to understand my solution for Problem 25.

First, I’ll describe my first attempted solution, which was a “computery” approach.  My original plan was to use the closed form (discussed in the link above) to the Fibonacci Numbers, passing off all the “heavy lifting” to GNU bc.  This “works”, but is UNBELIEVABLY slow (I think I let it run for about 15 minutes before I decided to try to math my way out of things).  This problem is a good reminder (though we’ll definitely see others) of the inarguable fact that mathematicians are still kings of earth–or at least that the Project Euler people like to occasionally throw a bone to math-types instead of letting the computer science nerds have all the fun.

Ok, so, remember that weird “continued fraction” expansion of $\phi$ that I gave in the solution to Problem 2?  If you stare at it for a few seconds, then it should be obvious that

$\phi = 1+\displaystyle\frac{1}{\phi}$

and hence

$1-\phi = \displaystyle\frac{1}{\phi}$

While we’re here, it’s worth pointing out that if you define $\phi$ in terms of that continued fraction expansion, then you can deduce from that definition the numeric value of $\phi$ , since from the first equation above, if we multiply both sides by $\phi$, then rearrange things and get

$\phi^2-\phi-1=0$

Using the good old quadratic formula gives two solutions (or in other words, two possibilities for $\phi$), namely $\displaystyle\frac{1\pm\sqrt{5}}{2}$.  One of these two possibilities is positive and the other negative.  From there it isn’t hard to see which is which.

Ok, so back to Project Euler.  With the above observation regarding $1-\phi$, we can rewrite the closed form expression of the Fibonacci Numbers (using the Project Euler convention that $F_1=F_2=1$) as

$F_n = \displaystyle\frac{\phi^n-\left(\frac{1}{\phi}\right)^n}{\sqrt{5}}$

Then for sufficiently large $n$ (and thinking about the problem at hand for 3 seconds should convince you that the $n$ we’re after should do the trick), we have

$F_n\approx\displaystyle\frac{\phi^n}{\sqrt{5}}$

since of course $\displaystyle\left(\frac{1}{\phi}\right)^n$ will be quite small.  It follows that we need only find the smallest natural number $n$ satisfying

$F_n = \displaystyle\frac{\phi^n}{\sqrt{5}}>10^{999}$

And since $\log$ is an increasing function, this is equivalent to needing to find the smallest n satisfying

$\log_{10}(F_n)>999$

The problem is now trivial.  Note that from the above observation regarding $\displaystyle\left(\frac{1}{\phi}\right)^n$ and from basic properties of logarithms:

$\log_{10}(F_n) = n\log_{10}(\phi)-\frac{1}{2}\log_{10}(5)$

Whence, we wish to find the smallest natural number $n$ satisfying

$n>\displaystyle\frac{999+\frac{1}{2}\log_{10}(5)}{\log_{10}\left(\phi\right)}\approx 4781.859$

It follows that $n=4782$.

Who says a masters in math is useless?

R Code:

ElapsedTime <- system.time({
##########################
phi<-(1+sqrt(5))/2