From: [email protected] (Ted Cashel)
Date: Wed Mar 31, 1999 11:10:17 PM US/Eastern
Subject: Fwd: Neat story
Reply-To: [email protected] (Ted Cashin)
Attachments: There is 1 attachment
(Embedded image moved to file: pic09930.jpg)
[email protected] didn’t work so I am trying you here
I got this from a recent Tidbits and it reminded me of your prime number program. This guy has a more complex program (albeit nearly as pointless) but has the background to have tried it on a proud line of mainframes and modern PC’s.
Power Macintosh G3: The Cannonball Express
——————————————
by Rick Holzgrafe
The Cannonball Express was the fabled train that was so fast it
took three men to say “Here she comes,” “Here she is,” and “There
she goes.” Computers are fast too, although unlike trains, most
aren’t self-propelled. What makes a computer fast, and how much
effect does software design have? How much faster are today’s
computers than yesterday’s? Recently I revisited some of these
questions, beginning with a trip down memory lane.
**Back in the Stone Age** — Twenty years ago, I was teaching
myself programming and had access to a DEC PDP-11/60 minicomputer
on evenings and weekends. This beast was bigger than a washing
machine, and during workdays I shared it with two dozen other
technicians and engineers. I found a word puzzle in a magazine and
thought it would be fun to program the PDP to solve it. The puzzle
was as follows.
Given a phrase and a sheet of graph paper, write the phrase on the
graph paper according to these rules:
1. Write one letter per square on the graph paper, like filling in
a crossword puzzle. Ignore case and anything that’s not one of the
26 letters of the alphabet. The phrase “N. 1st Street” is thus
identical to “NSTSTREET”.
2. Put the first letter of the phrase in any square you like.
3. After writing any letter, put the next letter of the phrase in
any adjacent square. Here, “adjacent” means any of the eight
neighboring squares, up, down, left, right, or diagonally. You may
reuse a square if it is adjacent and already holds the letter you
need; otherwise you must use a blank square. You can’t use the
same square twice in a row – no “hopping in place” for double
letters.
The goal is to write the phrase inside a rectangle of the smallest
possible area. (A subtle point: you are not trying to write in a
minimal number of squares.) To score your solution, draw the
smallest enclosing rectangle you can and take its area. The
rectangle may enclose some blank squares; count them, too.
Got it? Tongue-twisters are the most fun because they have lots of
opportunities to reuse whole snaky strings of squares. The 37
letters in “Peter Piper picked a peck of pickled peppers” can be
packed into a 3 by 5 rectangle, like this (view this in a
monospaced font):
OFIPT
KCPER
LEDAS
In those days I knew computers were “fast” but had no idea how
fast. The answer turned out to be “not very.” I wrote a program to
solve these puzzles and called it Piper after the tongue-twister.
I set Piper running on a medium-length phrase on a Friday evening,
and came back on Monday to find it still running. It had found
several less-than-best solutions but hadn’t finished. Way too slow
– I found a better solution myself on paper in about half an hour.
Why did it take so long? Piper was a “brute force” program. It
tried every possible solution to the problem, one after another.
The trouble is that there are too many possible solutions. Exactly
how many depends on the phrase, but for any non-trivial phrase the
number is astronomical. I realized for the first time that “fast”
sometimes isn’t “fast enough.” This point may be obvious today,
when we all use computers and are weary of waiting for them. But
in 1979, that PDP-11 was only the second computer I had ever seen!
**What Part of Fast Don’t You Understand?** I saw that I would
have to make Piper faster. There are two basic ways to speed up a
program. Plan A is to find a better way of solving the problem,
but after twenty years I still haven’t thought of a better
solution. That leaves plan B, the classic efficiency expert’s
solution: eliminate unnecessary steps. For example, Piper created
every possible solution, then calculated the area of each. It
built each solution one letter at a time, so instead of taking the
area only for completed solutions, I changed Piper to check the
area after placing each letter. If placing a letter made the
solution-in-progress take up more space than the smallest complete
solution found so far, Piper could skip the rest of that solution
(and all other solutions that started the same way) and move right
on to the next one. This eliminated a huge amount of work and
greatly improved Piper’s speed. Finding clever ways to track the
area of a growing solution helped too, because it was faster than
calculating the area from scratch after each letter. I also found
a way to calculate a minimum size for the final solution quickly:
I couldn’t guarantee that the best solution would be that small,
but I could guarantee that it wouldn’t be smaller. If Piper got
lucky and found a solution as small as that calculated minimum, it
could stop immediately. Otherwise it would continue on after
finding the best solution, vainly seeking a still better one.
Eventually Piper became clever enough to finish that original
phrase in a reasonably short period of time. But the holy grail
continued to elude me: I wanted a solution for “How much wood
would a woodchuck chuck if a woodchuck could chuck wood?” That PDP
(and, perhaps, my cleverness) were not up to the task. I had run
out of ideas for speeding up Piper, and runs still took longer
than a weekend. But if I couldn’t improve Piper, I could at least
hope to run it on a faster computer.
**Big Iron** — People tend to think of processor speed as the
speed of a computer, but many factors affect overall performance.
Virtual memory lets you work on bigger data sets or on more
problems at a time, but it’s slow, so adding more physical RAM
helps by reducing your reliance on virtual memory. Faster disks
and I/O buses load and save data more quickly. RAM disks and disk
caching replace slow disk operations with lightning-quick RAM
access. Instruction and data caches in special super-fast RAM
offer big improvements for some programs. Well-written operating
systems and toolboxes can outrun poorly written ones.
But in the end, little of this matters to Piper. Piper has always
used only a small amount of data, doesn’t read or write the disk
after it gets going, and does little I/O of any kind. With its
small code and data set Piper can take good advantage of data and
instruction caching, but what it mostly needs is “faster hamsters”
– a faster processor to make the wheels turn more quickly.
As the years rolled on, I ran versions of Piper on my first Macs,
but in the middle 1980’s I worked for Apple Computer, and had
access to a programmer’s dream: Apple’s $15 million Cray Y-MP
supercomputer, one of only two dozen in the world and arguably the
fastest computer in existence at the time. I figured the Cray
would make short work of Piper. But the Cray was not well suited
to the problem. It could barrel through parallel-processing
floating-point matrix calculations like the Cannonball Express,
but Piper was a highly linear, non-mathematical problem. Piper
used only one of the Cray’s four processors and didn’t do the kind
of operations at which the Cray excelled. Piper wasn’t a fair test
of the Cray’s power, but the Cray was still the fastest machine
I’d ever used. The Cray succeeded where all previous machines
(that PDP, my Mac Plus, my Mac II) had failed. It solved
“woodchuck” in less than a day, taking only about 20 hours to
finish its run. I was awestruck – 20 hours?! I’d no idea that
“woodchuck” was _that_ big a problem!
**Young Whippersnappers** — I set Piper aside for many years, but
recently I began to wonder how a modern desktop box compares to
those old minicomputers and mainframes. I rewrote Piper from
memory and ran it on my new 400 MHz ice-blue Power Macintosh G3
with “woodchuck.” The output is below. Piper first reprints the
phrase, then prints solutions and elapsed times as it finds them.
Each solution is the best found so far, culminating in the best of
all. The times are in seconds from the beginning of the run; the
final time is the total run time. (Unfortunately, the best
solution for “woodchuck” is larger than Piper’s calculated
minimum, so Piper continued to run for a bit after finding the
best solution.)
Here are the results. Some intermediate solutions have been left
out for brevity, but you can see Piper finding ever smaller
solutions. In the end, the 57 letters in “woodchuck” are packed
into a 4 by 4 rectangle. Have a look at Piper’s total run time,
and the time needed to find its best solution:
How much wood would a woodchuck chuck if a woodchuck could chuck wood?
0 seconds:
ULD ADLU
HDOAIUCOHWUHOD
UCOWFKHDOMCWOW
K WO
1 seconds:
DLIFADLU
UCKOHWUHOD
OHDWOMCWOW
2 seconds:
ULCHC
HWUHODKUK
OMCWOWAFI
7 seconds:
HWUHOW
OMCWOD
IKAUCW
FLDHK
9 seconds:
HWUH
OMCW
LUOK
HDOI
CWAF
65 seconds:
HWM
OUC
IKH
FWC
ADO
LUO
67 seconds:
HWMU
OOCH
UDWK
LAFI
Total run time: 107 seconds
There you have it: a shade over a minute to find the best
solution, under two minutes to finish its run. Two minutes! So
much for the big iron of the 1980’s. My new G3 Mac finished
“woodchuck” over 600 times faster (and 5,000 times cheaper) than
that 15 megabuck Cray. (For a more realistic comparison, see this
description of UCLA’s Project Appleseed.)
www.apple.com/education/hed/aua0101/appleseed/
If you want to check Piper’s speed on your Mac, I’ve placed the
code in the public domain; it’s a 40K package.
ftp://ftp.tidbits.com/pub/tidbits/misc/piper.hqx
**The Future** — What’s yet to come? 400 MHz already looks a
little pokey. It’s the best Apple offers today, but I’ve seen
claims of 550 MHz or so from third party accelerators and over-
clocking tricks. People are predicting 1 GHz (1,000 MHz) chips for
the near future. Buses are getting faster, and caches hold more
data in less space and are moving onto the processor chip for
still more speed. (Small is fast. Did you know that the speed of
light is a serious limiting factor in modern computer design? The
closer together the components are, the faster they can signal
each other.)
And like the old Cray, multi-processor desktop systems are
starting to appear. They gang up on a problem by having separate
processors work on different parts of the problem simultaneously.
Although I didn’t try to use the Cray’s extra processors, I’ve
done a little thinking lately. Piper doesn’t have to be completely
linear. On an eight-processor system, I bet I could come close to
making Piper run in one-eighth of the time of a single processor.
Are you ready?
Here she comes –
Here she is –
There she GOES!