303 Home

Listed below are some sample programming problems that you might like to work on for your Perl project. If you'd rather tackle a different problem, please check with me to insure that your idea is workable within the scope of the assignment.

 

Rot 13

Here’s a simple encryption algorithm that you can use to encode secret messages: shift every letter in the message 13 characters towards the end of the alphabet. (“a” becomes “n”; “b” becomes “o”; “x” loops around and becomes “k”; etc.) The beauty of this algorithm is that since the alphabet has 26 letters, you can also decrypt a message by repeating the process and shifting every letter in the message 13 characters towards the end of the alphabet. It’s often called “ROT 13” because the algorithm “rotates” each letter 13 places. Write a ROT 13 program that will accept a line of text, print the encrypted message, then decrypt the message and print the result. [Test: does your program work for uppercase and lowercase letters?]

n!

Mathematicians define n! (pronounced “n factorial”) as the product of all the numbers from 1 to n. (that is, you multiply all the numbers, so 5! = 1 * 2 * 3 * 4 * 5). Write a program that will calculate n! for a value of n that the user inputs. (Don’t let the user input negative numbers.) Can your program calculate correctly that 0! = 1…? (If you’re looking for a more challenging algorithm, try a recursive subroutine.)

Fibonacci Sequence

Leonardo Pisano, better known by his nickname Fibonacci, was an Italian mathematician who lived from 1170-1250. He is famous for investigating a sequence of numbers that begins as follows:

0, 1, 1, 2, 3, 5, 8, 13 . . .

Each number in the sequence is the sum of the previous two numbers. Write a program that accepts a number, n, from the user and finds the nth Fibonacci number. Entering “4” for example, should return the output “2”; entering “8” should return “13”. Be sure to check the user’s input for validity. (Again, for a more challenging solution, try a recursive subroutine.)

Charting Random Numbers

Write a program to select 100 random numbers between 1 and 10. Count how many occurrences of each number there are, and print out a distribution chart similar to the following:

          1 = 15  |  xxxxxxxxxxxxxxx
          2 = 7   |  xxxxxxx
          3 = 10  |  xxxxxxxxxx
          4 = 11  |  xxxxxxxxxxx
          5 = 11  |  xxxxxxxxxxx
          6 = 14  |  xxxxxxxxxxxxxx
          7 = 6   |  xxxxxx
          8 = 10  |  xxxxxxxxxx
          9 = 8   |  xxxxxxxx
         10 = 8   |  xxxxxxxx

[Two hints: 1) array indexes begin at which number? 2) And “printf” may help, although you don’t necessarily need it.] For a more challenging problem, have the user input how many random numbers to select. If the user inputs 10000, how will your program handle all the X’s?

Guess a Random Number

Write a program that chooses a random number (between 1 and 50, say, or 1 and 100). Keep asking the user to guess the number until they either get it right or quit playing. (You could check the input for “q” or “e” or “x” or a different code of your own devising. On various systems, ctrl-c or ctrl-d or ctrl-z will also break out of an input loop but you don’t have to check for those.) At each guess, tell the user whether their guess is too high or too low. Count how many guesses it takes for the user to get the right answer. Can you devise a playing strategy that will always minimize your number of guesses? [If you can and you’d like another challenge, take a look at the “Devise an Algorithm” problem below.]

Estimating Pi

Although it might sound surprising, it's possible to use random numbers to learn about non-random things. Computer scientists call strategies like these "Monte Carlo" algorithms because they simulate using a roulette wheel to generate random numbers, and Monte Carlo is a famous gambling city in Monaco. Anyway, here's an example of a Monte Carlo algorithm — we can use random numbers to estimate the value of pi. Let's imagine that we have a circular, green dartboard attached to a gray square as in this diagram:

Now if we throw darts at the dartboard randomly, we'd expect some darts to fall within the green circle and others to fall on the gray square. (Since we're good dart players, none will hit the white wall.) If we count up the darts, the ratio of 'green' darts to 'gray' darts will be very close to the ratio of the area of the circle to the area of the square.

In algebra, we learned that the area of the square is the length of a side times the length of a side (as mathematicians say, "length squared" or, using superscripts, length2). The area of a circle is calculated by multiplying pi times the radius of the circle times the radius of the circle (pi * r2). Let's imagine that the radius of the circle is 1 unit; the diameter of the circle will then be 2 units — and that's precisely the same as the length of a side of our square. So the area of the square will be 22 = 4. The area of the circle will be pi * 12 = pi.

And so the ratio of circle area to square area will be pi to 4. That is to say, the probability of hitting the circle with a dart is pi / 4. Put yet another way, if you calculate the ratio of green darts to the number of darts thrown and multiply the answer by 4, you'll get an estimate of the value of pi.

Write a program to throw as many darts as the user wants, and print out the resulting estimate of pi. (Hint: you'll need to put a coordinate axis on top of your dartboard in order to calculate whether a dart falls inside or outside the circle. What problems does this cause? How did you solve them? Another hint: the equation for a circle is x2 + y2 = r2). How many darts does the computer need to throw in order to approximate pi reasonably accurately?

The Monte Carlo Gas Station

It's also possible to use these so-called 'Monte Carlo' techniques in order to create computer simulations. For example, let's say that a gas station wants to figure out what the average waiting time for each customer is. They give you some basic data:

Write a program to simulate the Monte Carlo Gas Station — select a random number for each simulated minute that the gas station is open. If the random number is less than 0.1, say that a customer has arrived (0.1 equals 10% probability). Each time your simulated clock advances one minute, decide whether a new customer has arrived.

When a customer arrives, put them into the 8-minute countdown mode. Line up new customers as they arrive, and let them pump gas when it's their turn, but be sure to calculate the length of time that they wait (that's the whole point of this problem, after all!). At the end of the day, print out a report that informs the manager how many customers arrived, and what their average waiting time was.

If you're curious, you might want to keep track of other interesting "trivia" (How long was the longest customer wait? What's the record for the longest lineup of customers? If the station adds a second pump, does that mean that the waiting time will be cut in half?). Run the simulation several times to see if the numbers vary — and they probably will because the arrival of a customer is based entirely on chance. (Scienticians use the word 'stochastic' to refer to processes that have a random component or are probabilistic like this one. When they write a Monte Carlo routine, they don't settle for just one run of the program; they'll model a stochastic process again and again. And again.)

Prime Numbers

A prime number is a number that is divisible only by itself and 1. Examples of prime numbers are 3, 5, 11, and even 999,983. On the other hand, 6 and 9 and 14 are not prime because 2 * 3 = 6 and 3 * 3 = 9 and 2 * 7 = 14. Write a program to accept a number from the user and determine whether that number is prime. Try modifying your program to calculate the nearest prime number (whether higher or lower) to the one the user inputs (if it is not already prime).

Roman Numerals

Write a program that will accept a number as input and then will translate it into Roman numerals. Remember that:

1 = I
5 = V
10 = X
50 = L
100 = C
500 = D
1000 = M

And remember that, despite what you often see on clocks, 4 = IV (not IIII). In other words, except for very large numbers (like 4000 = MMMM), Roman numerals do not have four of the same "digit" in row. Does your program accurately translate numbers like 4 (IV) and 9 (IX) and 40 (XL) and 90 (XC)? Can you add parts to your program so that it will translate Roman numerals back into Arabic? Can you make your program intelligent enough so that it will decide for itself whether its current job is to translate into Roman or into Arabic? What was the most challenging aspect about writing this algorithm? [Trivia: remember that the Arabic invention, zero, wasn't introduced into Europe until about the 12th century. Hence, the Romans did not have an equivalent for the number zero.]

Search for a String

How many times in Moby-Dick does Herman Melville use the phrase "...it was not unlike..."? Well, that case might not be particularly interesting, but try this: write a program to accept a word or a phrase from the user and then search a text file for all occurrences of that phrase. Print out a message that says "There were $num matches" and then print only the lines that contain that string, putting some sort of special delimiter characters around the match. (Remind me to give you the really good hint that's on page 36 of the Perl Pocket Reference that provides some invaluable help on this one.) Can you print the line numbers from the original file? [Other hints: can you put a variable into a regex? How do you read the contents of a file? If you need a text to search through, you might want to download a free text file from the Gutenberg Project: http://promo.net/pg/.]

Sentence Length

One of the challenges to becoming a better writer is learning to vary your sentence lengths. First-year university writers often have an average sentence length of about 10 words, whereas more advanced writers often weigh in somewhere between 20 and 30 words per sentence. Moreover, in beginning writers' essays, the shortest and longest sentence often are very close to the average sentence length, whereas more advanced writers have a much larger spread. E.B. White's "Once More to the Lake," for example, has 5 words in its shortest sentence and a whopping 125 in its longest! Noting that his average sentence length is 30 words, you'd probably guess that he's a very good writer — even before you've read his essay. Write a program to analyze a piece of text, calculating the shortest sentence, the longest sentence, and the average number of words per sentence. [Hints: How many different punctuation marks terminate a sentence? Will you be able to accommodate sentences that wrap around to new lines? I can give you a hint from page 62 of the Perl Pocket Reference that may provide some help on that dilemma, but it won't entirely solve the problem for you.]

Haiku

A haiku is a Japanese form of poetry that has three lines. The first line has five syllables, the second has seven, and the third has five. Write a program that will generate as many haiku as the user wants. There are several ways to devise an algorithm for this program. One strategy might be to create some arrays that contain words of various syllable lengths, then brew up an algorithm that will randomly combine them into patterns of the proper length. A further step might be to pre-sort the words according to their parts of speech in order to create haiku that make more semantic sense. What are the most challenging aspects of writing these algorithms? What kinds of "knowledge" do you need to include in your program to make it write higher quality haiku? (As you can see, this program actually takes one or two steps down the path towards artificial intelligence, doesn't it?) What would you add to the program had you world enough and time?

Palindromes

Palindromes are words or phrases that are spelled the same way backwards and forwards. “Dennis sinned” and “Madam, I’m Adam” are both palindromes (click here for a bigger list of 220 palindromes for you to try). Write a program that accepts a line of input from the user and decides whether the string is a palindrome or not. [Hint: does your program take spaces into account? punctuation? capitalization? Bonus points if you can do it using a recursive subroutine.]

Palindromic Square Numbers

In 1931, a 19th-century British mathematician by the name of Henry Dudeney published (posthumously) a book of puzzles. Here's one of them:

This is a curious subject for investigation — the search for square numbers the figures of which read backwards and forwards alike. Some of them are very easily found. For example, the squares of 1, 11, 111, and 1111 are respectively 1, 121, 12321, and 1234321, all palindromes, and the rule applies for any number of 1's provided the number does not contain more than nine. But there are other cases that we may call irregular, such as the square of 264 = 69696 and the square of 2285 = 5221225. Now, all the examples I have given contain an odd number of digits. Can the reader find a case where the square palindrome contains an even number of figures? (from 536 Puzzles and Curious Problems, pp. 34-35.)

Dudeney's puzzle was not a trivial one in the era before computers — it meant long, tedious hours with pencil and paper, trying all the various combinations. Write a program to solve Dudeney's problem. [Hint: how will you handle the dilemma of analyzing your data as both strings of digits and as numbers? Does Perl provide any shortcuts?] Do you notice any trends or patterns in the numbers that generate palindromes?

The Remainder Puzzle

Here's another puzzle from the pen of mathematician Henry Dudeney:

The following is a rather curious puzzle. Find the smallest number that, when divided successfully by 45, 454, 4545 and 45454, leaves the remainders 4, 45, 454, and 4,545 respectively. This is perhaps not easy but it affords a good arithmetical exercise. (from 536 Puzzles and Curious Problems, p. 37.)

And it's a good Perl exercise, too, we might add. How will you decide how to calculate a remainder? (Hint: do a web search for "modulo" or "modulus".) And if you need to search through some large numbers (you'll need to divide by 45,454 after all!), it might be a good idea to have your program print out occasional "status reports" so that you know how far it's progressing. What steps can you take to devise a smarter and more efficient algorithm? You probably won't want to search through numbers smaller than 49,999 for example (can you figure out why?). Try writing this program using several different algorithms — and count the number of tests the program makes in order to evaluate them for efficiency.

Darwin’s Elephants

Discussing population growth, Charles Darwin wrote in The Origin of Species:

The elephant is reckoned to be the slowest breeder of all known animals, and I have taken some pains to estimate its probable minimum rate of natural increase: it will be under the mark to assume that it breeds when thirty years old, and goes on breeding till ninety years old, bringing forth three pair of young in this interval; if this be so, at the end of the fifth century there would be alive fifteen million elephants, descended from the first pair. (Ch. 3, Facsimile Edition, p. 64)

Write a program to check Darwin’s numbers. [Hint: puzzling out the algorithm is the hardest part of this program; after you've got that, it's easy. Remember, Darwin published this in 1859, so he didn’t have access to all kinds of fancy equations describing population growth.]

Ping-Pong Balls and Thermodynamics

In 1907, physicist Paul Ehrenfest invented a game to help himself better understand the 2nd Law of Thermodynamics, which says that warm things inevitably grow colder. Here’s how Hans Christian von Baeyer describes an implementation of Ehrenfest’s game in his book, Warmth Disperses and Time Passes: The History of Heat:

Imagine two large urns and a hundred Ping-Pong balls numbered from one to a hundred, like the ones used to draw lottery numbers on TV. We will also need a roulette wheel with a hundred slots, or some other means of generating random integers from one to a hundred. The rules of the game are simple. To begin with, drop seventy-five randomly chosen balls into one urn, and twenty-five into the other. Then pick a number at random, fish out the corresponding ball from whichever urn you find it in, and plop it in the other urn. (p. 137)

Write a program that plays Ehrenfest’s game and finds out whether the two urns ever reach “equilibrium”—that is, whether they ever have an equal number of ping-pong balls in each. Count how many “transfers” the computer had to make to reach equilibrium. For extra credit, have the computer run the simulation as many times as the user wants, printing out the minimum, maximum, and average number of “transfers” to reach equilibrium.

Morse Code

Write a program that will convert a user-entered string into Morse code. (You can download a text file of the conversion chart here.) Can you extend your program to translate back from Morse code back into English? Which is the most challenging algorithm? Can you modify your algorithm to translate the contents of a file that may happen to sprawl across two or more lines?

Concordance (and Collocates)

Write a program to open a text file, count how many times each word in that file gets used and print out the results. (As source data, you could, for example, save a recently written essay in plain text format. For an alternate text source, see the next programming problem.) Can you sort the output by printing all the words in alphabetical order? Can you sort the output according to the number of occurrences? For an extra challenge, print out the word that precedes and the word that follows each occurrence (called “collocates” or “collocations”). [Hint: although Perl has a built-in sorting subroutine, sorting by keys is much easier than sorting by values. Come talk to me and I’ll show you how to sort by values.]

Letter Frequency Table

In English, it’s much more likely to find an “e” in a word than a “z” — but how much more likely? Write a program to read a file of text and construct a letter frequency table by counting all the occurrences of each letter. Print out a table that lists each letter and how many occurrences you found (one common format is to print how many occurrences you’d typically expect to find per thousand letters. Yes, yes, these tables are readily accessible on the internet, but you’ll learn more Perl if you try to build your own.) You will get more accurate numbers if you can run your program on a fairly large piece of text. Charles Dickens’s Bleak House, for example, works quite nicely. Find it or something just as interesting at the Gutenberg Project: http://promo.net/pg/.

Letter Frequency Decryption Program

Modify the ‘Letter Frequency Table’ program listed above to try to decrypt the following secret message (click here to get a plain text file of it) that has been encrypted with a simple substitution cipher. Have your program count the occurrences of each letter in the encrypted message. If you can sort your letter frequency table (from the previous exercise) in descending order and then sort your frequency table from the encrypted text below in descending order, you should have an almost one-to-one match. Next, build a “translation hash” that will convert each encrypted letter back into its plaintext form. Your program may not perform the decryption process perfectly, but how close can you get? Can you explain the various decryption “errors” that happen? (Perhaps surprisingly, there might be multiple causes for these errors.) [Note: everything after the line with colons is merely filler to make the frequencies match a little better. And I’ve resisted the common practice of breaking the encrypted text into 5-character chunks. The word spacings and punctuation are the same as in the original text, which should help you in the event that your frequency tables are a little off. It means, however, that you’ll have to recognize and pass the punctuation through the decryption process without modifications.]

kernndg vlk bgaagr ajlf jwk vnrz. jg zwz wa luu, lfz
wfmwfwagus onrg; lfz an awfs awo, vjn zwz fna zwg, jg vlk
l kgenfz mlajgr. jg bgelog lk dnnz l mrwgfz, lk dnnz l
olkagr, lfz lk dnnz l olf, lk ajg dnnz nuz ewas cfgv, nr
lfs najgr dnnz nuz ewas, anvf, nr bnrntdj, wf ajg dnnz nuz
vnruz. knog pgnpug ultdjgz an kgg ajg luagrlawnf wf jwo,
bta jg uga ajgo ultdj, lfz uwaaug jggzgz ajgo; mnr jg vlk
vwkg gfntdj an cfnv ajla fnajwfd gqgr jlppgfgz nf ajwk
dunbg, mnr dnnz, la vjwej knog pgnpug zwz fna jlqg ajgwr mwuu
nm ultdjagr wf ajg ntakga; lfz cfnvwfd ajla ktej lk ajgkg
vntuz bg buwfz lfsvls, jg ajntdja wa xtwag lk vguu ajla ajgs
kjntuz vrwfcug tp ajgwr gsgk wf drwfk, lk jlqg ajg olulzs wf
ugkk laarleawqg mnrok. jwk nvf jglra ultdjgz: lfz ajla vlk
xtwag gfntdj mnr jwo.

jg jlz fn mtrajgr wfagrentrkg vwaj kpwrwak, bta uwqgz tpnf
ajg analu lbkawfgfeg prwfewpug, gqgr lmagrvlrzk; lfz wa vlk
luvlsk klwz nm jwo, ajla jg cfgv jnv an cggp ejrwkaolk
vguu, wm lfs olf luwqg pnkkgkkgz ajg cfnvugzdg. ols ajla
bg artus klwz nm tk, lfz luu nm tk! lfz kn, lk awfs awo
nbkgrqgz, dnz bugkk tk, gqgrs nfg!
::::
lll
wwwww wwww
kkkkk kkkkk k
fffff fffff f
rrrrr rrrrr rrr
ttttt ttt
ooooo ooooo ooo
sssss sssss ssss
eeeee eeeee eeeee ee
ccc

Devise an Algorithm

Look back to the “guess a random number” problem earlier in these problems. If you can invent a good playing strategy for that game, see if you can extend that strategy to other situations. Try this. Put the following array definition into your program exactly as written (one blank space between each element and no newline!):

my @cards_played = qw/C10 C2 C4 C5 C6 C8 CA CQ D10 D2 D5 D6 D7 
D8 D9 DA DJ H10 H2 H3 H4 H8 H9 HA HK S10 S3 S5 S7 S8 S9 SK/;

(Note that this array is sorted in ASCIIbetical order.) Now write a program that will accept a card from the user in ‘SuitCard’ format as above and have the computer check to see if that card has already been played. If it has, print out something akin to “Yes, it’s been played!” and print out how many comparisons the computer had to make in order to find the answer. On the other hand, if it’s not in the array, print out “No, that card has not been played” and again print out how many comparisons the computer had to make to arrive at that conclusion. (Obviously, you’ll have to count these.) Your goal here is to try to minimize the amount of work the computer has to do to find the answer. [Hint: you might want to try this using two techniques: the first would search the array from the beginning to the end, in sequential fashion. For the second technique, have the computer employ the strategy that you devised in the “guess a random number” example. Play the game a few times. Which strategy takes fewer comparisons? What would be a best-case scenario and how many comparisons would it take? What would be a worst-case scenario and how many comparisons would that take? (You might want to put this simulation in a loop, keeping track of the number of comparisons each strategy takes. Can you calculate an average?) Can you arrive at a general conclusion (or even, for the ambitious, can you write an equation?!) describing the number of comparisons that each strategy takes? Double secret hint: there’s a name for this algorithm, and I’ll tell you what it is, but you’ll learn more Perl if you can wrestle with it on your own first.]