Opening and Closing Files in Perl

[Return to Perl Tools]

 

In case you need to manipulate external data files, there are three and a half things you need to know:

  1. What's a "filehandle"?
  2. How to open and close the file.
  3. How to read from the file.
  4. Different I/O Modes.

Here's the dirt on these topics.

 



What's a "filehandle"?

In Perl, when your program "connects" to an external data file, Perl labels the connection (not the file itself!) with a label called a "filehandle." You, the programmer, get to choose the name for the filehandle, and each input to or output from your program is transferred via the connection labelled as the filehandle. It looks like you're reading and writing directly to the filehandle and not the file itself. The general scheme looks like this:

Filehandle Graphic

 

...and this is good... why?

Well, for starters, it's good because you can process a whole set of files in one fell swoop without having to change filehandles for each individual file as you process it. Your Perl program can blindly read and write to the same filehandle, no matter which file happens to be open on the other end. You might, for example, want your user to type in the name of the file to be processed. You need not change your filehandle name to accommodate this variability.

Moreover, Unix (and Perl, too, by extension) treats external peripheral devices (like keyboards and printers) as files, too. Unix can't tell (nor does it care!) whether the filehandle points to your term paper or to an external hard drive or to the complete works of Shakespeare or to a printer. It simply doesn't matter.

In fact, Perl has several "built-in" filehandles that you've already used, but that are invisible to you. Each time you print something to the screen, the output goes to a filehandle called STDOUT ("standard out"). Usually, the STDOUT filehandle is connected to your computer monitor, but it might just as easily point to a data file or to a laser printer. Output to all those devices should be equally easy, and filehandles make it so.

Here are two more: your Perl program usually gathers input from STDIN ("standard in"), which is by default your keyboard. And errors are logged to the STDERR filehandle (which normally points to the same place as STDOUT, unless you specify otherwise. You could change it quite easily — say you wanted, for example, to have a CGI script report errors to a logfile rather than automatically dumping them to the user's browser window where they would do you no good).

 

Formatting and Naming Conventions

In Perl, it's customary to CAPITALIZE your filehandle names. That way, you will recognize the filehandle as something different from a normal variable (remember, Perl is case sensitive, so this trick will keep you from stomping on your other variables).

Any name will do, as long as you can easily remember it and as long as it is not one of the following six pre-existing Perl filehandles:

How to use them? Ah, keep reading.

 

[Return to top]

 



How to open and close the file.

Filehandles are opened and closed with the (surprise!) "open" and "close" commands. You must specify the name of the filehandle you've chosen and tell Perl which file that filehandle should point to. The syntax looks like this:

open SPRINGFIELD, "homer.txt";

In this case, SPRINGFIELD is the name of our filehandle. Then a comma. Then the name of our file in double quotes, followed by the requisite semi-colon. That's it. You simply read and write from SPRINGFIELD (see the next section for more details on how to do this). Now our example looks like this:

Filehandle Example

Closing the filehandle is even simpler:

close SPRINGFIELD;

Open filehandles should always be closed by your program when you're finished with them in order to prevent data corruption or other (perhaps more serious) errors.

 

...or die

What if your Perl program can't open or close the file for some reason? What if "homer.txt" has gone missing? How would you ever know? Well, Perl provides a spiffy command specifically suited to cases like this — the die command, which tells your program to terminate and to spit out an error message with its last, dying gasp. In preparation for just such an event, Perl will automatically store an error message in the special $! variable (yes, that's a real variable name), and you can append it to a message of your own choosing that the die command will print. It looks like this (I usually put it on two lines with an indent for readability):

 
   	open SHAKESPEARE, "complete_works.txt"
       or die "The haunted grave of Shakespeare won't open: $!";


And it's a good idea to include the "or die" command when closing too. Notice again, though, that we do not need to specify the name of the file.

 
   	close SHAKESPEARE
       or die "The walking dead will not cease and desist: $!";

 

[Return to top]

 



How to read from the file.

Any input/output (I/O) operations that you perform with the file go through the filehandle. Normally, you'll want to read the file one line at a time and perform some operation on it (counting it, testing it against a regex, extracting words or letters, etc.). The best way to do this is with a while loop. Just put the name of the filehandle between < and > brackets. It looks like this (remember that you need parentheses ( and ) for the while test):

while (<SHAKESPEARE>) {
    # do stuff
}

(Of course, swap out that comment for your real code.) Perl is smart enough to read through the file (or files!) connected to the filehandle until it hits its last "end of file" marker (EOF), at which point the while loop will terminate automatically and gracefully without causing any sort of ruckus at all.

 

 

Writing to a file

Writing is just as easy. Simply list the filehandle immediately after the print command (make sure you've already opened the filehandle or you'll get an error):

print SPRINGFIELD "I am evil Homer! I am evil Homer!\n";
Of course, a double-quoted string will interpolate variables too:
print SPRINGFIELD "I am the equivalent of $num evil Homers!\n";

And remember: if you don't have any other filehandles open, a normal print command just prints to STDOUT — it's not necessary to specify the filehandle, but you could if you wanted: print STDOUT "Hello, world!\n";

Two more advanced tips. If you happen to have multiple filehandles open at once (it could happen), you can use the select command to choose the appropriate filehandle. The following two-liner is equivalent to the one-liner above:

select SPRINGFIELD;
print "I am evil Homer! I am evil Homer!\n";

That output will go to SPRINGFIELD just as if you had put the filehandle on the print line.

A second tip: if you forget which filehandle you've currently selected, select by itself will tell you. Putting some code like this in your program...:

$handle = select;
print "handle = $handle\n";

...Will output something like this:

handle = main::STDOUT

Here, the main:: part of the answer tells you that it's your main program, not some other module or package that's responsible for the output. But what you really wanted to know was whether you were going to STDOUT or somewhere else, and that's been answered for you.

 




Different I/O Modes.

Sometimes, you might want to specify a filehandle that's exclusively for input (to prevent, for example, unauthorized writing to the file). Or you might want to open a text file to which you can only append new data (in the case of a logfile, for instance). Perl can accommodate you. In that event, you can add these special bracket symbols to your open command:

open SHELBYVILLE, "sharks"; (for both read and write access)

open SHELBYVILLE, "<sharks"; (for reading input into your program)

open SHELBYVILLE, ">sharks"; (for output from your program)

open SHELBYVILLE, ">>sharks"; (to append data to the end of the file)

 

[Return to top]