Beginners Introduction to Object-Oriented Programming with Perl

By chromatic
November 13, 2008 | Comments: 8

This series has discussed Perl as a language for numbers, strings, and files -- the original purpose of the language. (A Beginner's Introduction to Perl 5.10, A Beginner's Introduction to Files and Strings with Perl 5.10, and A Beginner's Introduction to Perl Regular Expressions). Then it showed how to use Perl 5.10 for Web Programming.

The previous articles assumed that you'll write code for your programs yourself, in one big file per program. Perl has a huge advantage; you don't need to do this. Over 7,000 people have contributed over 16,400 addon libraries, or module distributions, for common tasks.

This installment explains how modules work; you'll build one, and along the way you'll learn a bit about object-oriented programming in Perl.

What Is an Object?

Think back to the first article in this series, which discussed two basic data types in Perl, strings and numbers. There's another basic data type: the object. You saw them in the previous article, A Beginner's Introduction to Perl Web Programming.

Objects are a convenient way of packaging information with the things you actually do with that information. An object contains data in its attributes or properties, and can perform actions through methods.

For example, you might have an AddressEntry object for an address book program. This object would contain properties that store a person's name, mailing address, phone number, and e-mail address; as well as methods that print a nicely formatted mailing label or allow you to change the person's phone number.

A New Goal

So far, the configuration information for the code developed in previous articles appears directly in the source code of those programs. This isn't a good approach. You may want to install a program and allow multiple users to run it, each with their own preferences, or you may want to store common groups of options for later. What you need is a configuration file to store these options.

The INI-style format is particularly easy to use; it's a simple plain-text format, which groups name and value pairs into sections. Header names in brackets delineate sections. To refer to the value of a specific key in the configuration file, use the section.name syntax. For instance, the value of author.firstname in this simple file is Doug:

[author]
firstname=Doug
lastname=Sheppard

[site]
name=Perl.com
url=http://www.perl.com/

If you used Windows in the ancient days when versions had numbers, not years, you'll recognize this format. Note also that an existing well-tested and well-maintained CPAN module already handles this format: Config::INI. In real-world code, use that module instead -- but it's simple enough to write a module to handle the very basic form of that format that this makes a good didactic exercise.

With the real-world purpose of this module defined, it's time to think about what properties and methods it will have. What do TutorialConfig objects store, and what can you do with them?

The first part is simple: the object's properties will be the values in the configuration file.

The second part is more complex. Start by listing the two things you need to do: read a configuration file and retrieve a value from it. Call these two methods read and fetch. Finally, add another method to store or change a value from within your program. Call it store. These three methods will cover nearly everything you want to do.

Starting Off

Perl class names often use the StudlyCaps or CamelCase style. Use the name TutorialConfig. Because Perl looks for a module by its filename, the filename should be TutorialConfig.pm.

Put the following into a file called TutorialConfig.pm:

package TutorialConfig;

warn "TutorialConfig is successfully loaded!\n";

1;

(I've sprinkled debugging statements throughout the code. You can take them out in practice. The warn keyword is useful to bring things to the user's attention without ending the program the way die would.)

The package keyword tells Perl the name of the class you're defining. This is generally the same as the module name. (It doesn't have to be, but it's a good idea!) The 1; will return a true value to Perl, which indicates that the module has loaded completely and successfully. If you forget this (and you will), Perl will give you an error message saying that your package did not return a true value.

You now have a simple module called TutorialConfig, which you can use in your code with the use keyword. Run this very simple, one-line program:

use TutorialConfig;

... and you should see:

TutorialConfig is successfully loaded!

What Does an Object Do?

Before you can create an object, you need to know how to create it. That means you must write a method called new that will initialize and return an object. Sometimes you need to initialize objects and sometimes you don't; this constructor is where you do so.

Add this new method to TutorialConfig.pm right after the package declaration:

sub new {
    my ($class_name) = @_;

    my $self = {};
    warn "We just created our new variable...\n";

    bless ($self, $class_name);
    warn "and now it's a $class_name object!\n";

    $self->{_created} = 1;
    return $self;
}

(Again, you won't need those warn statements in actual practice.)

First, notice that methods use the sub keyword as well. (All methods are really just a special sort of sub.) This new method takes one parameter: the type of object to create, stored in a private variable called $class_name. (You can also pass extra parameters to new if you want. Some modules use this for special initialization routines.)

Next, the code uses a hash-based object by creating a new anonymous hash. This works just like a regular hash, except that it has no name and you must dereference it to use it. More on that in a moment.

The bless operator takes two parameters: a variable containing a reference that you want to make into an object, and the type of object you want it to be. This is the line that makes the magic happen! All bless does is associate a class with a reference so that you can call methods on it.

The code next stores a property called _created. This property isn't really that useful, but it does show the syntax for accessing the contents of a hash reference: $object_name->{property_name}.

Finally, having made $self into a new TutorialConfig object, the code returns it.

A program to create a TutorialConfig object looks like:

use TutorialConfig;
my $tut = TutorialConfig->new();

In this case, new() is a class method; you call it on a class rather than an object itself. Notice that the method calling operator -> looks exactly like the operator used to dereference the anonymous hash. This is on purpose. All objects are references. (If you've followed closely, you may say "Wait, C is a bareword string there! You're not dereferencing anything!" That's true, but the syntax for calling a class method on a bareword name and an instance method on an object is the same. This is an important consistency worth encouraging in your programs.)

When you run this code, you'll see:

TutorialConfig is successfully loaded!
We just created the variable ...
and now it's a TutorialConfig object!

Now that you have a class and can create objects with it, it's time to make the class do something!

The Goal, Part 2

Remember the goals for the example program? You need to write three methods for the TutorialConfig module: read, fetch, and store.

The first method, read, obviously requires the name of a file to read. Notice that this method requres two parameters. The first parameter is the object to use, and the second is the filename to read. The returned value indicates whether the method successfully read the file.

sub read {
    my ($self, $file) = @_;

    open my $config_fh, $file or return 0;

    # Store a special property containing the name of the file.
    $self->{_filename} = $file;

    my $section;

    while (my $line = <$config_fh>) {
        chomp $line;
        given ($line) {
            when (/^\[(.*)\]/)                   { $section = $1 }
            when (/^(?<key>[^=]+)=(?<value>.*)/) {
                $section //= '';
                $self->{"$section.$config_name"} = $config_val;
            }
        }
    }

    close $config_fh;

    return 1;
}

Surprisingly, that code handles most of the work of the configuration object. Now that the class knows how to read a configuration file, you can add a method to retrieve a value from the object. fetch is simple:

sub fetch {
    my ($self, $key) = @_;

    return $self->{$key};
}

These two methods are really all you need to begin experimenting with our TutorialConfig object. Save the sample configuration file as tutc.txt, and then run this sample TutorialConfig program:

use 5.010;

use TutorialConfig;

my $tut = TutorialConfig->new();
$tut->read('tutc.txt') or die "Couldn't read config file: $!";

say "The author's first name is ",
         $tut->fetch('author.firstname'),
         ".";

When you run this program, you'll see something like:

TutorialConfig has been successfully loaded!
We just created the variable...
and now it's a TutorialConfig object!
The author's first name is Doug. 

You now have an object that will read configuration files and show values inside those files. This is good enough, but there was one more goal: to write a store method that allows you to add or change configuration values from within a program. This is almost as simple as fetch:

sub store {
    my ($self, $key, $value) = @_;

    $self->{$key} = $value;
}

Now test it:

use 5.010;

use TutorialConfig;
my $tut = TutorialConfig->new();

$tut->read('tutc.txt') or die "Can't read config file: $!";
$tut->store('author.country', 'Canada');

say $tut->fetch('author.firstname'), " lives in ",
      $tut->fetch('author.country'), ".";

These three methods (read, fetch, and store) are everything necessary for this simple TutorialConfig.pm module. More complex modules might have dozens of methods!

Encapsulation

You may be wondering why the code has fetch and store methods at all. Why use $tut->store('author.country', 'Canada') when $tut->{'author.country'} = 'Canada' works just as well? There are multiple reasons to use methods instead of playing directly with an object's properties.

First, you can generally trust that a module won't change its methods, no matter how much their implementation changes. Someday, you might want to switch from using text files to hold configuration information to using a database such as MySQL or PostgreSQL. The new TutorialConfig module might have new, read, fetch and store methods that look like:

sub new {
    my ($class) = @_;
    bless {}, $class;
}

sub read {
    my ($self, $file) = @_;
    my ($db)          = database_connect($file);
    return 0 unless $db;

    $self->{_db} = $db;
}

sub fetch {
    my ($self, $key) = @_;
    my $db           = $self->{_db};

    return database_lookup($db, $key);
}

sub store {
    my ($self, $key, $value) = @_;
    my $db                   = $self->{_db};

    return database_store($db, $key, $value);
}

(Assume that the database_connect, database_lookup(), and database_store() routines appear elsewhere and do just what their names imply.)

Even though the entire module's source code has changed, all of the methods still have the same names and syntax. The external interface to this code remains the same. Code that uses these methods will continue working just fine, but code that directly manipulates properties will break!

Suppose that you have some code which stores a configuration value:

$tut->{'author.country'} = 'Canada';

This works fine with the original TutorialConfig, because when you call $tut->fetch('author.country'), it looks in the object's properties and returns Canada just like you expected. However, when you upgrade to the new version that uses databases, the code will no longer return the correct result. Instead of fetch() looking in the object's properties, it'll go to the database, which won't contain the correct value for author.country! If you'd used $tut->store('author.country', 'Canada') all along, things would work fine.

As a module author, writing methods will let you make changes (bug fixes, enhancements, or even complete rewrites) without requiring your module's users to rewrite any of their code.

A related benefit is that you can further customize the behavior of this module by subclassing or other polymorphic behavior. Polymorphism is a four-dollar word which means "Anything that has the same interface -- the same public attributes and methods -- behaves the same way." That is, if you had multiple active TutorialConfig objects, you can treat them all the same way, perhaps using the one that works with INI files to read in old configuration data and write to the object that works with a database backend.

Second, using methods lets you avoid impossible values. You might have an object that takes a person's age as a property. A person's age must be a positive number (you can't be -2 years old, unless you have a time machine, in which case the author has a business plan and is willing to split the proceeds!), so the age() method for this object will reject negative numbers. If you bypass the method and directly manipulate $obj->{age}, you may cause problems elsewhere in the code. A routine to calculate the person's birth year, for example, might fail or produce an odd result.

As a module author, you can use methods to help programmers who use your module write better software. You can write a good error-checking routine once, and reuse it many times.

Some languages, by the way, enforce encapsulation, by giving you the ability to make certain properties private. Perl doesn't do this. In Perl, encapsulation isn't the law. It's just a very good idea.

This encapsulation goes as far as using methods to access an object's properites, rather than poking in the blessed hash directly. If you're creating your objects directly, you might declare a filename() accessor method to get and set the name of the configuration file:

sub filename
{
    my ($self, $filename) = @_;
    $self->{_filename}    = $filename if defined $filename;

    return $self->{_filename};
}

... and then use $self->filename() instead of accessing $self->{_filename} directly. Not only does this insulate the details of how you store the name of the configuration file in a single place (the filename() method), but it allows you to change the representation and storage of the filename and the object as a whole in this class itself or polymorphic variants of the class.

Writing all of those accessors by hand can be a little tedious, but this is Perl. There's more than one way to do things.

Declarative Objects

If you've used objects in other languages, you may rightfully wonder "What's with all of that weird bless a reference stuff? Why can't I just declare my class and its attributes and have Perl take care of everything for me?" Fortunately, you can.

The Moose distribution from the CPAN builds on Perl 5's standard object system to provide many more features in a declarative fashion that's easier to use in many ways. Moose borrows liberally from Perl 6's object system (as well as some nice features from Smalltalk and Common Lisp). The result is much more powerful and introspective.

Moose can be intimidating; it has many features. Mouse is a similar distribution which provides a gateway to Moose. It supports the most common, basic operations in the same way, though it's easier to install and understand.

This is not to say that there's anything wrong with Perl 5's default object system. It works. It's deliberately minimal, but you can build anything you want out of it. Sometimes reaching for the Mouse or the Moose can help you write (and especially maintain) larger programs more effectively.

With Mouse and Moose, you use the module, then declare your object's attributes and some metadata about them. You get a constructor and accessor methods in return without having to write them. A TutorialConfig class might look like this instead:

package TutorialConfig;

use 5.010;
use Mouse;

has 'filename'   => ( is => 'rw' );
has 'properties' => ( is => 'ro', default => sub { {} } );

sub read
{
    my ($self, $file) = @_;

    open my $config_fh, $file or return 0;

    $self->filename( $file );

    my $section = '';

    while (my $line = <$config_fh>) {
        chomp $line;

        given ($line) {
            when (/^\[(.*)\]/)                   { $section = $1 . '.' }
            when (/^(?<key>[^=]+)=(?<value>.*)/) {
                $self->store( $section . $+{key}, $+{value} );
            }
        }
    }

    return 1;
}

sub store
{
    my ($self, $key, $value)  = @_;
    $self->properties->{$key} = $value;
}

sub fetch
{
    my ($self, $key) = @_;
    return $self->properties->{$key};
}

1;

Most of the special Mouse magic is in the first few lines of the file. use Mouse; in the TutorialConfig package creates a class named TutorialConfig. This class has two attributes, filename and properties. The has keyword (okay, it's a list-ary function, but it looks and behaves like a keyword here!) takes the name of an attribute to add and a list of the attribute's properties.

In this case, the filename is a rw property. It's readable and writeable, so Mouse will generate a read/write accessor for it, named filename.

properties is a hash. This attribute is more complex; it's not a good idea to allow users to replace the properties hash, so it's a read-only attribute (thus the properties() method only returns the hash reference; you can't set it). There's also a default value: a hash reference.

Mouse exposes one drawback of Perl in the syntax to set a default value for the property. Every new TutorialConfig object needs its own unique properties, so the default value provided here is an anonymous Perl function which returns a new hash reference. If the line read instead default => {}, all instances would share the same hash reference. It's a slight infelicity. (Python programmers may recognize something similar in default function parameters.)

It's little trouble to write the equivalent code by hand, but consider all of the code you don't have to write to declare a class, its attributes, and its accessors. What's left is declarative and clear to understand. Even better, it emphasizes the essential differences between the parameters; they stand out in ways that multiple near-boilerplate accessor method declarations cannot provide.

You don't have to use Mouse or Moose to use objects in Perl effectively, but even for an example as simple as this, their benefits are obvious. In an object more complex, the value of this approach is clear.

Play Around!

There are plenty of opportunities for you to modify this code as you experiment with object oriented Perl.

  • The TutorialConfig class could use a method that will write a new configuration file to any filename you desire. Write your own write() method (use keys %$self to fetch the keys of the object's properties, in the standard Perl object version). Be sure to handle the error condition if Perl cannot open the file!
  • Write a BankAccount module. Your BankAccount object should have deposit, withdraw, and balance methods. Make the withdraw method fail if you try to withdraw more money than you have, or deposit or withdraw a negative amount of money.
  • A big advantage of using CGI objects (see the previous article) is that you can store and retrieve queries on disk. Take a look in the CGI documentation to learn how to use the save() method to store queries, and how to pass a filehandle to new to read them from disk. Try writing a CGI program that saves recently used queries for easy retrieval.
  • One reason many novices avoid using modules in Perl is because they don't know how to distribute them. The Module::Build distribution (a core library in Perl 5.10) helps configure, build, test, and install Perl modules. A simple Build.PL file takes only a few minutes to write. Bundle TutorialConfig and BankAccount into their own distributions with Module::Build.

You might also be interested in:

8 Comments

I never considered plugging Mouse as a gateway to Moose. I'll refactor the documentation to be geared toward people being first exposed to Moose through Mouse.

If you're not sold on Moose, I urge you to give Moose::Unsweetened a read. It defines two classes, first with Moose then with plain Perl 5 OO. If nothing else, it should exemplify why people are so enthusiastic about Moose.

use 5.010?

If only perl and python had setup their scoping so that the 'self' wasn't always required to access class members. This leads to so many problems for people used to C++ or Java, especially if the language is silently creating new variables when you think you're accessing class members.

If there's one thing I'd like to see in the next major perl or python revision it is that.

@Ciaran, that's almost nearly impossible with the dynamic nature of Perl and Python objects. At compile time, C++ and Java can resolve named symbols to attribute access. Because Perl and Python use very late binding (and their type systems prefer polymorphic equivalence to strict taxonomic identities), there's no good way of distinguishing between free variable and attribute access at compile time.

Python has it worse; lack of variable declarations makes this impossible in the general case.


I had to replace $config_name and $config_val in this line to get it to work. Why doesn't the way presented work for me?

$self->{"$section.$+{'key'}"} = $+{'value'};

Lets see one of these articles for python!

Here is a very good perl guide for beginners:
Object-Oriented Perl Guide

use strict;
And no more silent creating new variables

News Topics

Recommended for You

Got a Question?