Mod Perl Icon Mod Perl Icon Perl Reference


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


A must read!

This new document was born because some users are reluctant to learn Perl, prior to jumping into a mod_perl. I will try to cover some of the most frequent pure perl questions being asked at the list.

Update: I'm moving most of the pure Perl related topics from everywhere in the Guide to this chapter. From now on other chapters will refer to sections in this chapter if required.

Before you decide to skip this chapter make sure you know all the information provided here. The rest of the Guide assumes that you read this chapter and understood it.

[TOC]


Warnings Explained

Meta: Rewrite this section

Sometimes it's very hard to understand what a warning is complaining about. You see the source code, but you cannot understand why some specific snippet produces that warning. The mystery often results from the fact that the code can be called from different places.

Here is an example:

  local $^W=1;
  good();
  bad();
  
  sub good{
    print_value("Perl");
  }
  
  sub bad{
    print_value();
  }
  
  sub print_value{
    my $var = shift;
    print "My value is $var\n";
  }

In the code above, there is a subroutine that prints the passed value, sub good that passes the value correctly and sub bad where we forgot to pass it. When we run the script, we get the warning:

  Use of uninitialized value at ./warning.pl line 15.

We can see the undefined variable $var at the line that attempts to print it:

  print "My value is $var\n";

But how do we know why it is undefined? The solution is quite simple. What we need is a full stack trace, triggered by the warning.

The Carp module comes to our aid with its cluck() function. Let's modify the script by adding a couple of lines. The rest of the script is unchanged.

  use Carp ();
  local $SIG{__WARN__} = \&Carp::cluck;
  
  local $^W=1;
  good();
  bad();
  
  sub good{
    print_value("Perl");
  }
  
  sub bad{
    print_value();
  }
  
  sub print_value{
    my $var = shift;
    print "My value is $var\n";
  }

Now when we execute it, we see:

  Use of uninitialized value at /home/httpd/perl/book/warning.pl line 18.
  Apache::ROOT::perl::book::warning_2epl::print_value() 
    called at /home/httpd/perl/book/warning.pl line 13
  Apache::ROOT::perl::book::warning_2epl::bad() 
    called at /home/httpd/perl/book/warning.pl line 6
  Apache::ROOT::perl::book::warning_2epl::handler('Apache=SCALAR(0x84b1154)') 
    called at /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139
  eval {...} called at 
    /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139
  Apache::Registry::handler('Apache=SCALAR(0x84b1154)') 
    called at PerlHandler subroutine `Apache::Registry::handler' line 0
  eval {...} called at PerlHandler subroutine `Apache::Registry::handler' line 0

Take a moment to understand the trace. The only part that we are interested in is the one that starts when our script is being called, so we can skip the Apache::Registry trace part. So we are left with:

  Use of uninitialized value at /home/httpd/perl/book/warning.pl line 18.
  Apache::ROOT::perl::book::warning_2epl::print_value() 
    called at /home/httpd/perl/book/warning.pl line 13
  Apache::ROOT::perl::book::warning_2epl::bad() 
    called at /home/httpd/perl/book/warning.pl line 6

which tells us that the code that triggered the warning was:

  Apache::Registry code => bad() => print_value()

We go into a bad() and indeed see that we forgot to pass the variable. Of course when you write a subroutine like print_value it could be a good idea to check the passed arguments before starting execution. But it was ``good'' enough to show you how to ease the debugging process.

Sure, you say. I could find that problem by simple inspection of the code. You're right, but I promise you that your task would be quite complicated and time consuming for code of some thousands of lines.

Notice the local() keyword in the second line that we added to our script, before setting $SIG{__WARN__}. Since %SIG is a global variable, forgetting to use local() will enforce this setting for all the scripts running under the same process. If this is the behaviour you want, for example in the development server, you should set it in a startup file, where you can easily switch this feature on and off.

As you have noticed, warnings report the line number of the script which caused the warning. Unfortunately, certain uses of the eval operator and ``here documents'' are known to throw off Perl's line numbering, so the line numbers are often incorrect. (See Finding the Line Number the Error/Warning has been Triggered at)

While having warning mode turned On is a must in a development server, you should turn it globally Off in a production server, since if every CGI script generates only one warning per request, and your server serves millions of requests per day, your log file will eat up all of your disk space and your system will die. My production servers have the following directive in the httpd.conf:

    PerlWarn Off

While we are talking about control flags, another and more important flag is -T which turns On Taint mode. Since this is a very broad topic I'll not discuss it here, but if you aren't forcing all your scripts to run under Taint mode you are looking for trouble from malicious users. To turn it On, add to httpd.conf:

  PerlTaintCheck On

[TOC]


Variables globally, lexically scoped and fully qualified

 META: complete

Also see the clarification of my() vs. use vars - Ken Williams writes:

  Yes, there is quite a bit of difference!  With use vars(), you are
  making an entry in the symbol table, and you are telling the
  compiler that you are going to be referencing that entry without an
  explicit package name.
  
  With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE.  The compiler
  figures out _at_ _compile_time_ which my() variables (i.e. lexical
  variables) are the same as each other, and once you hit execute time
  you can not go looking those variables up in the symbol table.

And my() vs. local() - Randal Schwartz writes:

  local() creates a temporal-limited package-based scalar, array,
  hash, or glob -- when the scope of definition is exited at runtime,
  the previous value (if any) is restored.  References to such a
  variable are *also* global... only the value changes.  (Aside: that
  is what causes variable suicide. :)
  
  my() creates a lexically-limited non-package-based scalar, array, or
  hash -- when the scope of definition is exited at compile-time, the
  variable ceases to be accessible.  Any references to such a variable
  at runtime turn into unique anonymous variables on each scope exit.

[TOC]


Additional reading references

For more information see: Using global variables and sharing them between modules/packages and an article by Mark-Jason Dominus about how Perl handles variables and namespaces, and the difference between use vars() and my() - http://www.plover.com/~mjd/perl/FAQs/Namespaces.html .

[TOC]


my() Scoped Variable in Nested Subroutines

Before we proceed let's make a healthy assumption that we want to develop the code under strict pragma and avoid using global variables, thus using my() scoped variables whenever it's possible.

[TOC]


The Poison

Let's look at this code:

  nested.pl
  -----------
  #!/usr/bin/perl
  
  use strict;
  
  sub print_power_of_2 {
    my $x = shift;
  
    sub power_of_2 {
      return $x ** 2; 
    }
  
    my $result = power_of_2();
    print "$x^2 = $result\n";
  }
  
  print_power_of_2(5);
  print_power_of_2(6);

Don't let the weird subroutine names to fool you, the print_power_of_2() subroutine should print the power of two of the passed number. Let's run the code and see whether it works:

  % ./nested.pl
  
  5^2 = 25
  6^2 = 25

Ouch, something is wrong. May be there is a bug in Perl and it doesn't work correctly with number 6? Let's try again using the 5 and 7:

  print_power_of_2(5);
  print_power_of_2(7);

And run it:

  % ./nested.pl
  
  5^2 = 25
  7^2 = 25

Wow, does it works only for 5? How about using 3 and 5:

  print_power_of_2(3);
  print_power_of_2(5);

and the result is:

  % ./nested.pl
  
  3^2 = 9
  5^2 = 9

Now we start to understand--only the first call to the print_power_of_2() function works correctly. Which makes us think that our code has some kind of memory for results of first time execution and a ignorance of the arguments from consequent executions.

[TOC]


The Diagnosis

Let's follow the guidelines and use a -w flag. Now execute the code:

  % ./nested.pl
  
  Variable "$x" will not stay shared at ./nested.pl line 9.
  5^2 = 25
  6^2 = 25

We have never saw such a warning message before and we don't quite understand what it means. A diagnostics pragma will certainly help us. Let's prepend this pragma before the strict pragma in our code:

  #!/usr/bin/perl -w
  
  use diagnostics;
  use strict;

And execute it:

  % ./nested.pl
  
  Variable "$x" will not stay shared at ./nested.pl line 10 (#1)
    
    (W) An inner (nested) named subroutine is referencing a lexical
    variable defined in an outer subroutine.
    
    When the inner subroutine is called, it will probably see the value of
    the outer subroutine's variable as it was before and during the
    *first* call to the outer subroutine; in this case, after the first
    call to the outer subroutine is complete, the inner and outer
    subroutines will no longer share a common value for the variable.  In
    other words, the variable will no longer be shared.
    
    Furthermore, if the outer subroutine is anonymous and references a
    lexical variable outside itself, then the outer and inner subroutines
    will never share the given variable.
    
    This problem can usually be solved by making the inner subroutine
    anonymous, using the sub {} syntax.  When inner anonymous subs that
    reference variables in outer subroutines are called or referenced,
    they are automatically rebound to the current values of such
    variables.
    
  5^2 = 25
  6^2 = 25

Well, now everything is clear. We have the inner subrouitine power_of_2() and the outer subroutine print_power_of_2() in our code.

When the inner power_of_2() subroutine is called for the first time, it sees the value of the outer print_power_of_2() subroutine's $x variable. On consequent calls the $x variable wouldn't be updated, no matter what was the value of it in the outer subroutine. That's why the $x variable is no longer be shared.

[TOC]


The Remedy

diagnostics pragma suggests using an anonymous subroutine (known also as closure). Let's rewrite the code to use this technique instead:

  anonymous.pl
  --------------
  #!/usr/bin/perl
  
  use strict;
  
  sub print_power_of_2 {
    my $x = shift;
  
    my $func_ref = sub {
      return $x ** 2;
    };
  
    my $result = &$func_ref();
    print "$x^2 = $result\n";
  }
  
  print_power_of_2(5);
  print_power_of_2(6);

Now $func_ref contains a reference to an anonymous function, which we later use when we need to get the power of two. Since the anonymous function will be generated afresh every time print_power_of_2() will be called the correct answer will given. Let's verify:

  % ./anonymous.pl
  
  5^2 = 25
  6^2 = 36

Indeed, it worked correctly as advertised.

[TOC]


When You Cannot Get Rid of Inner Subroutine

First you might wonder, why in the world someone will need to define an inner subroutine. For example to improve the efficiency of perl scripts starting overhead you decide to write a daemon that will compile that the scripts and modules only once and store the cached pre-compiled code in memory. When some script ought to be executed you just tell the daemon the name of the script to run and it will do the rest.

Seems like an easy task, and it is. The only problem is once the script is compiled, how do you execute it? Or let's put it the other way: after it was executed for the first time and it stays compiled in the daemon memory, how do you call it again? If you could enforce on developers to code the scripts so each will have a subroutine called run() that will actually execute the code in the script you have half of the problem solved.

But how daemon knows to refer to some specific script if they all run in the main:: name space? An obvious thing is to ask the developers to declare a package in each and every script, and for the package name to be derived from the script name. Moreover, since there is chance that there will be more than once script with the same name but residing in different directories, the directory has to be a part of the package name in order to prevent namespace collisions. And don't forget that script can be moved from directory to directory and you will have to make sure that the package name will be corrected every time the script gets moved.

But why enforce these strange rules on developers, when we can arrange for our daemon to do this work? For every script that daemon is about to execute for the first time, it should be wrapped inside the package whose name is constructed from the mungled path to the script and a subroutine called run(). For example if the daemon is about to execute the script /tmp/hello.pl:

  hello.pl
  --------
  #!/usr/bin/perl
  print "Hello\n";

Prior to running it, the daemon will change the code to be:

  wrapped_hello.pl
  ----------------
  package cache::tmp::hello_2epl;
  
  sub run{
    #!/usr/bin/perl 
    print "Hello\n";
  }

Where the package name is constructed from prefix cache::, each directories separation slash replaced with :: and non ASCII characters are encoded, so the . becomes _2e.

Now when the daemon is requested to execute the script /tmp/hello.pl, all it has to do is to build the package name as before based on the location of the script and call its run() subroutine:

  use cache::tmp::hello_2epl;
  cache::tmp::hello_2epl::run();

We have just written a partial prototype of the daemon we desired, the only not defined method is how to pass the path to the script to the daemon. This detail is left to the reader as an exercise.

If you are familiar with Apache::Registry module, you know that it works almost in the same way. It uses a different package prefix and the generic function is called handler() and not run(). The scripts to run are passed through the HTTP protocol's headers.

Now you understand that there are cases where your normal subroutines can become inner, since if your script was a simple:

  simple.pl
  ---------
  #!/usr/bin/perl 
  sub hello { print "Hello" }
  hello();

Wrapped into a run() subroutine it becomes:

  simple.pl
  ---------
  package cache::simple_2epl;
  
  sub run{
    #!/usr/bin/perl 
    sub hello { print "Hello" }
    hello();
  }

Therefore, hello() is an inner subroutine and if you have used my() scoped variables defined and altered outside and used inside hello(), it wouldn't work correctly starting from the second call, as was explained in the previous section.

[TOC]


Remedies working for Inner Subroutine

First of all there is nothing to worry about since if you do happen to have ``the my() scoped variable in the inner subroutine'' problem, Perl will always alert you if you don't forget to turn the warnings On.

Given that you have a script that has this problem. What are the ways to solve it? There are many of them and we will discuss some of them here.

We will the following code to show different solutions.

  multirun.pl
  -----------
  #!/usr/bin/perl -w
  
  use strict;
  
  for (1..3){
    print "run: [time $_]\n";
    run();
  }
  
  sub run {
  
    my $counter = 0;
  
    increment_counter();
    increment_counter();
  
    sub increment_counter{
      $counter++;
      print "Counter is equal to $counter !\n";
    }
  
  } # end of sub run

This code executes the run() subroutine three times, which in turn initializes the $counter variable to 0, every time it executed and then calls twice the increment_counter() inner subroutine that prints $counter's value after incrementing it. One might expect to see the following output:

  run: [time 1]
  Counter is equal to 1 !
  Counter is equal to 2 !
  run: [time 2]
  Counter is equal to 1 !
  Counter is equal to 2 !
  run: [time 3]
  Counter is equal to 1 !
  Counter is equal to 2 !

But as we have already learned from the previous sections, this is not what we are going to see. Indeed, when we run the script we see:

  % ./multirun.pl

  Variable "$counter" will not stay shared at ./nested.pl line 18.
  run: [time 1]
  Counter is equal to 1 !
  Counter is equal to 2 !
  run: [time 2]
  Counter is equal to 3 !
  Counter is equal to 4 !
  run: [time 3]
  Counter is equal to 5 !
  Counter is equal to 6 !

Obviously, the $counter variable is not reinitialized on each run() execution, therefore the $counter variable inside the increment_counter() subroutine preserves its previous value from the last execution and increments it to the next value.

One of the workarounds is to use globally declared variables, with the vars pragma.

  multirun1.pl
  -----------
  #!/usr/bin/perl -w
  
  use strict;
  use vars qw($counter);
  
  for (1..3){
    print "run: [time $_]\n";
    run();
  }
  
  sub run {
  
    $counter = 0;
  
    increment_counter();
    increment_counter();
  
    sub increment_counter{
      $counter++;
      print "Counter is equal to $counter !\n";
    }
  
  } # end of sub run

If you run this and other offered below solutions, the correct expected output will be generated:

  % ./multirun1.pl
  
  run: [time 1]
  Counter is equal to 1 !
  Counter is equal to 2 !
  run: [time 2]
  Counter is equal to 1 !
  Counter is equal to 2 !
  run: [time 3]
  Counter is equal to 1 !
  Counter is equal to 2 !

By the way, the warning we saw before has gone and so the problem, since there is no my() (lexically defined) variable used in the nested subroutine.

Another approach is to use fully qualified variables. This is a better one, since less memory will be used, but it adds a typing overhead:

  multirun2.pl
  -----------
  #!/usr/bin/perl -w
  
  use strict;
  
  for (1..3){
    print "run: [time $_]\n";
    run();
  }
  
  sub run {
  
    $main::counter = 0;
  
    increment_counter();
    increment_counter();
  
    sub increment_counter{
      $main::counter++;
      print "Counter is equal to $main::counter !\n";
    }
  
  } # end of sub run

You can also pass the variable to the subroutine by value and make the subroutine return it after it was updated. This adds time and memory overheads, so it's not a good idea if the variable can be very large.

Don't rely on the fact that the variable is small during the development of the application, it can grow quite big in situations you didn't expect. For example, a very simple HTML form text entry field can return a few megabytes of data if one of users is bored and want to test how good is your code. It's not uncommon to see user Copy-and-Paste core dump files of 10Mb in size into a form's text fields and submit it for your script to process.

  multirun3.pl
  -----------
  #!/usr/bin/perl -w
  
  use strict;
  
  for (1..3){
    print "run: [time $_]\n";
    run();
  }
  
  sub run {
  
    my $counter = 0;
  
    $counter = increment_counter($counter);
    $counter = increment_counter($counter);
  
    sub increment_counter{
      my $counter = shift || 0 ;
  
      $counter++;
      print "Counter is equal to $counter !\n";
  
      return $counter;
    }
  
  } # end of sub run

Finally, you can use references to do the job. increment_counter() accepts a reference to a $counter variable and increments its value by first dereferencing it. The $counter variable outside gets affected by this change as well.

  multirun4.pl
  -----------
  #!/usr/bin/perl -w
  
  use strict;
  
  for (1..3){
    print "run: [time $_]\n";
    run();
  }
  
  sub run {
  
    my $counter = 0;
  
    increment_counter(\$counter);
    increment_counter(\$counter);
  
    sub increment_counter{
      my $r_counter = shift || 0;
  
      $$r_counter++;
      print "Counter is equal to $$r_counter !\n";
    }
  
  } # end of sub run

Here is yet another even more obsure reference usage. We modify the value of $counter inside the subroutine by using the fact that variables in @_ are actually aliases, so if you directly modify one of the members of the array the actual value of the passed variable gets changed.

  multirun5.pl
  -----------
  #!/usr/bin/perl -w
  
  use strict;
  
  for (1..3){
    print "run: [time $_]\n";
    run();
  }
  
  sub run {
  
    my $counter = 0;
  
    increment_counter($counter);
    increment_counter($counter);
  
    sub increment_counter{
      $_[0]++;
      print "Counter is equal to $_[0] !\n";
    }
  
  } # end of sub run

Now you have at least five workarounds to choose from.

For more information please refer to perlref and perlsub manpages.

[TOC]


use(), require(), do(), %INC and @INC Explained

[TOC]


The @INC array

@INC is a special Perl variable which is an equivalent of the shell's PATH variable. While PATH includes a list of directories the executables are being looked up in, @INC contains a list of directories Perl modules and libraries can be loaded from.

When you use(), require() or do() a filename or a module, Perl gets a list of directories from the @INC variable to search for the file it was requested to load. If the file that you want to load is not located in one of the listed directories, you have to tell Perl where to find the file by providing it a relative path to one of the directories in @INC or a full path to the file.

[TOC]


The %INC hash

%INC is another special Perl variable that is used to cache the names of the files and the modules that were successfully loaded and compiled by use(), require() or do() functions. Before attempting to load a file or a module, Perl checks whether it's already in %INC hash. If it's there--the loading and therefore the loaded code compilation are not performed at all. Otherwise the file is loaded in memory and attempted to be compiled.

If the file is successfully loaded and compiled, a new key-value pair is added to %INC, where the key is the name of the file or module as it passed to the one of the three functions we have just mentioned, and the value is a full path to it in the file system if it was found in any of the @INC directories, but ".".

The following examples will make it easier to understand a described logic.

First, let's see what are the contents of @INC on my system:

  % perl -e 'print join "\n", @INC'
  /usr/lib/perl5/5.00503/i386-linux
  /usr/lib/perl5/5.00503
  /usr/lib/perl5/site_perl/5.005/i386-linux
  /usr/lib/perl5/site_perl/5.005
  .

Notice the . (current directory) as a last directory in the list.

Now let's load a module strict.pm and see the contents of %INC:

  % perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'
  
  strict.pm => /usr/lib/perl5/5.00503/strict.pm

Since strict.pm was found in /usr/lib/perl5/5.00503/ directory and /usr/lib/perl5/5.00503/ is a part of @INC--%INC includes a full path as a value for the key strict.pm.

Now let's create the simplest module in /tmp/test.pm:

  test.pm
  -------
  1;

It does nothing, but returns a true value when loaded. Now let's laod it in different ways:

  % cd /tmp
  % perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'
  
  test.pm => test.pm

Since the file was found relative to . (current directory) the relative path is inserted as a value, but if we alter the @INC, by adding the /tmp to the end:

  % cd /tmp
  % perl -e 'BEGIN{push @INC, "/tmp"} use test; \
  print map {"$_ => $INC{$_}\n"} keys %INC'
  
  test.pm => test.pm

we still get the relative path, since the module was found first relative to ".", because the /tmp was after . in the list. But if we execute the same code from a different directory and therefore the "." directory wouldn't match:

  % cd /
  % perl -e 'BEGIN{push @INC, "/tmp"} use test; \
  print map {"$_ => $INC{$_}\n"} keys %INC'
  
  test.pm => /tmp/test.pm

we get the full path. We can also prepand the path with unshift(), so it will be used for matching before "." and therefore we get a full path as well.

  % cd /tmp
  % perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
  print map {"$_ => $INC{$_}\n"} keys %INC'
  
  test.pm => /tmp/test.pm

  BEGIN{unshift @INC, "/tmp"}

can be replaced with more elegant:

  use lib "/tmp";

Which executes exactly the BEGIN block from above.

These approaches to modifying @INC can be labour intensive, since if you want to move the script around in the filesystem you have to modify the path. This can be painful, for example, when you move your scripts from development to a production server.

There is a FindBin module, which solves this problem is the plain perl world, but unfortunately it doesn't work correctly under mod_perl.

If you use this module, you don't need to write a hardcoded path. The following snippet does all the work for you (the file is /tmp/load.pl):

  load.pl
  -------
  #!/usr/bin/perl
  
  use FindBin ();
  use lib "$FindBin::Bin";
  use test;
  print "test.pm => $INC{'test.pm'}\n";

In the above example $FindBin::Bin equals to /tmp. If we move the script somewhere else... e.g. /tmp/x in the code above $FindBin::Bin equals to /home/x.

  % /tmp/load.pl
  
  test.pm => /tmp/test.pm

Just like with use lib but no hardcoded path required.

As I've mentioned earlier, FindBin will not work in mod_perl environment, since it's a module and as any module it's loaded only once. So the first script using it will have all the settings correct, but the rest of the scripts will not if located in a different directory than the first one.

[TOC]


Modules, Libraries and Files

Before we proceed let's define what do we mean by module and library or file.

[TOC]


require()

What require() does is reading a file with Perl code and compiles it. Before attempting to load the file it looks up its argument in %INC to see whether it was already loaded. If it was, require() just returns without doing a thing. Otherwise the file will be attempted to be loaded and compiled.

require() has to find the file, is has to load. If the argument is a full path to the file, it just tries to read it. For example:

  require "/home/httpd/perl/mylibs.pl";

If the path is relative, require() will attempt to search for the file in all the directories listed in @INC. For example:

  require "mylibs.pl";

If there is more than one occurance of the file with the same name, in directories listed in @INC the first occurance will be used.

The file must return TRUE as the last statement to indicate successful execution of any initialization code. Since you never know what changes the file will go through in the future, you cannot be sure that the last statement will always return TRUE. That's why the suggestion is to put ``1;'' at the end of file.

While you should use the real filename for mosts of the files. If the file is a module, you may use the following convention instead:

  require My::Module;

This is equal to:

  require "My/Module.pm";

If require() fails to load the file, either because it couldn't find the file in question, the code failed to compile and didn't return TRUE at the end, the program would die(), unless the require() statement would be enclosed into an eval() block, like in this example:

  require.pl
  ----------
  #!/usr/bin/perl -w
  
  eval { require "/file/that/does/not/exists"};
  if ($@) {
    print "Failed to load, because : $@"
  }
  print "\nHello\n";

When we execute the program:

  % ./require.pl
  
  Failed to load, because : Can't locate /file/that/does/not/exists in
  @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux
  /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux
  /usr/lib/perl5/site_perl/5.005 .) at require.pl line 3.
  
  Hello

We see that the program didn't die(), because Hello was printed. This trick is useful when you want to check whether a user has some module installed, but if she hasn't--it's not so critical, may be the program runs without this module with a reduced set of functionality.

If we remove the eval() part and try again:

  require.pl
  ----------
  #!/usr/bin/perl -w
  
  require "/file/that/does/not/exists";
  print "\nHello\n";

  % ./require1.pl
  
  Can't locate /file/that/does/not/exists in @INC (@INC contains:
  /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503
  /usr/lib/perl5/site_perl/5.005/i386-linux
  /usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3.

The program just die()s in the last example, which is what you want in most of the cases.

For more information referer to perlfunc manpage.

[TOC]


use()

use() just like require() loads and compiles the files with Perl code, but it works with modules only. Thus the only way to pass a module to load is by its name and not a filename. If the module located in MyCode.pm, the correct way to use() it is:

  use MyCode

and not:

  use "MyCode.pm"

What use() does is translating of the passed argument into a file name replacing :: with / and appending .pm at the end. So My::Module becomes My/Module.pm.

use() is exactly equivalent to:

 BEGIN { require Module; import Module LIST; }

Internally it calls to require() to do the loading and compilation chores, when the former finishes its job, the import() is being called, unless () is a second argument. The following pairs are equivalent:

  use MyModule;
  BEGIN {require MyModule; import MyModule; }
  
  use MyModule qw(foo bar);
  BEGIN {require MyModule; import MyModule ("foo","bar"); }
  
  use MyModule ();
  BEGIN {require MyModule; }

When non of the parameters passed to import() it imports the default symbols if such were defined inside the module. The import() is not a builtin function--it's just an ordinary static method call into the ``MyModule'' package to tell the module to import the list of features back into the current package. See the Exporter manpage for more information.

There's a corresponding ``no'' command that unimports symbols imported by use, i.e., it calls unimport Module LIST instead of import().

[TOC]


do()

While do() behaves almost indentically to require(), it reloads the file unconditionally. It doesn't check %INC to see whether the file was already loaded.

If do() cannot read the file, it returns undef and sets $! to report the error. If do() can read the file but cannot compile it, it returns undef and sets an error message in $@. If the file is successfully compiled, do() returns the value of the last expression evaluated.

[TOC]


Using global variables and sharing them between modules/packages

[TOC]


Making the variables global

When you first wrote $x in your code you created a global variable. It is visible everywhere in the file you have use it. or if defined it inside a package - it is visible inside this package. But it will work only if you do not use strict pragma and you HAVE to use this pragma if you want to run your scripts under mod_perl. Read The strict pragma to find out why.

[TOC]


Making the variables global with strict pragma On

First you use :

  use strict;

Then you use:

 use vars qw($scalar %hash @array);

Starting from this moment the variables are global in the package you defined them, if you want to share global variables between packages, here what you can do.

[TOC]


Using Exporter.pm to share global variables

Assume that you want to share the CGI.pm's object (I will use $q) between your modules. For example you create it in the script.pl, but want it to be visible in My::HTML. First - you make $q global.

  script.pl:
  ----------------
  use vars qw($q);
  use CGI;
  use lib qw(.); 
  use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl
  $q = new CGI;
  
  My::HTML::printmyheader();
  ----------------

Note that we have imported $q from My::HTML. And the My::HTML which does the export of $q:

  My/HTML.pm
  ----------------
  package My::HTML;
  use strict;
  
  BEGIN {
    use Exporter ();
  
    @My::HTML::ISA         = qw(Exporter);
    @My::HTML::EXPORT      = qw();
    @My::HTML::EXPORT_OK   = qw($q);
  
  }
  
  use vars qw($q);
  
  sub printmyheader{
    # Whatever you want to do with $q... e.g.
    print $q->header();
  }
  1;
  -------------------

So the $q is being shared between the My::HTML package and the script.pl. It will work vice versa as well, if you create the object in the My::HTML but use it in the script.pl. You have a true sharing, since if you change $q in script.pl, it will be changed in My::HTML as well.

What if you need to share $q between more than 2 packages? For example you want My::Doc to share $q as well.

You leave the My::HTML untouched, modify the script.pl to include:

 use My::Doc qw($q);

And write the My::Doc exactly like My::HTML - of course that the content is different :).

One possible pitfall is when you want to use the My::Doc in both My::HTML and script.pl. Only if you add:

  use My::Doc qw($q);

Into a My::HTML, the $q will be shared. Otherwise My::Doc will not share the $q anymore. To make things clear here is the code:

  script.pl:
  ----------------
  use vars qw($q);
  use CGI;
  use lib qw(.); 
  use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl
  use My::Doc  qw($q); # Ditto
  $q = new CGI;
  
  My::HTML::printmyheader();
  ----------------

  My/HTML.pm
  ----------------
  package My::HTML;
  use strict;
  
  BEGIN {
    use Exporter ();
  
    @My::HTML::ISA         = qw(Exporter);
    @My::HTML::EXPORT      = qw();
    @My::HTML::EXPORT_OK   = qw($q);
  
  }
  
  use vars     qw($q);
  use My::Doc  qw($q);
  
  sub printmyheader{
    # Whatever you want to do with $q... e.g.
    print $q->header();
  
    My::Doc::printtitle('Guide');
  }
  1;
  -------------------

  My/Doc.pm
  ----------------
  package My::Doc;
  use strict;
  
  BEGIN {
    use Exporter ();
  
    @My::Doc::ISA         = qw(Exporter);
    @My::Doc::EXPORT      = qw();
    @My::Doc::EXPORT_OK   = qw($q);
  
  }
  
  use vars qw($q);
  
  sub printtitle{
    my $title = shift || 'None';
    
    print $q->h1($title);
  }
  1;
  -------------------

[TOC]


Using aliasing perl feature to share global variables

As the title says you can import a variable into a script/module without using an Exporter.pm. I have found it useful to keep all the configuration variables in one module My::Config. But then I have to export all the variables in order to use them in other modules, which is bad for two reasons: polluting other packages' name spaces with extra tags which rise up the memory requirements, adding an overhead of keeping track of what variables should be exported from the configuration module and what imported for some particular package. I solve this problem by keeping all the variables in one hash %c and exporting only it. Here is an example of My::Config:

  package My::Config;
  use strict;
  use vars qw(%c);
  %c = (
    # All the configs go here
    scalar_var => 5,
  
    array_var  => [
                   foo,
                   bar,
                  ],
  
    hash_var   => {
                   foo => 'Foo',
                   bar => 'BARRR',
                  },
  );
  1;

Now in packages that want to use the configuration variables I have either to use the fully qualified names like $My::Config::test, which I dislike or import them as described in the previous section. But hey, since we have only one variable to handle, we can make things even simpler and save the loading of the Exporter.pm package. We will use aliasing perl feature for exporting and saving the keystrokes:

  package My::HTML;
  use strict;
  use lib qw(.);
    # Global Configuration now aliased to global %c
  use My::Config (); # My/Config.pm in the same dir as script.pl
  use vars qw(%c);
  *c = \%My::Config::c;
  
    # Now you can access the variables from the My::Config
  print $c{scalar_val};
  print $c{array_val}[0];
  print $c{hash_val}{foo};

Of course $c is global everywhere you use it as described above, and if you change it somewhere it will affect any other packages you have aliased $My::Config::c to.

Note that aliases work either with global or local() vars - you cannot write:

  my *c = \%My::Config::c;

Which is an error. But you can:

  local *c = \%My::Config::c;

[TOC]


The Scope of the Special Perl Variables

Special Perl variables like $| (buffering), $^T (time), $^W (warnings), $/ (input record separator), $\ (output record separator) and many more are all global variables. This means that you cannot localize them with my(). Only local() is permitted to do that. Since the child server doesn't usually exit, if in one of your scripts you modify a global varible it will be changed for the rest of the process' life and will affect all the scripts executed by the same process.

We will demonstrate the case on the input record separator variable. If you undefine this variable, a diamond operator will suck in the whole file at once if you have enough memory. Remembering this you should never write code like the example below.

  $/ = undef; 
  open IN, "file" ....
    # slurp it all into a variable
  $all_the_file = <IN>;

The proper way is to have a local() keyword before the special variable is being changed, like this:

  local $/ = undef; 
  open IN, "file" ....
    # slurp it all inside a variable
  $all_the_file = <IN>;

But there is a catch. local() will propagate the changed value to any of the code below it. The modified value will be in effect until the script terminates, unless it is changed again somewhere else in the script.

A cleaner approach is to enclose the whole of the code that is affected by the modified variable in a block, like this:

  {
    local $/ = undef; 
    open IN, "file" ....
      # slurp it all inside a variable
    $all_the_file = <IN>;
  }

That way when Perl leaves the block it restores the original value of the $/ variable, and you don't need to worry about its value anywhere else in your program.

[TOC]


Compiled Regular Expressions

When using a regular expression that contains an interpolated Perl variable, if it is known that the variable (or variables) will not vary during the execution of the program, a standard optimization technique consists of adding the /o modifier to the regexp pattern. This directs the compiler to build the internal table once, for the entire lifetime of the script, rather than every time the pattern is executed. Consider:

  my $pat = '^foo$'; # likely to be input from an HTML form field
  foreach( @list ) {
    print if /$pat/o;
  }

This is usually a big win in loops over lists, or when using grep() or map() operators.

In long-lived mod_perl scripts, however, this can pose a problem if the variable changes according to the invocation. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by the httpd child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is dependent on. Your script will appear broken.

There are two solutions to this problem:

The first -- is to use eval q//, to force the code to be evaluated each time. Just make sure that the eval block covers the entire loop of processing, and not just the pattern match itself.

The above code fragment would be rewritten as:

  my $pat = '^foo$';
  eval q{
    foreach( @list ) {
      print if /$pat/o;
    }
  }

Just saying:

  foreach( @list ) {
    eval q{ print if /$pat/o; };
  }

is going to be a horribly expensive proposition.

You can use this approach if you require more than one pattern match operator in a given section of code. If the section contains only one operator (be it an m// or s///), you can rely on the property of the null pattern, that reuses the last pattern seen. This leads to the second solution, which also eliminates the use of eval.

The above code fragment becomes:

  my $pat = '^foo$';
  "something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
  foreach( @list ) {
    print if //;
  }

The only gotcha is that the dummy match that boots the regular expression engine must absolutely, positively succeed, otherwise the pattern will not be cached, and the // will match everything. If you can't count on fixed text to ensure the match succeeds, you have two possibilities.

If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can use the dummy match:

  "$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present

If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the unsearchable \377 character as follows:

  "\377" =~ /$pat|^[\377]$/; # guaranteed if meta-characters present

Another approach:

It depends on the complexity of the regexp you apply this technique to. One common usage where compiled regexp is usually more efficient is to ``match any one of a group of patterns'' over and over again.

Maybe with some helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book ``Mastering Regex''.

  #####################################################
  # Build_MatchMany_Function
  # -- Input:  list of patterns
  # -- Output: A code ref which matches its $_[0]
  #            against ANY of the patterns given in the
  #            "Input", efficiently.
  #
  sub Build_MatchMany_Function {
    my @R = @_;
    my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
    my $matchsub = eval "sub { $expr }";
    die "Failed in building regex @R: $@" if $@;
    $matchsub;
  }

Example usage:

  @some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
  $Known_Browser=Build_MatchMany_Function(@some_browsers);

  while (<ACCESS_LOG>) {
    # ...
    $browser = get_browser_field($_);
    if ( ! &$Known_Browser($browser) ) {
      print STDERR "Unknown Browser: $browser\n";
    }
    # ...
  }

[TOC]


perldoc's Rarely Known But Very Useful Options

To find what functions perl has, you would execute:

  perldoc perlfunc

To learn the syntax and to find an example of specific known function, you would execute (e.g. for open()):

  perldoc -f open

There is a bug in this option, for it wouldn't call pod2man and display the section in POD. But it's still readable and very useful.

To search the Perl FAQ (perlfaq) sections you would do (e.g for an open keyword):

  perldoc -q open

will return you all the matching Q&A sections, still in POD.

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 12/18/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.