Table of Contents:
This new document was born because some users are reluctant to learn Perl, prior to jumping into a mod_perl. I will try to cover some of the most frequent pure perl questions being asked at the list.
Update: I'm moving most of the pure Perl related topics from everywhere in the Guide to this chapter. From now on other chapters will refer to sections in this chapter if required.
Before you decide to skip this chapter make sure you know all the information provided here. The rest of the Guide assumes that you read this chapter and understood it.
Meta: Rewrite this section
Sometimes it's very hard to understand what a warning is complaining about. You see the source code, but you cannot understand why some specific snippet produces that warning. The mystery often results from the fact that the code can be called from different places.
Here is an example:
local $^W=1;
good();
bad();
sub good{
print_value("Perl");
}
sub bad{
print_value();
}
sub print_value{
my $var = shift;
print "My value is $var\n";
}
In the code above, there is a subroutine that prints the passed value,
sub good that passes the value correctly and sub bad where we forgot to pass it. When we run the script, we get the warning:
Use of uninitialized value at ./warning.pl line 15.
We can see the undefined variable $var at the line that attempts to print it:
print "My value is $var\n";
But how do we know why it is undefined? The solution is quite simple. What we need is a full stack trace, triggered by the warning.
The Carp module comes to our aid with its cluck()
function. Let's modify the script by adding a couple of lines. The rest of
the script is unchanged.
use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;
local $^W=1;
good();
bad();
sub good{
print_value("Perl");
}
sub bad{
print_value();
}
sub print_value{
my $var = shift;
print "My value is $var\n";
}
Now when we execute it, we see:
Use of uninitialized value at /home/httpd/perl/book/warning.pl line 18.
Apache::ROOT::perl::book::warning_2epl::print_value()
called at /home/httpd/perl/book/warning.pl line 13
Apache::ROOT::perl::book::warning_2epl::bad()
called at /home/httpd/perl/book/warning.pl line 6
Apache::ROOT::perl::book::warning_2epl::handler('Apache=SCALAR(0x84b1154)')
called at /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139
eval {...} called at
/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139
Apache::Registry::handler('Apache=SCALAR(0x84b1154)')
called at PerlHandler subroutine `Apache::Registry::handler' line 0
eval {...} called at PerlHandler subroutine `Apache::Registry::handler' line 0
Take a moment to understand the trace. The only part that we are interested
in is the one that starts when our script is being called, so we can skip
the Apache::Registry trace part. So we are left with:
Use of uninitialized value at /home/httpd/perl/book/warning.pl line 18.
Apache::ROOT::perl::book::warning_2epl::print_value()
called at /home/httpd/perl/book/warning.pl line 13
Apache::ROOT::perl::book::warning_2epl::bad()
called at /home/httpd/perl/book/warning.pl line 6
which tells us that the code that triggered the warning was:
Apache::Registry code => bad() => print_value()
We go into a bad() and indeed see that we forgot to pass the variable. Of course when you
write a subroutine like print_value it could be a good idea to check the passed arguments before starting
execution. But it was ``good'' enough to show you how to ease the debugging
process.
Sure, you say. I could find that problem by simple inspection of the code. You're right, but I promise you that your task would be quite complicated and time consuming for code of some thousands of lines.
Notice the local() keyword in the second line that we added to our script, before setting $SIG{__WARN__}. Since %SIG is a global variable, forgetting to use local() will enforce this setting for all the scripts running under the same
process. If this is the behaviour you want, for example in the development
server, you should set it in a startup file, where you can easily switch
this feature on and off.
As you have noticed, warnings report the line number of the script which
caused the warning. Unfortunately, certain uses of the eval
operator and ``here documents'' are known to throw off Perl's line
numbering, so the line numbers are often incorrect. (See Finding the Line Number the Error/Warning has been Triggered at)
While having warning mode turned On is a must in a development server, you should turn it globally Off in a production server, since if every CGI script generates only one
warning per request, and your server serves millions of requests per day,
your log file will eat up all of your disk space and your system will die.
My production servers have the following directive in the httpd.conf:
PerlWarn Off
While we are talking about control flags, another and more important flag
is -T which turns On Taint mode. Since this is a very broad topic I'll not discuss it here, but if you
aren't forcing all your scripts to run under Taint mode you are looking for trouble from malicious users. To turn it On, add to httpd.conf:
PerlTaintCheck On
META: complete
Also see the clarification of my() vs. use vars - Ken Williams writes:
Yes, there is quite a bit of difference! With use vars(), you are making an entry in the symbol table, and you are telling the compiler that you are going to be referencing that entry without an explicit package name. With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE. The compiler figures out _at_ _compile_time_ which my() variables (i.e. lexical variables) are the same as each other, and once you hit execute time you can not go looking those variables up in the symbol table.
And my() vs. local() - Randal Schwartz writes:
local() creates a temporal-limited package-based scalar, array, hash, or glob -- when the scope of definition is exited at runtime, the previous value (if any) is restored. References to such a variable are *also* global... only the value changes. (Aside: that is what causes variable suicide. :) my() creates a lexically-limited non-package-based scalar, array, or hash -- when the scope of definition is exited at compile-time, the variable ceases to be accessible. Any references to such a variable at runtime turn into unique anonymous variables on each scope exit.
For more information see: Using global variables and sharing them between modules/packages and an article by Mark-Jason Dominus about how Perl handles variables and
namespaces, and the difference between use vars() and my() - http://www.plover.com/~mjd/perl/FAQs/Namespaces.html
.
Before we proceed let's make a healthy assumption that we want to develop
the code under strict pragma and avoid using global variables, thus using my()
scoped variables whenever it's possible.
Let's look at this code:
nested.pl
-----------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
sub power_of_2 {
return $x ** 2;
}
my $result = power_of_2();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Don't let the weird subroutine names to fool you, the
print_power_of_2() subroutine should print the power of two of
the passed number. Let's run the code and see whether it works:
% ./nested.pl 5^2 = 25 6^2 = 25
Ouch, something is wrong. May be there is a bug in Perl and it doesn't work correctly with number 6? Let's try again using the 5 and 7:
print_power_of_2(5); print_power_of_2(7);
And run it:
% ./nested.pl 5^2 = 25 7^2 = 25
Wow, does it works only for 5? How about using 3 and 5:
print_power_of_2(3); print_power_of_2(5);
and the result is:
% ./nested.pl 3^2 = 9 5^2 = 9
Now we start to understand--only the first call to the
print_power_of_2() function works correctly. Which makes us
think that our code has some kind of memory for results of first time
execution and a ignorance of the arguments from consequent executions.
Let's follow the guidelines and use a -w flag. Now execute the code:
% ./nested.pl Variable "$x" will not stay shared at ./nested.pl line 9. 5^2 = 25 6^2 = 25
We have never saw such a warning message before and we don't quite
understand what it means. A diagnostics pragma will certainly help us. Let's prepend this pragma before the strict pragma in our code:
#!/usr/bin/perl -w use diagnostics; use strict;
And execute it:
% ./nested.pl
Variable "$x" will not stay shared at ./nested.pl line 10 (#1)
(W) An inner (nested) named subroutine is referencing a lexical
variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of
the outer subroutine's variable as it was before and during the
*first* call to the outer subroutine; in this case, after the first
call to the outer subroutine is complete, the inner and outer
subroutines will no longer share a common value for the variable. In
other words, the variable will no longer be shared.
Furthermore, if the outer subroutine is anonymous and references a
lexical variable outside itself, then the outer and inner subroutines
will never share the given variable.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are called or referenced,
they are automatically rebound to the current values of such
variables.
5^2 = 25
6^2 = 25
Well, now everything is clear. We have the inner subrouitine power_of_2() and the outer subroutine print_power_of_2() in our code.
When the inner power_of_2() subroutine is called for the first
time, it sees the value of the outer print_power_of_2()
subroutine's $x
variable. On consequent calls the $x variable wouldn't be updated, no matter what was the value of it in the
outer subroutine. That's why the $x variable is no longer be shared.
diagnostics pragma suggests using an anonymous subroutine (known also as closure). Let's rewrite the code to use this technique instead:
anonymous.pl
--------------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
my $func_ref = sub {
return $x ** 2;
};
my $result = &$func_ref();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Now $func_ref contains a reference to an anonymous function, which we later use when we
need to get the power of two. Since the anonymous function will be
generated afresh every time print_power_of_2() will be called
the correct answer will given. Let's verify:
% ./anonymous.pl 5^2 = 25 6^2 = 36
Indeed, it worked correctly as advertised.
First you might wonder, why in the world someone will need to define an inner subroutine. For example to improve the efficiency of perl scripts starting overhead you decide to write a daemon that will compile that the scripts and modules only once and store the cached pre-compiled code in memory. When some script ought to be executed you just tell the daemon the name of the script to run and it will do the rest.
Seems like an easy task, and it is. The only problem is once the script is
compiled, how do you execute it? Or let's put it the other way: after it
was executed for the first time and it stays compiled in the daemon memory,
how do you call it again? If you could enforce on developers to code the
scripts so each will have a subroutine called run() that will
actually execute the code in the script you have half of the problem
solved.
But how daemon knows to refer to some specific script if they all run in the main:: name space? An obvious thing is to ask the developers to declare a package in each and every script, and for the package name to be derived from the script name. Moreover, since there is chance that there will be more than once script with the same name but residing in different directories, the directory has to be a part of the package name in order to prevent namespace collisions. And don't forget that script can be moved from directory to directory and you will have to make sure that the package name will be corrected every time the script gets moved.
But why enforce these strange rules on developers, when we can arrange for
our daemon to do this work? For every script that daemon is about to
execute for the first time, it should be wrapped inside the package whose
name is constructed from the mungled path to the script and a subroutine
called run(). For example if the daemon is about to execute
the script /tmp/hello.pl:
hello.pl -------- #!/usr/bin/perl print "Hello\n";
Prior to running it, the daemon will change the code to be:
wrapped_hello.pl
----------------
package cache::tmp::hello_2epl;
sub run{
#!/usr/bin/perl
print "Hello\n";
}
Where the package name is constructed from prefix cache::, each directories separation slash replaced with :: and non ASCII characters are encoded, so the . becomes _2e.
Now when the daemon is requested to execute the script
/tmp/hello.pl, all it has to do is to build the package name as before based on the
location of the script and call its run() subroutine:
use cache::tmp::hello_2epl; cache::tmp::hello_2epl::run();
We have just written a partial prototype of the daemon we desired, the only not defined method is how to pass the path to the script to the daemon. This detail is left to the reader as an exercise.
If you are familiar with Apache::Registry module, you know that it works almost in the same way. It uses a different
package prefix and the generic function is called handler()
and not run(). The scripts to run are passed through the HTTP
protocol's headers.
Now you understand that there are cases where your normal subroutines can become inner, since if your script was a simple:
simple.pl
---------
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
Wrapped into a run() subroutine it becomes:
simple.pl
---------
package cache::simple_2epl;
sub run{
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
}
Therefore, hello() is an inner subroutine and if you have used
my() scoped variables defined and altered outside and used
inside hello(), it wouldn't work correctly starting from the
second call, as was explained in the previous section.
First of all there is nothing to worry about since if you do happen to have
``the my() scoped variable in the inner subroutine'' problem,
Perl will always alert you if you don't forget to turn the warnings On.
Given that you have a script that has this problem. What are the ways to solve it? There are many of them and we will discuss some of them here.
We will the following code to show different solutions.
multirun.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
This code executes the run() subroutine three times, which in
turn initializes the $counter variable to 0, every time it executed and then calls twice the
increment_counter() inner subroutine that prints
$counter's value after incrementing it. One might expect to see the following
output:
run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 3] Counter is equal to 1 ! Counter is equal to 2 !
But as we have already learned from the previous sections, this is not what we are going to see. Indeed, when we run the script we see:
% ./multirun.pl
Variable "$counter" will not stay shared at ./nested.pl line 18. run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 3 ! Counter is equal to 4 ! run: [time 3] Counter is equal to 5 ! Counter is equal to 6 !
Obviously, the $counter variable is not reinitialized on each run() execution,
therefore the $counter variable inside the increment_counter() subroutine preserves
its previous value from the last execution and increments it to the next
value.
One of the workarounds is to use globally declared variables, with the
vars pragma.
multirun1.pl
-----------
#!/usr/bin/perl -w
use strict;
use vars qw($counter);
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
If you run this and other offered below solutions, the correct expected output will be generated:
% ./multirun1.pl run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 3] Counter is equal to 1 ! Counter is equal to 2 !
By the way, the warning we saw before has gone and so the problem, since
there is no my() (lexically defined) variable used in the nested subroutine.
Another approach is to use fully qualified variables. This is a better one, since less memory will be used, but it adds a typing overhead:
multirun2.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$main::counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$main::counter++;
print "Counter is equal to $main::counter !\n";
}
} # end of sub run
You can also pass the variable to the subroutine by value and make the subroutine return it after it was updated. This adds time and memory overheads, so it's not a good idea if the variable can be very large.
Don't rely on the fact that the variable is small during the development of the application, it can grow quite big in situations you didn't expect. For example, a very simple HTML form text entry field can return a few megabytes of data if one of users is bored and want to test how good is your code. It's not uncommon to see user Copy-and-Paste core dump files of 10Mb in size into a form's text fields and submit it for your script to process.
multirun3.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
$counter = increment_counter($counter);
$counter = increment_counter($counter);
sub increment_counter{
my $counter = shift || 0 ;
$counter++;
print "Counter is equal to $counter !\n";
return $counter;
}
} # end of sub run
Finally, you can use references to do the job.
increment_counter() accepts a reference to a $counter variable and increments its value by first dereferencing it. The $counter variable outside gets affected by this change as well.
multirun4.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter(\$counter);
increment_counter(\$counter);
sub increment_counter{
my $r_counter = shift || 0;
$$r_counter++;
print "Counter is equal to $$r_counter !\n";
}
} # end of sub run
Here is yet another even more obsure reference usage. We modify the value
of $counter inside the subroutine by using the fact that variables in @_ are actually aliases, so if you directly modify one of the members of the
array the actual value of the passed variable gets changed.
multirun5.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter($counter);
increment_counter($counter);
sub increment_counter{
$_[0]++;
print "Counter is equal to $_[0] !\n";
}
} # end of sub run
Now you have at least five workarounds to choose from.
For more information please refer to perlref and perlsub manpages.
@INC is a special Perl variable which is an equivalent of the shell's PATH variable. While PATH includes a list of directories the executables are being looked up in, @INC contains a list of directories Perl modules and libraries can be loaded
from.
When you use(), require() or do() a
filename or a module, Perl gets a list of directories from the @INC variable to search for the file it was requested to load. If the file that
you want to load is not located in one of the listed directories, you have
to tell Perl where to find the file by providing it a relative path to one
of the directories in @INC or a full path to the file.
%INC is another special Perl variable that is used to cache the names of the
files and the modules that were successfully loaded and compiled by
use(), require() or do() functions.
Before attempting to load a file or a module, Perl checks whether it's
already in %INC
hash. If it's there--the loading and therefore the loaded code compilation
are not performed at all. Otherwise the file is loaded in memory and
attempted to be compiled.
If the file is successfully loaded and compiled, a new key-value pair is
added to %INC, where the key is the name of the file or module as it passed to the one
of the three functions we have just mentioned, and the value is a full path
to it in the file system if it was found in any of the @INC directories, but ".".
The following examples will make it easier to understand a described logic.
First, let's see what are the contents of @INC on my system:
% perl -e 'print join "\n", @INC' /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .
Notice the . (current directory) as a last directory in the list.
Now let's load a module strict.pm and see the contents of %INC:
% perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'
strict.pm => /usr/lib/perl5/5.00503/strict.pm
Since strict.pm was found in /usr/lib/perl5/5.00503/ directory and /usr/lib/perl5/5.00503/ is a part of @INC--%INC includes a full path as a value for the key strict.pm.
Now let's create the simplest module in /tmp/test.pm:
test.pm ------- 1;
It does nothing, but returns a true value when loaded. Now let's laod it in different ways:
% cd /tmp
% perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => test.pm
Since the file was found relative to . (current directory) the relative path is inserted as a value, but if we
alter the @INC, by adding the /tmp to the end:
% cd /tmp
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => test.pm
we still get the relative path, since the module was found first relative
to ".", because the /tmp was after . in the list. But if we execute the same code from a different directory and
therefore the "." directory wouldn't match:
% cd /
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => /tmp/test.pm
we get the full path. We can also prepand the path with
unshift(), so it will be used for matching before "." and therefore we get a full path as well.
% cd /tmp
% perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => /tmp/test.pm
BEGIN{unshift @INC, "/tmp"}
can be replaced with more elegant:
use lib "/tmp";
Which executes exactly the BEGIN block from above.
These approaches to modifying @INC can be labour intensive, since if you want to move the script around in the
filesystem you have to modify the path. This can be painful, for example,
when you move your scripts from development to a production server.
There is a FindBin module, which solves this problem is the plain perl world, but
unfortunately it doesn't work correctly under mod_perl.
If you use this module, you don't need to write a hardcoded path. The following snippet does all the work for you (the file is /tmp/load.pl):
load.pl
-------
#!/usr/bin/perl
use FindBin ();
use lib "$FindBin::Bin";
use test;
print "test.pm => $INC{'test.pm'}\n";
In the above example $FindBin::Bin equals to /tmp. If we move the script somewhere else... e.g. /tmp/x in the code above
$FindBin::Bin equals to /home/x.
% /tmp/load.pl test.pm => /tmp/test.pm
Just like with use lib but no hardcoded path required.
As I've mentioned earlier, FindBin will not work in mod_perl environment, since it's a module and as any
module it's loaded only once. So the first script using it will have all
the settings correct, but the rest of the scripts will not if located in a
different directory than the first one.
Before we proceed let's define what do we mean by module and library or file.
A file which contains perl subroutines and other code.
It generally doesn't include a package declaration.
Its last statement returns true.
Can be named in any desired way, but generally it has a .pl or .ph extensions.
Examples:
config.pl ---------- $dir = "/home/httpd/cgi-bin"; $cgi = "/cgi-bin"; 1;
mysubs.pl
----------
sub print_header{
print "Content-type: text/plain\r\n\r\n";
}
1;
A file which contains perl subroutines and other code.
It generally declares a package name at the beginning of it.
Its last statement returns true.
A naming convention requires it to have a .pm extension.
Example:
MyModule.pm
-----------
package My::Module;
$My::Module::VERSION = 0.01;
sub new{ return bless {}, shift;}
END { print "Quitting\n"}
1;
What require() does is reading a file with Perl code and
compiles it. Before attempting to load the file it looks up its argument in
%INC to see whether it was already loaded. If it was, require()
just returns without doing a thing. Otherwise the file will be attempted to
be loaded and compiled.
require() has to find the file, is has to load. If the
argument is a full path to the file, it just tries to read it. For example:
require "/home/httpd/perl/mylibs.pl";
If the path is relative, require() will attempt to search for
the file in all the directories listed in @INC. For example:
require "mylibs.pl";
If there is more than one occurance of the file with the same name, in
directories listed in @INC the first occurance will be used.
The file must return TRUE as the last statement to indicate successful execution of any
initialization code. Since you never know what changes the file will go
through in the future, you cannot be sure that the last statement will
always return TRUE. That's why the suggestion is to put ``1;'' at the end of file.
While you should use the real filename for mosts of the files. If the file is a module, you may use the following convention instead:
require My::Module;
This is equal to:
require "My/Module.pm";
If require() fails to load the file, either because it
couldn't find the file in question, the code failed to compile and didn't
return
TRUE at the end, the program would die(), unless the
require() statement would be enclosed into an
eval() block, like in this example:
require.pl
----------
#!/usr/bin/perl -w
eval { require "/file/that/does/not/exists"};
if ($@) {
print "Failed to load, because : $@"
}
print "\nHello\n";
When we execute the program:
% ./require.pl Failed to load, because : Can't locate /file/that/does/not/exists in @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .) at require.pl line 3. Hello
We see that the program didn't die(), because Hello was printed. This trick is useful when you want to check whether a user has some module installed,
but if she hasn't--it's not so critical, may be the program runs without
this module with a reduced set of functionality.
If we remove the eval() part and try again:
require.pl ---------- #!/usr/bin/perl -w require "/file/that/does/not/exists"; print "\nHello\n";
% ./require1.pl Can't locate /file/that/does/not/exists in @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3.
The program just die()s in the last example, which is what you
want in most of the cases.
For more information referer to perlfunc manpage.
use() just like require() loads and compiles the
files with Perl code, but it works with modules
only. Thus the only way to pass a module to load is by its name and not a
filename. If the module located in MyCode.pm, the correct way to use() it is:
use MyCode
and not:
use "MyCode.pm"
What use() does is translating of the passed argument into a
file name replacing :: with / and appending .pm at the end. So
My::Module becomes My/Module.pm.
use() is exactly equivalent to:
BEGIN { require Module; import Module LIST; }
Internally it calls to require() to do the loading and
compilation chores, when the former finishes its job, the
import() is being called, unless () is a second argument. The following pairs are equivalent:
use MyModule;
BEGIN {require MyModule; import MyModule; }
use MyModule qw(foo bar);
BEGIN {require MyModule; import MyModule ("foo","bar"); }
use MyModule ();
BEGIN {require MyModule; }
When non of the parameters passed to import() it imports the
default symbols if such were defined inside the module. The import() is not a builtin function--it's just an ordinary static method call into
the ``MyModule'' package to tell the module to import the list of features back into the
current package. See the Exporter manpage for more information.
There's a corresponding ``no'' command that unimports symbols imported by use, i.e., it calls unimport Module LIST instead of
import().
While do() behaves almost indentically to
require(), it reloads the file unconditionally. It doesn't
check %INC to see whether the file was already loaded.
If do() cannot read the file, it returns undef and sets $! to report the error. If do() can read the file but cannot
compile it, it returns undef and sets an error message in $@. If the file is successfully compiled, do() returns the value
of the last expression evaluated.
When you first wrote $x in your code you created a global
variable. It is visible everywhere in the file you have use it. or if
defined it inside a package - it is visible inside this package. But it
will work only if you do not use strict pragma and you HAVE to use this pragma if you want to run your scripts under mod_perl. Read The strict pragma to find out why.
First you use :
use strict;
Then you use:
use vars qw($scalar %hash @array);
Starting from this moment the variables are global in the package you defined them, if you want to share global variables between packages, here what you can do.
Assume that you want to share the CGI.pm's object (I will use $q) between your modules. For example you create it in the script.pl, but want it to be visible in My::HTML. First - you make $q global.
script.pl: ---------------- use vars qw($q); use CGI; use lib qw(.); use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl $q = new CGI; My::HTML::printmyheader(); ----------------
Note that we have imported $q from My::HTML. And the My::HTML
which does the export of $q:
My/HTML.pm
----------------
package My::HTML;
use strict;
BEGIN {
use Exporter ();
@My::HTML::ISA = qw(Exporter);
@My::HTML::EXPORT = qw();
@My::HTML::EXPORT_OK = qw($q);
}
use vars qw($q);
sub printmyheader{
# Whatever you want to do with $q... e.g.
print $q->header();
}
1;
-------------------
So the $q is being shared between the My::HTML package and the
script.pl. It will work vice versa as well, if you create the object in the My::HTML but use it in the script.pl. You have a true sharing, since if you change $q in script.pl, it will be changed in My::HTML as well.
What if you need to share $q between more than 2 packages? For example you want My::Doc to share $q as well.
You leave the My::HTML untouched, modify the script.pl to include:
use My::Doc qw($q);
And write the My::Doc exactly like My::HTML - of course that the content is different :).
One possible pitfall is when you want to use the My::Doc in both
My::HTML and script.pl. Only if you add:
use My::Doc qw($q);
Into a My::HTML, the $q will be shared. Otherwise My::Doc will not share the $q anymore. To make things clear here is the code:
script.pl: ---------------- use vars qw($q); use CGI; use lib qw(.); use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl use My::Doc qw($q); # Ditto $q = new CGI; My::HTML::printmyheader(); ----------------
My/HTML.pm
----------------
package My::HTML;
use strict;
BEGIN {
use Exporter ();
@My::HTML::ISA = qw(Exporter);
@My::HTML::EXPORT = qw();
@My::HTML::EXPORT_OK = qw($q);
}
use vars qw($q);
use My::Doc qw($q);
sub printmyheader{
# Whatever you want to do with $q... e.g.
print $q->header();
My::Doc::printtitle('Guide');
}
1;
-------------------
My/Doc.pm
----------------
package My::Doc;
use strict;
BEGIN {
use Exporter ();
@My::Doc::ISA = qw(Exporter);
@My::Doc::EXPORT = qw();
@My::Doc::EXPORT_OK = qw($q);
}
use vars qw($q);
sub printtitle{
my $title = shift || 'None';
print $q->h1($title);
}
1;
-------------------
As the title says you can import a variable into a script/module without
using an Exporter.pm. I have found it useful to keep all the configuration
variables in one module My::Config. But then I have to export all the variables in order to use them in other
modules, which is bad for two reasons: polluting other packages' name
spaces with extra tags which rise up the memory requirements, adding an
overhead of keeping track of what variables should be exported from the
configuration module and what imported for some particular package. I solve
this problem by keeping all the variables in one hash %c and exporting only it. Here is an example of My::Config:
package My::Config;
use strict;
use vars qw(%c);
%c = (
# All the configs go here
scalar_var => 5,
array_var => [
foo,
bar,
],
hash_var => {
foo => 'Foo',
bar => 'BARRR',
},
);
1;
Now in packages that want to use the configuration variables I have either
to use the fully qualified names like $My::Config::test, which I dislike or import them as described in the previous section. But
hey, since we have only one variable to handle, we can make things even
simpler and save the loading of the Exporter.pm package. We will use aliasing perl feature for exporting and saving the
keystrokes:
package My::HTML;
use strict;
use lib qw(.);
# Global Configuration now aliased to global %c
use My::Config (); # My/Config.pm in the same dir as script.pl
use vars qw(%c);
*c = \%My::Config::c;
# Now you can access the variables from the My::Config
print $c{scalar_val};
print $c{array_val}[0];
print $c{hash_val}{foo};
Of course $c is global everywhere you use it as described
above, and if you change it somewhere it will affect any other packages you
have aliased $My::Config::c to.
Note that aliases work either with global or local() vars - you cannot write:
my *c = \%My::Config::c;
Which is an error. But you can:
local *c = \%My::Config::c;
Special Perl variables like $| (buffering), $^T (time), $^W
(warnings), $/ (input record separator), $\ (output record separator) and many more are all global variables. This
means that you cannot localize them with my(). Only
local() is permitted to do that. Since the child server
doesn't usually exit, if in one of your scripts you modify a global varible
it will be changed for the rest of the process' life and will affect all
the scripts executed by the same process.
We will demonstrate the case on the input record separator variable. If you undefine this variable, a diamond operator will suck in the whole file at once if you have enough memory. Remembering this you should never write code like the example below.
$/ = undef;
open IN, "file" ....
# slurp it all into a variable
$all_the_file = <IN>;
The proper way is to have a local() keyword before the special
variable is being changed, like this:
local $/ = undef;
open IN, "file" ....
# slurp it all inside a variable
$all_the_file = <IN>;
But there is a catch. local() will propagate the changed value
to any of the code below it. The modified value will be in effect until the
script terminates, unless it is changed again somewhere else in the script.
A cleaner approach is to enclose the whole of the code that is affected by the modified variable in a block, like this:
{
local $/ = undef;
open IN, "file" ....
# slurp it all inside a variable
$all_the_file = <IN>;
}
That way when Perl leaves the block it restores the original value of the $/ variable, and you don't need to worry about its value anywhere else in your
program.
When using a regular expression that contains an interpolated Perl
variable, if it is known that the variable (or variables) will not vary
during the execution of the program, a standard optimization technique
consists of adding the /o modifier to the regexp pattern. This directs the compiler to build the
internal table once, for the entire lifetime of the script, rather than
every time the pattern is executed. Consider:
my $pat = '^foo$'; # likely to be input from an HTML form field
foreach( @list ) {
print if /$pat/o;
}
This is usually a big win in loops over lists, or when using grep()
or map() operators.
In long-lived mod_perl scripts, however, this can pose a problem if the variable changes according to the invocation. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by the httpd child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is dependent on. Your script will appear broken.
There are two solutions to this problem:
The first -- is to use eval q//, to force the code to be evaluated each time. Just make sure that the eval
block covers the entire loop of processing, and not just the pattern match
itself.
The above code fragment would be rewritten as:
my $pat = '^foo$';
eval q{
foreach( @list ) {
print if /$pat/o;
}
}
Just saying:
foreach( @list ) {
eval q{ print if /$pat/o; };
}
is going to be a horribly expensive proposition.
You can use this approach if you require more than one pattern match
operator in a given section of code. If the section contains only one
operator (be it an m// or s///), you can rely on the property of the null pattern, that reuses the last
pattern seen. This leads to the second solution, which also eliminates the
use of eval.
The above code fragment becomes:
my $pat = '^foo$';
"something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
foreach( @list ) {
print if //;
}
The only gotcha is that the dummy match that boots the regular expression
engine must absolutely, positively succeed, otherwise the pattern will not
be cached, and the // will match everything. If you can't count on fixed text to ensure the match
succeeds, you have two possibilities.
If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can use the dummy match:
"$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present
If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the unsearchable \377 character as follows:
"\377" =~ /$pat|^[\377]$/; # guaranteed if meta-characters present
Another approach:
It depends on the complexity of the regexp you apply this technique to. One common usage where compiled regexp is usually more efficient is to ``match any one of a group of patterns'' over and over again.
Maybe with some helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book ``Mastering Regex''.
#####################################################
# Build_MatchMany_Function
# -- Input: list of patterns
# -- Output: A code ref which matches its $_[0]
# against ANY of the patterns given in the
# "Input", efficiently.
#
sub Build_MatchMany_Function {
my @R = @_;
my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
my $matchsub = eval "sub { $expr }";
die "Failed in building regex @R: $@" if $@;
$matchsub;
}
Example usage:
@some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww); $Known_Browser=Build_MatchMany_Function(@some_browsers);
while (<ACCESS_LOG>) {
# ...
$browser = get_browser_field($_);
if ( ! &$Known_Browser($browser) ) {
print STDERR "Unknown Browser: $browser\n";
}
# ...
}
To find what functions perl has, you would execute:
perldoc perlfunc
To learn the syntax and to find an example of specific known function, you
would execute (e.g. for open()):
perldoc -f open
There is a bug in this option, for it wouldn't call pod2man and display the section in POD. But it's still readable and very useful.
To search the Perl FAQ (perlfaq) sections you would do (e.g for an
open keyword):
perldoc -q open
will return you all the matching Q&A sections, still in POD.
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/18/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |