guide/ 0040755 0000000 0000000 00000000000 07027225634 010664 5 ustar root root guide/CHANGES 0100644 0000000 0000000 00000115545 07027225633 011666 0 ustar root root This is a CHANGES file for mod_perl Guide
12.19.99 ver 1.19
* all.html has gone (all htmls in one) -- it became more than 1Mb, too
big - use the PS version instead
* reorg: moved the "perl reference" chapter to be one of the first
ones, because it should be read first. Moved the strategies and
implementations toward the middle.
* snippets: started "Code Unloading" as hinted by Doug.
* porting: updated "Output from system calls" (Doug)
* porting: fixed the "\n\n" vs. "\r\n\r\n"(Philip Newton)
* debug: added "Debugging when Server Crashes on Startup before
Writing to Log File" (Cliff Rayman)
* snippets: added "Redirecting While Maintaining Environment
Variables" (Vivek Khera)
* troubleshooting: added "libexec/libperl.so: open failed: No such
file or directory" (Christophe Dupre)
* performance: added "Upload/Download of Big Files" (Ken Williams)
* install: added a reference to "Static debian package" (by David
Huggins-Daines)
* troubleshooting: added Windows: "Apache::DBI and
PERL_STARTUP_DONE_CHECK" (Gerald Richter, Randy Kobes)
* performance: added KeepAlive notes (Craig, Pascal Eeftinck)
* performance: added HTML::Mason notes (Pascal Eeftinck)
* porting: added new "die() and mod_perl"
* porting/perl: moved most of the perl specific reference material
into perl.pod removing duplications of this material on the way and
replacing it with pointers to perl.pod
* performance: rewritten "Object Methods Calls Versus Function Calls"
* porting: FindBin is not mod_perl compatible (Andrei A. Voropaev,
Joao Fonseca)
* scenario: denoted the ProxyReceiveBufferSize limit by SO_RCVBUF in
kernel (Vivek Khera) and kern.ipc.maxsockbuf=2621440 on BSD (Oleg
Bartunov)
* snippets: added "mysql backup and restore scripts"
* snippets: added "Subclassing Apache::Request example"
* snippets: added "CGI::params in the mod_perl-ish way"
* debug: added "Using print() and Data::Dumper for Debugging"
* snippets: started a "Sending email from mod_perl" topic
* control: Preparing for Machine Reboot
* download: added more load ballancing URLs
* performance: added "Tuning with httperf"
* intro: added "High-Profile Sites Running mod_perl" (Rex Staples)
* review: Ged W. Haywood was very kind to review and correct the
following chapters: start, intro, strategy, porting (!), databases,
dbm, security.
* install.pod: perl Makefile.PL troubleshooting - added "A test
compilation with your Makefile configuration failed..." and
"missing/misconfigured libgdbm.so" (Tom Brown and Steve Willer)
* install.pod: make troubleshooting "unrecognized format specifier for..."
during the build process (Scott Fagg)
* porting: a bug in a script from "Exposing Apache::Registry secrets"
spotted and fixed (John Deighan)
* install.pod: integrated the "manual mod_perl build process" remarks
and patch (Robin Berjon)
* install.pod: don't put mod_perl sources in a sub-dir of Apache
sources. It wouldn't build! (Ask Bjoern Hansen)
* review: Dale Couch was very kind to review and correct the
following chapters: porting
11.13.99 ver 1.18
* An almost complete rewrite of debug.pod:
(Integrated Doug's debugging article at perlmonth.com)
Curing The "Internal Server Error"
Helping error_log to Help Us
The Importance of Warnings
diagnostics pragma
Monitoring the error_log file
Hanging processes: Detection and Diagnostics
An Example of the Code that Might Hang the Process
Detecting hanging processes
Determination of the reason
Handling the 'User pressed Stop button' case
Detecting Aborted Connections
The Importance of Cleanup Code
Critical Section
Safe Resource Locking
Cleanup Code
Handling the server timeout cases and working with $SIG{ALRM}
Watching the server
Configuration
Usage
Compiled Registry Scripts section seems to be empty.
Sometimes script works, sometimes does not
Code Debug
Locating and correcting Syntax Errors
Using Apache::FakeRequest to Debug Apache Perl Modules
Finding the Line Number the Error/Warning has been Triggered at
Using print() Function for Debugging
The Importance of Good Coding Style and Conciseness
Introduction into Perl Debugger
Interactive Perl Debugging under mod_cgi
Non-Interactive Perl Debugging under mod_perl
Interactive Perl Debugging under mod_perl
Interactive Perl Debugging under mod_perl and ptkdb
Debugging core Dumping Code
Apache::Debug
Debugging Core Dumps
Debug Tracing
gdb says there are no debugging symbols
Debugging Signal Handlers ($SIG{FOO})
Code Profiling
Devel::Peek
How can I find if my mod_perl scripts have memory leaks
Debugging your code in Single Server Mode
* A complete rewrite of install.pod:
(Integrated the INSTALL.* docs from the mod_perl distribution)
Installing mod_perl in 10 Minutes and 10 Command Lines
The Gory Details
Sources Configuration (perl Makefile.PL ...)
Configuration parameters
APACHE_SRC
DO_HTTPD, NO_HTTPD, PREP_HTTPD
Callback Hooks
EVERYTHING
PERL_TRACE
APACHE_HEADER_INSTALL
PERL_STATIC_EXTS
PERL_MARK_WHERE
APACHE_PREFIX
APACI_ARGS
Reusing Configuration Parameters
Discovering whether some option was configured
Using an alternative Configuration file
mod_perl Building (make)
make Troubleshooting
undefined reference to 'Perl_newAV'
Built Server Testing (make test)
Manual Testing
make test Troubleshooting
make test fails
mod_perl.c is incompatible with this version of apache
make test......skipping test on this platform
Installation (make install)
Building Apache and mod_perl by Hand
Installation Scenarios for Standalone mod_perl
The All-In-One Way
The Flexible Way
Build mod_perl as DSO inside Apache source tree via APACI
Build mod_perl as DSO outside Apache source tree via APXS
Installation Scenarios for mod_perl and Other Components
mod_perl and mod_ssl (+openssl)
mod_perl and mod_ssl Rolled from RPMs
mod_perl and apache-ssl (+openssl)
mod_perl and Stronghold
Note For Solaris 2.5 users
mod_perl Installation with CPAN.pm's Interactive Shell
Installing on multiple machines
using RPM, DEB and other packages to install mod_perl
A word on mod_perl RPM packages
Getting Started
Compiling RPM source files
Mix and Match RPM and source
Installing a single apache+mod_perl RPM
Compiling libapreq (Apache::Request) with the RH 6.0 mod_perl RPM
Installing separate Apache and mod_perl RPMs
Testing the mod_perl API
Installation Without Superuser Privileges
Installing Perl Modules into a Directory of Choice
Making Your Scripts Find the Locally Installed Modules
CPAN.pm Shell and Locally Installed Modules
Making a Local Apache Installation
Actual Local mod_perl Enabled Apache Installation
Local mod_perl Enabled Apache Installation with CPAN.pm
Automating installation
How can I tell whether mod_perl is running
Testing by checking the error_log file
Testing by viewing /perl-status
Testing via telnet
Testing via a CGI script
Testing via lwp-request
General Notes
Should I rebuild mod_perl if I have upgraded my perl?
Perl installation requirements
mod_auth_dbm nuances
Stripping apache to make it almost perl-server
Saving the config.status Files with mod_perl, php, ssl and Other Components
Should I Build mod_perl with gcc or cc?
OS Related Notes
* databases: added "Debugging code which deploys DBI"
* porting: added "STDIN, STDOUT and STDERR streams"
* advocacy: added "A summary of perl/cgi discussion at slashdot.org"
* snippets: added "Terminating a child process on Request Completion"
(Doug)
* troubleshooting": added "Apache.pm failed to load!" (Doug)
* snippets: added "Reading POST Data, then Redirecting" (Doug)
* snippets: added "Cache control for regular and error modes" (Cliff
Rayman)
* performance: added "Be carefull with symbolic links" (the same
script compiled twice)
* install: new "apache/mod_perl/mod_ssl Rolled from RPMs Scenario"
(Stephane Benoit)
* porting: 'use subs (exit)' typo fixed (Chris Nokleberg)
* warnings.pod was renamed to troubleshooting.pod and now it's
categorized by the following sections:
Building and Installation
Configuration and Startup
Code Parsing and Compilation
Runtime
Shutdown and Restart
* porting: the following sections were moved to debug.pod:
"Finding the Line Number the Error/Warning has been Triggered at",
"Turning warnings ON",
"diagnostics pragma"
* porting: rewritten "Comman line Switches (-w, -T, etc)"
* performance: "Forking or Executing subprocesses from mod_perl"
updated with another CHLD sighandler using WNOHANG to reap zombie
processes (Lincoln Stein)
* install: updated "Testing via a CGI script" (Geoffrey S Young)
* porting: updated "Terminating requests and processes, exit()
function" with info about post_request termination,
Apache::SizeLimit and Apache::GTopLimit
* perlformance: links from
http://www.realtime.net/~parkerm/perl/conf98/index.htm
and http://www.realtime.net/~parkerm/perl/conf98/sld006.htm
were dead (I removed them :( (Peter Skov)
* snippets: added "Caching the POSTed Data" (Doug)
* install: "Compiling libapreq with mod_perl RPM" reviewed and
corrected (Geoffrey S Young)
* status.pod has been eliminated and absorbed by debug.pod where it
belong
* Fixed pod translator. Now it handles correctly C<$r-E Table of Contents:
mod_perl Advocacy
[ Prev | Main Page | Next ]
The
Writing Apache Modules with Perl and C
book can be purchased online from O'Reilly
and
Amazon.com.
Your corrections of either technical or grammatical
errors are very welcome. You are encouraged to help me
to improve this guide. If you have something to
contribute please send it
directly to me.
Your need for scalability and flexibility depends on your needs from the web. If you want only a simple guest book or database gateway with no feature headroom, you can get away with any EASY_AND_FAST_TO_DEVELOP_TOOL (Exchange, MS IIS, Lotus Notes, etc).
Experience shows that you will soon want more functionality, that's the point you'll discover the limitations of these ``easy'' tools. Gradually, your boss will ask for increasing functionality and at some point you'll realize that the tool lacks flexibility and/or scalability. Then your boss will either buy another EASY_AND_FAST_TO_DEVELOP_TOOL and repeat the process (with different unforseen problems), or you'll start investing time learning how to use a powerful, flexible tool to make the long-term development cycle easier.
If you and your company are serious about delivering flexible Internet functionality, do your homework. Then urge your boss to invest a little extra time and resources to choose the right tool for the job. Your long-term Internet site will prove the results.
Each developer has a boss who participates in the decision-making process. Remember that the boss considers input from sales people, developers, the media and associates before handing down large decisions. Of course, results count! A sales brochure makes very little impact compared to a working demonstration, and demonstrations of company-specific and developer-specific results count big!
Personally, when I discovered mod_perl I did a lot of testing and coding at home and at work. Once I had a working heavy application, I came to my boss with 2 URLs - one for the plain CGI server and the other for the mod_perl-enabled server. It took about 30 secs for my boss to say: `Go with it''. Of course the moment I did it, I have had to provide all the support for other developers, that is why I took time to learn it in first place (that is how this guide was born!).
Chances are that if you've done your homework, you've learned the tools and can deliver results, you'll have a successful project. If you convince your boss to try a tool that you don't know very well, your results may suffer. If your boss follows your development process closely and sees much worse than expected progress, he might say ``forget it'' and wish never to give mod_perl a second chance.
Advocacy is a great thing for the open-source software movement, but it's best done quietly until you have confidence that you can show productivity. If you can demonstrate to your boss a heavy CGI which is running much faster under mod_perl, that may be a strong argument for further evaluation. Your company may even sponsor a portion of your learning process.
Learn the technology by working on sample projects. Learn how to support yourself and learn how to get support from the community; then advocate your ideas to your boss. Then you'll have the knowledge; your company will have the benefit; and mod_perl will have the reputation it deserves.
Well, there was a nice discussion of merits of Perl in CGI world. I took the time to summarize this thread, so here is what I've got:
Perl Domination in CGI Programming? http://slashdot.org/askslashdot/99/10/20/1246241.shtml
Perl is cool and fun to code with
Perl is very fast to develop with
Perl is even faster to develop with if you know what CPAN is :)
Math intensive code and other stuff which is faster in C/C++, plugged in into Perl with XS/SWIG and transparent to perl user of the modules.
Most CGI apps do text processing, where perl excels at
Forking and loading (unless code is shared) a C/C++ optimized CGI produces an overhead
Bandwidth is a bigger bottleneck than perl performance (vs C/C++) (not true for Intranets, and might change for Internet in a number of years)
For database driven apps, db itself is a bottleneck. lots of posts talk about latency vs throughput.
mod_perl, FastCGI, velocigen and perlexec are good solutions for plain mod_cgi slowness
other light alternatives to perl and its derivatives mentioned: PHP, Pyhton
well, there were almost no voices from the M$ and alike technologies users, I guess that's because they don't read /. :)
many said that in many people's minds: 'CGI' eq 'perl' > 0 (the entropy of perl grows bigger :)
|
|
||
|
Written by Stas Bekman.
Last Modified at 11/13/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
In a URL such as http://my.site.com/foo.pl?foo=bar®=foobar , some browsers will interpret ® as a magic entity, and encode it as
®, which will result in a corrupted QUERY_STRING. If you encounter this problem you should either avoid using such keys or
separate parameter pairs with ; instead of &. Both CGI.pm and
Apache::Request support a semicolon instead of an ampersand as a separator. So your URI
should look like:
http://my.site.com/foo.pl?foo=bar;reg=foobar.
Note that this is only an issue when you are building your own URLs with query strings. It is not a problem when the URL is the result of submitting a form because the browsers _have_ to get that right.
One problem with publishing 8080 port numbers is that (so I was told) IE 4.x has a bug when re-posting data to a non-port-80 URL. It drops the port designator and uses port 80 anyway.
See Publishing port numbers different from 80
|
|
||
|
Written by Stas Bekman.
Last Modified at 07/29/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
The next step after building and installing your new mod_perl enabled
apache server, is to configure the server. To learn how to modify apache's
configuration files, please refer to the documentation included with the
apache distribution, or just view the files in
conf directory and follow the instructions in these files - the embedded
comments within the file do a good job of explaining the options.
Before you start with mod_perl specific configuration, first configure apache, and see that it works. When done, return here to continue...
[ Note that prior to version 1.3.4, the default apache install used three configuration files -- httpd.conf, srm.conf, and access.conf. The 1.3.4 version began distributing the configuration directives in a single file -- httpd.conf. The remainder of this chapter refers to the location of the configuration directives using their historical location. ]
First, you need to specify the locations on a file-system for the scripts to be found.
Add the following configuration directives:
# for plain cgi-bin:
ScriptAlias /cgi-bin/ /usr/local/myproject/cgi/
# for Apache::Registry mode
Alias /perl/ /usr/local/myproject/cgi/
# Apache::PerlRun mode
Alias /cgi-perl/ /usr/local/myproject/cgi/
Alias provides a mapping of URL to file system object under
mod_perl. ScriptAlias is being used for mod_cgi.
Alias defines the start of the URL path to the script you are referencing.
For example, using the above configuration, fetching
http://www.nowhere.com/perl/test.pl, will cause the server to look for the file test.pl at /usr/local/myproject/cgi, and execute it as an Apache::Registry script if we define Apache::Registry to be the handler of /perl location (see below). The URL
http://www.nowhere.com/perl/test.pl will be mapped to
/usr/local/myproject/cgi/test.pl. This means that you can have all your CGIs located at the same place in
the file-system, and call the script in any of three modes simply by
changing the directory name component of the URL (cgi-bin|perl|cgi-perl) - is not this neat? (That is the configuration you see above - all three
Aliases point to the same directory within your file system, but of course
they can be different). If your script does not seem to be working while
running under mod_perl, you can easily call the script in straight mod_cgi
mode without making any script changes (in most cases), but rather by
changing the URL you invoke it by.
FYI: for modperl ScriptAlias is the same thing as:
Alias /foo/ /path/to/foo/ SetHandler cgi-handler
where SetHandler cgi-handler invokes mod_cgi. The latter will be overwritten if you enable Apache::Registry. In other words,
ScriptAlias does not work for mod_perl, it only appears to work when the additional
configuration is in there. If the
Apache::Registry configuration came before the ScriptAlias, scripts would be run under mod_cgi. While handy, ScriptAlias is a known kludge, always better to use Alias and SetHandler.
Of course you can choose any other Alias (you will use it later in
httpd.conf), you can choose to use all three modes or only one of these. It is
undesirable to run scripts in plain mod_cgi from a mod_perl-enabled server
- the price is too high, it is better to run these on plain apache server.
(See Standalone mod_perl Enabled Apache Server)
Now we will work with the httpd.conf file. I add all the mod_perl stuff at the end of the file, after the native
apache configurations.
First we add:
<Location /perl>
#AllowOverride None
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
This configuration causes all scripts that are called with a /perl
path prefix to be executed under the Apache::Registry module and as a CGI (so the ExecCGI, if you omit this option the script will be printed to the caller's
browser as a plain text or possibly will trigger a 'Save-As' window).
PerlSendHeader On tells the server to send an HTTP header to the browser on every script
invocation. You will want to turn this off for nph (non-parsed-headers)
scripts. PerlSendHeader On means to call
ap_send_http_header() after parsing your script headers. It is only meant for CGI emulation, its
always better to use CGI->header
from CGI.pm module or $r->send_http_header directly.
Remember the Alias from the section above? We must use the same
Alias here, if you use Location that does not have the same
Alias defined in srm.conf, the server will fail to locate the script in the file system. (We are
talking about script execution here -- there are cases where Location is something that is being executed by the server itself, without having
the corresponding file, like /perl-status location.)
Note that sometimes you will have to add :
PerlModule Apache::Registry
before you specify the location that uses Apache::Registry as a
PerlHandler. Basically you can start running the scripts in the
Apache::Registry mode...
You have nothing to do about /cgi-bin location (mod_cgi), since it has nothing to do with mod_perl.
Here is a similar location configuration for Apache::PerlRun (More about Apache::PerlRun):
<Location /cgi-perl>
#AllowOverride None
SetHandler perl-script
PerlHandler Apache::PerlRun
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
You may load modules from the config file at server startup via:
PerlModule Apache::DBI CGI DBD::Mysql
There is a limit of 10 PerlModule's, if you need more to be loaded when the server starts, use one PerlModule to pull in many or write them all in a regular perl syntax and put them
into a startup file which can be loaded with use of the PerlRequire directive.
PerlRequire /home/httpd/perl/lib/startup.pl
Both PerlModule and PerlRequire are implemented by require(), but there is a subtle change. PerlModule works like use(), expecting a module name without .pm extension and slashes.
Apache::DBI is OK, while Apache/DBI.pm is not. PerlRequire is the opposite to PerlModule -- it expects a relative or full path to the module or a filename, like in
the example above.
As with any file that's being required() -- it must return a true
value, to ensure that this happens don't forget to add 1; at the end of such files.
We must stress that all the code that is run at the server initialization
time is run with root priveleges if you are executing it as a root user
(you have to unless you choose an unpriveledged port, above 1024.
somethings that you might have to if you don't have a root access. Just
remember that you better pick a well known port like 8000 or 8080 since
other non-standard ports might be blocked by firewalls that protect many
organizations and individuals). This means that anyone who has write access
to a script or module that is loaded by PerlModule or PerlRequire, effectively has root access to the system. You might want to take a look
at the new and experimental PerlOpmask directive and PERL_OPMASK_DEFAULT
compile time option to try to disable some dangerous operators.
As you know Apache specifies about 11 phases of the request loop, namely in that order: Post-Read-Request, URI Translation, Header Parsing, Access Control, Authentication, Authorization, MIME type checking, FixUp, Response (Content phase). Logging and finally Cleanup. These are the stages of a request where the Apache API allows a module to step in and do something. There is a dedicated PerlHandler for each of these stages. Namely:
PerlChildInitHandler
PerlPostReadRequestHandler
PerlInitHandler
PerlTransHandler
PerlHeaderParserHandler
PerlAccessHandler
PerlAuthenHandler
PerlAuthzHandler
PerlTypeHandler
PerlFixupHandler
PerlHandler
PerlLogHandler
PerlCleanupHandler
PerlChildExitHandler
The first 4 handlers cannot be used in the <Location>,
<Directory>, <Files> and .htaccess file, the main reason is all the above require a known path to the file in
order to bind a requested path with one or more of the identifiers above.
Starting from PerlHeaderParserHandler (5th) URI is allready being mapped to a physical pathname, thus can be used
to match the <Location>,
<Directory> or <Files> configuration section, or to look at
.htaccess file if exists at the specified directory in the translated path.
The Apache documentation (or even better -- the ``Writing Apache Modules with Perl and C'' book by Doug MacEachern and Lincoln Stein) will tell you all about those stages and what your modules can do. By default, these hooks are disabled at compile time, see the INSTALL document for information on enabling these hooks.
Note that by default Perl API expects a subrotine called handler to handle the request in the registered PerlHandler module. Thus if your
module implements this subrotine, you can register the handler as simple as
writing:
Perl*Handler Apache::SomeModule
replace Perl*Handler with a wanted name of the handler. mod_perl will preload the specified
module for you. But if you decide to give the handler code a different
name, like my_handler, you must preload the module and to write explicitly the chosen name.
PerlModule Apache::SomeModule Perl*Handler Apache::SomeModule::my_handler
Please note that the former approach will not preload the module at the
startup, so either explicitly preload it with PerlModule
directive, add it to the startup file or use a nice shortcut the
Perl*Handler syntax suggests:
Perl*Handler +Apache::SomeModule
Notice the leading + character. It's equal to:
PerlModule Apache::SomeModule Perl*Handler Apache::SomeModule
If a module wishes to know what handler is currently being run, it can find out with the current_callback method. This method is most useful to PerlDispatchHandlers who wish to only take action for certain phases.
if($r->current_callback eq "PerlLogHandler") {
$r->warn("Logging request");
}
With the mod_perl stacked handlers mechanism, it is possible for more than
one Perl*Handler to be defined and run during each stage of a request.
Perl*Handler directives can define any number of subroutines, e.g. (in config files)
PerlTransHandler OneTrans TwoTrans RedTrans BlueTrans
With the method, Apache->push_handlers(), callbacks can be added to the stack by scripts at runtime by mod_perl
scripts.
Apache->push_handlers() takes the callback hook name as its first argument and a subroutine name or
reference as its second. e.g.:
Apache->push_handlers("PerlLogHandler", \&first_one);
$r->push_handlers("PerlLogHandler", sub {
print STDERR "__ANON__ called\n";
return 0;
});
After each request, this stack is cleared out.
All handlers will be called unless a handler returns a status other than OK or DECLINED.
example uses:
CGI.pm maintains a global object for its plain function interface. Since the
object is global, it does not go out of scope, DESTROY is never called. CGI->new can call:
Apache->push_handlers("PerlCleanupHandler", \&CGI::_reset_globals);
This function will be called during the final stage of a request,
refreshing CGI.pm's globals before the next request comes in.
Apache::DCELogin establishes a DCE login context which must exist for the lifetime of a
request, so the DCE::Login object is stored in a global variable. Without stacked handlers, users must
set
PerlCleanupHandler Apache::DCELogin::purge
in the configuration files to destroy the context. This is not
``user-friendly''. Now, Apache::DCELogin::handler can call:
Apache->push_handlers("PerlCleanupHandler", \&purge);
Persistent database connection modules such as Apache::DBI could push a PerlCleanupHandler handler that iterates over %Connected, refreshing connections or just checking that ones have not gone stale.
Remember, by the time we get to PerlCleanupHandler, the client has what it wants and has gone away, we can spend as much time
as we want here without slowing down response time to the client (but the
process is unavailable for serving new request befor the operation is
completed).
PerlTransHandlers may decide, based on URI or other condition, whether or not to handle a
request, e.g. Apache::MsqlProxy. Without stacked handlers, users must configure:
PerlTransHandler Apache::MsqlProxy::translate PerlHandler Apache::MsqlProxy
PerlHandler is never actually invoked unless translate() sees the request is a proxy request ($r->proxyreq), if it is a proxy request, translate() sets $r->handler("perl-script"), only then will PerlHandler handle the request. Now, users do not have to specify PerlHandler Apache::MsqlProxy, the translate()
function can set it with push_handlers().
Includes, footers, headers, etc., piecing together a document, imagine (no need for SSI parsing!):
PerlHandler My::Header Some::Body A::Footer
A little test:
#My.pm package My;
sub header {
my $r = shift;
$r->content_type("text/plain");
$r->send_http_header;
$r->print("header text\n");
}
sub body { shift->print("body text\n") }
sub footer { shift->print("footer text\n") }
1;
__END__
#in config <Location /foo> SetHandler "perl-script" PerlHandler My::header My::body My::footer </Location>
Parsing the output of another PerlHandler? this is a little more tricky, but consider:
<Location /foo> SetHandler "perl-script" PerlHandler OutputParser SomeApp </Location> <Location /bar> SetHandler "perl-script" PerlHandler OutputParser AnotherApp </Location>
Now, OutputParser goes first, but it untie()'s *STDOUT and re-tie()'s to its own package like so:
package OutputParser;
sub handler {
my $r = shift;
untie *STDOUT;
tie *STDOUT => 'OutputParser', $r;
}
sub TIEHANDLE {
my($class, $r) = @_;
bless { r => $r}, $class;
}
sub PRINT {
my $self = shift;
for (@_) {
#do whatever you want to $_
$self->{r}->print($_ . "[insert stuff]");
}
}
1; __END__
To build in this feature, configure with:
% perl Makefile.PL PERL_STACKED_HANDLERS=1 [PERL_FOO_HOOK=1,etc]
Another method Apache->can_stack_handlers will return TRUE if mod_perl was configured with PERL_STACKED_HANDLERS=1, FALSE otherwise.
If a Perl*Handler is prototyped with $$, this handler will be invoked as method. e.g.
package My;
@ISA = qw(BaseClass);
sub handler ($$) {
my($class, $r) = @_;
...;
}
package BaseClass;
sub method ($$) {
my($class, $r) = @_;
...;
}
__END__
Configuration:
PerlHandler My
or
PerlHandler My->handler
Since the handler is invoked as a method, it may inherit from other classes:
PerlHandler My->method
In this case, the My class inherits this method from BaseClass.
To build in this feature, configure with:
% perl Makefile.PL PERL_METHOD_HANDLERS=1 [PERL_FOO_HOOK=1,etc]
To reload PerlRequire, PerlModule, other use()'d modules and flush the Apache::Registry cache on server restart, add:
PerlFreshRestart On Make sure you read L<Evil things might happen when using PerlFreshRestart|warnings/Evil_things_might_happen_when_us>.
A very useful feature. You can watch what happens to the perl guts of the server. Below you will find the instructions of configuration and usage of this feature
Add this to httpd.conf:
<Location /perl-status>
SetHandler perl-script
PerlHandler Apache::Status
order deny,allow
#deny from all
#allow from
</Location>
If you are going to use Apache::Status, it's important to put it as a first module in the start-up file, or in
the httpd.conf (after
Apache::Registry):
# startup.pl use Apache::Registry (); use Apache::Status (); use Apache::DBI ();
If you don't put Apache::Status before Apache::DBI then you don't get Apache::DBI's menu entry in status.
Assuming that your mod_perl server listens to port 81, fetch http://www.nowhere.com:81/perl-status
Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16 process 187138, running since Thu Nov 19 09:50:33 1998
This is the linked menu that you should see:
Signal Handlers Enabled mod_perl Hooks PerlRequire'd Files Environment Perl Section Configuration Loaded Modules Perl Configuration ISA Tree Inheritance Tree Compiled Registry Scripts Symbol Table Dump
Let's follow for example : PerlRequire'd Files -- we see:
PerlRequire Location /usr/myproject/lib/apache-startup.pl /usr/myproject/lib/apache-startup.pl
From some menus you can continue deeper to peek at the perl internals of the server, to watch the values of the global variables in the packages, to the list of cached scripts and modules and much more. Just click around...
Sometimes when you fetch /perl-status you and follow the Compiled
Registry Scripts link from the status menu -- you see no listing of scripts at all. This is
absolutely correct -- Apache::Status shows the registry scripts compiled in the httpd child which is serving
your request for /perl-status. If a child has not compiled yet the script you are asking for, /perl-status will just show you the main menu. This usually happens when the child was
just spawned.
PerlSetEnv key val PerlPassEnv key
PerlPassEnv passes, PerlSetEnv sets and passes the
ENVironment variables to your scripts. you can access them in your scripts through %ENV (e.g. $ENV{"key"}).
Regarding the setting of PerlPassEnv PERL5LIB in httpd.conf If you turn on taint checks (PerlTaintMode On), $ENV{PERL5LIB} will be ignored (unset).
PerlSetVar is very similar to PerlSetEnv, but you extract it with another method. In <Perl> sections:
push @{ $Location{"/"}->{PerlSetVar} }, [ 'FOO' => BAR ];
and in the code you read it with:
my $r = Apache->request;
print $r->dir_config('FOO');
Since many times you have to add many perl directives to the configuration
file, it can be a good idea to put all of these into a one file, so the
configuration file will be cleaner. Add the following line to httpd.conf:
# startup.perl loads all functions that we want to use within
# mod_perl
Perlrequire /path/to/startup.pl
before the rest of the mod_perl configuration directives.
Also you can call perl -c perl-startup to test the file's syntax. What does this take?
An example of perl-startup file:
use strict;
# extend @INC if needed
use lib qw(/dir/foo /dir/bar);
# make sure we are in a sane environment.
$ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/
or die "GATEWAY_INTERFACE not Perl!";
# for things in the "/perl" URL
use Apache::Registry;
#load perl modules of your choice here
#this code is interpreted *once* when the server starts
use LWP::UserAgent ();
use DBI ();
# tell me more about warnings
use Carp ();
$SIG{__WARN__} = \&Carp::cluck;
# Load CGI.pm and call its compile() method to precompile
# (but not to import) its autoloaded methods.
use CGI ();
CGI->compile(':all');
Note that starting with $CGI::VERSION 2.46, the recommended method to precompile the code in CGI.pm is:
use CGI qw(-compile :all);
But the old method is still available for backward compatibility.
See also Apache::Status
Modules that are being loaded at the server startup will be shared among
server children, so only one copy of each module will be loaded, thus
saving a lot of RAM for you. Usually I put most of the code I develop into
modules and preload them from here. You can even preload your CGI script
with Apache::RegistryLoader and preopen the DB connections with Apache::DBI. (See Preload Perl modules at server startup).
Many people wonder, why there is a need for duplication of use()
clause both in startup file and in the script itself. The question rises
from misunderstanding of the use() operand. use() consists of two other operands, namely require() and import(). So when you write:
use Foo qw(bar);
perl actually does:
require Foo.pm; import qw(bar);
When you write:
use Foo qw();
perl actually does:
require Foo.pm; import qw();
which means that the caller does not want any symbols to be imported. Why
is this important? Since some modules has @EXPORT set to a list of tags to be exported by default and when you write:
use Foo;
and think nothing is being imported, the import() call is being executed and probably some symbols do being imported. See the
docs/source of the module in question to make sure you use() it correctly. When you write your own modules, always remember that it's
better to use @EXPORT_OK instead of @EXPORT, since the former doesn't export tags unless it was asked to.
Since the symbols that you might import into a startup's script namespace
will be visible by none of the children, scripts that need a
Foo's module exported tags have to pull it in like if you did not preload Foo at the startup file. For example, just because you have
use()d Apache::Constants in the startup script, does not mean you can have the following handler:
package MyModule;
sub {
my $r = shift;
## Cool stuff goes here
return OK;
}
1;
You would either need to add:
use Apache::Constants qw( OK );
Or instead of return OK; say:
return Apache::Constants::OK;
See the manpage/perldoc on Exporter and perlmod for more on
import().
PerlRequire allows you to execute code that preloads modules and does more things.
Imported or defined variables are visible in the scope of the startup file.
It is a wrong assumption that global variables that were defined in the
startup file, will be accessible by child processes.
You do have to define/import variables in your scripts and they will be
visible inside a child process who run this script. They will be not shared
between siblings. Remember that every script is running in a specially
(uniquely) named package - so it cannot access variables from other
packages unless it inherits from them or use()'s them.
apachectl configtest tests the configuration file without starting the server. You can safely
modify the configuration file on your production server, if you run this
test before you restart the server. Of course it is not 100% error prone,
but it will reveal any syntax errors you might do while editing the file.
'apachectl configtest' is the same as 'httpd -t' and it actually executes the code in startup.pl, not just parses it. <Perl> configuration has always started Perl during the configuration read,
Perl{Require,Module} do so as well.
If you want your startup code to get a control over the -t
(configtest) server launch, start the server configuration test with:
httpd -t -Dsyntax_check
and in your startup file, add (at the top):
return if Apache->define('syntax_check');
if you want to prevent the code in the file from being executed.
For PerlWarn and PerlTaintCheck see Switches -w, -T
See Tuning the Apache's configuration variables for the best performance
It is advised not to publish the 8080 (or alike) port number in URLs, but rather using a proxying rewrite rule in the thin (httpd_docs) server:
RewriteRule .*/perl/(.*) http://my.url:8080/perl/$1 [P]
One problem with publishing 8080 port numbers is that I was told that IE 4.x has a bug when re-posting data to a non-port-80 url. It drops the port designator, and uses port 80 anyway.
With <Perl></Perl> sections, it is possible to configure your server entirely in Perl.
<Perl> sections can contain *any* and as much Perl code as you wish. These
sections are compiled into a special package whose symbol table mod_perl
can then walk and grind the names and values of Perl variables/structures
through the apache core configuration gears. Most of the configurations
directives can be represented as scalars ($scalar) or lists (@list). An @List inside these sections is simply converted into a space delimited string for
you inside. Here is an example:
#httpd.conf <Perl> @PerlModule = qw(Mail::Send Devel::Peek); #run the server as whoever starts it $User = getpwuid($>) || $>; $Group = getgrgid($)) || $); $ServerAdmin = $User; </Perl>
Block sections such as <Location..</Location>> are represented in a
%Location hash, e.g.:
$Location{"/~dougm/"} = {
AuthUserFile => '/tmp/htpasswd',
AuthType => 'Basic',
AuthName => 'test',
DirectoryIndex => [qw(index.html index.htm)],
Limit => {
METHODS => 'GET POST',
require => 'user dougm',
},
};
If a Directive can take two *or* three arguments you may push strings and
the lowest number of arguments will be shifted off the @List or use array reference to handle any number greater than the minimum for
that directive:
push @Redirect, "/foo", "http://www.foo.com/"; push @Redirect, "/imdb", "http://www.imdb.com/"; push @Redirect, [qw(temp "/here" "http://www.there.com")];
Other section counterparts include %VirtualHost, %Directory and
%Files.
To pass all environment variables to the children with a single configuration directive, rather than listing each one via PassEnv or PerlPassEnv, a <Perl> section could read in a file and:
push @PerlPassEnv, [$key => $val];
or
Apache->httpd_conf("PerlPassEnv $key $val");
These are somewhat simple examples, but they should give you the basic
idea. You can mix in any Perl code your heart desires. See
eg/httpd.conf.pl and eg/perl_sections.txt in mod_perl distribution for some examples.
A tip for syntax checking outside of httpd:
<Perl> # !perl #... code here ... __END__ </Perl>
Now you may run:
perl -cx httpd.conf
To enable <Perl> sections you should build mod_perl with perl
Makefile.PL PERL_SECTIONS=1.
You can watch how have you configured the <Perl> sections through the /perl-status location, by choosing the Perl Sections from the menu.
You can dump the configuration by <Perl> sections configuration this way:
<Perl> use Apache::PerlSections(); ... print STDERR Apache::PerlSections->dump(); </Perl>
Alternatively you can store it in a file:
Apache::PerlSections->store("httpd_config.pl");
You can then require() that file in some other <Perl> section.
mod_macro is an Apache module written by Fabien Coelho that lets you define and use macros in the Apache configuration file.
mod_macro proved really useful when you have many virtual hosts, each virtual host has a number of scripts/modules, most of them with a moderately complex configuration setup.
First download the latest version of mod_macro from http://www.cri.ensmp.fr/~coelho/mod_macro/ , and configure your Apache server to use this module.
Here are some useful macros for mod_perl users:
# set up a registry script
<Macro registry>
SetHandler "perl-script"
PerlHandler Apache::Registry
Options +ExecCGI
</Macro>
# example
Alias /stuff /usr/www/scripts/stuff
<Location /stuff>
Use registry
</Location>
If your registry scripts are all located in the same directory, and your aliasing rules consistent, you can use this macro:
# set up a registry script for a specific location
<Macro registry $location $script>
Alias /script /usr/www/scripts/$script
<Location $location>
SetHandler "perl-script"
PerlHandler Apache::Registry
Options +ExecCGI
</Location>
</Macro>
# example
Use registry stuff stuff.pl
If you're using content handlers packaged as modules, you can use the following macro:
# set up a mod_perl content handler module
<Macro modperl $module>
SetHandler "perl-script"
Options +ExecCGI
PerlHandler $module
</Macro>
#examples
<Location /perl-status>
PerlSetVar StatusPeek On
PerlSetVar StatusGraph On
PerlSetVar StatusDumper On
Use modperl Apache::Status
</Location>
The following macro sets up a Location for use with HTML::Embperl. Here we define all ``.html'' files to be processed by Embperl.
<Macro embperl>
SetHandler "perl-script"
Options +ExecCGI
PerlHandler HTML::Embperl
PerlSetEnv EMBPERL_FILESMATCH \.html$
</Macro>
# examples
<Location /mrtg>
Use embperl
</Location>
Macros are also very useful for things that tend to be verbose, such as setting up Basic Authentication:
# Sets up Basic Authentication
<Macro BasicAuth $realm $group>
Order deny,allow
Satisfy any
AuthType Basic
AuthName $realm
AuthGroupFile /usr/www/auth/groups
AuthUserFile /usr/www/auth/users
Require group $group
Deny from all
</Macro>
# example of use
<Location /stats>
Use BasicAuth WebStats Admin
</Location>
Finally, here is a complete example that uses macros to set up simple virtual hosts. It uses the BasicAuth macro defined previously (yes, macros can be nested!).
<Macro vhost $ip $domain $docroot $admingroup>
<VirtualHost $ip>
ServerAdmin webmaster@$domain
DocumentRoot /usr/www/htdocs/$docroot
ServerName www.$domain
<Location /stats>
Use BasicAuth Stats-$domain $admingroup
</Location>
</VirtualHost>
</Macro>
# define some virtual hosts
Use vhost 10.1.1.1 example.com example example-admin
Use vhost 10.1.1.2 example.net examplenet examplenet-admin
mod_macro also useful in a non vhost setting. Some sites for example have lots of scripts where people use to view various statistics, email settings and etc. It is much easier to read things like:
use /forwards email/showforwards use /webstats web/showstats
Check your configuration files and make sure that the ``ExecCGI'' is turned on in your configurations.
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
Did you put PerlSendHeader On in the configuration part of the <Location foo></Location>?
No. Any virtual host will be able to see the routines from a startup.pl loaded for any other virtual host.
You can use 'PerlSetEnv PERL5LIB ...' or a PerlFixupHandler w/ the lib pragma.
Even a better way is to use Apache::PerlVINC
This has been a bug before, last fixed in 1.15_01, i.e. if you are running 1.15, that could be the problem. You should set this variable in a startup file (PerlRequire):
$Apache::Registry::NameWithVirtualHost = 1;
But, as we know sometimes bug turns into a feature. If there is the same script running for more than one Virtual host on the same machine, this can be a waste, right? Set it to 0 in a startup script if you want to turn it off and have this bug as a feature. (Only makes sense if you are sure that there will be no otherscripts named by the same path/name). It also saves you some memory on the way.
$Apache::Registry::NameWithVirtualHost = 0;
The problem was reported by users who declared mod_perl configuration inside a <Directory> section for all files matching to *.pl. The problem has gone away after placing the usage of mod_perl in a <File>- section.
It is better not to advertise the port mod_perl server running at to the
outside world for it creates a potential security risk by revealing which
module(s) and/or OS you are running your web server on.
The more modules you have in your web server, the more complex the code in your webserver.
The more complex the code in your web server, the more chances for bugs.
The more chance for bugs, the more chance that some of those bugs may involve security.
Never was completely sure why the default of the ServerToken directive in Apache is Full rather than Minimal. Seems like you would only make it full if you are debugging.
For more information see Publishing port numbers different from 80
Another approach is to modify httpd sources to reveal no unwanted
information, so if you know the port the HEAD request will return an empty or phony Server: field.
Let's say that you want all the file in a specific directory and below to be handled the same way, but a few of them to be handled somewhat different. For example:
<Directory /home/foo>
<FilesMatch "\.(html|txt)$">
SetHandler perl-script
PerlHandler Apache::AddrMunge
</FilesMatch>
</Directory>
Alternatively you can use <Files> inside an .htaccess file.
Note that you cannot have Files derective inside Location, but you can have Files inside Directory.
When the server is restarted. the configuration and module initialization phases are called again (twice in total). To ensure that the future restart will workout correctly, Apache actually runs these two phases twice during server startup, to check that all modules can survive a restart.
(META: And add an example that writes to the log file - I was restarted 1, 2 times)
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/18/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
All of these techniques require that you know the server PID (Process ID).
The easiest way to find the PID is to look it up in the httpd.pid file. With my configuration it exists as
/usr/local/var/httpd_perl/run/httpd.pid. It's easy to discover where to look at, by checking out the httpd.conf
file. Open the file and locate the entry PidFile:
PidFile /usr/local/var/httpd_perl/run/httpd.pid
Another way is to use the ps and grep utilities:
% ps auxc | grep httpd_perl
or maybe:
% ps -ef | grep httpd_perl
This will produce a list of all httpd_perl (the parent and the children) processes. You are looking for the parent
process. If you run your server as root - you will easily locate it, since
it belongs to root. If you run the server as user (when you don't have a root access, most likely all the processes will belong to that user (unless defined
differently in the
httpd.conf), but it's still easy to know 'who is the parent' -- the one of the
smallest size...
You will notice many httpd_perl executables running on your system, but you should not send signals to any
of them except the parent, whose pid is in the PidFile. That is to say you shouldn't ever need to send signals to any process
except the parent. There are three signals that you can send the parent: TERM, HUP, and USR1.
We will concentrate here on the implications of sending these signals to a mod_perl enabled server. For documentation on the implications of sending these signals to a plain Apache server see http://www.apache.org/docs/stopping.html .
Sending the TERM signal to the parent causes it to immediately attempt to kill off all of its children. This process may take several seconds to complete, following which the parent itself exits. Any requests in progress are terminated, and no further requests are served.
That's the moment that the accumulated END blocks will be executed! Note that if you use Apache::Registry or Apache::PerlRun, then
END blocks are being executed upon each request (at the end).
Sending the HUP signal to the parent causes it to kill off its children like in TERM (Any requests in progress are terminated) but the parent doesn't exit. It re-reads its configuration files, and re-opens any log files. Then it spawns a new set of children and continues serving hits.
The server will reread its configuration files, flush all the compiled and preloaded modules, and rerun any startup files. It's equivalent to stopping, then restarting a server.
Note: If your configuration file has errors in it when you issue a restart then your parent will not restart but exit with an error. See below for a method of avoiding this.
The USR1 signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.
The only difference between USR1 and HUP is that USR1 allows children to complete any in-progress request prior to killing them off.
By default, if a server is restarted (ala kill -USR1 `cat
logs/httpd.pid` or with HUP signal), Perl scripts and modules are not reloaded. To reload PerlRequire's, PerlModule's, other
use()'d modules and flush the Apache::Registry cache, enable with this command:
PerlFreshRestart On (in httpd.conf)
Make sure you read Evil things might happen when using PerlFreshRestart.
It's worth mentioning that restart or termination can sometimes take quite
a lot of time. Check out the PERL_DESTRUCT_LEVEL=-1 option during the mod_perl perl Makefile.PL stage, which speeds this up and leads to more robust operation in the face
of problems, like running out of memory. It is only usable if no
significant cleanup has to be done by perl END
blocks and DESTROY methods when the child terminates, of course. What constitutes significant
cleanup? Any change of state outside of the current process that would not
be handled by the operating system itself. So committing database
transactions is significant but closing an ordinary file isn't.
Some folks prefer to specify signals using numerical values, rather than
symbolics. If you are looking for these, check out your
kill(3) man page. My page points to /usr/include/sys/signal.h, the relevant entries are:
#define SIGHUP 1 /* hangup, generated when terminal disconnects */ #define SIGTERM 15 /* software termination signal */ #define SIGUSR1 30 /* user defined signal 1 */
Apache's distribution provides a nice script to control the server. It's
called apachectl and it's installed into the same location with httpd. In our scenario -
it's
/usr/local/sbin/httpd_perl/apachectl.
Start httpd:
% /usr/local/sbin/httpd_perl/apachectl start
Stop httpd:
% /usr/local/sbin/httpd_perl/apachectl stop
Restart httpd if running by sending a SIGHUP or start if not running:
% /usr/local/sbin/httpd_perl/apachectl restart
Do a graceful restart by sending a SIGUSR1 or start if not running:
% /usr/local/sbin/httpd_perl/apachectl graceful
Do a configuration syntax test:
% /usr/local/sbin/httpd_perl/apachectl configtest
Replace httpd_perl with httpd_docs in the above calls to control the httpd_docs server.
There are other options for apachectl, use help option to see them all.
It's important to understand that this script is based on the PID file
which is PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid. If you delete the file by hand - apachectl will fail to run.
Also, notice that apachectl is suitable to use from within your Unix system's startup files so that
your web server is automatically restarted upon system reboot. Either copy
the apachectl file to the appropriate location (/etc/rc.d/rc3.d/S99apache works on my RedHat Linux system) or create a symlink with that name
pointing to the the canonical location. (If you do this, make certain that
the script is writable only by root -- the startup scripts have root
privileges during init processing, and you don't want to be opening any
security holes.)
You have prepared a new version of code, uploaded it into a production server, restarted it and it doesn't work. What could be worse than that? You also cannot go back, because you have overwritten the good working code.
It's quite easy to prevent it! Just don't overwrite the previous good files!!!
Personally I do all updates on the live server with a following sequence.
Assume that the root directory lies in
/home/httpd/perl/rel. When I'm about to update the files I create a new directory /home/httpd/perl/beta, copy the old files from
/home/httpd/perl/rel and update it with new files I'm about to replace. The I do last sanity
checks (file permissions (read+executable), run perl -c on the new modules to make sure there no errors in them). When I think I'm
ready I do:
% cd /home/httpd/perl % mv rel old && mv beta rel && stop && sleep 3 && restart && err
Let's explain what I'm doing. First I use alises to make things faster:
% alias | grep apachectl graceful /usr/local/apache/bin/apachectl graceful rehup /usr/local/apache/sbin/apachectl restart restart /usr/local/apache/bin/apachectl restart start /usr/local/apache/bin/apachectl start stop /usr/local/apache/bin/apachectl stop % alias err tail -f /usr/local/apache/logs/error_log
So I write all the commands in one line, separated with semicolon and only
then press Enter key. That ensures that if I suddenly get a connection lost (sadly but that
happens sometimes) I wouldn't leave the server down if only the stop command squeezed in.
I backup the old working directory in old, and move the new one instead. I stop the server, give it a few seconds to
shutdown (it might take even longer) and then do restart followed by immediate view of the tail of the error_log file in order to see that everything is OK. apachectl generates the status messages too early (e.g. on stop it says server has been stopped, while it's not yet, so don't rely on it,
rely on error_log file instead). Also you have noticed that I use restart and not just start. I do this for the same reason of Apache's long stopping times (it depends
on what you do with it of course!), so if you use start and Apache didn't release the port it listens to, the start would fail and error_log would tell that port is in use, e.g.:
Address already in use: make_sock: could not bind to port 8080
But if you use restart, it will patiently wait for the server to quit and then will cleanly start
it.
Now what happens if the new modules are broken? First of all, I see
immediately the indication of the problems reported at error_log
file, which I tail -f immediately after a restart command. That's easy, we just put everything as
it was before:
% mv rel bad && mv old rel && stop && sleep 3 && restart && err
And 99.9% that everything would be alright, and you have had only about 10 secs of downtime, which is pretty good!
What happens if you really must took down the server or disable the
scripts? This situation might happen when you need to do some maintanance
works on your database server, which you have to put down and which cause
all the scripts using this database server non-working. If you do nothing,
user will see either grey The Error
has happened or a better customized error message if you have added a code to trap and
customize the errors (See Redirecting Errors to the Client instead of error_log
for the latter case)
A much more user friendly approach is to confess to your users that you are doing some maintainance works and plead for a paitience, promising that the services will become fully functional in X minutes (it worth to keep the promize!). There are a few ways to do that:
First doesn't require messing with server and works when you have to disable a script and not a module! Just prepare a little script like:
/home/http/perl/construction.pl ---------------------------- #!/usr/bin/perl -wT use strict; use CGI; my $q = new CGI; print $q->header, "Sorry, the service is down for maintainance. It will be back in a about 5-15 minutes. Please, bear with us. Thank you!";
And if now you have to disable a script at /home/http/perl/chat.pl, just do:
% mv /home/http/perl/chat.pl /home/http/perl/chat.pl.orig % ln -s /home/http/perl/construction.pl /home/http/perl/chat.pl
Of course you server configuration should allow symbolic links for this trick to work. Just make sure you have
Options FollowSymLinks
directive in your <Location>/<Directory> section configuration.
When done, it's easy to restore the previous setup. Just do:
% mv /home/http/perl/chat.pl.orig /home/http/perl/chat.pl
and overwrite the symbolic link. Apache will automatically detect the change and will use the moved script instead.
Second approach, is changing the server configuration and configure a whole directories to be handled by Contruction handler that you would write, e.g. if you write something like:
Construction.pm
---------------
use strict;
use CGI;
use Apache::Constants;
sub handler{
my $q = new CGI;
print $q->header,
"Sorry, the service is down for maintainance.
It will be back in a about 5-15 minutes.
Please, bear with us.
Thank you!";
return OK;
}
and put it in directory that in the server's @INC, to put down all your scripts at /perl you would replace:
<Location /perl>
SetHandler perl-script
PerlHandler Apache::Registry
[snip]
</Location>
with
<Location /perl>
SetHandler perl-script
PerlHandler Construction
[snip]
</Location>
Now restart the server and your user will be happy to know that you are working on a much better version of the service and it worth for them to go read slashdot.org and come back in 10 minutes.
If you need to disable a location handled by some module, the second approach would work just as well.
For those who wants to use SUID startup script, here is an example for you. This script is SUID to root, and should be executable only by members of some special group at your site. Note the 10th line, which ``fixes an obscure error when starting apache/mod_perl'' by setting the real to the effective UID. As others have pointed out, it is the mismatch between the real and the effective UIDs that causes Perl to croak on the -e switch.
Note that you must be using a version of Perl that recognizes and emulates
the suid bits in order for this to work. The script will do different
things depending on whether it is named start_http,
stop_http or restart_http. You can use symbolic links for this purpose.
#!/usr/bin/perl
# These constants will need to be adjusted.
$PID_FILE = '/home/www/logs/httpd.pid';
$HTTPD = '/home/www/httpd -d /home/www';
# These prevent taint warnings while running suid
$ENV{PATH}='/bin:/usr/bin';
$ENV{IFS}='';
# This sets the real to the effective ID, and prevents
# an obscure error when starting apache/mod_perl
$< = $>;
$( = $) = 0; # set the group to root too
# Do different things depending on our name
($name) = $0 =~ m|([^/]+)$|;
if ($name eq 'start_http') {
system $HTTPD and die "Unable to start HTTP";
print "HTTP started.\n";
exit 0;
}
# extract the process id and confirm that it is numeric
$pid = `cat $PID_FILE`;
$pid =~ /(\d+)/ or die "PID $pid not numeric";
$pid = $1;
if ($name eq 'stop_http') {
kill 'TERM',$pid or die "Unable to signal HTTP";
print "HTTP stopped.\n";
exit 0;
}
if ($name eq 'restart_http') {
kill 'HUP',$pid or die "Unable to signal HTTP";
print "HTTP restarted.\n";
exit 0;
}
die "Script must be named start_http, stop_http, or restart_http.\n";
When you run your own development box, it's OK to start the webserver by hand when you need it. On the production system, there is chance that the machine the server is running on will have to be rebooted. Once the reboot is completed, who is going to rememeber to start the server? It's an easy to forget task, and what happens if you aren't around when the machine was rebooted?
After the server installation is complete, it's important not to forget
that you need to put a script, to perform the server startup and shutdown,
into a standard system location, like
/etc/rc.d/init.d or equivalent (varies from OS to OS). This is the directory where all other
daemons are being started and shutted down from.
Generally the simplest solution is to copy there the apachectl script, that you will find in the same directory with httpd executable after Apache installation. If you have more than one Apache server, you have to put a script for each one, of course renaming them on the way.
For example on Linux RedHat machine with two server setup, I've the following setup:
/etc/rc.d/init.d/httpd_docs /etc/rc.d/init.d/httpd_perl /etc/rc.d/rc3.d/S86httpd_docs -> ../init.d/httpd_docs /etc/rc.d/rc3.d/S87httpd_perl -> ../init.d/httpd_perl /etc/rc.d/rc6.d/K86httpd_docs -> ../init.d/httpd_docs /etc/rc.d/rc6.d/K87httpd_perl -> ../init.d/httpd_perl
In <init.d> directory reside the scripts themselves. In the rest of directories reside the symbolic links to these scripts, prepended with numbers to preserve a particular order of execution.
When a machine is booted and its runlevel set as 3 (multiuser+network),
Linux goes into /etc/rc.d/rc3.d/ and executes the scripts the symbolic links point to with the start argument, so when it sees the S87httpd_perl, it executes:
/etc/rc.d/init.d/httpd_perl start
When the machine is being shutted down, the scripts pointed from
/etc/rc.d/rc6.d/ directory are being executed, this time the scripts are called with stop argument, like:
/etc/rc.d/init.d/httpd_perl stop
Most of the systems are coming with GUI utilites to automate the symbolic
links creation. For example Linux RH includes a
control-panel utility, which among other utilities includes a
RunLevel Manager that will help you to properly create the symbolic links. Of course before
you use it, you should put the apachectl or similar scripts into a init.d or equivalent directory.
With mod_perl many things can happen to your server. The worst one is the possibility that the server will die when you will be not around. As with any other critical service you need to run some kind of watchdog.
One simple solution is to use a slightly modified apachectl script which I called apache.watchdog and to put it into the crontab to be called every 30 minutes or even every minute - if it's so critical to make sure the server will be up all the time.
The crontab entry:
0,30 * * * * /path/to/the/apache.watchdog >/dev/null 2>&1
The script:
#!/bin/sh
# this script is a watchdog to see whether the server is online
# It tries to restart the server if it's
# down and sends an email alert to admin
# admin's email
EMAIL=webmaster@somewhere.far
#EMAIL=root@localhost
# the path to your PID file
PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid
# the path to your httpd binary, including options if necessary
HTTPD=/usr/local/sbin/httpd_perl/httpd_perl
# check for pidfile
if [ -f $PIDFILE ] ; then
PID=`cat $PIDFILE`
if kill -0 $PID; then
STATUS="httpd (pid $PID) running"
RUNNING=1
else
STATUS="httpd (pid $PID?) not running"
RUNNING=0
fi
else
STATUS="httpd (no pid file) not running"
RUNNING=0
fi
if [ $RUNNING -eq 0 ]; then
echo "$0 $ARG: httpd not running, trying to start"
if $HTTPD ; then
echo "$0 $ARG: httpd started"
mail $EMAIL -s "$0 $ARG: httpd started" </dev/null >& /dev/null
else
echo "$0 $ARG: httpd could not be started"
mail $EMAIL -s "$0 $ARG: httpd could not be started" </dev/null >& /dev/null
fi
fi
Another approach, probably even more practical, is to use the cool LWP
perl package , to test the server by trying to fetch some document (script)
served by the server. Why is it more practical? Because, while server can
be up as a process, it can be stuck and not working, So failing to get the
document will trigger restart, and ``probably'' the problem will go away.
(Just replace start with restart in the $restart_command
below.
Again we put this script into a crontab to call it every 30 minutes.
Personally I call it every minute, to fetch some very light script. Why so
often? If your server starts to spin and trash your disk's space with
multiply error messages, in a 5 minutes you might run out of free space,
which might bring your system to its knees. And most chances that no other
child will be able to serve requests, since the system will be too busy,
writing to an error_log file. Think big -- if you are running a heavy service, which is very fast,
since you are running under mod_perl, adding one more request every minute,
will be not felt by the server at all.
So we end up with crontab entry:
* * * * * /path/to/the/watchdog.pl >/dev/null 2>&1
And the watchdog itself:
#!/usr/local/bin/perl -w use strict; use diagnostics; use URI::URL; use LWP::MediaTypes qw(media_suffix); my $VERSION = '0.01'; use vars qw($ua $proxy); $proxy = '';
require LWP::UserAgent;
use HTTP::Status;
###### Config ########
my $test_script_url = 'http://www.stas.com:81/perl/test.pl';
my $monitor_email = 'root@localhost';
my $restart_command = '/usr/local/sbin/httpd_perl/apachectl restart';
my $mail_program = '/usr/lib/sendmail -t -n';
######################
$ua = new LWP::UserAgent;
$ua->agent("$0/Stas " . $ua->agent);
# Uncomment the proxy if you don't use it!
# $proxy="http://www-proxy.com";
$ua->proxy('http', $proxy) if $proxy;
# If returns '1' it's we are alive
exit 1 if checkurl($test_script_url);
# We have got the problem - the server seems to be down. Try to
# restart it.
my $status = system $restart_command;
# print "Status $status\n";
my $message = ($status == 0)
? "Server was down and successfully restarted!"
: "Server is down. Can't restart.";
my $subject = ($status == 0)
? "Attention! Webserver restarted"
: "Attention! Webserver is down. can't restart";
# email the monitoring person
my $to = $monitor_email;
my $from = $monitor_email;
send_mail($from,$to,$subject,$message);
# input: URL to check
# output: 1 if success, o for fail
#######################
sub checkurl{
my ($url) = @_;
# Fetch document
my $res = $ua->request(HTTP::Request->new(GET => $url));
# Check the result status
return 1 if is_success($res->code);
# failed
return 0;
} # end of sub checkurl
# sends email about the problem
#######################
sub send_mail{
my($from,$to,$subject,$messagebody) = @_;
open MAIL, "|$mail_program"
or die "Can't open a pipe to a $mail_program :$!\n";
print MAIL <<__END_OF_MAIL__;
To: $to
From: $from
Subject: $subject
$messagebody
__END_OF_MAIL__
close MAIL;
}
Often while developing new code, you will want to run the server in single process mode. See Sometimes it works Sometimes it does Not and Names collisions with Modules and libs Running in single process mode inhibits the server from ``daemonizing'', allowing you to run it more easily under debugger control.
% /usr/local/sbin/httpd_perl/httpd_perl -X
When you execute the above the server will run in the fg (foreground) of the shell you have called it from. So to kill you just kill it with Ctrl-C.
Note that in -X mode the server will run very slowly while fetching images. If you use
Netscape while your server is running in single-process mode, HTTP's KeepAlive feature gets in the way. Netscape tries to open multiple connections and
keep them open. Because there is only one server process listening, each
connection has to time-out before the next succeeds. Turn off
KeepAlive in httpd.conf to avoid this effect while developing or you can press STOP after a few seconds (assuming you use the image size params, so the
Netscape will be able to render the rest of the page).
In addition you should know that when running with -X you will not see any control messages that the parent server normally
writes to the error_log. (Like ``server started, server stopped and etc''.)
Since
httpd -X causes the server to handle all requests itself, without forking any
children, there is no controlling parent to write status messages.
If you are the only developer working on the specific server:port - you have no problems, since you have a complete control over the server. However, many times you have a group of developers who need to concurrently develop their own mod_perl scripts. This means that each one will want to have control over the server - to kill it, to run it in single server mode, to restart it again, etc., as well to have control over the location of the log files and other configuration settings like MaxClients, etc. You can work around this problem by preparing a few httpd.conf file and forcing each developer to use:
httpd_perl -f /path/to/httpd.conf
I have approached it in other way. I have used the -Dparameter
startup option of the server. I call my version of the server
% http_perl -Dsbekman
In httpd.conf I wrote:
# Personal development Server for sbekman # sbekman use the server running on port 8000 <IfDefine sbekman> Port 8000 PidFile /usr/local/var/httpd_perl/run/httpd.pid.sbekman ErrorLog /usr/local/var/httpd_perl/logs/error_log.sbekman Timeout 300 KeepAlive On MinSpareServers 2 MaxSpareServers 2 StartServers 1 MaxClients 3 MaxRequestsPerChild 15 </IfDefine> # Personal development Server for userfoo # userfoo use the server running on port 8001 <IfDefine userfoo> Port 8001 PidFile /usr/local/var/httpd_perl/run/httpd.pid.userfoo ErrorLog /usr/local/var/httpd_perl/logs/error_log.userfoo Timeout 300 KeepAlive Off MinSpareServers 1 MaxSpareServers 2 StartServers 1 MaxClients 5 MaxRequestsPerChild 0 </IfDefine>
What we have achieved with this technique: Full control over start/stop, number of children, separate error log file, and port selection. This saves me from getting called every few minutes - ``Stas, I'm going to restart the server''.
To make things even easier. (In the above technique, you have to discover
the PID of your parent httpd_perl process - written in
/usr/local/var/httpd_perl/run/httpd.pid.userfoo) . We change the
apachectl script to do the work for us. We make a copy for each developer called apachectl.username and we change 2 lines in script:
PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid.sbekman HTTPD='/usr/local/sbin/httpd_perl/httpd_perl -Dsbekman'
Of course you think you can use only one control file and know who is calling by using uid, but since you have to be root to start the server - it is not so simple.
The last thing was to let developers an option to run in single process mode by:
/usr/local/sbin/httpd_perl/httpd_perl -Dsbekman -X
In addition to making life easier, we decided to use relative links everywhere in the static docs (including the calls to CGIs). You may ask how using the relative link you will get to the right server? Very simple - we have utilized the mod_rewrite to solve our problems:
In access.conf of the httpd_docs server we have the following code: (you have to configure your httpd_docs server with
--enable-module=rewrite )
# sbekman' server
# port = 8000
RewriteCond %{REQUEST_URI} ^/(perl|cgi-perl)
RewriteCond %{REMOTE_ADDR} 123.34.45.56
RewriteRule ^(.*) http://nowhere.com:8000/$1 [R,L]
# userfoo's server
# port = 8001
RewriteCond %{REQUEST_URI} ^/(perl|cgi-perl)
RewriteCond %{REMOTE_ADDR} 123.34.45.57
RewriteRule ^(.*) http://nowhere.com:8001/$1 [R,L]
# all the rest
RewriteCond %{REQUEST_URI} ^/(perl|cgi-perl)
RewriteRule ^(.*) http://nowhere.com:81/$1 [R]
where IP numbers are the IPs of the developer client machines (where they
are running their web browser.) (I have tried to use
REMOTE_USER since we have all the users authenticated but it did not work for me)
So if I have a relative URL like /perl/test.pl written in some html or even http://www.nowhere.com/perl/test.pl in my case (user at machine of sbekman) it will be redirected by httpd_docs to
http://www.nowhere.com:8000/perl/test.pl.
Of course you have another problem: The CGI generates some html, which should be called again. If it generates a URL with hard coded PORT the above scheme will not work. There 2 solutions:
First, generate relative URL so it will reuse the technique above, with
redirect (which is transparent for user) but it will not work if you have
something to POST (redirect looses all the data!).
Second, use a general configuration module which generates a correct full
URL according to REMOTE_USER, so if $ENV{REMOTE_USER} eq
'sbekman', I return http://www.nowhere.com:8000/perl/ as
cgi_base_url. Again this will work if the user is authenticated.
All this is good for development. It is better to use the full URLs in
production, since if you have a static form and the Action is relative but the static document located on another server, pressing the
form's submit will cause a redirect to mod_perl server, but all the form's
data will be lost during the redirect.
Many times you start off debugging your script by running it from your
favorite shell. Sometimes you encounter a very weird situation when script
runs from the shell but dies when called as a CGI. The real problem lies in
the difference between the environment that is being used by your server
and your shell. An example can be a different perl path or having PERL5LIB env variable which includes paths that are not in the @INC of the perl compiled with mod_perl server and configured during the
startup.
The best debugging approach is to write a wrapper that emulates the exact
environment of the server, by first deleting the environment variables like PERL5LIB and calling the same perl binary that it is being used by the server. Next,
set the environment identical to the server's by copying the perl run
directives from server startup and configuration files. It will also allow
you to remove completely the first line of the script - since mod_perl
skips it and the wrapper knows how to call the script.
Below is the example of such a script. Note that we force the -Tw
when we call the real script. (I have also added the ability to pass
params, which will not happen when you call the cgi from the web)
#!/usr/local/bin/perl -w
# This is a wrapper example
# It simulates the web server environment by setting the @INC and other
# stuff, so what will run under this wrapper will run under web and
# vice versa.
#
# Usage: wrap.pl some_cgi.pl
#
BEGIN{
use vars qw($basedir);
$basedir = "/usr/local";
# we want to make a complete emulation,
# so we must remove the user's environment
@INC = ();
# local perl libs
push @INC,
qw($basedir/lib/perl5/5.00502/aix
$basedir/lib/perl5/5.00502
$basedir/lib/perl5/site_perl/5.005/aix
$basedir/lib/perl5/site_perl/5.005
);
}
use strict;
use File::Basename;
# process the passed params
my $cgi = shift || '';
my $params = (@ARGV) ? join(" ", @ARGV) : '';
die "Usage:\n\t$0 some_cgi.pl\n" unless $cgi;
# Set the environment
my $PERL5LIB = join ":", @INC;
# if the path includes the directory
# we extract it and chdir there
if ($cgi =~ m|/|) {
my $dirname = dirname($cgi);
chdir $dirname or die "Can't chdir to $dirname: $! \n";
$cgi =~ m|$dirname/(.*)|;
$cgi = $1;
}
# run the cgi from the script's directory
# Note that we invoke warnings and Taint mode ON!!!
system qq{$basedir/bin/perl -I$PERL5LIB -Tw $cgi $params};
A little bit off topic but good to know and use with mod_perl where your error_log can grow at a 10-100Mb per day rate if your scripts spit out lots of warnings...
To rotate the logs do:
mv access_log access_log.renamed kill -HUP `cat httpd.pid` sleep 10; # allow some children to complete requests and logging # now it's safe to use access_log.renamed .....
The effect of SIGUSR1 and SIGHUP is detailed in: http://www.apache.org/docs/stopping.html .
I use this script:
#!/usr/local/bin/perl -Tw
# this script does a log rotation. Called from crontab.
use strict;
$ENV{PATH}='/bin:/usr/bin';
### configuration
my @logfiles = qw(access_log error_log);
umask 0;
my $server = "httpd_perl";
my $logs_dir = "/usr/local/var/$server/logs";
my $restart_command = "/usr/local/sbin/$server/apachectl restart";
my $gzip_exec = "/usr/bin/gzip";
my ($sec,$min,$hour,$mday,$mon,$year) = localtime(time);
my $time = sprintf "%0.2d.%0.2d.%0.2d-%0.2d.%0.2d.%0.2d", $year,++$mon,$mday,$hour,$min,$sec;
$^I = ".".$time;
# rename log files
chdir $logs_dir;
@ARGV = @logfiles;
while (<>) {
close ARGV;
}
# now restart the server so the logs will be restarted
system $restart_command;
# compress log files
foreach (@logfiles) {
system "$gzip_exec $_.$time";
}
Randal L. Schwartz contributed this:
Cron fires off setuid script called log-roller that looks like this:
#!/usr/bin/perl -Tw use strict; use File::Basename; $ENV{PATH} = "/usr/ucb:/bin:/usr/bin"; my $ROOT = "/WWW/apache"; # names are relative to this my $CONF = "$ROOT/conf/httpd.conf"; # master conf my $MIDNIGHT = "MIDNIGHT"; # name of program in each logdir my ($user_id, $group_id, $pidfile); # will be set during parse of conf die "not running as root" if $>; chdir $ROOT or die "Cannot chdir $ROOT: $!"; my %midnights; open CONF, "<$CONF" or die "Cannot open $CONF: $!"; while (<CONF>) { if (/^User (\w+)/i) { $user_id = getpwnam($1); next; } if (/^Group (\w+)/i) { $group_id = getgrnam($1); next; } if (/^PidFile (.*)/i) { $pidfile = $1; next; } next unless /^ErrorLog (.*)/i; my $midnight = (dirname $1)."/$MIDNIGHT"; next unless -x $midnight; $midnights{$midnight}++; } close CONF; die "missing User definition" unless defined $user_id; die "missing Group definition" unless defined $group_id; die "missing PidFile definition" unless defined $pidfile; open PID, $pidfile or die "Cannot open $pidfile: $!"; <PID> =~ /(\d+)/; my $httpd_pid = $1; close PID; die "missing pid definition" unless defined $httpd_pid and $httpd_pid; kill 0, $httpd_pid or die "cannot find pid $httpd_pid: $!"; for (sort keys %midnights) { defined(my $pid = fork) or die "cannot fork: $!"; if ($pid) { ## parent: waitpid $pid, 0; } else { my $dir = dirname $_; ($(,$)) = ($group_id,$group_id); ($<,$>) = ($user_id,$user_id); chdir $dir or die "cannot chdir $dir: $!"; exec "./$MIDNIGHT"; die "cannot exec $MIDNIGHT: $!"; } } kill 1, $httpd_pid or die "Cannot sighup $httpd_pid: $!";And then individual MIDNIGHT scripts can look like this:
#!/usr/bin/perl -Tw use strict; die "bad guy" unless getpwuid($<) =~ /^(root|nobody)$/; my @LOGFILES = qw(access_log error_log); umask 0; $^I = ".".time; @ARGV = @LOGFILES; while (<>) { close ARGV; }Can you spot the security holes? Our trusted user base can't or won't. :) But these shouldn't be used in hostile situations.
Sometimes calling an undefined subroutine in a module can cause a tight loop that consumes all memory. Here is a way to catch such errors. Define an autoload subroutine:
sub UNIVERSAL::AUTOLOAD {
my $class = shift;
warn "$class can't \$UNIVERSAL::AUTOLOAD!\n";
}
It will produce a nice error in error_log, giving the line number of the call and the name of the undefined subroutine.
Sometimes an error happens and causes the server to write millions of lines
into your error_log file and in a few minutes to put your server down on its knees. For example
I get an error Callback called
exit show up in my error_log file many times. The error_log file grows to 300 Mbytes in size in a few minutes. You should run a cron
job to make sure this does not happen and if it does to take care of it.
Andreas J. Koenig is running this shell script every minute:
S=`ls -s /usr/local/apache/logs/error_log | awk '{print $1}'`
if [ "$S" -gt 100000 ] ; then
mv /usr/local/apache/logs/error_log /usr/local/apache/logs/error_log.old
/etc/rc.d/init.d/httpd restart
date | /bin/mail -s "error_log $S kB on inx" myemail@domain.com
fi
It seems that his script will trigger restart every minute, since once the logfile grows to be of 100000 lines, it will stay of this size, unless you remove or rename it, before you do restart. On my server I run a watchdog every five minutes which restarts the server if it is getting stuck (it always works since when some modperl child process goes wild, the I/O it causes is so heavy that other brother processes cannot normally to serve the requests.) See Monitoring the Server for more hints.
Also check out the daemontools from ftp://koobera.math.uic.edu/www/daemontools.html :
,----- | cyclog writes a log to disk. It automatically synchronizes the log | every 100KB (by default) to guarantee data integrity after a crash. It | automatically rotates the log to keep it below 1MB (by default). If | the disk fills up, cyclog pauses and then tries again, without losing | any data. `-----
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/18/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
As there is always more than one way to do it, I'm tempted to believe one must be the best. Hardly ever am I right.
This chapter has been contributed to the Guide by Andreas Koenig. You will
find the references and other related info at the bottom of this page. I'll
try to keep it concurrent with the Master version which resides on CPAN. If
in doubt -- always check the CPAN for
Apache::correct_headers.
If you have any questions regarding this specific document only, please refer to Andreas, since he is the guru on this subject. On any other matter please contact the mod_perl mailing list.
Dynamic Content is dynamic, after all, so why would anybody care about HTTP
headers? Header composition is an often neglected task in the CGI world.
Because pages are generated dynamically, you might believe that pages
without a Last-Modified header are fine, and that an
If-Modified-Since header in the browser's request can go by unnoticed. This laissez-faire
principle gets in the way when you try to establish a server that is
entirely driven by dynamic components and the number of hits is
significant.
If the number of hits is not significant, don't bother to read this document.
If the number of hits is significant, you might want to consider what cache-friendliness means (you may also want to read [4]) and how you can cooperate with caches to increase the performace of your site. Especially if you use a squid in accelerator mode (helpful hints for squid, see [1]), you will have a strong motivation to cooperate with it. This document may help you to do it correctly.
The HTTP standard (v 1.1 is specified in [3], v 1.0 in [2]) describes lots of headers. In this document, we only discuss those headers which are most relevant to caching.
I have grouped the headers in three groups: date headers, content headers, and the special Vary header.
Section 14.18 of the HTTP standard deals with the circumstances, under
which you must or must not send a Date header. For almost everything a normal mod_perl user is doing, a Date header needs to be generated. But the mod_perl programmer doesn't have to
care for this header, the apache server guarantees that this header is
being sent.
In http_protocol.c the Date header is set according to
$r->request_time. A modperl script can read, but not change,
$r->request_time.
Section 14.29 of the HTTP standard deals with this. The
Last-Modified header is mostly used as a so-called weak validator. I'm citing two
sentences from the HTTP specs:
A validator that does not always change when the resource changes is a "weak validator."
One can think of a strong validator as one that changes whenever the bits of an entity changes, while a weak value changes whenever the meaning of an entity changes.
This tells us that we should consider the semantics of the page we are generating and not the date when we are running. The question is, when did the meaning of this page change last time? Let's imagine, the document in question is a text-to-gif renderer that takes as input a font to use, background and foreground color, and a string to render. Although the actual image is created on-the-fly, the semantics of the page are determined when the script has changed the last time, right?
Actually, there are a few more things relevant: the semantics also change a
little when you update one of the fonts that may be used or when you update
your ImageMagick or whatever program. It's something you should consider, if you want to get
it right.
If you have several components that compose a page, you should ask the question for all components, when they changed their semantic behaviour last time. And then pick the maximum of those times.
mod_perl offers you two convenient methods to deal with this header:
update_mtime and set_last_modified. Both these two and several more methods
are not available in the normal mod_perl environment but get added silently
when you require Apache::File. As of this writing,
Apache::File comes without a manpage, so you have to read about it in Chapter 9 of [5].
update_mtime() takes a UNIX time as argument and sets Apache's
request structure finfo.st_mtime to this value. It does so only when the
argument is greater than an already stored finfo.st_mtime.
set_last_modified() sets the outgoing header Last-Modified to the string that corresponds to the stored finfo.st_mtime. By passing a
UNIX time to set_last_modified(), mod_perl calls
update_mtime() with this argument first.
use Apache::File;
use Date::Parse;
# Date::Parse parses RCS format, Apache::Util::parsedate doesn't
$Mtime ||=
Date::Parse::str2time(substr q$Date: 1999/08/14 06:21:32 $, 6);
$r->set_last_modified($Mtime);
Section 14.21 of the HTTP standard deals with the Expires
header. The meaning of the Expires header is to determine a point in time after which this document should be
considered out of date (stale). Don't confuse this with the very different
meaning of the
Last-Modified. The Expires header is useful to avoid unnecessary validation from now on until the
document expires and it helps the recipient to clean up his stored
documents. A sentence from the HTTP standard:
The presence of an Expires field does not imply that the original resource will change or cease to exist at, before, or after that time.
So think before you set up a time when you believe, a resource should be regarded as stale. Most of the time I can determine an expected lifetime from ``now'', that is the time of the request. I would not recommend to hardcode the date of Expiry, because when you forget that you did that, and the date arrives, you will serve ``already expired'' documents that cannot be cached at all by anybody. If you believe, a resource will never expire, read this quote from the HTTP specs:
To mark a response as "never expires," an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future.
Now the code for the mod_perl programmer that wants to expire a document half a year from now:
$r->header_out('Expires',
HTTP::Date::time2str(time + 180*24*60*60));
A very handy alternative to this computation is available in HTTP 1.1, the
cache control mechanism. Instead of setting the Expires header you can specify a delta value in a Cache-Control header. You can do that by running just
$r->header_out('Cache-Control', "max-age=" . 180*24*60*60);
which is, of course much cheaper than the above because perl computes the value only once at compile time and optimizes it away as a constant.
As this alternative is only available in HTTP 1.1 and old cache servers may
not understand this header, it is advisable to send both headers. In this
case the Cache-Control header takes precedence, so that the Expires header is ignored on HTTP 1.1 complient servers. Or you could go with an
if/else clause:
if ($r->protocol =~ /(\d\.\d)/ && $1 >= 1.1){
$r->header_out('Cache-Control', "max-age=" . 180*24*60*60);
} else {
$r->header_out('Expires',
HTTP::Date::time2str(time + 180*24*60*60));
}
If you restart your apache regularly, I'd save the Expires
header in a global variable. Oh, well, this is probably over-engineered
now.
If people are determined that their document shouldn't be cached, here is
the easy way to set a suitable Expires header...
The call $r->no_cache(1) will cause apache to generate an
Expires header with the same content as the Date-header in the response, so that
the document ``expires immediately''. Don't set
Expires with $r->header_out if you use $r->no_cache, because header_out takes precedence. the problem that remains are broken
browsers that ignore Expires headers.
Currently to avoid caching alltogether
my $headers = $r->headers_out;
$headers->{'Pragma'} = $headers->{'Cache-control'} = 'no-cache';
$r->no_cache(1);
works with the major browsers.
You are most probably familiar with Content-Type. Sections 3.7, 7.2.1 and 14.17 of the HTTP specs deal with the details.
Mod_perl has the content_type() method to deal with this
header, as in
$r->content_type("image/png");
Content-Type SHOULD be included in all messages according to the specs, and apache will
generate one if you don't. It will be whatever is specified in the relevant DefaultType configuration directive or
text/plain if none is active.
The Content-Length header according to the HTTP specs section 14.13, is the number of octets
in the body of a message. If it can be determined prior to sending, it can
be very useful for several reasons to include it. The most important reason
why it is good to include it, is that keepalive requests only work with
responses that contain a
Content-Length header. In mod_perl you can say
$r->header_out('Content-Length', $length);
If you use Apache::File, you get the additional set_content_length method for the Apache class
which is a bit more efficient than the above. You can then say:
$r->set_content_length($length);
The Content-Length header can have an important impact on caches by invalidating cache entries
as the following citation of the specs explains:
The response to a HEAD request MAY be cacheable in the sense that the information contained in the response MAY be used to update a previously cached entity from that resource. If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, Content-MD5, ETag or Last-Modified), then the cache MUST treat the cache entry as stale.
So be careful to never send a wrong Content-Length, be it in a GET or in a HEAD request.
An Entity Tag is a validator that can be used instead of or in addition to the Last-Modified header. An entity tag is a quoted string that has the property to identify
different versions of a particular resource. An entity tag can be added to
the response headers like so:
$r->header_out("ETag","\"$VERSION\"");
Note: mod_perl offers the Apache::set_etag() method if you have loaded Apache::File. It is strongly recommended to not use this method unless you know what
you are doing. set_etag() is expecting that it is used in
conjunction with a static request for a file on disk that has been
stat()ed in the course of the current request. It is
inappropriate and dangerous to use it for dynamic content.
By sending an entity tag you promise to the recipient, that you will not
send the same ETag for the same resource again unless the content is equal to the one you are
sending now (see below for what equality means).
The pros and cons of using entity tags are discussed in section 13.3 of the HTTP specs. For us mod_perl programmers that discussion can be summed up as follows:
There are strong and weak validators. Strong validators change whenever a single bit changes in the response. Weak validators change when the meaning of the response changes. Strong validators are needed for caches to allow for sub-range requests. Weak validators allow a more efficient caching of equivalent objects. Algorithms like MD5 or SHA are good strong validators, but what we usually want, when we want to take advantage of caching, is a good weak validator.
A Last-Modified time, when used as a validator in a request, can be strong or weak,
depending on a couple of rules. Please refer to section 13.3.3 of the HTTP
standard to understand these rules. This is mostly relevant for range
requests as this citation of section 14.27 explains:
If the client has no entity tag for an entity, but does have a Last-Modified date, it MAY use that date in a If-Range header.
But it is not limited to range requests. Section 13.3.1 succintly states that
The Last-Modified entity-header field value is often used as a cache validator.
The fact that a Last-Modified date may be used as a strong validator can be pretty disturbing if we are
in fact changing our output slightly without changing the semantics of the
output. To prevent such kind of misunderstanding between us and the cache
servers in the response chain, we can send a weak validator in an ETag
header. This is possible because the specs say:
If a client wishes to perform a sub-range retrieval on a value for which it has only a Last-Modified time and no opaque validator, it MAY do this only if the Last-Modified time is strong in the sense described here.
In other words: by sending them an ETag that is marked as weak we prevent them to use the Last-Modified header as a
strong validator.
An ETag value is marked as a weak validator by prepending the string W/ to the quoted string, otherwise it is strong. In perl this would mean
something like this:
$r->header_out('ETag',"W/\"$VERSION\"");
Consider carefully, which string you choose to act as a validator. You are left alone with this decision because...
... only the service author knows the semantics of a resource well enough to select an appropriate cache validation mechanism, and the specification of any validator comparison function more complex than byte-equality would open up a can of worms. Thus, comparisons of any other headers (except Last-Modified, for compatibility with HTTP/1.0) are never used for purposes of validating a cache entry.
If you are composing a message from multiple components, it may be necessary to combine some kind of version information for all components into a single string.
If you are producing relative big documents or contents that do not change frequently, you most likely will prefer a strong entity tag, thus giving caches a chance to transfer the document in chunks. (Anybody in the mood to add a chapter about ranges to this document?)
A particularly wonderful but unfortunately not yet widely supported feature
that was introduced with HTTP 1.1 is content negotiation. The probably most
popular usage scenario of content negotiation is language negotiation. A
user specifies in his browser preferences the languages he understands and
how well he understands them. The browser includes these settings in an Accept-Language header when it sends the request to the server and the server then chooses
among several available representations of the document the one that fits
the user's preferences best. Content negotiation is not limited to
language. Citing the specs:
HTTP/1.1 includes the following request-header fields for enabling server-driven negotiation through description of user agent capabilities and user preferences: Accept (section 14.1), Accept- Charset (section 14.2), Accept-Encoding (section 14.3), Accept- Language (section 14.4), and User-Agent (section 14.43). However, an origin server is not limited to these dimensions and MAY vary the response based on any aspect of the request, including information outside the request-header fields or within extension header fields not defined by this specification.
In order to signal to the recipient that content negotiation has been used
to determine the best available representation for a given request, the
server must include a Vary header that tells the recipient, which of the request headers have been
used to determine it. So an answer may be generated like so:
$r->header_out('Vary', join ", ", 'accept', 'accept-language',
'accept-encoding', 'user-agent');
While this may be in the header of a very cool page that greets the user with something like
Hallo Kraut, Dein NutScrape versteht zwar PNG aber leider kein GZIP.
it has the side effect of being expensive for a caching proxy. As of this writing, squid (version 2.1PATCH2) does not cache resources at all that come with a Vary header. So unless you find a clever workaround, you won't enjoy your squid accelerator for these documents :-(
Section 13.11 of the specs states that the only two cachable methods are GET and HEAD.
Among the above recommended headers, the date-related ones (Date,
Last-Modified, and Expires/Cache-Control) are usually easy to produce and thus should be computed for HEAD requests just the same as for GET requests.
The Content-Type and Content-Length headers should be exactly the same as would be supplied to the
corresponding GET request. But as it can be expensive to compute them, they can just as well
be omitted, there is nothing in the specs that forces you to compute them.
What is important for the mod_perl programmer is that the response to a HEAD request MUST NOT contain a message-body. The code in your mod_perl handler
might look like this:
# compute all headers that are easy to compute
if ( $r->header_only ){ # currently equivalent for $r->method eq "HEAD"
$r->send_http_header;
return OK;
}
If you are running a squid accelerator, it will be able to handle the whole HEAD request for you, but under some circumstances it may not be allowed to do
so.
The response to a POST request is not cachable due to an underspecification in the HTTP standards.
Section 13.4 does not forbid caching of responses to POST request but no other part of the HTTP standard explains how caching of POST requests could be implemented, so we are in a vacuum here and all existing
caching servers therefore refuse to implement caching of POST requests. This may change if somebody does the footwork of defining the
semantics for cache operations on POST. Note that some browsers with their more aggressive caching do implement
caching of POST requests.
Note: If you are running a squid accelerator, you should be aware that it
accelerates outgoing traffic, but does not bundle incoming traffic, so if
you have long post requests, the squid doesn't buy you anything. So always
consider to use a GET instead of a POST if possible.
A normal GET is what we usually write our mod_perl programs for. Nothing special about
it. We send our headers followed by the body.
But there is a certain case that needs a workaround to achieve better cacheability. We need to deal with the ``?'' in the rel_path part of the requested URI. Section 13.9 specifies, that
... caches MUST NOT treat responses to such URIs as fresh unless the server provides an explicit expiration time. This specifically means that responses from HTTP/1.0 servers for such URIs SHOULD NOT be taken from a cache.
You're tempted to believe, that we are using HTTP 1.1 and sending an
explicit expiration time, so we're on the safe side? Unfortunately reality
is a little bit different. It has been a bad habit for quite a long time to
misconfigure cache servers such that they treat all
GET requests containing a question mark as uncacheable. People even used to
mark everything as uncacheable that contained the string
cgi-bin.
To work around this bug in the heads, I have dropped the habit to call my
CGI directories cgi-bin and I have written the following handler that lets me work with CGI-like
query strings without rewriting the software that deals with them, namely Apache::Request or CGI.pm.
sub handler {
my($r) = @_;
my $uri = $r->uri;
if ( my($u1,$u2) = $uri =~ / ^ ([^?]+?) ; ([^?]*) $ /x ) {
$r->uri($u1);
$r->args($u2);
} elsif ( my($u1,$u2) = $uri =~ m/^(.*?)%3[Bb](.*)$/ ) {
# protect against old proxies that escape volens nolens
# (see HTTP standard section 5.1.2)
$r->uri($u1);
$u2 =~ s/%3B/;/gi;
$u2 =~ s/%26/;/gi; # &
$u2 =~ s/%3D/=/gi;
$r->args($u2);
}
DECLINED;
}
This handler must be installed as a PerlPostReadRequestHandler.
The handler takes any request that contains no questionmark but one or more semicolons such that the first semicolon is interpreted as a questionmark and everything after that as the querystring. You can now exchange the request
http://foo.com/query?BGCOLOR=blue;FGCOLOR=red
with
http://foo.com/query;BGCOLOR=blue;FGCOLOR=red
Thus it allows the co-existence of queries from ordinary forms that are being processed by a browser and predefined requests for the same resource. It has one minor bug: Apache doesn't allow percent-escaped slashes in such a querystring. So you must write
http://foo.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=/font/bla
and must not say
http://foo.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=%2Ffont%2Fbla
A rather challenging request we mod_perl programmers can get is the
conditional GET, which typically means a request with an If-Modified-Since header. The
HTTP specs have this to say:
The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
So how can we reduce the unnecessary network usage in such a case? mod_perl
makes it easy for you by offering apache's meets_conditions().
You have to set up your Last-Modified (and possibly ETag) header before running this method. If the return value of this method is
anything but OK, you should return from your handler with that return value and you're
done. Apache handles the rest for you. The following example is taken from
[5]:
if((my $rc = $r->meets_conditions) != OK) {
return $rc;
}
#else ... go and send the response body ...
If you have a squid accellerator running, it will often handle the
conditionals for you and you can enjoy its extreme fast responses for such
requests by reading the access.log. Just grep for
TCP_IMS_HIT/304. But as with a HEAD request there are circumstances under which it may not be allowed to do so.
That is why the origin server (which is the server you're programming)
needs to handle conditional GETs as well even if a squid accelerator is running.
There is another approach to dynamic content that is possible with mod_perl. This approach is appropriate if the content changes relatively infrequently, if you expect lots of requests to retrieve the same content before it changes again and if it is much cheaper to test whether the content needs refreshing than it is to refresh it.
In this case a PerlFixupHandler can be installed for the relevant location. It tests whether the content is
up to date. If so it returns DECLINED and lets the apache core serve the content from a file. Otherwise, it
regenerates the content into the file, updates the $r->finfo status and again returns DECLINED so that apache serves the updated file. Updating $r->finfo can be achieved by calling
$r->filename($file); # force update of finfo
even if this seems redundant because the filename is already equal to
$file. Setting the filename has the side effect of doing a stat()
on the file. This is important because otherwise apache would use the out
of date finfo when generating the response header.
Stas Bekman: Mod_perl Guide. http://perl.apache.org/guide/
T. Berners-Lee et al.: Hypertext Transfer Protocol -- HTTP/1.0, RFC 1945.
R. Fielding et al.: Hypertext Transfer Protocol -- HTTP/1.1, RFC 2616.
Martin Hamilton: Cachebusting - cause and prevention, draft-hamilton-cachebusting-01. Also available online at http://vancouver-webpages.com/CacheNow/
Lincoln Stein, Doug MacEachern: Writing Apache Modules with Perl and C, O'Reilly, 1-56592-567-X. Selected chapters available online at http://www.modperl.com . Amazon page at http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu/
You're reading revision $Revision: 1.16 $ of this document, written on $Date: 1999/08/14 06:21:32 $
Andreas Koenig with helpful corrections, addition, comments from Ask Bjoern Hansen <ask@netcetera.dk>, Frank D. Cringle <fdc@cliwe.ping.de>, Eric Cholet <cholet@logilune.com>, Mark Kennedy <mark.kennedy@gs.com>, Doug MacEachern <dougm@pobox.com>, Tom Hukins <tom@eborcom.com>, Wham Bang <wham_bang@yahoo.com> and many others.
|
|
||
|
Written by Stas Bekman.
Last Modified at 10/24/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Nowadays millions of people surf the Internet. There are millions of Terabytes of data lying around. To manipulate the data new smart techniques and technologies were invented. One of the major inventions was the relational database, which allows us to search and modify huge stores of data in very little time. We use SQL (Structured Query Language) to manipulate the contents of these databases.
When people started to use the web, they found that they needed to write
web interfaces to their databases. CGI is the most widely used technology
for building such interfaces. The main limitation of a CGI script driving a
database is that its database connection is not persistent - on every
request the CGI script has to initiate a connection to the database, and
when the request is completed the connection is closed. Apache::DBI was written to remove this limitation. When you use it, you have a database
connection which persists for the process' entire life. So when your
mod_perl script needs to use a database, Apache::DBI provides a valid connection immediately and your script starts work right
away without having to initiate a database connection first.
This is possible only with CGI running under a mod_perl enabled server, since in this model the child process does not quit when the request has been served.
It's almost as straightforward as is it sounds, there are just a few things to know about and we will cover them in this section.
This module initiates a persistent database connection. It is possible only with mod_perl.
The DBI module can make use of the Apache::DBI module. When it loads, the
DBI module tests if the environment variable
$ENV{GATEWAY_INTERFACE} starts with CGI-Perl, and if the
Apache::DBI module has already been loaded. If so, the DBI module will forward every
connect() request to the Apache::DBI
module. Apache::DBI uses the ping() method to look for a database handle from a
previous connect() request, and tests if this handle is still
valid. If these two conditions are fulfilled it just returns the database
handle.
If there is no appropriate database handle or if the ping()
method fails, Apache::DBI establishes a new connection and stores the handle for later re-use. When
the script is run again by a child that is still connected, Apache::DBI just checks the cache of open connections by matching host,username and password parameters against it. A matching connection is returned if available or a
new one is initiated and then returned.
There is no need to delete the disconnect() statements from
your code. They won't do anything because the Apache::DBI module overloads the disconnect() method with an empty one.
When this module should be used and when shouldn't?
You will want to use this module if you are opening several database
connections to the server. Apache::DBI will make them persistent per child, so if you have ten children and each
opens two different connections (with different connect()
arguments) you will have in total twenty opened and persistent connections.
After the initial connect() you will save the connection time
for every connect() request from your DBI module. This can be a huge benefit for a server with a high volume of
database traffic.
You must NOT use this module if you are opening a special connection for each of your users. Each connection will stay persistent and in a short time the number of connections will be so big that your machine will scream in agony and die.
If you want to use Apache::DBI but you have both situations on one machine, at the time of writing the
only solution is to run two Apache/mod_perl servers, one which uses Apache::DBI and one which does not.
After installing this module, the configuration is simple - add the
following directive to httpd.conf
PerlModule Apache::DBI
Note that it is important to load this module before any other
Apache*DBI module and DBI module itself!
You can skip preloading DBI, since Apache::DBI does that. But there is no harm in leaving it in, as long as it is loaded
after
Apache::DBI.
If you want to make sure that a connection will already be opened when your
script is first executed after a server restart, then you should use the connect_on_init() method in the startup file to preload every connection you are going to
use. For example:
Apache::DBI->connect_on_init
("DBI:mysql:myDB::myserver",
"username",
"passwd",
{
PrintError => 1, # warn() on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
);
As noted above, use this method only if you only want all of apache to be able to connect to the database server as one user (or as a very few users).
Be warned though, that if you call connect_on_init() and your database is down, Apache children will be delayed at server
startup, trying to connect. They won't begin serving requests until either
they are connected, or the connection attempt fails. Depending on your DBD
driver, this can take several minutes!
If you are not sure this module is working as advertised, you should enable Debug mode in the startup script by:
$Apache::DBI::DEBUG = 1;
Starting with ApacheDBI-0.84, setting $Apache::DBI::DEBUG = 1
will produce only minimal output. For a full trace you set
$Apache::DBI::DEBUG = 2.
Another approach is to add to httpd.conf (which does the same):
PerlModule Apache::DebugDBI
After setting the DEBUG level you will see entries in the error_log
both when Apache::DBI initializes a connection and when it returns one from its cache. Use the
following command to view the log in real time (your error_log might be located at a different path, it is set in the Apache configuration
files):
tail -f /usr/local/apache/logs/error_log
I use alias (in tcsh) so I do not have to remember the path:
alias err "tail -f /usr/local/apache/logs/error_log"
The SQL server keeps a connection to the client open for a limited period
of time. Many developers were bitten by so called Morning
bug, when every morning the first users to use the site received a
No Data Returned message, but after that everything worked fine. The error is caused by Apache::DBI returning a handle of the invalid connection (the server closed it because
of a timeout), and the script was dying on that error. The infamous ping() method was introduced to solve this problem, but still people were being
bitten by this problem. Another solution was found - to increase the
timeout parameter when starting the SQL server. Currently I startup MySQL
server with a script safe_mysql, so I have modified it to use this option:
nohup $ledir/mysqld [snipped other options] -O wait_timeout=172800
172800 seconds is equal to 48 hours. This change solves the problem.
Note that as from version 0.82, Apache::DBI implements ping() inside the eval block. This means that if the handle has timed out it should be reconnected
automatically, and avoid the morning bug.
When it received a connection request, before it will decide to use an
existing cached connection, Apache::DBI insists that the new connection be opened in exactly the same way as the
cached connection. If I have one script that sets LongReadLen and one that does not, Apache::DBI will make two different connections. So instead of having a maximum of 40
open connections, I can end up with 80.
However, you are free to modify the handle immediately after you get it
from the cache. So always initiate connections using the same parameters
and set LongReadLen (or whatever) afterwards.
You must use DBI::connect() as in normal DBI usage to get your $dbh database handler.
Using the Apache::DBI does not eliminate the need to write proper DBI code. As the Apache::DBI man page states, you should program as if you are not using Apache::DBI at all. Apache::DBI will override the DBI methods where necessary and return your cached
connection. Any disconnect() call will be just ignored.
Make sure you have it installed.
Make sure you configured mod_perl with EVERYTHING=1.
Use the example script eg/startup.pl (in the mod_perl distribution). Remove the comment from the line.
# use Apache::DebugDBI;
and adapt the connect string. Do not change anything in your scripts for
use with Apache::DBI.
Does your error_log look like this?
10169 Apache::DBI PerlChildInitHandler 10169 Apache::DBI skipping connection cache during server startup Database handle destroyed without explicit disconnect at /usr/lib/perl5/site_perl/5.005/Apache/DBI.pm line 29.
If so you are trying to open a database connection in the parent httpd process. If you do, children will each get a copy of this handle, causing clashes when the handle is used by two processes at the same time. Each child must have its own, unique, connection handle.
To avoid this problem, Apache::DBI checks whether it is called during server startup. If so the module skips
the connection cache and returns immediately without a database handle.
You must use the Apache::DBI->connect_on_init() method in the startup file.
To log a trace of DBI statement execution, you must set the
DBI_TRACE environment variable. The PerlSetEnv DBI_TRACE
directive must appear before you load Apache::DBI and DBI.
For example if you use Apache::DBI, modify your httpd.conf with:
PerlSetEnv DBI_TRACE "3=/tmp/dbitrace.log" PerlModule Apache::DBI
Replace 3 with the TRACE level you want. The traces from each request will be
appended to /tmp/dbitrace.log. Note that the logs might interleave if requests are processed
concurrently.
Within your code you can control trace generation with the
trace() method:
DBI->trace($trace_level) DBI->trace($trace_level, $trace_filename)
0 disables the trace. 2 generates detailed call trace information including parameters and return
values.
(META: 1, 3 - no info in the manpage about these levels?)
Since many mod_perl developers use mysql as their preferred SQL engine,
these notes explain the difference between mysql_use_result() and
mysql_store_result(). The two influence the speed and size of the processes.
The DBD::mysql (version 2.0217) documentation includes the following snippet:
mysql_use_result attribute: This forces the driver to use mysql_use_result rather than mysql_store_result. The former is faster and less memory consuming, but tends to block other processes. (That's why mysql_store_result is the default.)
Think about it in client/server terms. When you ask the server to
spoon-feed you the data as you use it, the server process must buffer the
data, tie up that thread, and possibly keep any database locks open for a
long time. So if you read a row of data and ponder it for a while, the
tables you have locked are still locked, and the server is busy talking to
you every so often. That is mysql_use_result().
If you just suck down the whole dataset to the client, then the server is
free to go about its business serving other requests. This results in
parallelism since the server and client are doing work at the same time,
rather than blocking on each other doing frequent I/O. That is
mysql_store_result().
As the mysql manual suggests: you should not use mysql_use_result()
if you are doing a lot of processing for each row on the client side. This
can tie up the server and prevent other threads from updating the tables.
In this section you will find scripts, modules and code snippets to help
you get started using relational Databases with mod_perl scripts. Note that
I work with mysql ( http://www.mysql.com ), so the code
you find here will work out of box with mysql. If you use some other SQL
engine, it might work for you or it might need some changes. YMMV.
Having to write many queries in my CGI scripts, persuaded me to write a stand alone module that saves me a lot of time in coding and debugging my code. It also makes my scripts much smaller and easier to read. I will present the module here, with examples following:
Notice the DESTROY block at the end of the module, which makes various cleanups and allows
this module to be used under mod_perl and
mod_cgi as well. Note that you will not get the benefit of persistent database
handles with mod_cgi.
Notice the DESTROY block at the end of the module, which makes various cleanups and allows
this module to be used under mod_cgi as well.
(Note that you will not find this on CPAN. at least not yet :)
package My::DB;
use strict;
use 5.004;
use DBI;
use vars qw(%c);
%c =
(
# DB debug
#db_debug => 1,
db_debug => 0,
db => {
DB_NAME => 'foo',
SERVER => 'localhost',
USER => 'put_username_here',
USER_PASSWD => 'put_passwd_here',
},
);
use Carp qw(croak verbose);
#local $SIG{__WARN__} = \&Carp::cluck;
# untaint the path by explicit setting
local $ENV{PATH} = '/bin:/usr/bin';
#######
sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
my $self = {};
# connect to the DB, Apache::DBI takes care of caching the connections
# save into a dbh - Database handle object
$self->{dbh} = DBI->connect("DBI:mysql:$c{db}{DB_NAME}::$c{db}{SERVER}",
$c{db}{USER},
$c{db}{USER_PASSWD},
{
PrintError => 1, # warn() on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
)
or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");
# we want to die on errors if in debug mode
$self->{dbh}->{RaiseError} = 1 if $c{'db_debug'};
# init the sth - Statement handle object
$self->{sth} = '';
bless ($self, $class);
$self;
} # end of sub new
######################################################################
###################################
### ###
### SQL Functions ###
### ###
###################################
######################################################################
# print debug messages
sub d{
# we want to print the trace in debug mode
print "<DT><B>".join("<BR>", @_)."</B>\n" if $c{'db_debug'};
} # end of sub d
######################################################################
# return a count of matched rows, by conditions
#
# $count = sql_count_matched($table_name,\@conditions);
#
# conditions must be an array so we can pass more than one column with
# the same name.
#
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
#
# The sub knows automatically to detect and quote strings
#
##########################
sub sql_count_matched{
my $self = shift;
my $table = shift || '';
my $r_conds = shift || [];
# we want to print the trace in debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT COUNT(*) FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= "WHERE ". join " AND ", @where if @where;
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
my ($count) = $self->{sth}->fetchrow_array;
d("Result: $count");
$self->{sth}->finish;
return $count;
} # end of sub sql_count_matched
######################################################################
# return a single (first) matched value or undef, by conditions and
# restrictions
#
# sql_get_matched_value($table_name,$column,\@conditions,\@restrictions);
#
# column is a name of the column
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
# The sub knows automatically to detect and quote strings
#
# restrictions is a list of restrictions like ('order by email')
#
##########################
sub sql_get_matched_value{
my $self = shift;
my $table = shift || '';
my $column = shift || '';
my $r_conds = shift || [];
my $r_restr = shift || [];
# we want to print in the trace debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT $column FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
# restrictions (DONT put commas!)
$do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
d("SQL: $do_sql");
# do query
return $self->{dbh}->selectrow_array($do_sql);
} # end of sub sql_get_matched_value
######################################################################
# return a single row of first matched rows, by conditions and
# restrictions. The row is being inserted into @results_row array
# (value1,value2,...) or empty () if none matched
#
# sql_get_matched_row(\@results_row,$table_name,\@columns,\@conditions,\@restrictions);
#
# columns is a list of columns to be returned (username, fname,...)
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
# The sub knows automatically to detect and quote strings
#
# restrictions is a list of restrictions like ('order by email')
#
##########################
sub sql_get_matched_row{
my $self = shift;
my $r_row = shift || {};
my $table = shift || '';
my $r_cols = shift || [];
my $r_conds = shift || [];
my $r_restr = shift || [];
# we want to print in the trace debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT ";
$do_sql .= join ",", @{$r_cols} if @{$r_cols};
$do_sql .= " FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
# restrictions (DONT put commas!)
$do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
d("SQL: $do_sql");
# do query
@{$r_row} = $self->{dbh}->selectrow_array($do_sql);
} # end of sub sql_get_matched_row
######################################################################
# return a ref to hash of single matched row, by conditions
# and restrictions. return undef if nothing matched.
# (column1 => value1, column2 => value2) or empty () if non matched
#
# sql_get_hash_ref($table_name,\@columns,\@conditions,\@restrictions);
#
# columns is a list of columns to be returned (username, fname,...)
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
# The sub knows automatically to detect and quote strings
#
# restrictions is a list of restrictions like ('order by email')
#
##########################
sub sql_get_hash_ref{
my $self = shift;
my $table = shift || '';
my $r_cols = shift || [];
my $r_conds = shift || [];
my $r_restr = shift || [];
# we want to print in the trace debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT ";
$do_sql .= join ",", @{$r_cols} if @{$r_cols};
$do_sql .= " FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
# restrictions (DONT put commas!)
$do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
return $self->{sth}->fetchrow_hashref;
} # end of sub sql_get_hash_ref
######################################################################
# returns a reference to an array, matched by conditions and
# restrictions, which contains one reference to array per row. If
# there are no rows to return, returns a reference to an empty array:
# [
# [array1],
# ......
# [arrayN],
# ];
#
# $ref = sql_get_matched_rows_ary_ref($table_name,\@columns,\@conditions,\@restrictions);
#
# columns is a list of columns to be returned (username, fname,...)
#
# conditions must be an array so we can path more than one column with
# the same name. @conditions are being cancatenated with AND
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
# results in
# WHERE foo > 15 AND foo < 30
#
# to make an OR logic use (then ANDed )
# @conditions = ( column => ['comp_sign',['value1','value2']],
# foo => ['=',[15,24] ],
# bar => ['=',[16,21] ],
# );
# results in
# WHERE (foo = 15 OR foo = 24) AND (bar = 16 OR bar = 21)
#
# The sub knows automatically to detect and quote strings
#
# restrictions is a list of restrictions like ('order by email')
#
##########################
sub sql_get_matched_rows_ary_ref{
my $self = shift;
my $table = shift || '';
my $r_cols = shift || [];
my $r_conds = shift || [];
my $r_restr = shift || [];
# we want to print in the trace debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT ";
$do_sql .= join ",", @{$r_cols} if @{$r_cols};
$do_sql .= " FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
if (ref $$r_conds[$i+1][1] eq 'ARRAY') {
# multi condition for the same field/comparator to be ORed
push @where, map {"($_)"} join " OR ",
map { join " ",
$r_conds->[$i],
$r_conds->[$i+1][0],
sql_quote(sql_escape($_));
} @{$r_conds->[$i+1][1]};
} else {
# single condition for the same field/comparator
push @where, join " ",
$r_conds->[$i],
$r_conds->[$i+1][0],
sql_quote(sql_escape($r_conds->[$i+1][1]));
}
} # end of for(my $i=0;$i<@{$r_conds};$i=$i+2
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
# restrictions (DONT put commas!)
$do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
d("SQL: $do_sql");
# do query
return $self->{dbh}->selectall_arrayref($do_sql);
} # end of sub sql_get_matched_rows_ary_ref
######################################################################
# insert a single row into a DB
#
# sql_insert_row($table_name,\%data,$delayed);
#
# data is hash of type (column1 => value1 ,column2 => value2 , )
#
# $delayed: 1 => do delayed insert, 0 or none passed => immediate
#
# * The sub knows automatically to detect and quote strings
#
# * The insert id delayed, so the user will not wait untill the insert
# will be completed, if many select queries are running
#
##########################
sub sql_insert_row{
my $self = shift;
my $table = shift || '';
my $r_data = shift || {};
my $delayed = (shift) ? 'DELAYED' : '';
# we want to print in the trace debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "INSERT $delayed INTO $table ";
$do_sql .= "(".join(",",keys %{$r_data}).")";
$do_sql .= " VALUES (";
$do_sql .= join ",", sql_quote(sql_escape( values %{$r_data} ) );
$do_sql .= ")";
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
} # end of sub sql_insert_row
######################################################################
# update rows in a DB by condition
#
# sql_update_rows($table_name,\%data,\@conditions,$delayed);
#
# data is hash of type (column1 => value1 ,column2 => value2 , )
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
#
# $delayed: 1 => do delayed insert, 0 or none passed => immediate
#
# * The sub knows automatically to detect and quote strings
#
#
##########################
sub sql_update_rows{
my $self = shift;
my $table = shift || '';
my $r_data = shift || {};
my $r_conds = shift || [];
my $delayed = (shift) ? 'LOW_PRIORITY' : '';
# we want to print in the trace debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "UPDATE $delayed $table SET ";
$do_sql .= join ",",
map { "$_=".join "",sql_quote(sql_escape($$r_data{$_})) } keys %{$r_data};
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
# my ($count) = $self->{sth}->fetchrow_array;
#
# d("Result: $count");
} # end of sub sql_update_rows
######################################################################
# delete rows from DB by condition
#
# sql_delete_rows($table_name,\@conditions);
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
#
# * The sub knows automatically to detect and quote strings
#
#
##########################
sub sql_delete_rows{
my $self = shift;
my $table = shift || '';
my $r_conds = shift || [];
# we want to print in the trace debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "DELETE FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Must be very careful with deletes, imagine somehow @where is
# not getting set, "DELETE FROM NAME" deletes the contents of the table
warn("Attempt to delete a whole table $table from DB\n!!!"),return unless @where;
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where;
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
} # end of sub sql_delete_rows
######################################################################
# executes the passed query and returns a reference to an array which
# contains one reference per row. If there are no rows to return,
# returns a reference to an empty array.
#
# $r_array = sql_execute_and_get_r_array($query);
#
#
##########################
sub sql_execute_and_get_r_array{
my $self = shift;
my $do_sql = shift || '';
# we want to print in the trace debug mode
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
d("SQL: $do_sql");
$self->{dbh}->selectall_arrayref($do_sql);
} # end of sub sql_execute_and_get_r_array
#
#
# return current date formatted for a DATE field type
# YYYYMMDD
#
############
sub sql_date{
my $self = shift;
my ($mday,$mon,$year) = (localtime)[3..5];
return sprintf "%0.4d%0.2d%0.2d",1900+$year,++$mon,$mday;
} # end of sub sql_date
#
#
# return current date formatted for a DATE field type
# YYYYMMDDHHMMSS
#
############
sub sql_datetime{
my $self = shift;
my ($sec,$min,$hour,$mday,$mon,$year) = localtime();
return sprintf "%0.4d%0.2d%0.2d%0.2d%0.2d%0.2d",1900+$year,++$mon,$mday,$hour,$min,$sec;
} # end of sub sql_datetime
# Quote the list of parameters. Parameters consisting entirely of
# digits (i.e. integers) are unquoted.
# print sql_quote("one",2,"three"); => 'one', 2, 'three'
#############
sub sql_quote{ map{ /^(\d+|NULL)$/ ? $_ : "\'$_\'" } @_ }
# Escape the list of parameters (all unsafe chars like ",' are escaped)
# We make a copy of @_ since we might try to change the passed values,
# producing an error when modification of a read-only value is attempted
##############
sub sql_escape{ my @a = @_; map { s/([\'])/\\$1/g;$_} @a }
# DESTROY makes all kinds of cleanups if the fuctions were interuppted
# before their completion and haven't had a chance to make a clean up.
###########
sub DESTROY{
my $self = shift;
$self->{sth}->finish if defined $self->{sth} and $self->{sth};
$self->{dbh}->disconnect if defined $self->{dbh} and $self->{dbh};
} # end of sub DESTROY
# Don't remove
1;
To use My::DB in your script, you first have to create a My::DB
object:
use vars qw($db_obj); my $db_obj = new My::DB or croak "Can't initialize My::DB object: $!\n";
Now you can use any of My::DB's methods. Assume that we have a table called tracker where we store the names of the users and what they are doing in each and
every moment (think about online community program).
I will start with a very simple query--I want to know where the users are
and produce statistics. tracker is the name of the table.
# fetch the statistics of where users are
my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
("tracker",
[qw(where_user_are)],
);
my %stats = ();
my $total = 0;
foreach my $r_row (@$r_ary){
$stats{$r_row->[0]}++;
$total++;
}
Now let's count how many users we have (in table users):
my $count = $db_obj->sql_count_matched("users");
Check whether a user exists:
my $username = 'stas';
my $exists = $db_obj->sql_count_matched
("users",
[username => ["=",$username]]
);
Check whether a user is online, and get the time since she went online (since is a column in the tracker table, it tells us when a user went online):
my @row = ();
$db_obj->sql_get_matched_row
(\@row,
"tracker",
['UNIX_TIMESTAMP(since)'],
[username => ["=",$username]]
);
if (@row) {
my $idle = int( (time() - $row[0]) / 60);
return "Current status: Is Online and idle for $idle minutes.";
}
A complex query. I join two tables, and I want a reference to an array
which will store a slice of the matched query (LIMIT $offset,$hits) sorted by username. Each row in the array is to include the fields from the users table, but only those listed in @verbose_cols. Then we print it out.
my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
(
"tracker STRAIGHT_JOIN users",
[map {"users.$_"} @verbose_cols],
[],
["WHERE tracker.username=users.username",
"ORDER BY users.username",
"LIMIT $offset,$hits"],
);
foreach my $r_row (@$r_ary){
print ...
}
Another complex query. The user checks checkboxes to be queried by, selects
from lists and types in match strings, we process input and build the @where array. Then we want to get the number of matches and the matched rows as
well.
META: Add what the tables contain
my @where = ();
# Process the checkboxes - we turn them into a regular expression
foreach (keys %search_keys) {
next unless defined $q->param($_) and $q->param($_);
my $regexp = "[".join("",$q->param($_))."]";
push @where, ($_ => ['REGEXP',$regexp]);
}
# Add the items selected by the user from our lists
# selected => exact match
push @where,(country => ['=',$q->param('country')]) if $q->param('country');
# Add the parameters typed by the user
foreach (qw(city state)) {
push @where,($_ => ['LIKE',$q->param($_)]) if $q->param($_);
}
# Count all that matched the query
my $total_matched_users = $db_obj->sql_count_matched
(
"users",
\@where,
);
# Now process the orderby
my $orderby = $q->param('orderby') || 'username';
# Do the query and fetch the data
my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
(
"users",
\@display_columns,
\@where,
["ORDER BY $orderby",
"LIMIT $offset,$hits"],
);
sql_get_matched_rows_ary_ref knows to handle both ORed and
ANDed params. This example shows how to use OR on parameters:
This snippet is an implementation of a watchdog. Our users want to know
when their colleagues go online. They register the usernames of the people
they want to know about. We have to make two queries: one to get a list of
usernames, the second to find out whether any of these users is online. In
the second query we use the OR keyword.
# check who we are looking for
$r_ary = $db_obj->sql_get_matched_rows_ary_ref
("watchdog",
[qw(watched)],
[username => ['=',$username)],
],
);
# put them into an array
my @watched = map {$_->[0]} @{$r_ary};
my %matched = ();
# Does the user have some registered usernames?
if (@watched) {
# Try to fetch all the users who match the usernames exactly.
# Put it into an array and compare it with a hash!
$r_ary = $db_obj->sql_get_matched_rows_ary_ref
("tracker",
[qw(username)],
[username => ['=',\@watched],
]
);
map {$matched{$_->[0]} = 1} @{$r_ary};
}
# Now %matched includes the usernames of the users who are being
# watched by $username and currently are online.
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/17/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
If you need a light database, with an easy API, using simple key-value pairs to store and manipulate the records, this is a solution that should be amongst the first you consider. The maximum practical size of a dbm database depends on your hardware and the desired response times of course, but as a rough guide consider 5000 to 10000 records to be reasonable.
Some of the earliest databases implemented on Unix were dbm files, and many are still in use today. As of this writing the Berkeley DB is the most powerful dbm implementation.
With dbm, the whole database is rarely read into a memory. Combine this feature with the use of smart storage techniques, and dbm files can be manipulated much faster than their flat file brothers. Flat file databases can become very slow on insert, update and delete operations, especially when the number of records exceeds a couple of thousand. The situation is worse if you need to run a sort algorithm on a flat file.
Several different indexing algorithms can be used with dbm:
The HASH algorithm gives a 0(1) complexity of search and update, fast insert and delete, but a slow sort.
(You have to do it yourself.)
The BTREE algorithm allows arbitrary key/value pairs to be stored in a sorted,
balanced binary tree, which allows us to get a sorted sequence of data
pairs in 0(1), but at the expense of much slower insert, update, delete operations than
is the case with HASH.
The RECNO algorithm is more complicated, and enables both fixed-length and
variable-length flat text files to be manipulated using the same key/value
pair interface as in HASH and BTREE. In this case the key will consist of a record (line) number.
Most often you will want to use the HASH method, but your choice depends very much on your application.
dbm databases are not limited to storing key/value pairs. They can store more
complicated data structures with the help of the MLDBM
module. This module can dump and restore the whole symbol table of your
script, including arrays, hashes and other complicated data structures.
It is important to note that you cannot simply switch a dbm file from one storage algorithm to another. The only way to change the algorithm is to dump the data to a flat file and then restore it using the new storage method. You can use a script like this:
#!/usr/bin/perl -w
#
# This script gets as a parameter a Berkeley DB file(s) which is stored
# with DB_BTREE algorithm, and will backup it with .bak and create
# instead the db with the same records but stored with DB_HASH
# algorithm
#
# Usage: btree2hash.pl filename(s)
use strict;
use DB_File;
use File::Copy;
# Do checks
die "Usage: btree2hash.pl filename(s))\n" unless @ARGV;
foreach my $filename (@ARGV) {
die "Can't find $filename: $!\n" unless -e $filename and -r $filename;
# First backup the file
move("$filename","$filename.btree")
or die "can't move $filename $filename.btree:$!\n";
my %btree;
my %hash;
# tie both dbs (db_hash is a fresh one!)
tie %btree , 'DB_File',"$filename.btree", O_RDWR|O_CREAT,
0660, $DB_BTREE or die "Can't tie %btree";
tie %hash , 'DB_File',"$filename" , O_RDWR|O_CREAT,
0660, $DB_HASH or die "Can't tie %hash";
# copy DB
%hash = %btree;
# untie
untie %btree ;
untie %hash ;
}
Note that some dbm implementations come with other conversion utilities as well.
Where does mod_perl fit into the picture?
If you are using a read only dbm file you can have it work faster if you keep it open (tied) all the time, so when your CGI script wants to access the database it is already tied and ready to be used. It will work with dynamic (read/write) databases as well but you need to use locking and data flushing to avoid data corruption.
Although mod_perl and dbm can give huge performance gains to your CGIs
scripts, you should be very careful. You need to consider locking, and the
consequences of die() and unexpected process deaths.
If your locking mechanism cannot handle dropped locks, a stale lock can deactivate your whole site. You can enter a deadlock situation if two processes simultaneously try to acquire locks on two separate databases. Each has locked only one of the databases, and cannot continue without locking the second. Yet this will never be freed because it is locked by the other process. If your processes all ask for their DB files in the same order, this situation cannot occur.
If you modify the DB you should be make very sure that you flush the data and synchronize it, especially when the process serving your CGI unexpectedly dies. In general your application should be tested very thoroughly before you put it into production to handle important data.
Let's make the lock status a global variable, so it will persist from request to request. If we request a lock - READ (shared) or WRITE (exclusive), we obtain the current lock status first.
If we are making a READ lock request, it is granted as soon as the file becomes unlocked or if it is already READ locked. The lock status becomes READ on success.
If we make a WRITE lock request, it is granted as soon as the file becomes unlocked. The lock status becomes WRITE on success.
The treatment of the WRITE lock request is most important.
If the DB is READ locked, a process that makes a WRITE request will poll until there are no reading or writing processes left. Lots of processes can successfully read the file, since they do not block each other. This means that a process that wants to write to the file (so first it needs to obtain an exclusive lock) may never get a chance to squeeze in. The following diagram represents a possible scenario where everybody can read but no one can write:
[-p1-] [--p1--]
[--p2--]
[---------p3---------]
[------p4-----]
[--p5--] [----p5----]
The result is a starving process, which will timeout the request, and it will fail to update the DB. This is a good reason not to cache the dbm handle with dynamic dbm files. It will work perfectly with static DBM files without any need to lock files at all.
Ken Williams solved the above problem with his Tie::DB_Lock module, which I will present in the next section.
Tie::DB_Lock ties hashes to databases using shared and exclusive locks. This module, by
Ken Williams, solves the problems raised in the previous section.
The main difference from what I have described above is that
Tie::DB_Lock copies a dbm file on read. Reading processes do not have to keep the file
locked while they read it, and writing processes can still access the file
while others are reading. This works best when you have lots of
long-duration reading, and a few short bursts of writing.
The drawback of this module is the heavy IO performed when every reader makes a fresh copy of the DB. With big dbm files this can be quite a disadvantage and can slow the server down considerably.
An alternative would be to have one copy of the dbm image shared by all the reading processes. This can cut the number of files that are copied, and puts the responsibility of copying the read-only file on the writer, not the reader. It would need some care to make sure it does not disturb readers when putting a new read-only copy into place.
Caution: The suggested locking methods in the Camel book and DB_File man page (at least before the version 1.72) are flawed. If you use them in an environment where more than one process can modify the dbm file, it can get corrupted!!! The following is an explanation of why this happens.
You may not use a tied file's filehandle for locking, since you get the filehandle after the file has been already tied. It's too late to lock. The problem is that the database file is locked after it is opened. When the database is opened, the first 4k (in my dbm library) are read and then cached in memory. Therefore, a process can open the database file, cache the first 4k, and then block while another process writes to the file. If the second process modifies the first 4k of the file, when the original process gets the lock is now has an inconsistent view of the database. If it writes using this view it may easily corrupt the database on disk.
This problem can be difficult to trace because it does not cause corruption every time a process has to wait for a lock. One can do quite a bit of writing to a database file without actually changing the first 4k. But once you suspect this problem you can easily reproduce it by making your program modify the records in the first 4k of the DB.
On some Operating Systems like FreeBSD, it's possible to lock on tie:
tie my %t, 'DB_File', $TOK_FILE, O_RDWR | O_EXLOCK, 0664;
and only release the lock by untieing the file. Notice the O_EXLOCK
flag, which is not available on all Operating Systems.
Here is DB_File::Lock which does the locking by using an external lockfile. This allows you to
gain the lock before the file is tied. Note that it's not yet on CPAN and
so is listed here in its entirety. Note also that this code still needs
some testing, so be
careful if you use it on a production machine.
package DB_File::Lock;
require 5.004;
use strict;
BEGIN {
# RCS/CVS compliant: must be all one line, for MakeMaker
$DB_File::Lock::VERSION = do { my @r = (q$Revision: 1.5 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r };
}
use DB_File ();
use Fcntl qw(:flock O_RDWR O_CREAT);
use Carp qw(croak carp verbose);
use Symbol ();
@DB_File::Lock::ISA = qw( DB_File );
%DB_File::Lock::lockfhs = ();
use constant DEBUG => 0;
# file creation permissions mode
use constant PERM_MODE => 0660;
# file locking modes
%DB_File::Lock::locks =
(
read => LOCK_SH,
write => LOCK_EX,
);
# SYNOPSIS:
# tie my %mydb, 'DB_File::Lock', $filepath,
# ['read' || 'write', 'HASH' || 'BTREE']
# while (my($k,$v) = each %mydb) {
# print "$k => $v\n";
# }
# untie %mydb;
#########
sub TIEHASH {
my $class = shift;
my $file = shift;
my $lock_mode = lc shift || 'read';
my $db_type = shift || 'HASH';
die "Dunno about lock mode: [$lock_mode].\n
Valid modes are 'read' or 'write'.\n"
unless $lock_mode eq 'read' or $lock_mode eq 'write';
# Critical section starts here if in write mode!
# create an external lock
my $lockfh = Symbol::gensym();
open $lockfh, ">$file.lock" or die "Cannot open $file.lock for writing: $!\n";
unless (flock $lockfh, $DB_File::Lock::locks{$lock_mode}) {
croak "cannot flock: $lock_mode => $DB_File::Lock::locks{$lock_mode}: $!\n";
}
my $self = $class->SUPER::TIEHASH
($file,
O_RDWR|O_CREAT,
PERM_MODE,
($db_type eq 'BTREE' ? $DB_File::DB_BTREE : $DB_File::DB_HASH )
);
# remove the package name in case re-blessing occurs
(my $id = "$self") =~ s/^[^=]+=//;
# cache the lock fh
$DB_File::Lock::lockfhs{$id} = $lockfh;
return $self;
} # end of sub new
# DESTROY is automatically called when a tied variable
# goes out of scope, on explicit untie() or when the program is
# interrupted, e.g. with a die() call.
#
# It unties the db by forwarding it to the parent class,
# unlocks the file and removes it from the cache of locks.
###########
sub DESTROY{
my $self = shift;
$self->SUPER::DESTROY(@_);
# now it safe to unlock the file, (close() unlocks as well). Since
# the object has gone we remove its lock filehandler entry
# from the cache.
(my $id = "$self") =~ s/^[^=]+=//; # see 'sub TIEHASH'
close delete $DB_File::Lock::lockfhs{$id};
# Critical section ends here if in write mode!
print "Destroying ".__PACKAGE__."\n" if DEBUG;
}
####
END {
print "Calling the END from ".__PACKAGE__."\n" if DEBUG;
}
1;
And you use it like this:
use DB_File::Lock ();
A simple tie, READ lock and untie
use DB_File::Lock ();
my $dbfile = "/tmp/test";
tie my %mydb, 'DB_File::Lock', $dbfile, 'read';
print $mydb{foo} if exists $mydb{foo};
untie %mydb;
You can even skip the untie() call. When $mydb goes out of scope everything will be done automatically. However it is
better use the explicit call, to make sure the critical sections between
lock and unlock are as short as possible. This is especially important when
requesting an exclusive (write) lock.
The following example shows how it might be convenient to skip the explicit untie(). In this example, we don't need to save the intermediate result, we just
return and the cleanup is done automatically.
use DB_File::Lock ();
my $dbfile = "/tmp/test";
print user_exists("stas") ? "Yes" : "No";
sub user_exists{
my $username = shift || '';
warn("No username passed\n"), return 0 unless $username;
tie my %mydb, 'DB_File::Lock', $dbfile, 'read';
# if we match the username return 1, else 0
return $mydb{$username} ? 1 : 0;
} # end of sub user_exists
Now let's write all the upper case characters and their respective ASCII values to a dbm file. Then read the file and print them the contents of the DB, unsorted.
use DB_File::Lock ();
my $dbfile = "/tmp/test";
# write
tie my %mydb, 'DB_File::Lock', $dbfile,'write';
for (0..26) {
$mydb{chr 65+$_} = $_;
}
untie %mydb;
# now, read them and printout (unsorted)
tie %mydb, 'DB_File::Lock', $dbfile;
while (my($k,$v) = each %mydb) {
print "$k => $v\n";
}
untie %mydb;
If your CGI was interrupted in the middle, DESTROY block will take care of unlocking the dbm file and flush any changes. So
your DB will be safe against possible corruption because of unclean program
termination.
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/19/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
You have just installed this new CGI script and when you try it out you see the grey screen of death saying ``Internal Server Error''... Or even worse you have a script running on production server for a long time without problems, when suddenly the same grey screen occasionally shows up.
What are you going to do? How can find out what the problem is? You code in Perl for years, and whenever an error was occuring you always saw it as it was displayed in the same terminal window you've started the script from. But when you work with webserver, there is no terminal to look for errors, since the server in most cases has no terminal to send the error messages to.
Actually, the error messages don't disappear, there end up in the
error_log file, that located in the directory specified by an
ErrorLog directive in the httpd.conf file. The default setting is generally:
ErrorLog /usr/local/apache/logs/error_log
So whenever you see the "Internal Server Error" it's a time to look at this file. We have solved the first problem, where to look for error messages.
There is a chance that seeing the error message doesn't really help to spot and fix the error. The error message can be of immediate help, but it might not help at all. The usefulness of the error message depends solely on the programmers coding style.
Let's take an example of the call to a function that opens a file passed as a parameter and does nothing with it. The first version of the code:
my $r = shift;
$r->send_http_header('text/plain');
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die;
}
open_file("/tmp/test.txt");
I assume that /tmp/test.txt doesn't exist so the open() would fail to open the file. When
we call this script from our browser, the browser returns an "internal error" message and we see the following error at at the end of error_log file:
Died at /home/httpd/perl/test.pl line 9.
So we use the hint, Perl kindly gave to us to find where in the code the
die() was called. What we still don't know is what filename
that was passed to this subroutine caused the program termination. When we
have only once function call like in the example above -- the task of
finding the problematic file is trivial.
Now let's add two more open_file() function calls and assume
that among tree files only /tmp/test2.txt exists:
open_file("/tmp/test.txt");
open_file("/tmp/test2.txt");
open_file("/tmp/test3.txt");
When you execute the above call, you will see the same error message for two times.
Died at /home/httpd/perl/test.pl line 9. Died at /home/httpd/perl/test.pl line 9.
Based on this error message, can you tell what files your program failed to
open? Probably not. Let's fix it by passing to die() the name
of the file is question.
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die "failed to open $filename";;
}
open_file("/tmp/test.txt");
When we execute the above code, we see:
failed to open /tmp/test.txt at /home/httpd/perl/test.pl line 9.
Which makes a big difference, since we know what file we should be checking on.
By the way, if you append a newline at the end of the message you pass to
die(), perl wouldn't report the line number the error has
happened at, so if you code:
open FILE, $filename or die "failed to open a file\n";
The error message in case of failure would be:
failed to open a file
Which gives you no debug information at all. It's very hard to debug this kind of code.
The warn() function, a kinder sister of die(),
which logs the message but doesn't cause the program termination, behaves
in the same way -- if you don't add a newline at the end of the message,
the line number warn() was called at would be logged,
otherwise it wouldn't.
You might want to use warn() instead of die(), if
the file opening failure isn't critical, consider the following code:
if(open FILE, $filename){
# do something with file
} else {
warn "failed to open a $filename";
}
# more code here...
So, we improved our code to report to us the names of the problematic
files, but we still don't know the reason for open()'s
failure. Let's try to improve the warn() example:
if(-r $filename){}
open FILE, $filename;
# do something with file
} else {
warn "$filename doesn't exist or is not readable";
}
We see the warning in the error_log file:
/tmp/test.txt doesn't exist or is not readable at /home/httpd/perl/test.pl line 9.
Since it tells us the reason for failure and we don't have to go to the
code and check what it was trying to do with a file: open it for writing,
reading or else. -r operator tests whether the file is readable.
It could by quite an overhead to explain the possible failure that way But
why reinvent the wheel, when we already have the reason of failure stored
in $! variable. Let's go back to the open_file() function:
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die "failed to open $filename: $!";
}
open_file("/tmp/test.txt");
We see:
failed to open /tmp/test.txt: No such file or directory at /home/httpd/perl/test.pl line 9.
Now we have all the information we ever need to debug this problems: we
know what line of code triggered die(), we know what file was
attempted to be opened and the last but not least -- the reason, which an
operational system gladly tells to us thru $! variable.
Now let's create the /tmp/test.txt file, so it would exist
% touch /tmp/test.txt
Now when we execute the latest version of the code, we see:
failed to open a /tmp/test.txt: Permission denied at /home/httpd/perl/test.pl line 9.
We see a different reason: I've created the file that doesn't belong to user nobody, the server runs as. So it has no permission to read the file.
Now you understand that it's much easier to debug your code if you validate
the return values of the system calls, and properly code arguments to
die() and warn() calls. open()
function is just one of the many system calls perl provides to your
convenience.
So now you can code and debug CGI scripts and modules, as easy as if they were plain perl scripts that you used to execute from a shell.
It's a good idea to keep it open all the time in a dedicated terminal with help of tail -f.
% tail -f /usr/local/apache/logs/error_log
So you will see all the errors and warning immediately showing up as they happen.
Another tips is to create an shell alias, to make it easier to execute the above command. In tcsh you would do:
% alias err "tail -f /usr/local/apache/logs/error_log"
and from now on in the shell you set the alias in, executing err will call the tail -f /usr/local/apache/logs/error_log. Since you want this alias to be available to you all the time, you should put it into a .tcshrc file or its equivalent if you don't use tcsh. (.bashrc for bash users)
Just like errors, perl's mandatory warnings are going to the
error_log file, if the they are enabled.
The code you write lives a dual life. In the first life it's being written, tested, debugged, improved, tested, debugged, rewritten, tested, debugged. In the second life it's being used, period.
A significant part of the first life the script spends at the developers, its personal God's machine. The other part is being spent at the production server where the developer's creature is supposed to be perfect, since it was created in his own image...
So when you develop the code you want all the help in the world, to help
you spot possible problems, and that's where enabling warning is a must
mode to enable. It's very important to get rid of all or at least most of
the warnings that appear in the error_log file. Why?
If there are warnings -- your code is not clean, and if they are waved away -- expect them to hit back on production server, when it's too late.
The other not less important reason, is that when each script's invocation generates more than 5 lines of warnings, it's very hard to catch real problems, as you just cannot see them among all these warnings you believe are unimportant.
On the other hand, on production server, you really *want* to turn warnings off. And there are good reasons for that:
There is no added value in having the same warning showing up, when
triggered by thousands of script invocations. If your code isn't very clean
and generates even a single warning per script invocation , you will end up
with a huge error_log file in a short time on the heavily loaded server. Imagine what happens
when you've got more than one warning appended to the log file. The
warnings elimination phase is supposed to be a part of the development
process, and should be done before the code goes live.
Enabling runtime warning checking has a small performance impact (in any perl script, not just under mod_perl).
mod_perl gives you a very simple solution to this warnings saga, don't enable warnings in the scripts unless you really have to. Let mod_perl to control this mode globally. All it takes is having a:
PerlWarn On
directive added to httpd.conf on your development machine and having a:
PerlWarn Off
directive at the live box's configuration file.
If there is a piece of code that generates warnings and really want to
disable them only in this code, you can do that too. $^W special variable allows you to dynamically turn on and off the warnings
mode. So just embrace the code into a block, and disable the warnings
through the scope of this block. The original value of $^W will be restored upon exit from the block.
{
local $^W=0;
# some code that generates warnings
}
Again, unless you have a really good reason, for your own sake the advise is avoid this workaround.
Don't forget the local() operand, as if you do, $^W will affect all the requests processed by the same process that globally
changed this variable.
diagnostics pragma can shed more light on the errors and warnings as you will see in a
moment.
This module extends the terse diagnostics normally emitted by both the perl
compiler and the perl interpreter, augmenting them with the more
explicative and endearing descriptions found in the perldiag
manpage. Like the other pragmata, it affects the compilation phase of your
program rather than merely the execution phase.
To use in your program as a pragma, merely invoke
use diagnostics;
at the start (or near the start) of your program. This pragma turns on the -w mode as well.
Note that generally this pragma is useful, when you are new to perl, and want a better explanation of the errors and warnings, or when you encounter some warning you've never seen before, e.g. when this new warning was introduced in a newer version of Perl.
If leaving the warnings On on production server, might consume your hard
disk space much faster, with diagnostics pragma you will run out of space about ten times faster if your code
generates warnings. Since for each line of text generated by mere warnings
mode,
diagnostics generates ten times more.
The other reason, is a huge performance overhead that is being added in comparison with just having warnings On. Let's see some numbers. We will run the same benchmark, once with enabled diagnostics and once disabled on a subroutine test_code which does nothing, but doing a power of two numbers in the loop, a numeric comparison of two strings and assignment of one string to another which never happens, because the conditions is the same all the time and it's false. The wrong comparison choice is intentional and you will understand the choice in a second. By the way, the choice of the rest of the code inside test_code subroutine was absolutely at random.
use Benchmark;
use diagnostics;
my $count = 10000;
disable diagnostics;
$t1 = timeit($count,\&test_code);
enable diagnostics;
$t2 = timeit($count,\&test_code);
print "Diagnostics off:",timestr($t1),"\n";
print "Diagnostics on :",timestr($t2),"\n";
sub test_code{
for my $i (1..10) {
my $j = $i**2;
}
$a = "Hi";
$b = "Bye";
if ($a == $b) {
$c = $a;
}
}
For only a few lines of code we get:
Diagnostics off: 2 wallclock secs ( 1.77 usr + 0.02 sys = 1.79 CPU) Diagnostics on :17 wallclock secs (13.16 usr + 0.08 sys = 13.24 CPU)
Result: the code running with enabled diagnostics runs seven times slower!!!
Now let's fix the comparison the way it should be, by replacing
== with eq, so we get:
$a = "Hi";
$b = "Bye";
if ($a eq $b) {
$c = $a;
}
and run the same benchmark again:
Diagnostics off: 1 wallclock secs ( 1.43 usr + 0.01 sys = 1.44 CPU) Diagnostics on : 2 wallclock secs ( 1.41 usr + 0.01 sys = 1.42 CPU)
Amazing, but now there is no overhead at all. And why is that? As we find
out, that diagnostics pragma slows things down only when something is wrong with the code.
It was just a little example, but it's obvious that you wouldn't benchmark all your scripts to check whether you have to remove this pragma or not. Just remember to remove it, when your code goes live.
While debugging my mod_perl and general CGI code, I keep the
error_log file open in a dedicated terminal window (xterm), so I can see errors and warnings as soon as they are appended to the
file. I do it with:
tail -f /usr/local/apache/logs/error_log
which shows all the lines that are being added lately into the file.
If you cannot access your error_log file because you are unable to telnet to your machine (generally a case
with some ISPs who provides user CGI support but no telnet access), you
might want to use a CGI script I wrote to fetch the latest lines from the
file (with a bonus of colored output for an easier reading). You might need
to ask your ISP to install this script for a general usage. See Watching the error_log file without telneting to the server.
Sometimes a httpd process might hang in a middle of a request processing, either because there is a bug in your code (i.e. the code is stuck in a while loop, blocked by some system call or because of a resource deadlock) or for some other reason. There are two things we want to know: when and why this happens.
# META: handle this
#=head1 Spinning httpds
#To see where an httpd is ``spinning'', try adding this to your script or #a startup file:
# use Carp (); # $SIG{'USR1'} = sub { # Carp::confess(``caught SIGUSR1!''); # };
#Then issue the command line:
# kill -USR1 <spinning_httpd_pid>
Just to give you an idea of what kind of bug might cause the code to hang,
let's look at the following example. Your process have to gain lock on some
resource (o.e. file) before it continues, so it makes an attempt and if
fails (no lock gained), it sleep()s for a second and increment
the counter of attempts.
until(gain_lock()){
$tries++;
sleep 1;
}
Either because there are many processes competing on this resource or because there is a deadlock (a situation when two processes X and Y need resources A and B to continue, where X process holds on A and Y on B. There is no possibility for Y process to continue before X releases the resource A. But X cannot release A before it gets Y. Therefore this event is being known as deadlock.
A real world situation that you may encounter very often is an exclusive lock starvation. Generally there are two lock types in use: SHARED lock which allows many processes to perform simultaneously READ operation and EXCLUSIVE lock which ensures an access by a single process, which makes possible a safe WRITE operation.
You can lock any kind of resource, in our example we talk about files.
If there is a READ lock request, it is granted as soon as file becomes unlocked or already READ locked. Lock status becomes READ on success.
If there is a WRITE lock request, it is granted as soon as file becomes unlocked. Lock status becomes WRITE on success.
What happens to the WRITE lock request, is the most important. If the file is being READ locked, a process that requests to write will poll until there will be no reading or writing process left. Lots of processes can successfully read the file, since they do not block each other from doing so. This means that a process that wants to write to the file (first obtaining an exclusive lock) never gets a chance to squeeze in. The following diagram represents a possible scenario where everybody read but no one can write:
[-p1-] [--p1--]
[--p2--]
[---------p3---------]
[------p4-----]
[--p5--] [----p5----]
Let's look at the real code and see it in action. The following script
imports flock() related parameters from the Fcntl module, opens a file that will be locked and we define and set two
variables:
$lock_type and $lock_type_verbose which are set to LOCK_EX
and EX if the first command line argument ($ARGV[0]) is defined
and equal to <EM>w</EM> indicating that this process will try to gain
<EM>WRITE</EM> (exclusive) lock, otherwise the two are set to <CODE>LOCK_SH</CODE> and
<SH for SHARED (read) lock.
Once the variables are set, we enter the never ending while(1) loop that attempts to lock the file by the mode set in $lock_type, report success and type of lock that was gained, then sleeps for a random
period between 0 to 9 seconds and unlocks the file. Then the loop starts
from the beginning.
lock.pl
-------------------
#!/usr/bin/perl -w
use Fcntl qw(:flock);
$lock = "/tmp/lock";
open LOCK, ">$lock" or die "Cannot open $lock for writing: $!";
my $lock_type = LOCK_SH;
my $lock_type_verbose = 'SH';
if (defined $ARGV[0] and $ARGV[0] eq 'w'){
$lock_type = LOCK_EX;
$lock_type_verbose = 'EX';
}
while(1){
flock LOCK,$lock_type;
# start of critical section
print "$$: $lock_type_verbose\n";
sleep int(rand(10));
# end of critical section
flock LOCK, LOCK_UN;
}
close LOCK;
When spawning a few of the above scripts simultaneously and making sure that the first processes to start are READ processes and there is majority of them, it's very easy to see the WRITE processes starvation. Execute three read and one write processes like:
% ./lock.pl r & ; ./lock.pl r & ; ./lock.pl r & ; ./lock.pl w &
You see something like:
24233: SH 24232: SH 24232: SH 24233: SH 24232: SH 24233: SH 24231: SH 24231: SH 24231: SH
and not a single EX line... When you kill off the reading processes, then the write lock will
be gained. Note that this is a rough example, since I've used
sleep() function. To emulate a real situation you need to use Time::HiRes module which allows you to sleep for microseconds.
The interval between lock and unlock is being called a Critical Section, which should be kept as little as possible in terms of time, and not in terms of amount of the code. As you just saw, a single sleep statement can make the critical section long.
To summarize the presented case, if you have a script that uses both READ and WRITE locks and the critical section isn't very short, The writing process might get into a starvation mode and after a while a browser that initiated this request will timeout the connection and abort the request, but it's more likely that user will press the Stop or Reload button before it happens. Since the process in question just waits, there is no way for Apache to know that the request was aborted and it will hang till the lock will be gained and only when a write to a client's broken connection will be attempted, Apache will terminate the script.
So this was a single example of how the process can hang.
It's not so easy to detect the hanging process. There is no way you can
tell how long the request is being processed by using plain system
utilities like ps() and top(). The reason is that
each Apache process serves many requests without quitting. System utilities
can tell how long the process is running since its creation, but this
information is useless in our case, since the long running Apache process
is a normal and expected behavior.
However there are a few approaches that can help to detect the hanging process.
If the process hangs and demands lots of resources it's quite easy to bust
it by monitoring the output of top() utility. You will see the
same process show up in the first few lines of the automatically refreshed
report. But many times the hanging process, uses little or close to zero
resources, e.g. when waiting for some event to happen.
Another easy spotting is when some process trashes the error_log
and writes millions of error messages there... Generally this process uses
lots of resources and spotted by using top() as described
above.
What we have to use are the tools that report the status of the Apache
processes. You can use either a mod_status module, which usually accessed
from /server_status location, or an Apache::VMonitor
module. Both tools provide counters of processed requests per Apache
process. So what you can do is to watch the report for about 5-10 minutes
spotting which process number has the same number of processed requests
while its status is 'W' (Which means that it hangs), but when you have
about 50 processes, it's quite hard to spot such a process. So let's write
a watchdog to do the work for us:
.....META??? Apache::SafeHang code
When you've got a real problem and the processes hang one after the other,
the moment comes when the number of hanging processes becomes equal to the
value of MaxClients directive, which means that no more processes will be spawned and your
service is halted from the point of user. This is easy to detect, attempt
to resolve and notify the administrator by a simple crontab watchdog that
requests some very light script an every minute or so. (See Monitoring the Server. A watchdog.)
In the watchdog you set a timeout you think is appropriate for your service, which may vary between a few seconds and 1 minute. If the server fails to respond before the timeout expires, watchdog has spotted a trouble and attempts to restart the server. After a restart an email report is being sent to administrator reporting first that there was a problem, second whether the restart was successful or not.
If you get such reports constantly something is wrong with your web service and you should revise your code. Note that it's possible that your server is overloaded when being hit by more requests that it can handle, so the requests are being queued and not processed for awhile, which triggers the watchdog's alarm. If this is a case you need to add more servers, memory and probably to split your single machine across a cluster of webserver machines.
Given the process pid, there are two ways to find out where it's hanging.
Depending on operating system you should have either truss
or strace utilities available within your code development software. The usage is
simple:
% truss -p PID
or
% strace -p PID
Replace PID with a process number you want to check on.
Let's write a program that hangs and deploy strace to find out the point it hangs at:
hangme.pl
---------
$|=1;
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
while(1){
$i++;
sleep 1;
}
The reason this simple code hangs is obvious from its examination -- the program never breaks from the while loop. As you have noticed, I print the PID of the current process to the browser, to learn what process to look after. Of course in a real situation, you cannot do the same trick. In the previous section I have presented a few ways to detect the runaway processes and their PIDs.
I save the above code in a file and execute it from the browser. Note that
I've made the STDOUT unbuffered with $|=1; so I would immediately see the process ID. Once the script make a request
the script prints its process PID and obviously hangs. So we press the
'Stop' button, but the process continues to hang in this code. Isn't apache
supposed to detect the broken connection and abort the request processing?
Yes and No, you will understand soon what's really happening.
First let's attach to the process and see what's it doing. I use the PID the script printed to the browser, which is 10045 in this case:
% strace -p 10045 [...truncated an identical output...] SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0 SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0 SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2) = 0 nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0 time([940973834]) = 940973834 time([940973834]) = 940973834 [...truncated the identical output...]
It doesn't what we have expected to see, does it? These are some system calls we don't see in our little example. What we actually see is how Perl translates our code into a system calls. Since we know that our code hangs in this snippet:
while(1){
$i++;
sleep 1;
}
We "easily" figure out that the first three system calls implement the $i++, while the other other three are responsible for the
sleep 1 call.
Generally the situation is quite opposite. You detect the hanging process, you attach to it and watch the trace of calls it does (or the last commands if the process hangs waiting for something, e.g. when blocking on file lock request). From watching the trace you should figure out what actually it's doing and probably find the corresponding lines in your perl code. For example let's see how one process "hangs" while requesting an exclusive lock on the file exclusively locked by another process:
excl_lock.pl
---------
use Fcntl qw(:flock);
use Symbol;
if ( fork() ) {
my $fh = gensym;
open $fh, ">/tmp/lock" or die "cannot open /tmp/lock $!";
print "$$: I'm going to obtain the lock\n";
flock $fh, LOCK_EX;
print "$$: I've got the lock\n";
sleep 20;
close $fh;
} else {
my $fh = gensym;
open $fh, ">/tmp/lock" or die "cannot open /tmp/lock $!";
print "$$: I'm going to obtain the lock\n";
flock $fh, LOCK_EX;
print "$$: I've got the lock\n";
sleep 20;
close $fh;
}
The code is simple. The process executing the code forks a second process, and both are doing the same thing: generate an unique symbol to be used as a file handler open the lock file for writing using the generated symbol, lock the file in an exclusive mode sleep for 20 seconds, pretending doing some lengthy operations and close the lock file, which also unlocks the file.
gensym function is a courtesy of Symbol module the code imports it from. Fcntl module provides us with a symbolic constant
LOCK_EX which is being imported with :flock tag, which imports this an other flock() function attributes.
The code used by both processes is identical, therefore we cannot predict
which one will get its hands on the lock file and succeed to lock it first,
so we add print() statements to find out the PID of the
blocking on lock request process.
When the above code executed from the command line, we see that one of the processes gets the lock:
% ./excl_lock.pl 3038: I'm going to obtain the lock 3038: I've got the lock 3037: I'm going to obtain the lock
We see that process 3037 is blocking (waiting to get the lock), so we attach to it:
% strace -p 3037 about to attach c10 flock(3, LOCK_EX
It's clear from the above trace, that the process waits for exclusive lock.
The more you watch traces of different processes, the easier the understanding of what actually happens would be
Another approach to see another kind of trace of the running code is to use gdb (GNU debugger) (or another debugger). It's supposed to work at any platform
the GNU development tools were ported to. Its purpose is to allow you to
see what is going on ``inside'' another program while it executes--or what
another program was doing at the moment it crashed. gdb requires the path to the binary program that the process you want to
examine is executing, in addition to the process ID. In case of perl code
it's /usr/bin/perl or a different path, for httpd process it would be the path to your httpd
executable. I will show a few examples of using gdb to get a better
understanding.
For example let's go back to our last locking example, execute it as before and attach to the process that didn't get the lock and waits:
% gdb /usr/bin/perl 3037
The moment the debugger was started, we execute where command to see the trace:
(gdb) where
#0 0x40131781 in __flock ()
#1 0x80a5421 in Perl_pp_flock ()
#2 0x80b148d in Perl_runops_standard ()
#3 0x80592b8 in perl_run ()
#4 0x805782f in main ()
#5 0x400a6cb3 in __libc_start_main (main=0x80577c0 <main>, argc=2,
argv=0xbffff7f4, init=0x8056af4 <_init>, fini=0x80b14fc <_fini>,
rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffff7ec)
at ../sysdeps/generic/libc-start.c:78
Again, that's not what we've expected to see and now it's a different
trace. #0 tells us the most recent call that was executed, which is a C language
level flock()'s implementation, but the previous call (#1) isn't print() as we would expect, but a higher level of
Perl's internal flock(). If we follow the trace of calls, what
we actually see is an Opcodes tree, which can be better presented as:
__libc_start_main
main ()
perl_run ()
Perl_runops_standard ()
Perl_pp_flock ()
__flock ()
So I would say that it's less useful than strace, since it's almost impossible to know which of the flock()s
was called if there are more than one in the code, something that is strace solves by showing the sequence of the system calls that are being executed,
so using the sequence we can locate the corresponding lines in the code.
(META: the above is wrong - you can ask to display the previous command! What is it?)
For your information, when you attach to a running process with debugger,
the program stops its executing and the control over the program is being
passed to a debugger, so you can continue the normal program run with continue command or to execute it step by step with next and step commands you type at the gdb
prompt. (next steps over any function calls in the line, while
step steps into them).
C/C++ debuggers is a very large topic and I wouldn't discuss it in the
scope of this document, but a gdb man page is quite a good document to
start with. You might want also to check the ddd (Data Display Debbuger) which provides a visual interface to gdb and other debuggers. It even knows to debug perl programs!!!
For a completeness let's see the gdb trace of the httpd process that still
hangs in the while(1) loop of the first example in this section.
% gdb /usr/local/apache/bin/httpd 1005
(gdb) where
#0 0x4014a861 in __libc_nanosleep ()
#1 0x4014a7ed in __sleep (seconds=1) at ../sysdeps/unix/sysv/linux/sleep.c:78
#2 0x8122c01 in Perl_pp_sleep ()
#3 0x812b25d in Perl_runops_standard ()
#4 0x80d3721 in perl_call_sv ()
#5 0x807a46b in perl_call_handler ()
#6 0x8079e35 in perl_run_stacked_handlers ()
#7 0x8078d6d in perl_handler ()
#8 0x8091e43 in ap_invoke_handler ()
#9 0x80a5109 in ap_some_auth_required ()
#10 0x80a516c in ap_process_request ()
#11 0x809cb2e in ap_child_terminate ()
#12 0x809cd6c in ap_child_terminate ()
#13 0x809ce19 in ap_child_terminate ()
#14 0x809d446 in ap_child_terminate ()
#15 0x809dbc3 in main ()
#16 0x400d3cb3 in __libc_start_main (main=0x809d88c <main>, argc=1,
argv=0xbffff7e4, init=0x80606f8 <_init>, fini=0x812b33c <_fini>,
rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffff7dc)
at ../sysdeps/generic/libc-start.c:78
Just as before we can see a complete trace of the last executed call.
As you noticed I still didn't provide the promised explanation of the
reason, the hanging in while(1) loop request processing wasn't aborted by Apache. The next section covers
the case.
#=head1 Examples of strace (or truss) usage
#(META: below are some snippets of strace outputs from list's emails)
#[there was a talk about Streaming LWP through mod_perl and the topic #was suggested optimal buffer size]
#Optimal buffer size depends on your system configuration, watch #apache
with strace -p (or truss) when its sending a static file, here #perlfunc.pod on my laptop (linux
2.2.7):
# writev(4, [{``HTTP/1.1 200 OK\r\nDate: Wed, 02''..., 289},
{``=head1 # NAME\n\nperlfunc - Perl b''..., 32768}], 2) = 33057 #
alarm(300) = 300 # write(4, ``m. In older
versions of Perl, i''..., 32768) = 32768 # alarm(300) = 300 #
write(4, ``hout waiting for the user to hit''..., 32768) =
32768 # alarm(300) = 300 # write(4,
``>&STDOUT'') || die ``Can't dup ''..., 32768) = 32768 #
alarm(300) = 300 # write(4, ``LEHANDLE is
supplied. This has ''..., 32768) = 32768 # alarm(300) = 300 #
write(4, ``ite,\nseek, tell, or eo''..., 25657) = 25657
When a user presses STOP or RELOAD buttons, Apache detects this event via a SIGPIPE signal (Broken pipe) and ceases the script execution and performs all the
cleanup stuff it has to do. It's important to stress the point that SIGPIPE will be triggered only when a process, that handles the connection that
went broken, will attempt to send some data to the client (browser). If the
script is doing some lengthy operation, without writing a thing to the
client, it wouldn't be stopped until before the operation is completed and
at least one character was sent back to the client.
This will work for apache >= 1.3.6, where it will not catch SIGPIPE anymore and modperl will do it much better. Here is a snippet from a Apache 1.3.6 CHANGES file.
*) SIGPIPE is now ignored by the server core. The request write routines (ap_rputc, ap_rputs, ap_rvputs, ap_rwrite, ap_rprintf, ap_rflush) now correctly check for output errors and mark the connection as aborted. Replaced many direct (unchecked) calls to ap_b* routines with the analogous ap_r* calls. [Roy Fielding]
Since Apache version 1.3.6:
$r->print returns true on success, false on failure (broken connection).
If you want the old SIGPIPE semanics, simply configure:
PerlFixupHandler Apache::SIG
Let's use the knowledge we have acquired before to trace the execution of the code and see all the events as they are happening.
Let's take a little script that obviously ``hangs'' the server:
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$i++;
sleep 1;
}
The script gets a request object $r by shift()ing it from the @_
argument list passed by the handler() subroutine. (The magic
is being done by Apache::Registry of course). Then the script sends a
Content-type header, saying to the client that we are going to send a plain text.
We print out a single line telling us the number of the process that handles this request, which we need to know in order to run the tracing utility. Then we flush Apache's buffer, since if we don't we would never see the line printed. That's because the length of the output we print is very small and the buffer wouldn't be flushed before it becomes full or the request is over. Since our script intentionally hangs, we have to enforce the buffer to get flushed.
Then we enter a never ending while(1) loop, which all it does is incrementing a dummy $i variable and sleeping for a second, before returning on the two operations
again and again.
Running strace -p PID, where PID is the process ID as printed to the browser, we see the following output
printed every second:
SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0 SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0 SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2) = 0 nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0 time([941281947]) = 941281947 time([941281947]) = 941281947
Let's leave the strace running and press the STOP button now. Anything was changed? No, the same trace printed every second.
Which means that Apache didn't detect the broken connection, which verifies
the statement that the script has to write something to trigger the SIGPIPE event.
Let's try to write that will write a NULL \0 character to the client so the detection would be possible as soon the Stop button was pressed:
while(1){
$r->print("\0");
last if $r->connection->aborted;
$i++;
sleep 1;
}
We add a print() statement to print a NULL character and then
we check whether the connection was aborted. If it was, we break from the
loop.
But if we run this script and strace on it as before, we see that it still doesn't work. What's missing is a flushing of the buffer, when we add it:
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$r->print("\0");
$r->rflush;
last if $r->connection->aborted;
$i++;
sleep 1;
}
Watch the strace's output on the running process and press the Stop button, we see:
SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0
SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0
SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2) = 0
nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0
time([941284358]) = 941284358
write(4, "\0", 1) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) ---
select(5, [4], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
time(NULL) = 941284358
write(17, "127.0.0.1 - - [30/Oct/1999:13:52"..., 81) = 81
gettimeofday({941284359, 39113}, NULL) = 0
times({tms_utime=9, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 41551400
close(4) = 0
SYS_174(0xa, 0xbffff4e0, 0xbffff454, 0x8, 0xa) = 0
SYS_174(0xe, 0xbffff46c, 0xbffff3e0, 0x8, 0xe) = 0
fcntl(18, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}
Apache detects the broken pipe as we see from this snippet:
write(4, "\0", 1) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) ---
Then stops the script, and does all the cleanup work, like access logging:
write(17, "127.0.0.1 - - [30/Oct/1999:13:52"..., 81) = 81
That's what we see in a access_log file, 17 is a file descriptor of this file in this process. We will
immediately talk about cleanups, since it's a very critical issue, with
aborted scripts. But first let's see how can we make the code more generic.
Apache::SIG comes to help us, the following script doesn't need to check for aborted
connections.
use Apache::SIG ();
Apache::SIG->set;
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$r->print("\0");
$r->rflush;
$i++;
sleep 1;
}
META: it kills the server!!! ???
Apache::SIG installs the SIGPIPE handler, that stops the script's execution for us.
If you would like to log when a request was canceled by a SIGPIPE
in your Apache access_log, you can declare Apache::SIG as a handler (any Perl*Handler will do, as long as it is run before
PerlHandler, e.g. PerlFixupHandler), and you must also define a custom LogFormat in your httpd.conf, like this:
PerlFixupHandler Apache::SIG LogFormat "%h %l %u %t \"%r\" %s %b %{SIGPIPE}e"
If the server has noticed that the request was canceled via a
SIGPIPE, then the log line will end with 1, otherwise it will just be a dash.
Now the question is what happen to the locked resources if there are any? Will they be freed or not? Since if there are not, any script using these resources and the same advisory locking scheme, will be unable to run and will hang, waiting for this resource to get free, something that would never happen.
Under mod_cgi this was a problem only if you happened to use external lock
files for lock indication, instead of using flock(). (there
are systems where flock(2) unavailable, and you can use Perl's
emulation of this function). If the script was aborted in the between lock
and unlock code and you didn't worry to write a cleanup code to remove
locks or otherwise write a code to break locks that are too old and
suspected to be dead, you are in a big trouble.
With mod_cgi you can create an END block, and put the cleanup code there:
END{
# some code that ensures that locks are removed
}
When the script is aborted, Apache will run the END blocks. But if you use flock() things are much simpler, since all opened files will be closed and all the
internally locked resources will be freed, because when the file is being
closed, the lock is being removed as well.
Things are more complex with mod_perl. Unless you explicitly
close() the files, they wouldn't be automatically closed,
since the processes don't exit upon a single request processing completion.
Let's see what problems we might encounter and possible solutions for them.
I want to make a little step aside and discuss a ``critical section'' issue before we continue.
Let's start with resource locking scheme. A schematic representation of a proper locking technique is as follows:
1. lock a resource
<critical section starts>
2. do something with the resource
<critical section ends>
3. unlock the resource
If the locking is exclusive, only one process can hold the resource at any given time, which means that all the other processes will have to wait, and this code snippet becomes a so called bottleneck. That's why the section of the code where the resource is locked is called critical and you must make it as short as possible.
In a shared locking scheme, where many processes can concurrently access the resource, it's important to keep the critical section as short as possible as well, if there are processes that sometimes want to get an exclusive lock. This code uses a shared lock, but has a non-optimized critical section:
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "filename" or die "$!";
flock $fh, LOCK_SH;
# start critical section
seek $fh, 0, 0;
my @lines = <$fh>;
for(@lines){
print if /foo/;
}
# end critical section
close $fh; # close unlocks the file
It opens the file for reading, locks and rewinds to the start, reads all the lines in and prints out the lines that include a foo string in them. Since once the file was read, we don't need it opened and locked anymore, we might close it earlier, since the loop might take some time to complete so we move it after the resource was freed:
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "filename" or die "$!";
flock $fh, LOCK_SH;
# start critical section
seek $fh, 0, 0;
my @lines = <$fh>;
# end critical section
close $fh; # close unlocks the file
for(@lines){
print if /foo/;
}
This is another very similar script, but now using a shared lock. It reads in a file and writes it back prepending a number of new text lines to a head of the file.
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# start critical section
seek $fh, 0, 0;
my @add_lines =
(
qq{Complete documentation for Perl, including FAQ lists,\n},
qq{should be found on this system using `man perl' or\n},
qq{`perldoc perl'. If you have access to the Internet, point\n},
qq{your browser at http://www.perl.com/, the Perl Home Page.\n},
);
my @lines = (@add_lines, <$fh>);
seek $fh, 0, 0;
truncate $fh, 0;
print $fh @lines;
# end critical section
close $fh; # close unlocks the file
First let's explain how the code works. I will discuss in a minute why did
I use Symbol module to generate the file handler variables.
Since we want to read the file, modify and write it back, without anyone
changing it on the way, we open it for read and write with help of +>> (you could get away with +< as well, see
perldoc -f open or perlfunc manpage for more information about open() function) and lock
it with exclusive lock. You cannot safely accomplish this task with opening
the file first for read and then reopening for write, since another process
might change the file, between the stages.
Next the code prepares the lines of text it wants to prepend to the head of
the file, and assigns them and the content of the file to
@lines array. Now when we have a data that ready to be written back to the file,
the file is being rewinded to the start with help of seek()
and truncate()d to a zero size, which is useless in our case,
but a must thing if there is a chance that the file will shrink. In our
example the file only grows. But it's better to always use
truncate(), as you never know what changes your code might
undergo in the future. This operation is not the one that you will blame
for a performance overhead.
Finally we write the data to the file and close it, which unlocks it as well. Did you notice that we created the text lines to be prepended, as close to the place of usage as possible, according to a locality of code style, which is good but it makes the critical section longer. In such a places you should sacrifice any style rules you've got used to, in order to make the critical section as short as possible. A corrected version of this script with the shorter critical section looks like:
use Fcntl qw(:flock);
use Symbol;
my @lines =
(
qq{Complete documentation for Perl, including FAQ lists,\n},
qq{should be found on this system using `man perl' or\n},
qq{`perldoc perl'. If you have access to the Internet, point\n},
qq{your browser at http://www.perl.com/, the Perl Home Page.\n},
);
my $fh = gensym;
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# start critical section
seek $fh, 0, 0;
push @lines, <$fh>;
seek $fh, 0, 0;
truncate $fh, 0;
print $fh @lines;
# end critical section
close $fh; # close unlocks the file
The difference is in preparing the text lines before the file is locked and
appending the rest of the file to the @lines array, instead of creating a new array and copying the lines that were
available before the locking time after it as in the original example.
Let's get back to the main issue of this section, which is a safe locking.
If didn't make a habit of closing all the files that you open, you will
encounter lots of troubles, unless you use the Apache::PerlRun
handler that does the cleanup for you. If you open the file but doesn't
close it, you will have a file descriptor leakage. Since a number of file
descriptors available is final, at some point you will run out of them and
your service will cease its operations.
This is bad, but you can live with this till before you run out of file
descriptors, of course this will happen much faster on a heavily used
server. But this is nothing relative to the trouble you enter yourself into
if you lock the files and forget to unlock or close them. Since
close() always unlocks the file, you don't have to explicitly
unlock files. Unlocked file will stay locked after your code finished, and
all the other scripts requesting to lock the same resource (file) will wait
indefinitely for it to become unlocked. Since it would never happen, until
the server restart time, all these processes would hang. This is the
offending code:
open IN, "+>>filename" or die "$!"; flock IN, LOCK_EX; # do something # quit without closing and unlocking the file
OK, so let's add the close():
open IN, "+>>filename" or die "$!";
flock IN, LOCK_EX;
# start critical section
# do something
# end critical section
# close and unlock the file
close IN;
Is it a safe code now? Unfortunately it is not. If user aborts the request
by pressing Stop or Reload buttons in the middle of the critical section, there is a chance that
script will be aborted in before it had a chance to close()
the file, which returns us back to the situation where we were forgetting
to close the files in first place.
What is the remedy for this poison? There are few approaches to solve this
problem. If you are running under Apache::Registry and friends handlers, the END block will perform the cleanup work for you, the same way you might use it
in the scripts running under mod_cgi or in the plain perl scripts. Just add
the cleanup code to this block and you are all safe. If you are writing
your own handlers you register_cleanup() allows you to
register code similar to the END
blocks, since END blocks will be executed only when a process exits, and not after a request
completion.
We will see a few examples later. Now I want to show a much easier safe
locking solution. The problem we have encountered, is actually lays in the
fact that file handlers like IN are global variables. If we could make them lexically scoped all our
worries would go away. You know that lexically scoped (with
my() operand) variables are being automatically destroyed when
they go out of scope, so when the program quits all the lexical variables
will be destroyed, since they leave the file scope. When the variable
holding an opened file descriptor is being destroyed, the file will be
automatically closed.
So if you use this technique to work with files, you even don't have to close the files! You still want to make sure that you close them as soon as possible if you recall the critical section discussion. In addition to this safe file handling having the file handlers lexically scoped, protect you from names collisions, e.g when you have to open more than one file, you always have to make sure you didn't use the same name somewhere else in the code and that file is might still be open. To emphasize the risk of collisions think of subroutine that opens a file for you:
sub open_file{
my $filename = shift;
open FILE, ">$filename" or die "$!";
return \*FILE;
}
my $fh1 = open_file("/tmp/x");
my $fh2 = open_file("/tmp/y");
print $fh1 "X";
print $fh2 "Y";
Obviously this code doesn't do what you think it should do. Instead of
writing a character X to /tmp/x file and Y to /tmp/y, what you see is that /tmp/x is empty and /tmp/y contains a XY
string. Why is that? Because you have used the same global variable, and
when you have called open_file() for a second time, it opened
a different file using the same variable. Since open_file()
returns a reference to a file handler and it's the same global variable all
the time -- both $fh1 and $fh2 point to it.
However, as you just saw we can generate unique file handlers that can be
lexically scoped with Symbol module. Symbol::gensym()
creates an anonymous glob and returns a reference to it. Such a glob
reference can be used as a file or directory handle.
use Symbol; my $fh = gensym; open $fh, "+>>filename" or die "$!"; flock $fh, LOCK_EX; # do something
Now the file will be always unlocked a the end of the request's processing.
Instead of close() you might use a block:
use Symbol;
{
my $fh = gensym;
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# do something
}
# the file will be automatically closed and unlocked at this point
But this is not so obvious to the reader of the code so you might want to avoid the last technique.
You can use the IO::* modules as well, such as IO::File or
IO::Dir, but these are much bigger than <Symbol> module, and worth using for
file or directory opening only if you are already using them for other
features they provide. As a matter of fact, these modules use
<Symbol> module themselves. The examples of their usage:
use IO::File; my $fh = new IO::File "> filename"; # the rest is as before
and:
use IO::Dir; my $dh = new IO::Dir "dirname";
Finally, let's see when do we need a special clean up code. As you just saw we have solved the problem of file handers by lexically scoping them. There are situation, you must write a cleanup code. A good example for this is a tied dbm file.
A reminder: dbm file is a simple database, which allows you to store pairs
of keys and values in it. As of this writing Berkeley DB is the most
advanced dbm implementation, and allows you to store key/values using the
HASH, BTREE and RECNO algorithms. (refer to a DB_File man page for more info.) DB_File module provides a Perl interface to 1.x versions of Berkeley DB. (BerkeleyDB module should handle more recent Berkeley DB versions 2 and 3)
Working with dbm files is very simple, because they are represented in Perl as a simple hash variables, with help of TIE interface, and they behave exactly like hashes. In order to access a dbm file you have to tie it first:
use Fcntl qw(O_RDWR O_CREAT);
use DB_File;
my $filename = "/tmp/mydb";
my %hash;
tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
or die "Can't tie %hash : $!";
A first argument to tie() is a hash variable, we want the dbm
file to be tied to. Following arguments are a name of the module that
provides an interface to a dbm implementation we want to use, DB_File in our case, then a filename the dbm resides in, Fcntl flags, file
permissions and finally the interface method (DB_HASH, DB_BTREE or
DB_RECNO) to be used.
From now on we use %hash to read from and write to a dbm file, like:
my $name = $hash{foo};
$hash{foo} = "Larry Wall";
The only nuance is that when we modify the hash by assigning some values,
it doesn't write the changes immediately to a file, but caches them to
improve a performance. It flushes its cache buffers when either they become
full, a sync() method is being called on its database handler
or the hash is being untied (closed). So if the program quits abnormally, a
dbm file might get corrupted.
To untie the dbm file, you simply call:
untie %hash;
To get the access to sync() method, you should retrieve the
database handler which is being returned by tie() method:
my $dbh = tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
or die "Can't tie %hash : $!";
Now you can flush the cache with:
$hash{foo} = "Larry Wall";
$dbh->sync;
Important: If you have saved a copy of the object returned from
tie(), the underlying database file will not be closed until
both the tied variable is untied and all copies of the saved object are
destroyed. We do it as follows
undef $dbh; untie %hash;
Of course, you have to lock the dbm file exactly like any other resource if some script modifies its contents. Refer to Locking dbm handlers for more info.
Ok, enough with introduction, let's get to the point. Since both
%hash and $dbh are lexically scoped variables, they always will be destroyed, no matter
whether you forgot to untie() or the request was aborted
before the untie() part.
Suppose that you want to take the benefit of mod_perl's persistent global
variables in each process and to use this feature to create persistent dbm
hashes. So you tie them only once per process, and save the time to
tie() and untie() per request. The idea is good,
assuming that you remember that you have to flush the cache buffers when
you modify the hash that represents the dbm file with sync()
method.
Let's code the idea:
use strict; use vars qw($dbh %hash); use Fcntl qw(:flock O_RDWR O_CREAT); use DB_File; use Symbol;
We declare $dbh and %hash as global variables, then pull in the
Fcntl module and import the symbols we are going to use. Actually we need only LOCK_EX from the tags provided by :flock. We pull in DB_File and Symbol modules.
my $r = shift;
$r->send_http_header('text/plain');
$r->print("PID $$\n");
Send the Content-type header of plain text type and tell the user the PID of the process that serves the request.
my $filename = "/tmp/mydb"; my $lockfile = "$filename.lock";
Configure the location of the dbm file and its lock file.
my $fh = gensym; open $fh, ">$lockfile" or die "Cannot open $lockfile: $!"; flock $fh, LOCK_EX;
Generate a unique anonymous glob, store it in a lexically scoped variable $fh and lock the file, which in turn advisory locks the dbm file which will be
safely tied now, because for the other copies of this script to access the
following code they have to acquire the lock file first, and since it's an
exclusive lock, only one replication of the script will be able to tie the
dbm file.
$dbh ||= tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
or die "Can't tie %hash : $!";
This code snippet demands some deeper explanation.
$a ||= $b;
is the same as:
$a = $a || $b;
The || check is a boolean one (testing for truth) and it doesn't care about
undefined values, since undef is false in Perl. So what it does is: leave $a unmodified if it's a true value, otherwise test $b and assign its value to $a if it's true. If it's false as well, $a stays undefined. (note that 0 and ""
(empty string) are both defined but false values!) (refer to perlop(1) manpage for more info about || operator)
Back to our tie() snippet. For each mod_perl process when this
code will be executed for the first time, $dbh variable is undefined, therefore a right part of the statement will be
executed, which will tie() the dbm file. On every consequent
code execution in the same process, $dbh will contain a database handler which is a true
value, so the tie() call will be saved.
$hash{int rand 10} = (qw(a b c d))[int rand 4];
Fill the dbm file with random keys and values. Each invocation of the code
would either generate a new key/value pair or override an old one, if an
existing key will be chosen by rand().
$dbh->sync();
The most important part of the code is to flush the modifications.
# unlock the db close $fh;
Now it's safe to unlock the dbm file. Please refer to Locking dbm handlers to learn why you should use a dbm's file descriptor to lock itself. To make long explanations short -- it may get your dbm file corrupted.
# printout the contents of the the dbm file
print map {"$_ => $hash{$_}\n"} sort keys %hash;
After we leave the critical section, we can take our time and print out the current contents of the dbm file.
Here is the same code in one piece:
use strict;
use vars qw($dbh %hash);
use Fcntl qw(:flock O_RDWR O_CREAT);
use DB_File;
use Symbol;
my $r = shift;
$r->send_http_header('text/plain');
$r->print("PID $$\n");
my $filename = "/tmp/mydb";
my $lockfile = "$filename.lock";
my $fh = gensym;
open $fh, ">$lockfile" or die "Cannot open $lockfile: $!";
# must lock the db file before opening it
flock $fh, LOCK_EX;
$dbh ||= tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
or die "Can't tie %hash : $!";
# fill the dbmfile with random keys values
$hash{int rand 10} = (qw(a b c d))[int rand 4];
# sync the DB
$dbh->sync();
# unlock the db
close $fh;
# printout the contents of the the dbm file
print map {"$_ => $hash{$_}\n"} sort keys %hash;
Well, if you run this code, you pretty soon figure out that this code
doesn't do what we thought it would. What happens is that each process
keeps its own copy of the %hash and modifies it. When the process calls sync() method, the dbm
file is being updated and now equal to the contents of the %hash of this process. If the next request will be processed by the process that
didn't yet tie()d the %hash it would be initialized to the value of the %hash of the last process that called sync() on this dbm file, but
if it would be handled by a process that already tied %hash before it wouldn't read the contents from the dbm file but use its private
value of the %hash.
In reality things are even more complicated. The above scenario is true
only when the hash file is smaller than a buffer size of the dbm file, when
it becomes bigger than buffer, its contents are being flushed. So when you
do keys %hash, all the keys should be brought from the dbm file, which causes the
process to read the values saved by the previous sync() calls
and buffer overflow automatic flushes. Which creates a whole big mess with
data and makes the whole idea unreal and useless.
But if we have arrived so far, let's see what other thing is flawed in this
code. It's the sync() call. If script is being stopped before
sync() called, the dbm will be unlocked, since $fh is lexically scoped, but it wouldn't be properly sync()ed,
which at some point will corrupt the dbm file.
The solution is quite simple -- write an END block to sync the file:
END{
# make sure that the DB is flushed
$dbh->sync();
}
The above will work only for Apache::Registry scripts, otherwise the END will be postponed till the process termination time. If you write a handler
in Perl API use the register_cleanup() method instead. It accepts a reference to a subroutine as an argument:
$r->register_cleanup(sub { $dbh->sync() });
Even a more correct code would be to check whether the connection was aborted, since you if you don't check -- the cleanup code will be always executed, which can be an unwanted thing for a normally finished scripts.
$r->register_cleanup
(sub {
$dbh->sync() if Apache->request->connection->aborted();
});
So in the case of END block usage you would use:
END{
# make sure that the DB is flushed
$dbh->sync() if Apache->request->connection->aborted();
}
Note that if you use register_cleanup() it should be used at the beginning of the script, or as soon as variables
you want to use in this code becomes available. If you use it at the end of
the script, and script is being aborted before this code is reached, there
will be no cleanup performed.
For example CGI.pm registers the cleanup subroutine in its new() method:
sub new {
# code snipped
if ($MOD_PERL) {
Apache->request->register_cleanup(\&CGI::_reset_globals);
undef $NPH;
}
# more code snipped
}
There is also another way to register a cleanup code for Perl API handlers.
You may use a PerlCleanupHandler in the configuration file, like:
<Location /foo>
SetHandler perl-script
PerlHandler Apache::MyModule
PerlCleanupHandler Apache::MyModule::cleanup()
Options ExecCGI
</Location>
where Apache::MyModule::cleanup() is supposed to perform a cleanup.
A similar situation to Pressed Stop button disease happens when client (browser) timeouts the connection (is it about 2
minutes?) . There are cases when your script is about to perform a very
long operation and there is a chance that its duration will be longer than
the client's timeout. One case I can think about is the DataBase
interaction, where the DB engine hangs or needs a lot of time to return
results. If this is the case, use $SIG{ALRM} to prevent the timeouts:
$timeout = 10; # seconds
eval {
local $SIG{ALRM} =
sub { die "Sorry timed out. Please try again\n" };
alarm $timeout;
... db stuff ...
alarm 0;
};
die $@ if $@;
But, as lately it was discovered local $SIG{'ALRM'} does not restore the original underlying C handler. It was fixed in the
mod_perl 1.19_01 (CVS version). As a matter of fact none of the
local $SIG{FOO} restore the original C handler - read Debugging Signal Handlers ($SIG{FOO}) for a debug technique and a possible workaround.
This is a very useful feature. You can watch what happens to the Perl parts of the server. Here are the instructions for configuring and using this feature:
Add this to http.conf:
<Location /perl-status>
SetHandler perl-script
PerlHandler Apache::Status
order deny,allow
#deny from all
#allow from
</Location>
If you are going to use Apache::Status it's important to put it as the first module in the start-up file, or in
the httpd.conf:
# startup.pl use Apache::Registry (); use Apache::Status (); use Apache::DBI ();
If you don't put Apache::Status before Apache::DBI, you wouldn't get Apache::DBI's menu entry in status.
For more about Apache::DBI see Persistent DB Connections.
Assuming that your mod_perl server listens on port 81, fetch http://www.myserver.com:81/perl-status
Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16 process 187138, running since Thu Nov 19 09:50:33 1998
Below all sections should be links:
Signal Handlers Enabled mod_perl Hooks PerlRequire'd Files Environment Perl Section Configuration Loaded Modules Perl Configuration ISA Tree Inheritance Tree Compiled Registry Scripts Symbol Table Dump
Let's follow, for example, PerlRequire'd Files. We see:
PerlRequire Location /home/perl/apache-startup.pl /home/perl/apache-startup.pl
From some menus you can continue deeper to peek into the internals of the server, to see the values of the global variables in the packages, to the cached scripts and modules, and much more. Just click around...
Sometimes when you fetch /perl-status and follow the Compiled
Registry Scripts you see no listing of scripts at all. This is absolutely correct: Apache::Status shows the registry scripts compiled in the httpd child which is serving
your request for /perl-status. If a child has not compiled yet the script
you are asking for, /perl-status will just show you the main menu.
See Sometimes it Works Sometimes it does Not
When the code doesn't perform what it's expected to, either never or just sometimes we say that this code requires debugging. There are a few levels of debug complexity.
The basic level is when perl terminates the program in the interpretation (compilation) stage before it started to run. Usually that happens when either there are syntax errors or some module is missing. Sometimes it takes an effort to solve this task, since code that uses Apache CORE modules generally wouldn't compile when executed from shell. We will learn how to solve syntax problems in mod_perl code quite easily.
Once the program compiles and begins to run, there might be logical
(algorithmic) problems, when the program doesn't do the right thing you
programmed it to do. This is somewhat harder to solve, especially when
there is a lot of code that need to be observed and reviewed, but it's just
a matter of time. Perl helps a lot to locate typos when you enable to
warnings, for example it warns you about places when you wanted to compare
to numbers, but omitted the second '=' character, so you end up with
something like if $yes = 1 instead of if $yes ==
1.
The next level is when the program does what it expected to most of the
time, but occasionally it misbehaves, but doing something different. An
observation of the code generally doesn't help, and either
print() statements or perl debugger come to help. Many times
it's quite easy to debug with print(), but sometimes the
overhead of typing the debug messages can be very tedious, especially when
you didn't yet spot the lines where the bug happens to hide. That's where a
perl debugger comes to help.
While print() statements are always work, running the perl
debugger for CGI scripts, might be quite a challenge. But with a right
knowledge and tools in hand the debug process becomes much easier.
Unfortunately there is no way to easy the debug of the program itself, as
it depends on the code you wrote, and it can be quite a nightmare to debug
a really complex code.
The worst thing you can think of, is when the process terminates in the middle of a request processing and dumps core. Operating system dumps core (read: creates a file called core in directory the process was running at) when the program tries to access a memory area that doesn't belong to it, which generally happens when there is a bug. This is something that you would almost never see with plain perl scripts, but can easily happen if you use modules whose guts are written in C or C++ and something goes wrong with them. Occasionally there is a bug in underlying C code of mod_perl itself, that was in a deep slumber before your code waked it up.
In the following sections we would go in details through each of the presented problems, thoroughly discuss them and present a few techniques to solve them.
While developing code, many times we do some syntax mistakes, like forgetting to put a semicolon at the end of statement ([S] unless it's an end of a block, where it's not required, but better if used since there is a chance that you will add more code at the end, and when you do, you might forget to add the missing semicolon.[/S]), comma in the list ([S] for the same reason, more items might be added to the list and perl has no problem when you finish the list with comma unlike other languages.[/S]) or else.
One of the approaches to locate the syntactically incorrect code, is to
execute the script from shell with -c flag that only validates the syntax but wouldn't run the code (Actually, it
will execute
BEGIN, END blocks, and use() calls, because these are considered as occurring outside the execution of
your program. Also it's a good idea to add -w switch to enable the warnings:
perl -cw test.pl
When executed and there are errors in the code, perl will report about the errors and the appropriate line numbers in the script.
Next step is to execute the script, since besides syntax errors there are run time errors, these are the errors that cause the "Internal Server Error" when executes from the browser. With plain CGI scripts it's the same as running a plain perl scripts -- just execute it and see that they work.
However the whole thing is quite different with scripts that use Apache::* modules which can be used only from within the mod_perl server, since they rely on the code and circumstances , which aren't available when you attempt to execute the script from shell, since there is no Apache request object available to the code.
If you have problems with code, you can either watch the errors and
warnings as they are logged to error_log file when you make a request to the script from the browser, or use an
Apache::FakeRequest module written by Doug MacEachern and Andrew Ford.
Apache::FakeRequest is used to set up an empty Apache request object that can be used for
debugging. The Apache::FakeRequest
methods just set internal variables of the same name as the method and
return the value of the internal variables. Initial values for methods can
be specified when the object is created. The print method prints to STDOUT.
Subroutines for Apache constants are also defined so that using
Apache::Constants while debugging works, although the values of the constants are hard-coded
rather than extracted from the Apache source code.
Let's write a very simple module, which prints "OK" to the client's browser:
package Apache::Example;
use Apache::Constants;
sub handler{
my $r = shift;
$r->send_http_header('text/plain');
print "You are OK ", $r->get_remote_host, "\n";
return OK;
}
1;
You cannot debug this module unless you configure the server to call its
handler from some location. But with help of
Apache::FakeRequest you can write a little script that will emulate a request and return the
expected output.
#!/usr/bin/perl
use Apache::FakeRequest ();
use Apache::Example ();
my $r = Apache::FakeRequest->new('get_remote_host'=>'www.foo.com');
Apache::Example::handler($r);
when you execute the script from the command line, you will see the following output:
You are OK www.foo.com
Apache::Registry, Apache::PerlRun and modules that compile-via-eval confuse the line numbering. Modules that
are read normally by Perl from disk have no problem with file name/line
number.
If you compile with the experimental PERL_MARK_WHERE=1, it shows you almost the exact line number, where this is happening. Generally a compiler makes a shift in its line counter. You can always stuff your code with special compiler directives, to reset its counter to the value you will tell. At the beginning of the line you should write (the '#' in column 1):
#line 298 myscript.pl or #line 890 some_label_to_be_used_in_the_error_message
The label is optional - the filename of the script will be used by default. This specifies the line number of the following line, not the line the directive is on. You can use a little script to stuff every N lines of your code with these directives, but then you will have to rerun this script every time you add or remove code lines. The script:
#!/usr/bin/perl
# Puts Perl line markers in a Perl program for debugging purposes.
# Also takes out old line markers.
die "No filename to process.\n" unless @ARGV;
my $filename = $ARGV[0];
my $lines = 100;
open IN, $filename or die "Cannot open file: $filename: $!\n";
open OUT, ">$filename.marked"
or die "Cannot open file: $filename.marked: $!\n";
my $counter = 1;
while (<IN>) {
print OUT "#line $counter\n" unless $counter++ % $lines;
next if $_ =~ /^#line /;
print OUT $_;
}
close OUT;
close IN;
chmod 0755, "$filename.marked";
Also notice, that another solution is to move most of the code into a separate modules, which ensures that the line number will be reported correctly.
To have a complete trace of calls add:
use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;
The universal debugging tool across nearly all platforms and programming
languages is the printf() or equivalent output function, which
can send data to the console, a file, application window and so on. In perl
we generally use the print() function. With an idea of where
and when the bug is triggered, a developer can insert print()
statements in the source code to examine the value of data at certain
points of execution.
However, it is rather difficult to anticipate all possible directions a program might take and what data to suspect of causing trouble. In addition, inline debugging code tends to add bloat and degrade performance of an application. So you have to comment out or remove the debug printings when you think that you have solved the problem, but if later you discover that you need to debug the same code again you need in the best case to uncomment the debug code lines or write them from scratch.
Let's see a few examples where we use print() to debug some
problem. In one of my applications I wrote a function that returns the date
that was a week ago. Here it is:
print "Content-type: text/plain\n\n";
print "A week ago date was ",date_a_week_ago(),"\n";
# return a date one week ago as a string in format: MM/DD/YYYY
####################
sub date_a_week_ago{
my @month_len = (31,28,31,30,31,30,31,31,30,31,30,31);
my ($day,$month,$year) = (localtime)[3..5];
for (my $j = 0; $j < 7; $j++) {
$day--;
if ($day == 0) {
$month--;
if ($month == 0) {
$year--;
$month = 12;
}
# there are 29 days in February in a leap year
$month_len[1] =
(($year % 4 or $year % 100 == 0) and $year % 400 )
? 28 : 29;
# set $day to be the last day of the previous month
$day = $month_len[$month - 1];
} # end of if ($day == 0)
} # end of for ($i = 0;$i < 7;$i++)
return sprintf "%02d/%02d/%04d",$month,$day,$year+1900;
}
This code is pretty straightforward. Get today's date and subtract one from the value of the day we get, updating on the way the month and the year if the boundaries are being crossed (end of month, end of year). Do it seven times in loop, and at the end you should get a date that was a week ago.
Note that since locatime() returns year as a value of
current_four_digits_format_year-1900, which means that we don't have a
century boundary to worry about, since if we are in the middle of the first
week of the year 2000, the value of year returned by
localtime() would be 100 and not 0 as you mistakenly might assume. So when the code does $year-- it becomes 99 and not -1. At the end we add 1900 and get back a correct four digit year format.
Also note that we have to cover the case of the leap year, where there are 29 days in the February. For the rest of months we have prepared an array with month lengths.
Now when we run this code and check the result, we see that something is
wrong. For example if today is 10/23/1999 and we expect the above code to print 10/16/1999, it prints: 09/16/1999, which means that we have lost a month, therefore the above code is buggy.
Let's stuff a few debug print() statements in the code near
the
$month variable:
sub date_a_week_ago{
my @month_len = (31,28,31,30,31,30,31,31,30,31,30,31);
my ($day,$month,$year) = (localtime)[3..5];
print "[set] month : $month\n";
for (my $j = 0; $j < 7; $j++) {
$day--;
if ($day == 0) {
$month--;
if ($month == 0) {
$year--;
$month = 12;
}
print "[loop $i] month : $month\n";
# there are 29 days in February in a leap year
$month_len[1] =
(($year % 4 or $year % 100 == 0) and $year % 400 )
? 28 : 29;
# set $day to be the last day of the previous month
$day = $month_len[$month - 1];
} # end of if ($day == 0)
} # end of for ($i = 0;$i < 7;$i++)
return sprintf "%02d/%02d/%04d",$month,$day,$year+1900;
}
When we run it we see:
[set] month : 9
Which is supposed to be the number of current month (10), when it actually is not. We have spotted a bug, since the only code that
sets the $month variable consists of a call to localtime(). So did we find a
bug in Perl? let's look at the man page of the localtime()
function: % perldoc -f localtime Converts a time as returned by the time
function to a 9-element array with the time analyzed for the local time
zone. Typically used as follows:
# 0 1 2 3 4 5 6 7 8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime(time);
All array elements are numeric, and come straight out of a struct tm. In particular this means that C<$mon> has the range C<0..11> and C<$wday> has the range C<0..6> with Sunday as day C<0>. Also, C<$year> is the number of years since 1900, that is, C<$year> is C<123> in year 2023, and I<not> simply the last two digits of the year. If you assume it is, then you create non-Y2K-compliant programs--and you wouldn't want to do that, would you? [more info snipped]
Which reveals us that we are supposed to increment a value of <$month>, if we want to count months from 1 to 12 and not 0 to 11. Among
other interesting facts about locatime() we also see an
explanation about $year, which as I've mentioned before is being set to the number of years since
1900.
Thus we have found the bug in our code and learned new things about
localtime(). To correct the above code we just add a month's
increment after we call localtime():
my ($day,$month,$year) = (localtime)[3..5];
$month++;
META: continue (unfinished)!!!
Now let's see some code including conditional and loop statements.
for my $i (1..31)
if( $day > 20) {
}
Sometimes you need to peek into a complex data structures, and trying to
print them out can be a non-easy task. That's where Data::Dumper
comes to a resque. For example if we create this complex data structure:
$data =
{
array => [qw(a b c d)],
hash => {
foo => "oof",
bar => "rab",
},
};
How do we print it out? Very easily:
use Data::Dumper; print Dumper \$data;
What we get is a pretty printed $data:
$VAR1 = \{
'hash' => {
'foo' => 'oof',
'bar' => 'rab'
},
'array' => [
'a',
'b',
'c',
'd'
]
};
While writing this example I made a mistake and wrote qw(a b c d)
instead of [qw(a b c d)], when I pretty printed the contents of
$data I immediately saw my mistake:
$VAR1 = \{
'b' => 'c',
'd' => 'hash',
'HASH(0x80cd79c)' => undef,
'array' => 'a'
};
That's not what I wanted of course, I've spotted the bug and corrected it, as you saw in the original example from above.
META: rewrite: blabla about -- very hard to find bugs and even understand the code below because of its obscurity. The example from the previous section is hard to debug too, because there is too much redundancy in it, you should develop a good coding style by creating a concise code but which is easy to understand (See the example below)...
it's much easier to find bugs
A shrinked version of the main loop, that wouldn't add for easier code understanding looks like:
for (0..7) {
next if --$day;
$year--,$month=12 unless --$month;
$day = $month != 1 ? $month_len[$month-1] : $year % 4 ? 28 : 29;
}
Don't do that at home :)
Why did I actually present the latter version? The shrinked version is too obfuscated, which makes it not easy to understand and maintain. From the other hand part of this code is easier to understand.
Larry Wall, the author of Perl, is a linguist, so he tried to define the syntax in a way that will make coding in Perl much like in English. So it can be a very good idea to learn perl coding idioms, which might seem inconvenient in the beginning but once you get used to them, you will not understand how could you live without them before. I'll show just a few of most used perl coding style idioms. It's a good idea to write the code that more readable but avoid redundancy, like instead of writing:
if ($i == 0) ...
it's better to write:
unless ($i)
Use a much more concise perlish style:
for my $j (0..7) {
instead of a syntax you've got used from other languages:
for (my $j=0; $j<7; $j++) {
it's much simpler to write and comprehend the code like:
print "something" if $debug;
rather than:
if($debug){
print "something";
}
A good style that improves understanding, readability and reduces a chance to have a bug is shown below in a form of yet another rewrite of the original version of the code:
for (0..7) {
$day--;
next if $day;
$month--;
unless ($month){
$year--;
$month=12
}
if($month == 1){
$day = $year % 4 ? 28 : 29;
} else {
$day = $month_len[$month-1];
}
}
which is a gold middle between the too verbose style as in the first example and too obfuscated second example.
And of course a two liner, which is much faster and easier to understand is:
sub date_a_week_ago{
my ($day,$month,$year) = (localtime(time-604800))[3..5];
return sprintf "%02d/%02d/%04d",$month+1,$day,$year+1900;
}
Just take the current date in seconds since epoch as time()
returns, subtract a week in seconds (7*24*60*60 = 604800) and feed the
result to localtime() - voila we've got a date from a week
ago!
Why the last version is important, when the first one works just fine? Not because of performance issues, and the last one is a twice faster, but because the are more chances that you have a bug in the first version, than in the last one.
As we saw it's almost always possible to debug code with help of print(). However,
it is rather difficult to anticipate all possible directions a program
might take and what data to suspect of causing trouble. In addition, inline
debugging code tends to add bloat and degrade performance of an
application. Although, most applications offer inline debugging as a
compile time option to avoid these hits. In any case, this information
tends to only be useful to the programmer who added the trace statement in
first place.
Sometimes you have to debug tens of thousands lines Perl application and
while you can be a very experienced Perl programmer and can understand Perl
code quite well by just looking at it, no mere mortal can begin to
understand what will actually happen in such a large application, until the
code is running. You just don't know where to start adding trusty
print() statements to see what is happening inside.
The most effective way to track down a bug is running the program with an interactive debugger. The majority of programming languages have such a tool available that allows one to see what is happening inside an application while it is running. Basic features of an interactive debugger allow you to: <ul>
Stop at a certain point in the code, based on a routine name or specific source file and line number
Stop at a certain point in the code, based on specific conditions such as the value of a given variable
Perform an action without stopping, based on the same criteria above
View and modify the value of variables at any given point
Provide context information such as stack traces and source windows
It does take practice to learn the most effective ways of using an interactive debugger, but the time and effort will be paid back many-fold in the long run.
Most C and C++ programmers are familiar with the interactive GNU debugger (gdb). gdb is a stand-alone program that requires your code to be compiled with
debugging symbols to be useful. While gdb
can be used to debug the perl interpreter program itself, it cannot be used
to debug your own Perl programs. Not to worry, Perl provides its own
interactive debugger, called perldb. Giving control of your Perl program to the interactive debugger is simply
a matter of specifying the -d command line switch. When this switch is used, Perl will insert debugging
hooks into the program syntax tree, but leaves the actual job of debugging
to a Perl module outside of the perl binary program itself.
I will start by introducing a few basic concepts and commands of the Perl interactive debugger. These basic warm up examples are all run from the command line, outside of the mod_perl, but are all still relevant once we do go inside Apache.
You may want to keep the the perldebug manpage handy for reference while reading this section and for future debugging sessions on your own.
The interactive debugger will attach to the current terminal and present you with a prompt just before the first program statement is executed. For example:
% perl -d -le 'print "mod_perl rules the world"'
Loading DB routines from perl5db.pl version 1.0402
Emacs support available.
Enter h or `h h' for help.
main::(-e:1): print "mod_perl rules the world"
DB<1>
The source line shown is that which Perl is about to execute, the
next command (or just n) will cause this line to be executed and stop again right before the next
line:
main::(-e:1): print "mod_perl rules the world"
DB<1> n
mod_perl rules the world
Debugged program terminated. Use q to quit or R to restart,
use O inhibit_exit to avoid stopping after program termination,
h q, h R or h O to get additional info.
DB<1>
In this case, our example code is only one line long, so we are done interacting after the first line of code is executed. Let's try again with a bit longer example which is the following script:
my $word = 'mod_perl'; my @array = qw(rules the world); print "$word @array\n";
Save the script in a file named domination.pl and run with the
-d switch:
% perl -d domination.pl
main::(domination.pl:1): my $word = 'mod_perl';
DB<1> n
main::(domination.pl:2): my @array = qw(rules the world);
DB<1>
At this point, the first line of code has been executed and the variable $word has been assigned to the value mod_perl. We can check that assumption by using the p command (a shortage for the
print, the two are interchangeable):
main::(domination.pl:2): my @array = qw(rules the world);
DB<1> p $word
mod_perl
The print command works just like the Perl's builtin print() function,
but adds a trailing newline and outputs to the $DB::OUT
file handle, which is normally opened to the terminal where perl was
launched from. Let's carry on:
DB<2> n
main::(domination.pl:4): print "$word @array\n";
DB<2> p @array
rulestheworld
DB<3> n
mod_perl rules the world
Debugged program terminated. Use q to quit or R to restart,
use O inhibit_exit to avoid stopping after program termination,
h q, h R or h O to get additional info.
Ouch, p @array printed rulestheworld and not rules the world, as you might expect it to, but it's absolutely normal. If you print an
array without expanding it first into a string it would be printed without
adding spaces (or other content of the $" variable, otherwise known as $LIST_SEPARATOR if English pragma is being used.) between the members of the array. If you do:
print "@array";
you would get the rules the world output, since the default value of $" variable is a single space.
You should notice by now, there is some valuable information to the left of each executable statement:
main::(domination.pl:4): print "$word @array\n";
DB<2>
First is the current package name, in this case main::. Next is the current filename and statement line number, domination.pl and 4 in the example above. The number presented at the prompt is the
command number which can be used to recall commands in session history,
with help of ! command followed by this number. For example, !1 would repeat the first command:
% perl -d -e0
main::(-e:1): 0
DB<1> p $]
5.00503
DB<2> !1
p $]5.00503
DB<3>
Where $] is the perl's version number. As you see !1 prints the value of $], prepended by the command that was executed.
Things start to get more interesting as the code does. In the example
script below (save it in a file named test.pl) we've increased the number of source files and packages by including the
standard
Symbol module, along with invoking its gensym() function:
use Symbol (); my $sym = Symbol::gensym(); print "$sym\n";
% perl -d test.pl
main::(test.pl:3): my $sym = Symbol::gensym();
DB<1> n
main::(test.pl:5): print "$sym\n";
DB<1> n
GLOB(0x80c7a44)
First, notice the debugger did not stop at the first line of the file, this
is because use ... is a compile-time statement, not a run-time statement. Also notice, there
was more work going on, than the debugger revealed. That's because the next command does not enter subroutine calls. To step into a subroutine code use
the step
command (or s):
% perl -d test.pl
main::(test.pl:3): my $sym = Symbol::gensym();
DB<1> s
Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:86):
86: my $name = "GEN" . $genseq++;
DB<1>
Notice the source line information has changed to the
Symbol::gensym package and the Symbol.pm file. We can carry on by hitting the return key at each prompt, which
causes the debugger to repeat the last step or next command. It wouldn't repeat a
print command for example. The debugger will return out of the subroutine and
back to our main program:
DB<1>
Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:87):
87: my $ref = \*{$genpkg . $name};
DB<1>
Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:88):
88: delete $$genpkg{$name};
DB<1>
Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:89):
89: $ref;
DB<1>
main::(test.pl:5): print "$sym\n";
DB<1>
GLOB(0x80c7a44)
Our line-by-line debugging approach has served us well for this small
program, but imagine the time it takes to step through a large application
at the same pace. There are several ways to speed up a debugging session,
one of which is known as setting a
breakpoint. The breakpoint command (b) can be used for instructing the debugger to stop at a named subroutine or
at line of a given file. In this example session, we will set a breakpoint
at the
Symbol::gensym subroutine at the first prompt, telling the debugger to stop at the first
line of this routine when it is called. Rather than move along with next or step we enter the continue
command (c) which tells the debugger to execute each line without stopping until it
reaches a breakpoint:
% perl -d test.pl
main::(test.pl:3): my $sym = Symbol::gensym();
DB<1> b Symbol::gensym
DB<2> c
Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:86):
86: my $name = "GEN" . $genseq++;
Now let's pretend we are debugging a large application where
Symbol::gensym might be called in various places. When the subroutine breakpoint is
reached, the debugger does not reveal where it was called from by default.
One way to find out this information is with the Trace command (T):
DB<2> T $ = Symbol::gensym() called from file `test.pl' line 3
In this example, the call stack is only one level deep, so only that line
is printed, we'll look at an example with a deeper stack later. The
left-most character reveals the context in which the subroutine was called. $ represents a scalar context, in others you may see @ which represent a list context or .
which represents a void context. In our case we have called:
my $sym = Symbol::gensym();
which calls the Symbol::gensym() in a scalar context.
Below we've made our test.pl example a little more complex. First, we've added a My::World package declaration at the top of the script, so we are no longer working
in the main:: package. Next, we've added a subroutine named do_work() which
invokes the familiar
Symbol::gensym, along with another function called
Symbol::qualify and returns a hash reference of the results. The do_work()
routine is invoked inside a for loop which will be run twice:
package My::World;
use Symbol ();
for (1,2) {
do_work("now");
}
sub do_work {
my($var) = @_;
return undef unless $var;
my $sym = Symbol::gensym();
my $qvar = Symbol::qualify($var);
my $retval = {
'sym' => $sym,
'var' => $qvar,
};
return $retval;
}
We'll start by setting a few breakpoints and then we use List
command (L) to display them:
% perl -d test.pl
My::World::(test.pl:5): for (1,2) {
DB<1> b Symbol::qualify
DB<2> b Symbol::gensym
DB<3> L
/usr/lib/perl5/5.00503/Symbol.pm:
86: my $name = "GEN" . $genseq++;
break if (1)
95: my ($name) = @_;
break if (1)
The filename and line number of the breakpoint are displayed just before the source line itself. Since both breakpoints located at the same file -- the filename is being displayed only once. After the source line we see the condition on which to stop, in our case as the constant value 1 indicates, we will always stop at these breakpoint. Later on you'll see how to specify a certain condition.
As we see, when continue command is executed, the normal flow of the program stops at one of these
breakpoints, either on line 86 or 95 of /usr/lib/perl5/5.00503/Symbol.pm file, whichever will be reached first. As you understand the displayed code
lines are the first rows of the two subroutines from Symbol.pm. Lines that qualify to be used as breakpoints cannot be empty lines or
comments, there must be a code there.
In our example List command shows the lines the breakpoints were set on, but we cannot tell
which breakpoint belongs to which subroutine. There are two ways to find it
out. One is to run
continue command and when it stops, execute the Trace command we saw before:
DB<3> c
Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:86):
86: my $name = "GEN" . $genseq++;
DB<3> T
$ = Symbol::gensym() called from file `test.pl' line 14
. = My::World::do_work('now') called from file `test.pl' line 6
So we see that it was a Symbol::gensym. The other way is to ask for a listing of code at some lines range. For
example, let's check which subroutine line 86 is a part of. We use a list (lowercase!) command (l), which displays parts of the code. Among various arguments it accepts,
there is one that we want to use here, a lines range. Since the breakpoint
is at line 86, let's print a few lines back and forward:
DB<3> l 85-87
85 sub gensym () {
86==>b my $name = "GEN" . $genseq++;
87: my $ref = \*{$genpkg . $name};
Now we know it's gensym sub and we also see the breakpoint displayed with help of ==>b markup. We could also use the name of the sub to display its code:
DB<4> l Symbol::gensym
85 sub gensym () {
86==>b my $name = "GEN" . $genseq++;
87: my $ref = \*{$genpkg . $name};
88: delete $$genpkg{$name};
89: $ref;
90 }
The delete command (d) is used to remove certain breakpoints by specifying the line number of
the breakpoint. Let's remove the first one:
DB<5> d 95
The Delete command (with a capital `D') or d removes all currently installed breakpoints.
Now let's look again at the trace produced at the breakpoint:
DB<3> c
Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:86):
86: my $name = "GEN" . $genseq++;
DB<3> T
$ = Symbol::gensym() called from file `test.pl' line 14
. = My::World::do_work('now') called from file `test.pl' line 6
As you can see, the stack trace prints the values which are passed into the
subroutine. Ah, and perhaps we've found our first bug, as we can see
do_work() was called in a void context, so the return value
was lost into thin air. Let's change the for loop logic to check the return
value of do_work():
for (1,2) {
my $stuff = do_work("now");
if ($stuff) {
print "work is done\n";
}
}
In this session we will set a breakpoint at line 7 of test.pl where we check the return value of do_work():
% perl -d test.pl
My::World::(test.pl:5): for (1,2) {
DB<1> b 7
DB<2> c
My::World::(test.pl:7): if ($stuff) {
DB<2>
Our program is still small, but it is getting more difficult to understand
the context of just one line of code, the window command (w) will list the first few lines of code that surround the current line:
DB<2> w
4
5: for (1,2) {
6: my $stuff = do_work("now");
7==>b if ($stuff) {
8: print "work is done\n";
9 }
10 }
11
12 sub do_work {
13: my($var) = @_;
The arrow points to the line which is about to be executed and also
contains a 'b' indicating we have set a breakpoint at this line. The breakable lines of
code include a `:' just after the line number.
Now, let's take a look at the value of the $stuff variable with the trusty old print command:
DB<2> p $stuff HASH(0x82b89b4)
That's not very useful information. Remember, the print command works just as the built-in print() function does. The x command evaluates a given expression and prints the results in a ``pretty''
fashion:
DB<3> x $stuff
0 HASH(0x82b89b4)
'sym' => GLOB(0x826a944)
-> *Symbol::GEN0
'var' => 'My::World::now'
There, things seem to be okay, lets double check by calling
do_work() with a different value and print the results:
DB<4> x do_work('later')
0 HASH(0x82bacc8)
'sym' => GLOB(0x818f16c)
-> *Symbol::GEN1
'var' => 'My::World::later'
We can see the symbol was incremented from GEN0 to GEN1 and the variable later was qualified, as expected.
Now let's change the test program a little to iterate over a list of
arguments held in @args and print a slightly different message:
package My::World;
use Symbol ();
my @args = qw(now later);
for my $arg (@args) {
my $stuff = do_work($arg);
if ($stuff) {
print "do your work $arg\n";
}
}
sub do_work {
my($var) = @_;
return undef unless $var;
my $sym = Symbol::gensym();
my $qvar = Symbol::qualify($var);
my $retval = {
'sym' => $sym,
'var' => $qvar,
};
return $retval;
}
There are only two arguments in the list, so stopping to look at each one
isn't too time consuming, but consider the debugging pace with a large list
of 100 or so entries. It is possible to customize breakpoints by specifying
a condition. Each time a breakpoint is reached, the condition is evaluated,
stopping only if the condition is true. In the session below the window command shows breakable lines and we set a breakpoint at line 7 with the
condition $arg eq
'later'. As we continue, the breakpoint is skipped when $arg has the value of now and stops when it has the value of later:
% perl -d test.pl
My::World::(test.pl:5): my @args = qw(now later);
DB<1> w
2
3: use Symbol ();
4
5==> my @args = qw(now later);
6: for my $arg (@args) {
7: my $stuff = do_work($arg);
8: if ($stuff) {
9: print "do your work $arg\n";
10 }
11 }
==> symbol shows us the line of the code that's about to be executed.
DB<1> b 7 $arg eq 'later'
DB<2> c
do your work now
My::World::(test.pl:7): my $stuff = do_work($arg);
DB<2> n
My::World::(test.pl:8): if ($stuff) {
DB<2> x $stuff
0 HASH(0x82b90e4)
'sym' => GLOB(0x82b9138)
-> *Symbol::GEN1
'var' => 'My::World::later'
DB<5> c
do your work later
Debugged program terminated. Use q to quit or R to restart,
There are plenty more tricks left to pull from the perldb bag, but you
should understand enough about the debugger to try them on your own with
the perldebug manpage by your side. A quick online help can be reached by
typing a h command. It will display a list of most useful commands and a short
explanation of what they are doing.
Devel::ptkdb is a visual Perl debugger that uses perlTk for a user interface.
To debug plain perl script with it, invoke it as:
% perl -d:ptkdb myscript.pl
A Tk application will be loaded. Now you can do most of the debugging you did with command line standard Perl debugger, but using a simple GUI to set/remove breakpoints, browse the code, step thru it and more.
With help of ptkdb you can debug your CGI scripts running under mod_cgi. Be sure that that your web server's perl installation includes Tk package. In order to enable the debugger you should change your:
#! /usr/local/bin/perl -wT
to
#! /usr/local/bin/perl -wTd:ptkdb
You can debug scripts remotely if you're using a Unix based server and
where you are authoring the script has an Xserver. The Xserver can be
another Unix workstation, a Macintosh or Win32 platform with an appropriate
XWindows package. In your script insert the following
BEGIN subroutine:
sub BEGIN {
$ENV{'DISPLAY'} = "myHostname:0.0" ;
}
You can use either IP (123.123.123.123:0.0) or DNS convention (myhost.com:0.0). Be sure that your web server has permission to open windows on your Xserver (see the xhost manpage for more info).
Access your web page with your browser and Submit the script as normal. The ptkdb window should appear on your monitor if you
have set correctly the $ENV{'DISPLAY'} variable. At this point you can start debugging your script. Be aware that
your browser may timeout waiting for the script to run.
To expedite debugging you may want to setup your breakpoints in advance
with a .ptkdbrc file and use the $DB::no_stop_at_start
variable. NOTE: for debugging web scripts you may have to have the
.ptkdbrc file installed in the server account's home directory (~www) or whatever
username your webserver is running under. Also try installing a .ptkdbrc file in the same directory as the target script.
META: insert snapshots of ptkdb screen
To debug scripts running under mod_perl either use Apache::DB (interactive Perl debugging) or an older non-interactive method as described below.
NonStop debugger option enables us to get some decent debug info when running under
mod_perl. For example, before starting the server:
% setenv PERL5OPT -d % setenv PERLDB_OPTS "NonStop=1 LineInfo=db.out AutoTrace=1 frame=2"
Now watch db.out for line:filename info. This is most useful for tracking
those core dumps that normally leave us guessing, even with a stack trace
from gdb. db.out will show you what Perl code triggered the core. 'man
perldebug' for more PERLDB_OPTS. Note, Perl will ignore PERL5OPT if PerlTaintCheck is On.
Now we'll turn to looking at how the interactive debugger is used in a
mod_perl environment. The Apache::DB module available from CPAN provides a wrapper around perldb for debugging Perl code running under mod_perl.
The server must be run in non-forking mode to use the interactive debugger,
this mode is turned on by passing the -X flag to httpd executable. It is convenient to use an IfDefine section around the
Apache::DB configuration, the example below does this using the name PERLDB. With this setup, debugging is only turned on when starting the server
with httpd -D PERLDB command.
This section should be at the top of your perl configuration section of the
configuration file, before any Perl code is pulled in, so debugging symbols
will be inserted into the syntax tree, triggered by the call to Apache::DB->init. The Apache::DB::handler can be configured using any of the Perl*Handler directives, in this case we use a PerlFixupHandler so handlers in the response phase will bring up the debugger prompt:
<IfDefine PERLDB>
<Perl>
use Apache::DB ();
Apache::DB->init;
</Perl>
<Location />
PerlFixupHandler Apache::DB
</Location>
</IfDefine>
Since we have used / as an argument to Location directive, the debugger will be invoked for any kind of requests (even for
static objects (images, static documents), but of course it would
immediately quit, unless there is some perl module registered to handle
these static objects).
In our first example, we will debug the standard Apache::Status
module, which is configured like so:
PerlModule Apache::Status
<Location /perl-status>
PerlHandler Apache::Status
SetHandler perl-script
</Location>
When the server is started with the debugging flag, a notice will be printed to the console:
% httpd -X -D PERLDB [notice] Apache::DB initialized in child 950
The debugger prompt will not be available until the first request is made,
in our case to http://localhost/perl-status. Once
we are at the prompt, all the standard debugging commands are available.
First we run the window for some context of the code being debugged, move
to the next statement after $r has been assigned to and print the request URI. If no breakpoints are set,
the continue command will give control back to Apache and the request will
finish with the
Apache::Status main menu showing up in the browser window:
Loading DB routines from perl5db.pl version 1.0402
Emacs support available.
Enter h or `h h' for help.
Apache::Status::handler(/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Status.pm:55):
55: my($r) = @_;
DB<1> w
52 }
53
54 sub handler {
55==> my($r) = @_;
56: Apache->request($r); #for Apache::CGI
57: my $qs = $r->args || "";
58: my $sub = "status_$qs";
59: no strict 'refs';
60
61: if($qs =~ s/^(noh_\w+).*/$1/) {
DB<1> n
Apache::Status::handler(/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Status.pm:56):
56: Apache->request($r); # for Apache::CGI
DB<1> p $r->uri
/perl-status
DB<2> c
All the techniques we saw while debugging plain perl scripts can be applied to this debugging session.
Debugging Apache::Registry scripts is somewhat different, because the handler routine does quite a bit
of work before it reaches your script. In this example, we make a request
for /perl/test.pl, which consists of this code:
use strict;
my $r = shift;
$r->send_http_header('text/plain');
print "mod_perl rules";
When a request is issued, the debugger stops at line 28 of Apache/Registry.pm. We set a breakpoint at line 140, which is the line that actually calls the script wrapper subroutine. The continue command will bring us to that line, where we can step into the script handler:
Apache::Registry::handler(/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm:28):
28: my $r = shift;
DB<1> b 140
DB<2> c
Apache::Registry::handler(/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm:140):
140: eval { &{$cv}($r, @_) } if $r->seqno;
DB<2> s
Apache::ROOT::perl::test_2epl::handler((eval 87):3):
3: my $r = shift;
Notice the funny package name, that's generated from the URI of the request
for namespace protection. The filename is not displayed, since the code was
compiled via eval(), but the print command can be used to show you $r->filename:
DB<2> n
Apache::ROOT::perl::test_2epl::handler((eval 87):4):
4: $r->send_http_header('text/plain');
DB<2> p $r->filename
/home/httpd/perl/test.pl
The line number might seem off too, but the window command will give you a better idea where you are:
DB<4> w
1: package Apache::ROOT::perl::test_2epl;use Apache qw(exit);sub handler { use strict;
2
3: my $r = shift;
4==> $r->send_http_header('text/plain');
5
6: print "mod_perl rules";
7
8 }
9 ;
The code from the test.pl file is between lines 2 and 7, the rest is the Apache::Registry magic to cache your code inside a
handler subroutine.
It will always take some practice and patience when putting together debugging strategies that make effective use of the interactive debugger for various situations. Once you do have a good strategy in mind, bug squashing can actually be quite a bit of fun!
Well as you we saw earlier you can use a ptkdb visual debugger to debug CGI scripts running under mod_cgi. It wouldn't
work for mod_perl though using the same configuration as used in mod_cgi.
We have to tweak the Apache/DB.pm module to use Devel/ptkdb.pm instead of
Apache/perl5db.pl.
Open the file in your favorite editor and replace:
require 'Apache/perl5db.pl';
with:
require 'Devel/ptkdb.pm';
Now when you use the interactive mod_perl debugger configuration from the previous section and issue a request, a ptkdb visual debugger will be loaded.
If you are debugging Apache::Registry scripts, exactly like in the terminal debugging mode example, you should go
to the line 140 or whatever line the eval { &{$cv}($r, @_) } if $r-seqno;> located and to <step in> to enter your script.
Note, that you can work with ptkdb in plain multi-server mode, so you don't
have to start the server with -X option.
META: One caveat:
* When the request is completed, ptkdb would hang. Anyone knows what code should be registered for it to exit on
completion? To replace the original Apache::DB cleanup code, as:
if (ref $r) {
$SIG{INT} = \&DB::catch;
$r->register_cleanup(sub {
$SIG{INT} = \&DB::ApacheSIGINT();
});
}
Any Perl/Tk guru to assist???
If your server crashes on startup, you need to start it under gdb and ask it to generate the stack trace.
I'll emulate a faulty server by starting a startup file with
dump() command:
startup.pl ---------- dump; 1;
and requiring this file from the httpd.conf:
PerlRequire /path/to/startup.pl
Make sure no server is running on port 80 or use an alternate config with an alternate port if you are on a production server.
% gdb /path/to/httpd (gdb) set args -X
Use:
set args -X -f /path/to/alternate/serverconfig_ifneeded.conf
if you want the server to start from an alternative configuration file.
Now run the program:
(gdb) run Starting program: /usr/local/apache/bin/httpd -X Program received signal SIGABRT, Aborted. 0x400da4e1 in __kill () from /lib/libc.so.6
At this point the server should die (because of dump()) and
when it happens we ask for a stack trace (using bt or where commands):
(gdb) where
#0 0x400da4e1 in __kill () from /lib/libc.so.6
#1 0x80d43bc in Perl_my_unexec ()
#2 0x8119544 in Perl_pp_goto ()
#3 0x8118990 in Perl_pp_dump ()
#4 0x812b2ad in Perl_runops_standard ()
#5 0x80d3a9c in perl_eval_sv ()
#6 0x807ef1c in perl_do_file ()
#7 0x807ef4f in perl_load_startup_script ()
#8 0x807b7ec in perl_cmd_require ()
#9 0x8092af7 in ap_clear_module_list ()
#10 0x8092f43 in ap_handle_command ()
#11 0x8092fd7 in ap_srm_command_loop ()
#12 0x80933e0 in ap_process_resource_config ()
#13 0x8093ca2 in ap_read_config ()
#14 0x809db63 in main ()
#15 0x400d41eb in __libc_start_main (main=0x809d8dc <main>, argc=2,
argv=0xbffffab4, init=0x80606f8 <_init>, fini=0x812b38c <_fini>,
rtld_fini=0x4000a610 <_dl_fini>, stack_end=0xbffffaac)
at ../sysdeps/generic/libc-start.c:90
If you are clueless of what this trace say, send it to the mod_perl mailing list. Make sure to include versions of apache, mod perl and perl.
In our case we already know that server is supposed to die when compiling the startup file and we can clearly see that from the trace. We always read it from its end upward:
We are in config file:
#13 0x8093ca2 in ap_read_config ()
We do require:
#8 0x807b7ec in perl_cmd_require ()
We load the file and compile it:
#6 0x807ef1c in perl_do_file () #5 0x80d3a9c in perl_eval_sv ()
dump() gets executed:
#3 0x8118990 in Perl_pp_dump ()
dump() calls __kill():
#0 0x400da4e1 in __kill () from /lib/libc.so.6
META: incomplete
mod_perl comes with a number of useful of gdb macros to ease the debug process . You will find the file with macros at mod_perl source distribution in .gdbinit file (mod_perl-x.xx/.gdbinit). You might want to modify the macros definittions.
In order to use this you need to compile mod_perl with
PERL_DEBUG=1.
To debug the server, start it :
% httpd -X
Issue a request to offending script that hangs. Find the PID number of the process that hangs.
Go to the root of the server:
% cd /usr/local/apache
Now attach to it with gdb (replace PID with actual PID number) and load the macros from .gdbinit:
% gdb /path/to/httpd PID % source /usr/src/mod_perl-x.xx/.gdbinit
Now you can start the server (httpd below is a gdb macro):
(gdb) httpd
Now run the curinfo macro:
(gdb) curinfo
It should tell you the line/filename of the offending Perl code.
Add this to the .gdbinit:
define longmess
set $sv = perl_eval_pv("Carp::longmess()", 1)
printf "%s\n", ((XPV*) ($sv)->sv_any )->xpv_pv
end
and when you reload the macros, run:
(gdb) longmess
to produce a Perl stacktrace.
$ perl -e dump
Abort(coredump)
META: should I move the Apache::StatINC here? (I think not, since it relates to other topics like reloading config files, but you should mention it here with a pointer to it)
(META: to be written)
use Apache::Debug (); Apache::Debug::dump($r, SERVER_ERROR, "Uh Oh!");
This module sends what may be helpful debugging info to the client rather that the error log.
Also, you could try using a larger emergency pool, try this instead of Apache::Debug:
$^M = 'a' x (1<<18); #260K buffer
use Carp ();
$SIG{__DIE__} = \&Carp::confess;
eval { Carp::confess("init") };
To enable mod_perl debug tracing configure mod_perl with the PERL_TRACE option:
perl Makefile.PL PERL_TRACE=1
The trace levels can then be enabled via the MOD_PERL_TRACE
environment variable which can contain any combination of:
d - Trace directive handling during configuration read s - Trace processing of perl sections h - Trace Perl*Handler callbacks g - Trace global variable handling, interpreter construction, END blocks, etc. all - all of the above
add to httpd.conf:
PerlSetVar MOD_PERL_TRACE all
For example if you want to see a trace of the PerlRequire's and PerlModule's as they are loaded, use:
PerlSetVar MOD_PERL_TRACE d
As you know you need an unstriped executable to be able to debug it. While
you can compile the mod_perl with -g (or PERL_DEBUG=1) the apache install strips the symbols.
Makefile.tmpl contains a line:
IFLAGS_PROGRAM = -m 755 -s
Removing the -s does the trick.
Current perl implementation does not restore the original apache's C
handler when you use local $SIG{FOO} clause. While save/restore of
$SIG{ALRM} was fixed in the mod_perl 1.19_01 (CVS version), other signals are not yet
fixed. The real fix should probably be in Perl itself.
Until recent local $SIG{ALRM} restored the SIGALRM handler to Perl's handler, not the handler it was in the first place
(apache's
alrm_handler()). if you build mod_perl with PERL_TRACE=1 and set the MOD_PERL_TRACE environment variable to g, you will see this in the error_log file:
mod_perl: saving SIGALRM (14) handler 0x80b1ff0 mod_perl: restoring SIGALRM (14) handler from: 0x0 to: 0x80b1ff0
If nobody touched $SIG{ALRM}, 0x0 would be the same address as the others.
If you work with signal handlers take a look at Sys::Signal module, which solves the problem:
Sys::Signal - Set signal handlers with restoration of existing C sighandler. Get it
from the CPAN.
The usage is simple, if the original code was:
eval {
local $SIG{ALRM} = sub { die "timeout\n" };
alarm $timeout;
... db stuff ...
alarm 0;
};
die $@ if $@;
If a timeout happens and SIGALRM is thrown, the alarm() will be reset, otherwise alarm 0 is reached and timer is being reset as well.
Now you would write:
use Sys::Signal ();
eval {
my $h = Sys::Signal->set(ALRM => sub { die "timeout\n" });
alarm $timeout;
... do something that may timeout ...
alarm 0;
};
die $@ if $@;
(Meta: duplication??? I've started to write about profiling somewhere in this file)
It is possible to profile code run under mod_perl with the
Devel::DProf module available on CPAN. However, you must have apache version 1.3b3 or
higher and the PerlChildExitHandler
enabled. When the server is started, Devel::DProf installs an
END block to write the tmon.out file, which will be run when the server is shutdown. Here's how to start
and stop a server with the profiler enabled:
% setenv PERL5OPT -d:DProf % httpd -X -d `pwd` & ... make some requests to the server here ... % kill `cat logs/httpd.pid` % unsetenv PERL5OPT % dprofpp
See also: Apache::DProf
Devel::Peek - A data debugging tool for the XS programmer
Let's see an example of Perl allocating buffer size only once, regardless
of my() scoping, although it will realloc() if
the size is >
SvLEN:
use Devel::Peek;
for (1..3) {
foo();
}
sub foo {
my $sv;
Dump $sv;
$sv = 'x' x 100_000;
$sv = "";
}
The output:
SV = NULL(0x0) at 0x8138008
REFCNT = 1
FLAGS = (PADBUSY,PADMY)
SV = PV(0x80e5794) at 0x8138008
REFCNT = 1
FLAGS = (PADBUSY,PADMY)
PV = 0x815f808 ""\0
CUR = 0
LEN = 100001
SV = PV(0x80e5794) at 0x8138008
REFCNT = 1
FLAGS = (PADBUSY,PADMY)
PV = 0x815f808 ""\0
CUR = 0
We can see that on subsequent calls (after the first one) $sv
already has a preallocated memory.
so, if you can afford the memory, the larger the buffer means less
brk() syscalls. if you watch that example with strace, you will only see calls to brk() in the first time through the loop. So, this is a case where you module
might want to pre-allocate the buffer for example for LWP, a file scope
lexical, like so:
package Your::Proxy; my $buffer = ' ' x 100_000; $buffer = "";
This way, only the parent has to brk() at server startup, each
child already will already have an allocated buffer, just reset to ``'',
when you are done.
Apache::Leak (derived from Devel::Leak) should help you with this task. Example:
use Apache::Leak;
my $global = "FooAAA";
leak_test {
$$global = 1;
++$global;
};
The argument to leak_test() is an anonymous sub, so you can just throw it around any code you suspect
might be leaking. Beware, it will run the code twice, because the first
time in, new SVs are created, but does not mean you are leaking, the second pass will give
better evidence. You do not need to be inside mod_perl to use it, from the
command line, the above script outputs:
ENTER: 1482 SVs new c28b8 : new c2918 : LEAVE: 1484 SVs ENTER: 1484 SVs new db690 : new db6a8 : LEAVE: 1486 SVs !!! 2 SVs leaked !!!
Build a debuggable perl to see dumps of the SVs. The simple way to have both a normal perl and debuggable perl, is to
follow hints in the
SUPPORT doc for building libperld.a, when that is built copy the
perl from that directory to your perl bin directory, but name it
dperl.
Leak explanation: $$global = 1; : new global variable created
FooAAA with value of 1, will not be destroyed until this module is destroyed.
Apache::Leak is not very user-friendly, have a look at
B::LexInfo. You'll see that what might appear to be a leak, is actually just a Perl
optimization. e.g. consider this code:
sub foo {
my $string = shift;
}
foo("a string");
B::LexInfo will show you that Perl does not release the value from $string, unless you
undef() it. this is because Perl anticipates the memory will
be needed for another string, the next time the subroutine is entered.
you'll see similar for @array length, %hash keys, and scratch areas of the pad-list for OPs such as join(), `.', etc.
Apache::Status now includes a new StatusLexInfo option.
Apache::Leak works better if you've built a libperld.a (see
SUPPORT document) and given PERL_DEBUG=1 to mod_perl's
Makefile.PL.
Running in httpd -X mode. (good only for testing during development phase).
You want to test that your application correctly handles global variables
(if you have any - the less you have of them the better, but sometimes you
just can't without them). It's hard to test with multiple servers serving
your cgi since each child has a different value for its global variables.
Imagine that you have a random()
sub that returns a random number and you have the following script.
use vars qw($num); $num ||= random(); print ++$num;
This script initializes the variable $num with a random value, then increments it on each request and prints it out.
Running this script in multiple server environments will result in
something like 1,
9, 4, 19 (number per reload), since each time your script will be served by a
different child. (On some OSes, the parent httpd process will assign all of
the requests to the same child process if all of the children are idle...
AIX...). But if you run in httpd -X
single server mode you will get 2, 3, 4, 5... (assuming that the random() returned 1 at the first call)
But do not get too obsessive with this mode, since working only in single server mode sometimes hides problems that show up when you switch to a normal (multi) server mode. Consider an application that allows you to change the configuration at run time.
Let's say the script produces a form to change the background color of the page. It's not a good design, but for the sake of demonstrating the potential problem, we will assume that our script doesn't write the changed background color to the disk, but simply changes it in memory, like:
use vars qw($bgcolor);
# assign default value at first invocation
$bgcolor ||= "white";
# modify the color if requested to
$bgcolor = $q->param('bgcolor') || $bgcolor;
So you have typed in a new color, and in response, your script prints back the html with a new color - you think that's it! It was so simple. And if you keep running in single server mode you will never notice that you have a problem...
If you run the same code in the normal server mode, after you submit the color change you will get the result as expected, but when you will call the same URL again (not reload!) chances are that you will get back the original default color (white in our case), since except the child who processed the color change request no one knows about their global variable change. Just remember that children can't share information, other than that which they inherited from their parent on their load. Of course you should use a hidden variable for the color to be remembered or store it on the server side (database, shared memory, etc).
Also note that since the server is running in single mode, if the output
returns HTML with <IMG> tags, then the load of these will take a lot of time.
When you use Netscape client while your server is running in single-process
mode, if the output returns a HTML with <IMG> tags, then the load of these will take a lot of time, since the KeepAlive
feature gets in the way. Netscape tries to open multiple connections and
keep them open. Because there is only one server process listening, each
connection has to time-out before the next succeeds. Turn off KeepAlive in httpd.conf to avoid this effect.
Also note that since the server is running in single mode, if the output
returns HTML with <IMG> tags, then the load of these will take a lot of time. If you use
Netscape while your server is running in single-process mode, HTTP's KeepAlive feature gets in the way. Netscape tries to open multiple connections and
keep them open. Because there is only one server process listening, each
connection has to time-out before the next succeeds. Turn off
KeepAlive in httpd.conf to avoid this effect while developing or you can press STOP after a few seconds (assuming you use the image size params, so the
Netscape will be able to render the rest of the page).
In addition you should know that when running with -X you will not see any control messages that the parent server normally
writes to the error_log. (Like ``server started, server stopped and etc''.)
Since
httpd -X causes the server to handle all requests itself, without forking any
children, there is no controlling parent to write status messages.
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/18/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
Here you will find instructions for downloading the software and the related documentation.
Perl is most likely already installed on your machine, but you should at least check the version you are using. It is highly recommended that you have at least Perl version 5.004. You can get the latest perl version from http://www.perl.com/ . Try the direct download link http://www.perl.com/pace/pub/perldocs/latest.html . You can get Perl documentation from the same location.
Get the latest Apache webserver and documentation from http://www.apache.org . Try the direct download link http://www.apache.org/dist/ .
Get the latest mod_perl sources and documentation from http://perl.apache.org . Try the direct download link http://perl.apache.org/dist/ .
Squid Linux 2.x Redhat RPMs : http://home.earthlink.net/~intrep/linux/
http://www.acme.com/software/thttpd/
Ask Bjoern Hansen has written a mod_proxy_add_forward.c module for Apache that sets the X-Forwarded-For field when doing a ProxyPass, similar to what Squid can do. His module is
at: http://modules.apache.org/search?id=124,
at ftp://ftp.netcetera.dk/pub/apache/mod_proxy_add_forward.c
or http://www.cpan.org/authors/id/ABH/mod_proxy_add_forward.c
complete with instructions on how to compile it in and whatnot.
http://www.hpl.hp.com/personal/David_Mosberger/httperf.html
Comes with the Apache distribution.
You will find the definite guide to load balancing techniques at the High-Availability Linux Project site -- http://www.henge.com/~alanr/ha/
More load ballancing URLs:
lbnamed - a load balancing name server written in Perl, by Roland Schemers http://www.stanford.edu/~riepel/lbnamed/ http://www.stanford.edu/~riepel/lbnamed/bof.talk/ http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html
Network Address Translation and Networks: Virtual Servers (Load Balancing) http://www.csn.tu-chemnitz.de/~mha/linux-ip-nat/diplom/node4.html#SECTION00043100000000000000
Get it from CPAN at $CPAN/authors/id/DOUGM/libapreq-x.xx.tar.gz or from http://perl.apache.org/dist/libapreq-x.xx.tar.gz . (replace x.xx with the current version)
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/04/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
This new document was born because some problems come up so often on the mailing list that should be stressed in the guide as one of the most important things to read/beware of. So I have tried to enlist them in this document. If you think some important problem that is being reported frequently on the list and covered in the guide but not included below, please tell.
See the ``my() Scoped Variable in Nested Subroutines'' section.
See Evil things might happen when using PerlFreshRestart
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/18/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
You can invest a lot of time and money into server tuning and code rewriting according the guidelines you have just learned, but your performance will be really bad if you do not take into account the hardware demands, and do not wisely choose the operating system suited for your needs. While the tips below apply to any webserver, they are written for an administrator of a mod_perl-enabled webserver
First let's talk about Operating Systems (OS). While I am personally a
Linux devotee, I do not want to start yet another OS war. Assuming this, I
will try to define what you should be looking for, then when you know what
do you want from your OS, go find it. Visit the Web sites of operating
systems you are interested in. You can gauge user's opinions by searching
relevant discussions in newsgroup and mailing list archives such as Deja -
http://deja.com and eGroups - http://egroups.com . I will leave this fan
research up to you. But I would use Linux or something from the
*BSD family.
Probably the most desired features in an OS are stability and robustness. You are in an Internet business, which does not have normal working hours, like many conventional businesses you know about (9am to 5pm). You are open 24 hours a day. You cannot afford to be off-line, for your customers will go shop at another service like yours, unless you have a monopoly :) . If the OS of your choice crashes every day or so, I would throw it away, after doing a little investigation, for there might be a reason for a system crash. Like a runaway server that eats up all the memory and disk, so you cannot blame the OS for that. Generally, people who use the OS for some time can tell you a lot about its stability.
You want an OS with a good memory management, some OSes are well known as memory hogs. The same code can use twice as much memory on one OS compared to the other. If the size of the mod_perl process is 10Mb and you have tens of these running, it definitely adds up!
Some OSes and/or the libraries (like C runtime libraries) suffer from
memory leaks. You cannot afford such a system, for you are already know
that a single mod_perl process sometimes serves thousands of requests
before itimer terminates. So if a leak occurs on every request, your memory
demands will be huge. Of course your code can be the cause of the memory
leaks as well (check out the Apache::Leak
module). Certainly, you can lower the number of requests to be served over
the process' life, but that can degrade performance.
You want an OS with good memory sharing capabilities. As you have learned, if you preload the modules and scripts at server startup, they are shared between the spawned children, at least for a part of a process' life span, since memory pages become ``dirty'' and cease to be shared. This feature can save you up a lot of memory!
If you are in a big business you are probably do not mind paying another
$1000 for some fancy OS and to get the bundled support for it.
But if your resources are low, you will look for cheaper and free OS. Free
does not mean bad, it can be quite opposite as we all either know from our
own experience or read about in news. Free OSes could have and do have the
best support you can find. It is very easy to understand - most of the
people are not rich and will try to use a cheaper or free OS first if it
does the work for them. Since it really fits their needs, many people keep
using it and eventually know it well enough to be able to provide support
for others in trouble. Why would they do this for free? For the spirit of
the first days of the Internet, when there was no commercial Internet and
people helped each other, because someone helped them in first place. I was
there, I was touched by that spirit and I will do anything to keep that
spirit alive.
But, let's get back to our world. We are living in material world, and our bosses pay us to keep the systems running. So if you feel that you cannot provide the support yourself and you do not trust the available free resources, you must pay for an OS backed by a company, and blame them for any problem. Your boss wants to be able to sue someone if the project has a problem caused by the external product that is being used in the project. If you buy a product and the company selling it, claims support, you have someone to sue. You do not have someone to sue other than getting yourself fired if you go with Open Source and it fails.
Also remember that if you spend less or zero money on OS and Software, you will be able to buy a better and stronger hardware.
You have invested a lot of time and money into developing some proprietary software that is bundled with the OS you were developing on. Like writing a mod_perl handler that takes advantage of some proprietary features of the OS and it will not run on any other OS. Things are under control, the performance is great and you sing from happiness. But... one day the company who wrote your beloved OS goes bankrupt, which is not unlikely to happen nowadays. You are stuck with their last masterpiece and no support! What you are going to do then? Invest more into porting the software to another OS...
Everyone can be hit by this mini-disaster, so it is better to check the background of the company when making your choice, but still you never know what will happen tomorrow. The OSes in this hazard group are completely developed by a single companies. Free OSes are probably less susceptible to this, for development is distributed between many companies and developers, so if a person who developed a really important part of the kernel lost interest in continuing, someone else will pick the falling flag and carry on. Of course if tomorrow some better project showed up, developers might migrate there and finally drop the development, but we are here not to let this happen.
In the final analysis, the decision is yours.
Actively developed OSes generally try to keep the pace with the latest technology developments, and continually optimize the kernel and other parts of the OS to become better and faster. Nowadays, Internet and networking in general are the hottest targets for system developers. Sometimes a simple OS upgrade to a latest stable version, can save you an expensive hardware upgrade. Also, remember that when you buy new hardware, chances are that the latest software will make the most of it. Since the existing software (drivers) might support the brand new product because of its backwards compatibility with previous products of the same family, it might not reap all the benefits of the new features. It means that you could spend much less money for almost the same functionality if you were to buy a previous model of the same product.
Since I am not fond of the idea of updating this section every day a new processor or memory type comes out, I will only hint what should you look for and suggest that sometimes the most expensive machine is not the one which provides the best performance.
Your demands are based on many aspects and components. Let's discuss some of them.
In discussion course you might meet some unfamiliar terms, here are some of them:
Clustering - a bunch of machines connected together to perform one big or many small computational tasks in a reasonable time.
Load balancing - users can remember only a name of one of your machines - namely of your server, but it cannot stand the heavy load, so you use a clustering approach, distributing the load over a number of machines. The central server, the one users access when they type the name of the service, works as a dispatcher, by redirecting requests to the rest of the machines, sometimes it also collects the results and return them to the users. One of the advantages is that you can take one of the machines down for a repair or upgrade, and your service will still work - the main server will not dispatch the requests to the machine that was taken down. I will just say that there are many load balancing techniques. (See High-Availability Linux Project for more info.)
NIC - Network Interface Card.
RAM - Random Access Memory
RAID - META
If you are building a fan site, but want to amaze your friends with a
mod_perl guest book, an old 486 machine will do it. If you are into a
serious business, it is very important to build a scalable server, so if
your service is successful and becomes popular, you get your server's
traffic doubled every few days, you should be ready to add more resources
dynamically. While we can define the webserver scalability more precisely,
the important thing is to make sure that you can add more power to your
webserver(s) without investing additional money into a
software developing (almost, you will need a software to connect your
servers if you add more of them). It means that you should choose a
hardware/OS that can talk to other machines and become a part of the
cluster.
From the other hand if you prepare for a big traffic and buy a monster to do the work for you, what happens if your service does not prove to be as successful as you thought it would be. Then you spent too much money and meanwhile there were a new faster processors and other hardware components released, so you loose again.
Wisdom and prophecy , that's all it takes :)
Everybody knows that Internet is a cash hole, what you throw in, hardly comes back. This is not always true, but there is a lot of wisdom in these words. While you have to invest money to build a decent service, it can be cheaper! You can spend as much as 10 times more money on a strong new machine, but get only a 10% improvement in performance. Remember that a four year old processor is still very powerful.
If you really need a lot of power do not think about a single strong machine (unless you have money to throw away), think about clustering and load balancing. You can probably buy 10 times more older but very cheap machines and have a 8 times more power, then purchasing only one single new machine. Why is that? Because as I mentioned before generally the performance improvement is marginal while the price is much bigger. Because 10 machines will do faster disk I/O, than one single machine, even if the disk is much faster. Yes, you have more administration overhead, but there is a chance you will have it anyway, for in a short time the machine you have just invested in will not stand the load anyway and you will have to purchase more and think how to implement load balancing and file system distribution.
Why I am so convinced? Facts! Look at the most used services on the Internet: search engines, email servers and the like -- most of them are using a clustering approach. While you may not always notice that, they do it by hiding the real implementation behind the proxy servers.
You have the best hardware you can get, but the service is still crawling. Make sure you have a fast Internet connection. Not as fast as your ISP claims it to be, but fast as it should be. The ISP might have a very good connection to the Internet, but puts many clients on the same line. If these are heavy clients, your traffic will have to share the same line and the throughput will decline. Think about a dedicated connection and make sure it is truly dedicated. Trust the ISP but check it!
The idea of having a connection to The Internet is a little misleading. Many Web hosting and co-location companies have large amounts of bandwidth, but still have poor connectivity. The public exchanges, such as MAE-East and MAE-West, frequently become overloaded, yet many ISPs depend on these exchanges.
Private peering means that providers can exchange traffic much quicker.
Also, if your Web site is of global interest, check that the ISP has good global connectivity. If the Web site is going to be visited mostly by people in a certain country or region, your server should probably be located there.
And a bad connectivity can directly influence your machine's performance. Here is a story, one of the developers told on the mod_perl mailing list:
What relationship has 10% packet loss on one upstream provider got to do with machine memory ?
Yes.. a lot. For a nightmare week, the box was located downstream of a provider who was struggling with some serious bandwidth problems of his own... people were connecting to the site via this link, and packet loss was such that retransmits and tcp stalls were keeping httpd heavies around for much longer than normal.. instead of blasting out the data at high or even modem speeds, they would be stuck at 1k/sec or stalled out... people would press stop and refresh, httpds would take 300 seconds to timeout on writes to no-one.. it was a nightmare. Those problems didn't go away till I moved the box to a place closer to some decent backbones.
Note that with a proxy, this only keeps a lightweight httpd tied up, assuming the page is small enough to fit in the buffers. If you are a busy internet site you always have some slow clients. This is a difficult thing to simulate in benchmark testing, though.
If your service is I/O bound (does a lot of read/write operations to disk, remember that relational databases are sitting on disk as well) you need a very fast disk. So you should not spend money on Video card and monitor (monochrome card and 14`` B&W are perfectly adequate for a server -- you will probably be telnetted or ssh-ed in most of the time), but rather look for disks with the best price/performance ratio. Of course, ask around and avoid disks that have a reputation for headcrashes and other disasters.
With money in hand you should think about getting a RAID system. RAID is generally a box with many HDs. It is capable of reading and writing data much faster, and is protected against disk failures. It does this by duplicating the same data over a number of disks, so if one fails, the RAID controller detects it and the data is still correct on the duplicated disks. You must think about RAID or similar systems if you have an enormous data set to serve. (What is an enormous data set nowadays? Gigabytes, terabytes?).
Ok, we have a fast disk, what's next? You need a fast disk controller. So either you should use the one embedded on your motherboard or you should plug a controller card if the one you have onboard is not good enough.
How much RAM (Randomly Accessed Memory) do you need? Nowadays, chances are you will hear: ``Memory is cheap, the more you buy the better''. But how much is enough? The answer pretty straightforward: ``You do not want your machine to swap''. When the CPU needs to write something into memory, but notices that it is already full, it takes the least frequently used memory pages and swaps them out. Swapping out means writing the data to disk. Another process then references some of its own data, which happens to be on one of the pages that were just swapped out. The CPU, ever obliging, swaps it back in again, probably swapping out some other data that will be needed very shortly by another process. Carried to the extreme, the CPU and disk start to thrash hopelessly in circles, without getting any real work done. The less RAM there is, the more often this scenario arises. Worse, you can exhaust swap space as well, and then the troubles really set in...
How do you make a decision? You know the highest rate your server expects to serve pages and how long it takes to do so. Now you can calculate how many server processes you need. Knowing the maximum size any of your servers can get, you know how much memory you need. You probably need less memory than you have calculated if your OS supports memory sharing and you know how to make best use of this feature (preloading the modules and scripts at server startup). Do not forget that other essential system processes need memory as well, so you should plan not only for the web server, but also take into account the other players. Remember that requests can be queued, so you can afford to let your client wait for a few moments until a server is available to serve it, your numbers will be more correct, since you generally do not have the highest load, but you should be ready to bear the peaks. So you need to reserve at least 20% of free memory for peak situations. Many sites have crashed a few moments after a big scoop about them was posted and unexpected number of requests suddenly came in. (This is called a Slashdot effect, which was born at http://slashdot.org ) If you are about to announce something cool, be aware of the possible consequences.
The most important thing to understand is that you might use the most expensive components, but still get bad performance. Why? Let me introduce an annoying word: A bottleneck.
A machine is an aggregate of many big and small components. Each one of them may be a bottleneck. If you have a fast processor but a small amount of RAM (memory), the processor will be under-utilized waiting for the kernel to swap the memory pages in and out, because memory is too small to hold the most used ones. If you have a lot of memory and a fast processor and a fast disk, but a slow controller - the performance will be bad, and you have wasted money.
Use a fast NIC (Network Interface Card) that does not create a bottleneck. If it is slow, the whole service is slow. This is the most important component, since webservers are much more network-bound than disk-bound!
To use your money optimally you have to understand the hardware very well, so you will know what to pick. Otherwise, you should hire a knowledgeable hardware consultants and employ him/her on a regular basis, since your demands will probably change as time goes by and your hardware will likewise be forced to adapt as well.
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/04/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |
Table of Contents:
If after reading this guide and other documents listed in this section, you feel that your question is not yet answered, please ask the apache/mod_perl mailing list to help you. But first try to browse the mailing list archive. Most of the time you will find the answer for your question by searching the mailing archive, since there is a big chance someone else has already encountered the same problem and found a solution for it. If you ignore this advice, do not be surprised if your question will be left unanswered - it bores people to answer the same question more than once. It does not mean that you should avoid asking questions. Just do not abuse the available help and RTFM before you call for HELP. (You have certainly heard the infamous fable of the shepherd boy and the wolves)
For more information See Get helped with mod_perl.
Hi, I wrote this document to help you with mod_perl. It does not mean that if you have any question regarding mod_perl, perl or whatever you think I might know, you should send it directly to me. Please see the Get helped with mod_perl section and follow the guidelines as prescribed there.
However, you are welcome to submit corrections and suggestions directly to me at sbekman@iname.com?subject=mod_perl%20guide%20corrections. If you are going to submit heavy corrections of the text (I love those!), please help me by downloading the source pages in POD (from the main page under the index) and directly editing them. I will use Emacs Ediff to perform an easy merge of your changes. Thank you!
PLEASE NO PERSONAL QUESTIONS, I didn't invite those by writing a guide. They all will be immediately deleted. Please ask the questions at the mod_perl list and if someone or I can answer your question--it will be answered. Thank you!
http://www.modperl.com is the home site of The Apache Modules Book, a book about creating Web server modules using the Apache API, written by Lincoln Stein and Doug MacEachern.
Now you can purchase the book at your local bookstore or from the online dealer. O'Reilly lists this book as:
Writing Apache Modules with Perl and C
By Lincoln Stein & Doug MacEachern
1st Edition March 1999
1-56592-567-X, Order Number: 567X
746 pages, $34.95
by Frank Cringle at http://perl.apache.org/faq/ .
by Vivek Khera at http://perl.apache.org/tuning/ .
by Doug MacEachern at http://perl.apache.org/src/mod_perl.html .
http://www.refcards.com (Apache and other refcards are available from this link)
The Apache/Perl mailing list (modperl@apache.org) is available for
mod_perl users and developers to share ideas, solve problems and
discuss things related to mod_perl and the Apache::* modules. To subscribe to this list, send mail to majordomo@apache.org with empty Subject and with Body:
subscribe modperl
A searchable mod_perl mailing list archive available at http://forum.swarthmore.edu/epigone/modperl . We owe it to Ken Williams.
More archives available:
http://world.std.com/~swmcd/steven/perl/module_mechanics.html - This page describes the mechanics of creating, compiling, releasing and maintaining Perl modules.
http://www.singlesheaven.com/stas/TULARC/webmaster/myfaq.html
http://www.gunther.web66.com/FAQS/taintmode.html (by Gunther Birznieks)
http://www.refcards.com (Apache and other refcards are available from this link)
http://www.saturn5.com/~jwb/dbi-examples.html (by Jeffrey William Baker).
http://outside.organic.com/mail-archives/dbi-users/ http://www.xray.mpe.mpg.de/mailing-lists/dbi/
http://perl.apache.org/src/mod_perl.html#PERSISTENT_DATABASE_CONNECTIONS
Home page - http://squid.nlanr.net/
Users Guide - http://squid.nlanr.net/Squid/Users-Guide/
Mailing lists - http://squid.nlanr.net/Squid/mailing-lists.html
|
|
||
|
Written by Stas Bekman.
Last Modified at 12/18/1999 |
|
Use of the Camel for Perl is a trademark of O'Reilly & Associates, and is used by permission. |