guide/0040755000000000000000000000000007027225634010664 5ustar rootrootguide/CHANGES0100644000000000000000000011554507027225633011666 0ustar rootrootThis is a CHANGES file for mod_perl Guide 12.19.99 ver 1.19 * all.html has gone (all htmls in one) -- it became more than 1Mb, too big - use the PS version instead * reorg: moved the "perl reference" chapter to be one of the first ones, because it should be read first. Moved the strategies and implementations toward the middle. * snippets: started "Code Unloading" as hinted by Doug. * porting: updated "Output from system calls" (Doug) * porting: fixed the "\n\n" vs. "\r\n\r\n"(Philip Newton) * debug: added "Debugging when Server Crashes on Startup before Writing to Log File" (Cliff Rayman) * snippets: added "Redirecting While Maintaining Environment Variables" (Vivek Khera) * troubleshooting: added "libexec/libperl.so: open failed: No such file or directory" (Christophe Dupre) * performance: added "Upload/Download of Big Files" (Ken Williams) * install: added a reference to "Static debian package" (by David Huggins-Daines) * troubleshooting: added Windows: "Apache::DBI and PERL_STARTUP_DONE_CHECK" (Gerald Richter, Randy Kobes) * performance: added KeepAlive notes (Craig, Pascal Eeftinck) * performance: added HTML::Mason notes (Pascal Eeftinck) * porting: added new "die() and mod_perl" * porting/perl: moved most of the perl specific reference material into perl.pod removing duplications of this material on the way and replacing it with pointers to perl.pod * performance: rewritten "Object Methods Calls Versus Function Calls" * porting: FindBin is not mod_perl compatible (Andrei A. Voropaev, Joao Fonseca) * scenario: denoted the ProxyReceiveBufferSize limit by SO_RCVBUF in kernel (Vivek Khera) and kern.ipc.maxsockbuf=2621440 on BSD (Oleg Bartunov) * snippets: added "mysql backup and restore scripts" * snippets: added "Subclassing Apache::Request example" * snippets: added "CGI::params in the mod_perl-ish way" * debug: added "Using print() and Data::Dumper for Debugging" * snippets: started a "Sending email from mod_perl" topic * control: Preparing for Machine Reboot * download: added more load ballancing URLs * performance: added "Tuning with httperf" * intro: added "High-Profile Sites Running mod_perl" (Rex Staples) * review: Ged W. Haywood was very kind to review and correct the following chapters: start, intro, strategy, porting (!), databases, dbm, security. * install.pod: perl Makefile.PL troubleshooting - added "A test compilation with your Makefile configuration failed..." and "missing/misconfigured libgdbm.so" (Tom Brown and Steve Willer) * install.pod: make troubleshooting "unrecognized format specifier for..." during the build process (Scott Fagg) * porting: a bug in a script from "Exposing Apache::Registry secrets" spotted and fixed (John Deighan) * install.pod: integrated the "manual mod_perl build process" remarks and patch (Robin Berjon) * install.pod: don't put mod_perl sources in a sub-dir of Apache sources. It wouldn't build! (Ask Bjoern Hansen) * review: Dale Couch was very kind to review and correct the following chapters: porting 11.13.99 ver 1.18 * An almost complete rewrite of debug.pod: (Integrated Doug's debugging article at perlmonth.com) Curing The "Internal Server Error" Helping error_log to Help Us The Importance of Warnings diagnostics pragma Monitoring the error_log file Hanging processes: Detection and Diagnostics An Example of the Code that Might Hang the Process Detecting hanging processes Determination of the reason Handling the 'User pressed Stop button' case Detecting Aborted Connections The Importance of Cleanup Code Critical Section Safe Resource Locking Cleanup Code Handling the server timeout cases and working with $SIG{ALRM} Watching the server Configuration Usage Compiled Registry Scripts section seems to be empty. Sometimes script works, sometimes does not Code Debug Locating and correcting Syntax Errors Using Apache::FakeRequest to Debug Apache Perl Modules Finding the Line Number the Error/Warning has been Triggered at Using print() Function for Debugging The Importance of Good Coding Style and Conciseness Introduction into Perl Debugger Interactive Perl Debugging under mod_cgi Non-Interactive Perl Debugging under mod_perl Interactive Perl Debugging under mod_perl Interactive Perl Debugging under mod_perl and ptkdb Debugging core Dumping Code Apache::Debug Debugging Core Dumps Debug Tracing gdb says there are no debugging symbols Debugging Signal Handlers ($SIG{FOO}) Code Profiling Devel::Peek How can I find if my mod_perl scripts have memory leaks Debugging your code in Single Server Mode * A complete rewrite of install.pod: (Integrated the INSTALL.* docs from the mod_perl distribution) Installing mod_perl in 10 Minutes and 10 Command Lines The Gory Details Sources Configuration (perl Makefile.PL ...) Configuration parameters APACHE_SRC DO_HTTPD, NO_HTTPD, PREP_HTTPD Callback Hooks EVERYTHING PERL_TRACE APACHE_HEADER_INSTALL PERL_STATIC_EXTS PERL_MARK_WHERE APACHE_PREFIX APACI_ARGS Reusing Configuration Parameters Discovering whether some option was configured Using an alternative Configuration file mod_perl Building (make) make Troubleshooting undefined reference to 'Perl_newAV' Built Server Testing (make test) Manual Testing make test Troubleshooting make test fails mod_perl.c is incompatible with this version of apache make test......skipping test on this platform Installation (make install) Building Apache and mod_perl by Hand Installation Scenarios for Standalone mod_perl The All-In-One Way The Flexible Way Build mod_perl as DSO inside Apache source tree via APACI Build mod_perl as DSO outside Apache source tree via APXS Installation Scenarios for mod_perl and Other Components mod_perl and mod_ssl (+openssl) mod_perl and mod_ssl Rolled from RPMs mod_perl and apache-ssl (+openssl) mod_perl and Stronghold Note For Solaris 2.5 users mod_perl Installation with CPAN.pm's Interactive Shell Installing on multiple machines using RPM, DEB and other packages to install mod_perl A word on mod_perl RPM packages Getting Started Compiling RPM source files Mix and Match RPM and source Installing a single apache+mod_perl RPM Compiling libapreq (Apache::Request) with the RH 6.0 mod_perl RPM Installing separate Apache and mod_perl RPMs Testing the mod_perl API Installation Without Superuser Privileges Installing Perl Modules into a Directory of Choice Making Your Scripts Find the Locally Installed Modules CPAN.pm Shell and Locally Installed Modules Making a Local Apache Installation Actual Local mod_perl Enabled Apache Installation Local mod_perl Enabled Apache Installation with CPAN.pm Automating installation How can I tell whether mod_perl is running Testing by checking the error_log file Testing by viewing /perl-status Testing via telnet Testing via a CGI script Testing via lwp-request General Notes Should I rebuild mod_perl if I have upgraded my perl? Perl installation requirements mod_auth_dbm nuances Stripping apache to make it almost perl-server Saving the config.status Files with mod_perl, php, ssl and Other Components Should I Build mod_perl with gcc or cc? OS Related Notes * databases: added "Debugging code which deploys DBI" * porting: added "STDIN, STDOUT and STDERR streams" * advocacy: added "A summary of perl/cgi discussion at slashdot.org" * snippets: added "Terminating a child process on Request Completion" (Doug) * troubleshooting": added "Apache.pm failed to load!" (Doug) * snippets: added "Reading POST Data, then Redirecting" (Doug) * snippets: added "Cache control for regular and error modes" (Cliff Rayman) * performance: added "Be carefull with symbolic links" (the same script compiled twice) * install: new "apache/mod_perl/mod_ssl Rolled from RPMs Scenario" (Stephane Benoit) * porting: 'use subs (exit)' typo fixed (Chris Nokleberg) * warnings.pod was renamed to troubleshooting.pod and now it's categorized by the following sections: Building and Installation Configuration and Startup Code Parsing and Compilation Runtime Shutdown and Restart * porting: the following sections were moved to debug.pod: "Finding the Line Number the Error/Warning has been Triggered at", "Turning warnings ON", "diagnostics pragma" * porting: rewritten "Comman line Switches (-w, -T, etc)" * performance: "Forking or Executing subprocesses from mod_perl" updated with another CHLD sighandler using WNOHANG to reap zombie processes (Lincoln Stein) * install: updated "Testing via a CGI script" (Geoffrey S Young) * porting: updated "Terminating requests and processes, exit() function" with info about post_request termination, Apache::SizeLimit and Apache::GTopLimit * perlformance: links from http://www.realtime.net/~parkerm/perl/conf98/index.htm and http://www.realtime.net/~parkerm/perl/conf98/sld006.htm were dead (I removed them :( (Peter Skov) * snippets: added "Caching the POSTed Data" (Doug) * install: "Compiling libapreq with mod_perl RPM" reviewed and corrected (Geoffrey S Young) * status.pod has been eliminated and absorbed by debug.pod where it belong * Fixed pod translator. Now it handles correctly C<$r-Emethod> encodings. (Andreas Koenig) 10.16.99 ver 1.17 * intro: CREDITS section was updated with the long list of contributors!!! Thank you all!!! If I've missed your name, please let me know!!! * control: added "Safe Code Updates on a Live Production Server" * control: added "An Intentional Disabling of Live Scripts" * scenario: added a big new section "One Light and One Heavy Servers where ALL htmls are Perl-Generated" (Wesley Darlington) * dbm: David Harris has detected a corruption with the suggested locking methods in the Camel book and DB_File man page (at least before the version 1.72). They are flawed and if you use them in the environment where more than one process modify the dbm file, it can get corrupted!!! I've modified the DB_File::Lock module to fix the problem by integrating the previously written DB_File::Wrap and the module David wrote (David Harris) * snippets: added "Sending multiply cookies with Perl API" (Rick Myers) * install: added a big section "using RPM, DEB and other packages to install mod_perl" (Geoffrey S Young, David Harris) * install: added "Automating installation" - James G Smith's Apache Builder script * install: added a new section "using CPAN to install mod_perl" * performance: extended the "Forking or Executing subprocesses from mod_perl" with information and code to avoid zombies. * performance: added a converted to pod "Jeff's guide to mod_perl database performance" (Jeffrey W. Baker) * new chapter: "Correct Headers" contributed by Andreas Koenig!!! * help: updated the link to DBI homepage (hermetica has gone) * performance: added sizing benchmarks of CGI.pm's imported symbols. (CGI.pm's object methods calls vs. function calls) * porting: fixed a typo with local() and Special variables (Andrei A. Voropaev) * snippets: fixed a taint problem in the sample error_log display script.(John Walker) * install: added "Should I Build mod_perl with gcc or cc" (Tom Hughes) * warnings: added to the troubleshotting section "Missing right bracket at line " with a link to the item explaining that in porting.pod ("__END__ and __DATA__ tokens") (Eric Strovink) * install: added a tip of saving config.status files for each module build (php. mod_perl, ssl) for a later easier reuse. (Dave Hodgkinson) * performance: added clarification to "PerlSetupEnv Off" item (Doug) * snippets: added "Passing environment variables between handlers" (Doug) * warnings: added "Can't locate loadable object for module XXX" (Doug) * config: corrected the section dump typo (Gerald Richter) * scenario: corrected the snippet to extract the client IP from the X-Forwarded-For header to use headers_in instead of the obsolete header_in (Oleg Bartunov) * scenario: added a note about "Ben Laurie's Apache-SSL setting REMOTE_ADDER instead of X-Forwarded-For header (Jie Gao) * performance: started "Analysis of SW and HW Requirements" (Jeffrey W. Baker) * warnings: clarification of "rwrite returned -1" (Eric Cholet) * warnings: added "Invalid command 'PerlHandler" (Doug) * debug: started "Apache::Debug" and Carp::confess("init") (Doug) * install: "undefined reference to 'Perl_newAV'" documented (Doug) * modules: added a clarification about Apache::PerlVINC (Doug) * warnings: updated the "Callback Called Exit & -D PERL_EMERGENCY_SBRK" (Doug) * databases: added $Apache::DBI::DEBUG = 2 (instead of '1') for ver 0.84+ (Edmund Mergl) * performance: added "Caching prepare() statements" + rolling your own Apache::DBO code (Jeffrey Baker) * porting: replaced "Apache::Registry::handler" with "Apache::Registry". It caused problems to some of the users (Daniel W. Burke) * performance: added "Increasing the shared memory with mergemem" (no real info but a link to the util's site. Please take a look and submit your opinions) (Doug Bagley) * snippets: added "Redirect a POST request, forwarding the content" (Eric Cholet, Doug) * performance extended the "Using $|=1 under mod_perl and better print() techniques" with notes about rflush() * shuffled many items around to help more intuitive search of the them * performance: added "Cached stat() calls" 09.26.99 ver 1.16 * Many little things fixed, rewritten - didn't worth listing them all here. * index.html: added another search box for only mod_perl FAQs and the guide provided by perlreference.com * config: added a note about Apache restarting twice on start * warnings: added "syntax error at /dev/null" - broken /dev/null (Doug) * porting: added "Special Perl Variables" using local() * multiuser: Added the considerations not to let users to execute their CGI scripts inside mod_perl server because of file permissions (non-mod_perl problem) and a possibility to hijack a DBI connection from Apache::DBI pool of cached connections (Peter Galbavy) * install: added "Is it possible to tell whether some option was included" nm() hints (Doug) * performance: new "PerlSetupEnv Off" (Doug) * porting: new section "Passing and preserving custom data structures between handlers" (Ken Williams) * security: "OK, AUTH_REQUIRED.and FORBIDDEN" in authentication phase. (Eric Cholet, Jason Bodnar) * porting: rewrote the "Generating correct HTTP Headers" section, to talk about HEAD requests, PerlSendHeader, Perl API to handle the headers generation, Cookie headers, closure methods to send headers only once. * Purifications: I'm very grateful to the people who take their time to help me to improve the guide's readablility. This time Richard A. Wells and Frank Schoeters submitted a few corrections to the text. Keep these corrections coming. Thanks! * porting: extended the "Forking and Starting Sub-processes with mod_perl" section (Les Mikesell, Randal L. Schwartz ) * porting: Wrote a whole new section "Configuration Files: Writing, Modifying and Reloading.", which consist of 3 big parts: Writing Configuration Files Reloading Configuration Files Dynamically updating configuration files * scenario: updated the X-Forwarded-For> section with notes of non-reliability. (Ask Bjoern Hansen, Vivek Khera) * porting: started the "Sharing variables between processes" section (Eric Cholet) * config: dumping the configuration by sections (Eric Cholet) * performance: prepare_cached() in persistent connections. * help.pod: updated a link to Jefferey W. Baker's DBI examples (Jefferey W. Baker) * a list of mailing list archives was updated (Andreas J. Koenig, Jan Peter Hecking, Matthew Darwin, Christof Damian, Geoffrey S Young) * debug.pod: "Spinning httpds" section from mod_perl.pod * config.pod: have stolen the sections "PERL METHOD HANDLERS", "STACKED HANDLERS" and "Perl*Handlers" from mod_perl.pod * performance.pod: noted the DTWO_POT_OPTIMIZE and -DPACK_MALLOC Perl Options from perl5004delta.pod relevant to mod_perl * config.pod: wrote sections "PerlModule and PerlRequire directives", and Perl*Handlers. * install.pod: "skipping test on this platform" while 'make test' explained. (Doug) * warnings.pod: syntax error at /dev/null, explained (Doug) * started to work on intro.pod to make clear out the differences between Perl API, Apache::Registry, Apache::PerlRun. * install.pod: added "mod_auth_dbm nuances" an old notice from mod_perl_traps page * porting.pod: Added the explanation of why you cannot use C<__END__> or C<__DATA__> within C scripts. * Removed the Cyan background from the postscript version of the guide. I liked the light grey background when the guide was printed on the B&W printer, but yes it uses too much toner - so it's gone :) 08.17.99 ver 1.15 * Richard A. Wells has kindly reviewed and corrected the following pods: advocacy.pod download.pod snippets.pod status.pod browserbugs.pod intro.pod start.pod * security.pod : added "Forcing reauthenticating" section * index.pod : Added a link to http://www.perlreference.com/mod_perl/ * help.pod : Added links to modperl.sourcegarden.org and http://www.perlreference.com/mod_perl/ * performance.pod: a little fix to the crashme script (Jay J) * Updated the porting.pod sections: "Sometimes it wors, sometimes doesn't", "Script's name space" and other as well * config.pod: updated sections (how to dump the sections (Eric Cholet) and how to use the /perl-status for doing that. * hardware.pod: David Landgren did a great job of reviewing, suggesting and correcting the OS/Hardare chapter! * Andreas J. Koenig pointed out that it's unfair to mention eddieware without the others.. I agree Andreas! hardware.pod and download.pod were updated to point to "High-Availability Linux Project" site... Eddieware was removed :) * download.pod: now guide hints on where to find Apache::Request (libapreq-x.xx.tar.gz) - on Philip Jacob request * config.pod: a few small typos (John Milton) * databases.pod: Matt Arnold pointed out a problem with connect_on_init if the database server is down. I've added a warning. * porting.pod: Cleared out the confusion with StatINC and @INC issue * perl.pod: Added a section that reveals the useful perldoc options * performance.pod: Added the explanation of the Apache::Leak example (Cliff Rayman) * databases.pod: Added the explanation of the "skipping connection cache during server startup", when the connection is attempted to be opened in the parent process. (Edmund Mergl) * debug.pod: started the "Debugging Core Dumps" item (Doug) * performance.pod: added the reference to Apache::RegistryBB, for those who want to save the little overhead of the stat() call that is being executed under Apache::Registry. (Doug) * modules.pod: added Apache::RegistryBB (Doug) * porting.pod: covered the issue of Apache/Work/Foo/Bar.pm collision with Apache/Work/Foo.pm if the former is being loaded first (Doug) * Apache::Leak considered to be non-friendly, added a reference to B::LexInfo (Doug) * porting.pod: "Passing ENV variables to CGI" added clarifications for %ENV setting/passing mechanism in mod_perl (Doug) * performance.pod: started a new subsection - shared memory (what, how much, where) * modules.pod and porting.pod: added an Apache::LogSTDERR module to solve the syslog problem(Doug) * porting.pod: Reloading handlers trick (Doug) 07.3.99 ver 1.14 * porting.pod: added "Exposing Apache::Registry secrets, closures, multiserver mode". * A complete review, which included corrections, verifications, extensions and clarifications was done to the following pods during the preparation of the tutorial for the 3rd apache conference: start.pod intro.pod porting.pod performance.pod strategy.pod scenario.pod config.pod install.pod control.pod databases.pod multiuser.pod help.pod 06.19.99 ver 1.13 * While working on presentation discovered a wonderfull 'html2ps' utility (http://www.tdb.uu.se/~jan/html2psug.html) - so now we have a real mod_perl book in PostScript !!! (cross references aren't working yet) * hardware: added a reference to eddieware * performance.pod: extended the Apache::Resource section (Doug) * performance.pod: Added a reference to httperf benchmark tool. * I made many little changes all over the guide, while preparing a subset of material for the upcoming apache/perl conference tutorial * performance.pod: added some clarifications to "Preload Perl modules at server startup" section - regarding CGI::compile * advocacy.pod: A complete rewrite to communicate the ideas differently. Now it displays a positive, motivational and concise perspective on the same ideas. (by Randy Harmon) * strategy.pod: modifications related to memory sharing with Apache, mod_proxy section (Ask Bjoern Hansen) * scenario.pod: Added the missing implementation of "Standalone mod_perl Enabled Apache Server"+configuration, which is temporarely located at the same chapter. * More pods have been purified by Steve Reppucci (performance.pod). 06.05.99 ver 1.12 * install.pod: added "Should I rebuild mod_perl if I have upgraded my perl?" * scenario.pod: explained the long termed bug with APACI_ARGS, csh vs. sh issue. * databases.pod: added "mysql_use_result vs. mysql_store_result" (Michael Hall, Ken Williams, Vivek Khera) * config.pod: added "Logical grouping of Location, Directory and FilesMatch directives" (Daniel Koch) * config.pod: added "The confusion with use() clause in startup" file (Mike Fletcher) * config.pod: added "The confusion with defining globals in startup" file * performance.pod: extended the Devel::DProf notes with Apache::DProf * started a new advocacy.pod: mod_perl advocacy * install.pod: "Stripping apache to make it almost perl-server" (Jeffrey W. Baker, Randal L. Schwartz,Robin Berjon) * modules.pod, config.pod : added Doug's Apache::PerlVINC to set a different @INC perl location * install.pod: covered an installation problem of: "mod_perl.c is incompatible with this version of apache" (Doug) * databases.pod: added "Opening a connection with different parameters" (Jauder Ho, Edmund Mergl) * performance.pod. modules.pod - added Apache::GzipChain to cut down download times * debug.pod: added some snippets from Doug's replies showing strace and Devel::Peek in action * updated obvious.html#Reloading_only_specific_files - some code improvements (Ken Williams) * debug.pod: added "Debugging Signal Handlers ($SIG{FOO})" which covers the latest $SIG{ALRM} changes and Doug's Sys::Signal module to overcome the handler restore problem with other signals. (Doug) * strategy.pod: mod_proxy and http accell sections were extended by notes from Joshua Chamas. * warning.pod: noted a 'rwrite returned -1' fix in CVS version * obvious.pod: added "Additional reading references" to "my() scoped variable in nested subroutines" including a pointer to an article by Mark-Jason Dominus about how Perl handles variables and namespaces, and the difference between `use vars' and `my'. * hardware.pod: applied some addition and changes. * debug.pod: added "Monitoring error_log file" * scenario.pod: added a complete definition of ProxyReceiveBufferSize, its buffering functionality. * More pods have been purified by Steve Reppucci (hardware.pod, strategy.pod). Thanks to Steve English speakers can read my scribbles as well :o) 05.17.99 ver 1.11 * new hardware.pod: added a "Operating System and Hardware Demands" (Dean Fitz reviewed it and made lots of fixes!!! Thanks) * started a new security.pod "Protecting Your Site" to explain security hazards and to show some configuration setups and code snippets. * security.pod: explained the Authentication and Authorization terms * security.pod: "Non authenticated access for internal IPs, but authenticated by external IPs" (Eric Cholet) * scenario.pod: added "HTTP Authentication with 2 servers + proxy" (Mark Mills, Russell D. Weiss) * scenario.pod: added some DSO building notes (Guy K. McArthur) * porting.pod: added "Generating correct HTTP MIME Headers" as suggested by Alex Krohn * config.pod: added "Running 'apachectl configtest' or 'httpd -t'" (Doug) * porting.pod: added "Passing ENV variables to CGI" (Doug) * Updated: "Finding the line number the error/warning has been triggered at" at porting.pod * Added the info about ProxyReceiveBufferSize in scenario.pod, mod_proxy section. (Rauznitz Balazs) * added to config.pod: Configuration Security Concerns (Gunther Birznieks) * completely rewrote the start.pod (the English was horrible :( * updated help.pod with squid help URLs 05.08.99 ver 1.10 * control.pod: SUID start-up scripts (Lincoln Stein) * porting.pod: Forking subprocesses from mod_per (Philp Gwyn) * added to performance.pod: CGI.pm\'s object methods calls vs. function calls * new pod: browserbugs.pod - Workarounds for some known bugs in browsers. added: Preventing QUERY_STRING to get corrupted with &entity key names. added: IE 4.x does not re-post data to a non-port-80 URL * strategy.pod: updated notes about squid (Andreas J. Koenig) * strategy.pod and scenario.pod started the ProxyPass sections (Mark Mills, Ken Williams, Ask Bjoern Hansen) * wrote a code to validate a pod L<> directive, by first building a hash of all available achors and hash of all L<> directives, then reporting the broken links! This is cool! TomC will never accept the patch to his Pod2Html.pm :( So there is no broken links anymore, unless I forgot to run the checker :) * start.pod now contains an overview of the guide. The previous content migrated to install.pod and download.pod. Part of the scenario.pod moved to install.pod. * people still report problems with CSS I use, I made more tweaking by deleting almost all styles. Seems people are missing some basic fonts families and complaining about being unable to read the text. * strategy.pod: using thttpd instead of plain apache (Rauznitz Balazs) * scenario.pod: was splitted into strategy.pod and scenario.pod. strategy.pod now only talks about different approaches, while scenario.pod provides the building and configuration details. strategy.pod tries clearly to state the pros and cons of each approach (please review) * Introduced a new dbm.pod: mod_perl and dbm files (please review) * Introduced a new databases.pod: mod_perl and Relational Databases (please review) * The whole expanded table of contents now can be found in index.html - (index.html now being generated by script). Should make navigation much easier. * The last html sources file has gone, now all src files are pods. * Improved search engines requirements: Extended and * Improved navigation : added Next, Main. Previous links. * Added another list archive (help.html): http://www.geocrawler.com/lists/3/web/182/0/ (Eric Cholet) * obvious.pod: "Setting environment variables for scripts called from CGI." (Lincoln Stein, Doug MacEachern) * extended the "Using $|=1 under mod_perl and better print() techniques" at performace.pod. 04.19.99 ver 1.09 (1/2) * guide.tar.gz and guide-src.tar.gz were outdated, now they are synced * Fixed a huge number of typos (with help of speller :), I'm sure there are still many that speller didn't catch - guess people are regular to read badly written textbooks, since just a few told me about them :( If you spot such, please, do not hesitate and tell me! * Lupe Christoph suggested to apply changes to the main page. It's done. Also as suggested by Lupe linked the text "Writing Apache Modules with Perl and C" a link to http://www.modperl.com/ . * Numerous typos were spotted by Andreas J. Koenig and gave me an idea to run speller :) 04.17.99 ver 1.09 * added to warnings.pod: explained "incorrect line number reporting in error/warn log messages" * added to scenario.pod: clarification about 2 different config files in the 2 servers scenario (David Livingstone) * added to scenario.pod: started "mod_perl as DSO" section - almost empty yet :( anyone with DSO experience? * added to config.pod: started the mod_perl as DSO section * updated config.pod: added how $Apache::Registry::NameWithVirtualHost bug in older versions can be turned into a feature (Doug) * added to performance.pod : Memory sharing (Leslie Mikesell) * updated warning.pod: server reached MaxClients setting * updated performance.pod : MaxClients reached ( Nick Tonkin ) * updated start.pod: "How can I tell whether mod_perl is really installed" - added httpd -l * modified scenario.pod: Made little changes to make the installation process less confusing (Pete Harlan) * obvious.pod: updated "Handling the server timeout cases" - $SIG{ALRM} to not restore the original underlying C handler. Pointed to try a Sys::Signal as a remedy (Doug) * new in multiuser.pod: ISPs providing mod_perl services - a fantasy or reality. (Notes from Mark Mills, Russell D. Weiss) * new in multiuser.pod: Virtual Hosts in the guide * new pod : multiuser.pod - mod_perl for ISPs. mod_perl and Virtual Hosts. * Added a link to the new book to the O'Reilly and Amazon.com sites. * debug.pod: added Apache::DB coverage * performance.pod: "Why you should not use $|=1 under mod_perl" (Doug, Randal) * debug.pod: "gdb says there are no debugging symbols" (Michael Hall) * config.pod: "the server no longer retrieves the DirectoryIndex files for a directory" (Andreas Grupp) * scenario.pod: added 'make test fails' when people use PREP_HTTPD=1 or don't use DO_HTTPD=1 (Doug) * removed the 'Mini' part from the guide's name, since it's growned enough to be not called mini any more. * modules.pod: added Apache::Request * modules.pod: added Apache::DBI * modules.pod: added Apache::Session (Jeffrey Baker) * new pod: modules.pod - to introduce Apache::* modules with small examples to rise curiosity to read the whole man page * new in scenario.pod: "mod_perl and proxy server" Incentives Squid proxy server in httpd accelerator mode Running a squid and 2 webservers scenario Running a squid and 1 mod_perl apache server scenario (Reviewed and modified according to notes by Richard Dice, Andreas J. Koenig, Eric Cholet, Jeremy Bailin, David Landgren) * Added to scenario.pod: 'Publishing port numbers different from 80' (originally by Ken Williams, forwarded by Eric Strovink) * config.pod: new section "Configuring Apache + mod_perl with mod_macro" contributed entirely by Eric Cholet (I have edited it a bit :). 04.03.99 * Rewritten the CREDITS section of the intro.html. I hope I didn't miss anyone, if I did please tell. Lets feed the ego :) * The guide now looks much better with StyleSheets (Nathan Vonnahme) * added to porting.pod : Filehandlers and locks leakages (Ken Williams, Doug) * added to obvious.pod: Handling the server timeout cases (Doug) * created new pod: perl.pod to cover some too frequently asked pure perl questions: opened up with "Using global variables and sharing them between modules/packages" * Now the pod sources available online along with the resulting htmls and the scripts that generates them. * Added a summary of various mod_perl deploying schemas (1/1, 2/2, DSO and proxy). /scenario.html#More_mod_perl_deploying_schemas (Mark Mills) * created new frequent.pod for "Frequent mod_perl problems" as suggested by Eric Cholet, who said that problems like 'my() scoped variable in nested subroutines' come up so often on the list that should be stressed in the guide as one of the most important things to read/beware of. Since now it has only a few problems please suggest what other ones should go here. * obvious.pod rewritten : my() scoped variable in nested subroutines (Eric Cholet) * some typos fixes in intro.html, start.pod and scenario.pod (Garr Updegraff) * snippets.pod: Cookie handling code (Ed Park) * obvious.pod updated: Handling the 'User pressed Stop button' case. More hints (Eric Strovink) and apache 1.3.6 news (Henrique Pantarotto) * scenario.pod added : Is it possible to determine which options were given to modperl's Makefile.PL * More pods have been purified by Steve Reppucci (warning.pod, obvious.pod and porting.pod). He did so much work to make them readable, that I'm afraid to apply new changes to break all the beauty he made :) Thanks, Steve! 03.15.99 * Added a downloadable guide.tar.gz as someone requested * snippets.pod: Accessing variables from the caller's package (Ken Williams) * porting.pod: Redirecting mod_perl error_log messages to the browser - added an extensive example * control.pod: added hints - Preventing from modperl process to eat up all the disk's space, when it goes wild. (Andreas J. Koenig, Ulrich Pfeifer) * performance.pod: cleared out where one can get the 'ab' Apache Benchmark utility * warning.pod: covered - Evil things might happen when using PerlFreshRestart (Doug) * status.pod: covered - Compiled Registry Scripts section seems to be empty (Radu Greab) * warning.pod: covered - RegistryLoader: Cannot translate the URI... * scenario.pod: added a note: when using USE_APACI and APACHE_PREFIX, make install will run also the make install at Apache's source tree... (Doug) * debug.pod Getting some decent debug info when running under mod_perl (Doug) * ScriptAlias vs. Alias updated and explaned in config.pod. (Doug, Ask and Eric) * scenario.pod, intro.html, config.pod, control.pod and start.pod were purified by Steve Reppucci. Steve has fixed my incorrect English expressions and tenses, corrected some technical details! Enormous help, Steve! Thanks! If you see some incorrect English in the guide, don't hesitate to send an email to me. Thanks! 01.22.99 * new obvious.pod: Where do the warnings/errors go? * new index.html: added a search box * new snippets.pod: added error_log.pl script to fetch the latest logs from the server without telneting there * new snippets.pod: How to avoid printing the header more than once. * new snippets.pod: More on relative paths * upd start.pod: removed all 'latest version is', so the guide will not misguide people (Ken Williams) * upd config.html: removed redundant ;; (Ken Williams) * upd config.html: fixed the question/answer 'Is there a way to provide a different startup.pl file for each individual virtual' (Ken Williams) * upd help.html: a few links fixed (Peter Skov (UNIT)) * upd porting.pod: CORE::exit vs Apache::exit section update (Doug) * upd scenario.pod: note about importance make clean execution, because of possible binary incompability (1.3.3 vs 1.3.4) (Doug) * upd porting.pod: switches -w, -T in the shebang line (Doug) * upd debug.pod: tracing the PerlRequire's and PerlModule's as they are loaded (Doug) * add config.pod: Sometimes script from one virtual host calls the script with the same path from the second virtual host (Doug) * add performance.pod: how can I find if my modperl scripts have memory leaks (and where). (Doug) * help.html: added a section for DBI help (Jeffrey W. Baker) 12.28.98 * Updated the "Client hit STOP or Netscrape bit it" section, with new warning "[modperl] caught SIGPIPE in process" for ver 1.17 (new Apache::SIG) * Richard A. Soderberg spotted a few problems with name anchors in start.html (pod converter doesn't resolve the problem correctly) and ScriptAlias typos at config.html. * broken link to www-security-faq was spotted by Gunther Birznieks 12.20.98 * fixed: @INC vs %INC obvious.html#Using_Apache_StatINC (thank to Ken Williams) * new: Apache::SpeedLimit added to performance.html#Limiting_the_request_rate_speed_ * modified: register_cleanup in Registry scripts (END{} blocks) based on the last week tread * new: obvious.html#Handling_the_User_pressed_Stop_ * Found a bug in Pod::Html - it tries to convert HTTP::Foo alike tokens into hypertext link which breaks the code in the resulting html. Applied the patch to Pod::Html::VERSION 1.01 1119a1120 > (?! :) # don't convert HTTP::Foo and alike * Discovered that the guide is being searchable thru the http://www.apache.org/search.html added a link to index.html * Extended the control|Log_Rotation section (+Script from Randal) * control: HUP vs TERM vs USR1. I have asked for validation of this section, but received none... Added a note about slowness of termination (Robin Berjon) and possible way to speed it up (Frank D. Cringle). Added a mneumonics => numbers for SIGs (Marshall Dudley) * added the missing USE_APACI=1 in start.html#Mod_Perl (Thanks to Tzvetan Stoyanov) 12.13.98 * covered warning: rwrite returned -1 * covered warning: Client hit STOP or Netscrape bit it! * covered warning: Can't load '.../auto/DBI/DBI.so' for module DBI * covered porting: using format() * covered warning: child process 30388 did not exit, sending another SIGHUP * extended warning: Callback called exit All the above are based on the Doug's answers this weekend :) * new: config: Tuning MinSpareServers MaxSpareServers StartServers MaxClients MaxRequestsPerChild (actually a pointer to the next item) * new: performance: Tuning the Apache's configuration variables for the best performance Tuning with ab - ApacheBench Tuning with crashme script Choosing MaxClients Choosing MaxRequestsPerChild Choosing MinSpareServers, MaxSpareServers and StartServers Summary of Benchmarking to tune all 5 parameters ########################################################################## 12.08.98 * Lots of "little typos" fixed. Thanks to Evan A. Zacks, Eric Cholet and Nancy Lin ! * added a quote from DBI page, why $sth-rows; can't be used for rows counting. * fixed obvious.html#Compiled_Regular_Expressions href at porting.html Thanks to Richard Dice! * lots of little changes and add ons... 12.07.98 * Run a spell check. ispell and WWWebster were quite helpful :) * Added Richard Dice's notes about ways to see whether or not mod_perl is actually compiled into the server and working. "check the error_log file" (installation) * Added 'Is it possible to install mod_perl without root access?' section into Server Installation (scenario) page. * Added Perrin Harkins and Jonathan Peterson's notes about apache/mod_perl/embperl/DBI vs IIS/ASP/ADO * Added a CHANGES file (this one) * Added an 'all in one page', suitable for printing. Currently it's just an ordered cat(). In the future it might change :) 12.03.98 * First Release guide/advocacy.html0100644000000000000000000002163707027225633013350 0ustar rootroot mod_perl guide: mod_perl Advocacy

Mod Perl Icon Mod Perl Icon mod_perl Advocacy


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Thoughts about scalability and flexibility

Your need for scalability and flexibility depends on your needs from the web. If you want only a simple guest book or database gateway with no feature headroom, you can get away with any EASY_AND_FAST_TO_DEVELOP_TOOL (Exchange, MS IIS, Lotus Notes, etc).

Experience shows that you will soon want more functionality, that's the point you'll discover the limitations of these ``easy'' tools. Gradually, your boss will ask for increasing functionality and at some point you'll realize that the tool lacks flexibility and/or scalability. Then your boss will either buy another EASY_AND_FAST_TO_DEVELOP_TOOL and repeat the process (with different unforseen problems), or you'll start investing time learning how to use a powerful, flexible tool to make the long-term development cycle easier.

If you and your company are serious about delivering flexible Internet functionality, do your homework. Then urge your boss to invest a little extra time and resources to choose the right tool for the job. Your long-term Internet site will prove the results.

[TOC]


The boss, the developer and advocacy

Each developer has a boss who participates in the decision-making process. Remember that the boss considers input from sales people, developers, the media and associates before handing down large decisions. Of course, results count! A sales brochure makes very little impact compared to a working demonstration, and demonstrations of company-specific and developer-specific results count big!

Personally, when I discovered mod_perl I did a lot of testing and coding at home and at work. Once I had a working heavy application, I came to my boss with 2 URLs - one for the plain CGI server and the other for the mod_perl-enabled server. It took about 30 secs for my boss to say: `Go with it''. Of course the moment I did it, I have had to provide all the support for other developers, that is why I took time to learn it in first place (that is how this guide was born!).

Chances are that if you've done your homework, you've learned the tools and can deliver results, you'll have a successful project. If you convince your boss to try a tool that you don't know very well, your results may suffer. If your boss follows your development process closely and sees much worse than expected progress, he might say ``forget it'' and wish never to give mod_perl a second chance.

Advocacy is a great thing for the open-source software movement, but it's best done quietly until you have confidence that you can show productivity. If you can demonstrate to your boss a heavy CGI which is running much faster under mod_perl, that may be a strong argument for further evaluation. Your company may even sponsor a portion of your learning process.

Learn the technology by working on sample projects. Learn how to support yourself and learn how to get support from the community; then advocate your ideas to your boss. Then you'll have the knowledge; your company will have the benefit; and mod_perl will have the reputation it deserves.

[TOC]


A summary of perl/cgi discussion at slashdot.org

Well, there was a nice discussion of merits of Perl in CGI world. I took the time to summarize this thread, so here is what I've got:

Perl Domination in CGI Programming? http://slashdot.org/askslashdot/99/10/20/1246241.shtml

  • Perl is cool and fun to code with

  • Perl is very fast to develop with

  • Perl is even faster to develop with if you know what CPAN is :)

  • Math intensive code and other stuff which is faster in C/C++, plugged in into Perl with XS/SWIG and transparent to perl user of the modules.

  • Most CGI apps do text processing, where perl excels at

  • Forking and loading (unless code is shared) a C/C++ optimized CGI produces an overhead

  • Bandwidth is a bigger bottleneck than perl performance (vs C/C++) (not true for Intranets, and might change for Internet in a number of years)

  • For database driven apps, db itself is a bottleneck. lots of posts talk about latency vs throughput.

  • mod_perl, FastCGI, velocigen and perlexec are good solutions for plain mod_cgi slowness

  • other light alternatives to perl and its derivatives mentioned: PHP, Pyhton

  • well, there were almost no voices from the M$ and alike technologies users, I guess that's because they don't read /. :)

  • many said that in many people's minds: 'CGI' eq 'perl' > 0 (the entropy of perl grows bigger :)

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 11/13/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/browserbugs.html0100644000000000000000000001270507027225633014117 0ustar rootroot mod_perl guide: Workarounds for some known bugs in browsers.

Mod Perl Icon Mod Perl Icon Workarounds for some known bugs in browsers.


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Preventing QUERY_STRING from getting corrupted because of &entity key names.

In a URL such as http://my.site.com/foo.pl?foo=bar&reg=foobar , some browsers will interpret &reg as a magic entity, and encode it as &reg;, which will result in a corrupted QUERY_STRING. If you encounter this problem you should either avoid using such keys or separate parameter pairs with ; instead of &. Both CGI.pm and Apache::Request support a semicolon instead of an ampersand as a separator. So your URI should look like: http://my.site.com/foo.pl?foo=bar;reg=foobar.

Note that this is only an issue when you are building your own URLs with query strings. It is not a problem when the URL is the result of submitting a form because the browsers _have_ to get that right.

[TOC]


IE 4.x does not re-post data to a non-port-80 URL

One problem with publishing 8080 port numbers is that (so I was told) IE 4.x has a bug when re-posting data to a non-port-80 URL. It drops the port designator and uses port 80 anyway.

See Publishing port numbers different from 80

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 07/29/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/config.html0100644000000000000000000014655407027225633013032 0ustar rootroot mod_perl guide: mod_perl Configuration

Mod Perl Icon Mod Perl Icon mod_perl Configuration


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


mod_perl Specific Configuration

The next step after building and installing your new mod_perl enabled apache server, is to configure the server. To learn how to modify apache's configuration files, please refer to the documentation included with the apache distribution, or just view the files in conf directory and follow the instructions in these files - the embedded comments within the file do a good job of explaining the options.

Before you start with mod_perl specific configuration, first configure apache, and see that it works. When done, return here to continue...

[ Note that prior to version 1.3.4, the default apache install used three configuration files -- httpd.conf, srm.conf, and access.conf. The 1.3.4 version began distributing the configuration directives in a single file -- httpd.conf. The remainder of this chapter refers to the location of the configuration directives using their historical location. ]

[TOC]


Alias Configurations

First, you need to specify the locations on a file-system for the scripts to be found.

Add the following configuration directives:

    # for plain cgi-bin:
  ScriptAlias /cgi-bin/ /usr/local/myproject/cgi/
    
    # for Apache::Registry mode
  Alias /perl/ /usr/local/myproject/cgi/
    
    # Apache::PerlRun mode
  Alias /cgi-perl/ /usr/local/myproject/cgi/

Alias provides a mapping of URL to file system object under mod_perl. ScriptAlias is being used for mod_cgi.

Alias defines the start of the URL path to the script you are referencing. For example, using the above configuration, fetching http://www.nowhere.com/perl/test.pl, will cause the server to look for the file test.pl at /usr/local/myproject/cgi, and execute it as an Apache::Registry script if we define Apache::Registry to be the handler of /perl location (see below). The URL http://www.nowhere.com/perl/test.pl will be mapped to /usr/local/myproject/cgi/test.pl. This means that you can have all your CGIs located at the same place in the file-system, and call the script in any of three modes simply by changing the directory name component of the URL (cgi-bin|perl|cgi-perl) - is not this neat? (That is the configuration you see above - all three Aliases point to the same directory within your file system, but of course they can be different). If your script does not seem to be working while running under mod_perl, you can easily call the script in straight mod_cgi mode without making any script changes (in most cases), but rather by changing the URL you invoke it by.

FYI: for modperl ScriptAlias is the same thing as:

  Alias /foo/ /path/to/foo/
  SetHandler cgi-handler

where SetHandler cgi-handler invokes mod_cgi. The latter will be overwritten if you enable Apache::Registry. In other words, ScriptAlias does not work for mod_perl, it only appears to work when the additional configuration is in there. If the Apache::Registry configuration came before the ScriptAlias, scripts would be run under mod_cgi. While handy, ScriptAlias is a known kludge, always better to use Alias and SetHandler.

Of course you can choose any other Alias (you will use it later in httpd.conf), you can choose to use all three modes or only one of these. It is undesirable to run scripts in plain mod_cgi from a mod_perl-enabled server - the price is too high, it is better to run these on plain apache server. (See Standalone mod_perl Enabled Apache Server)

[TOC]


Location Configuration

Now we will work with the httpd.conf file. I add all the mod_perl stuff at the end of the file, after the native apache configurations.

First we add:

  <Location /perl>
    #AllowOverride None
    SetHandler perl-script
    PerlHandler Apache::Registry
    Options ExecCGI
    allow from all
    PerlSendHeader On
  </Location>

This configuration causes all scripts that are called with a /perl path prefix to be executed under the Apache::Registry module and as a CGI (so the ExecCGI, if you omit this option the script will be printed to the caller's browser as a plain text or possibly will trigger a 'Save-As' window).

PerlSendHeader On tells the server to send an HTTP header to the browser on every script invocation. You will want to turn this off for nph (non-parsed-headers) scripts. PerlSendHeader On means to call ap_send_http_header() after parsing your script headers. It is only meant for CGI emulation, its always better to use CGI->header from CGI.pm module or $r->send_http_header directly.

Remember the Alias from the section above? We must use the same Alias here, if you use Location that does not have the same Alias defined in srm.conf, the server will fail to locate the script in the file system. (We are talking about script execution here -- there are cases where Location is something that is being executed by the server itself, without having the corresponding file, like /perl-status location.)

Note that sometimes you will have to add :

  PerlModule Apache::Registry

before you specify the location that uses Apache::Registry as a PerlHandler. Basically you can start running the scripts in the Apache::Registry mode...

You have nothing to do about /cgi-bin location (mod_cgi), since it has nothing to do with mod_perl.

Here is a similar location configuration for Apache::PerlRun (More about Apache::PerlRun):

  <Location /cgi-perl>
    #AllowOverride None
    SetHandler perl-script
    PerlHandler Apache::PerlRun
    Options ExecCGI
    allow from all
    PerlSendHeader On
  </Location>

[TOC]


PerlModule and PerlRequire directives

You may load modules from the config file at server startup via:

    PerlModule Apache::DBI CGI DBD::Mysql

There is a limit of 10 PerlModule's, if you need more to be loaded when the server starts, use one PerlModule to pull in many or write them all in a regular perl syntax and put them into a startup file which can be loaded with use of the PerlRequire directive.

    PerlRequire  /home/httpd/perl/lib/startup.pl

Both PerlModule and PerlRequire are implemented by require(), but there is a subtle change. PerlModule works like use(), expecting a module name without .pm extension and slashes. Apache::DBI is OK, while Apache/DBI.pm is not. PerlRequire is the opposite to PerlModule -- it expects a relative or full path to the module or a filename, like in the example above.

As with any file that's being required() -- it must return a true value, to ensure that this happens don't forget to add 1; at the end of such files.

We must stress that all the code that is run at the server initialization time is run with root priveleges if you are executing it as a root user (you have to unless you choose an unpriveledged port, above 1024. somethings that you might have to if you don't have a root access. Just remember that you better pick a well known port like 8000 or 8080 since other non-standard ports might be blocked by firewalls that protect many organizations and individuals). This means that anyone who has write access to a script or module that is loaded by PerlModule or PerlRequire, effectively has root access to the system. You might want to take a look at the new and experimental PerlOpmask directive and PERL_OPMASK_DEFAULT compile time option to try to disable some dangerous operators.

[TOC]


Perl*Handlers

As you know Apache specifies about 11 phases of the request loop, namely in that order: Post-Read-Request, URI Translation, Header Parsing, Access Control, Authentication, Authorization, MIME type checking, FixUp, Response (Content phase). Logging and finally Cleanup. These are the stages of a request where the Apache API allows a module to step in and do something. There is a dedicated PerlHandler for each of these stages. Namely:

    PerlChildInitHandler
    PerlPostReadRequestHandler
    PerlInitHandler
    PerlTransHandler
    PerlHeaderParserHandler
    PerlAccessHandler
    PerlAuthenHandler
    PerlAuthzHandler
    PerlTypeHandler
    PerlFixupHandler
    PerlHandler
    PerlLogHandler
    PerlCleanupHandler
    PerlChildExitHandler

The first 4 handlers cannot be used in the <Location>, <Directory>, <Files> and .htaccess file, the main reason is all the above require a known path to the file in order to bind a requested path with one or more of the identifiers above. Starting from PerlHeaderParserHandler (5th) URI is allready being mapped to a physical pathname, thus can be used to match the <Location>, <Directory> or <Files> configuration section, or to look at .htaccess file if exists at the specified directory in the translated path.

The Apache documentation (or even better -- the ``Writing Apache Modules with Perl and C'' book by Doug MacEachern and Lincoln Stein) will tell you all about those stages and what your modules can do. By default, these hooks are disabled at compile time, see the INSTALL document for information on enabling these hooks.

Note that by default Perl API expects a subrotine called handler to handle the request in the registered PerlHandler module. Thus if your module implements this subrotine, you can register the handler as simple as writing:

  Perl*Handler Apache::SomeModule

replace Perl*Handler with a wanted name of the handler. mod_perl will preload the specified module for you. But if you decide to give the handler code a different name, like my_handler, you must preload the module and to write explicitly the chosen name.

  PerlModule Apache::SomeModule
  Perl*Handler Apache::SomeModule::my_handler

Please note that the former approach will not preload the module at the startup, so either explicitly preload it with PerlModule directive, add it to the startup file or use a nice shortcut the Perl*Handler syntax suggests:

  Perl*Handler +Apache::SomeModule

Notice the leading + character. It's equal to:

  PerlModule Apache::SomeModule
  Perl*Handler Apache::SomeModule

If a module wishes to know what handler is currently being run, it can find out with the current_callback method. This method is most useful to PerlDispatchHandlers who wish to only take action for certain phases.

 if($r->current_callback eq "PerlLogHandler") {
     $r->warn("Logging request");
 }

[TOC]


Stacked Handlers

With the mod_perl stacked handlers mechanism, it is possible for more than one Perl*Handler to be defined and run during each stage of a request.

Perl*Handler directives can define any number of subroutines, e.g. (in config files)

 PerlTransHandler OneTrans TwoTrans RedTrans BlueTrans

With the method, Apache->push_handlers(), callbacks can be added to the stack by scripts at runtime by mod_perl scripts.

Apache->push_handlers() takes the callback hook name as its first argument and a subroutine name or reference as its second. e.g.:

 Apache->push_handlers("PerlLogHandler", \&first_one);
 
 $r->push_handlers("PerlLogHandler", sub {
     print STDERR "__ANON__ called\n";
     return 0;
 });

After each request, this stack is cleared out.

All handlers will be called unless a handler returns a status other than OK or DECLINED.

example uses:

CGI.pm maintains a global object for its plain function interface. Since the object is global, it does not go out of scope, DESTROY is never called. CGI->new can call:

 Apache->push_handlers("PerlCleanupHandler", \&CGI::_reset_globals);

This function will be called during the final stage of a request, refreshing CGI.pm's globals before the next request comes in.

Apache::DCELogin establishes a DCE login context which must exist for the lifetime of a request, so the DCE::Login object is stored in a global variable. Without stacked handlers, users must set

 PerlCleanupHandler Apache::DCELogin::purge

in the configuration files to destroy the context. This is not ``user-friendly''. Now, Apache::DCELogin::handler can call:

 Apache->push_handlers("PerlCleanupHandler", \&purge);

Persistent database connection modules such as Apache::DBI could push a PerlCleanupHandler handler that iterates over %Connected, refreshing connections or just checking that ones have not gone stale. Remember, by the time we get to PerlCleanupHandler, the client has what it wants and has gone away, we can spend as much time as we want here without slowing down response time to the client (but the process is unavailable for serving new request befor the operation is completed).

PerlTransHandlers may decide, based on URI or other condition, whether or not to handle a request, e.g. Apache::MsqlProxy. Without stacked handlers, users must configure:

 PerlTransHandler Apache::MsqlProxy::translate
 PerlHandler      Apache::MsqlProxy

PerlHandler is never actually invoked unless translate() sees the request is a proxy request ($r->proxyreq), if it is a proxy request, translate() sets $r->handler("perl-script"), only then will PerlHandler handle the request. Now, users do not have to specify PerlHandler Apache::MsqlProxy, the translate() function can set it with push_handlers().

Includes, footers, headers, etc., piecing together a document, imagine (no need for SSI parsing!):

 PerlHandler My::Header Some::Body A::Footer

A little test:

 #My.pm
 package My;

 sub header {
     my $r = shift;
     $r->content_type("text/plain");
     $r->send_http_header;
     $r->print("header text\n");
 }
 sub body   { shift->print("body text\n")   }
 sub footer { shift->print("footer text\n") }
 1;
 __END__

 #in config
 <Location /foo>
 SetHandler "perl-script"
 PerlHandler My::header My::body My::footer   
 </Location>

Parsing the output of another PerlHandler? this is a little more tricky, but consider:

 <Location /foo>
   SetHandler "perl-script"
   PerlHandler OutputParser SomeApp
 </Location>
 
 <Location /bar>
   SetHandler "perl-script"
   PerlHandler OutputParser AnotherApp
 </Location>

Now, OutputParser goes first, but it untie()'s *STDOUT and re-tie()'s to its own package like so:

 package OutputParser;

 sub handler {
     my $r = shift;
     untie *STDOUT;
     tie *STDOUT => 'OutputParser', $r;
 }
  
 sub TIEHANDLE {
     my($class, $r) = @_;
     bless { r => $r}, $class;
 }
 
 sub PRINT {
     my $self = shift;   
     for (@_) {
         #do whatever you want to $_
         $self->{r}->print($_ . "[insert stuff]");
     }
 }

 1;
 __END__

To build in this feature, configure with:

 % perl Makefile.PL PERL_STACKED_HANDLERS=1 [PERL_FOO_HOOK=1,etc]

Another method Apache->can_stack_handlers will return TRUE if mod_perl was configured with PERL_STACKED_HANDLERS=1, FALSE otherwise.

[TOC]


Perl Method Handlers

If a Perl*Handler is prototyped with $$, this handler will be invoked as method. e.g.

 package My;
 @ISA = qw(BaseClass);
  
 sub handler ($$) {
     my($class, $r) = @_;
     ...;
 }
  
 package BaseClass;
  
 sub method ($$) {
     my($class, $r) = @_;
     ...;
 }
 __END__

Configuration:

 PerlHandler My

or

 PerlHandler My->handler

Since the handler is invoked as a method, it may inherit from other classes:

 PerlHandler My->method

In this case, the My class inherits this method from BaseClass.

To build in this feature, configure with:

 % perl Makefile.PL PERL_METHOD_HANDLERS=1 [PERL_FOO_HOOK=1,etc]

[TOC]


PerlFreshRestart

To reload PerlRequire, PerlModule, other use()'d modules and flush the Apache::Registry cache on server restart, add:

  PerlFreshRestart On
Make sure you read L<Evil things might happen when using
PerlFreshRestart|warnings/Evil_things_might_happen_when_us>.

[TOC]


/perl-status location

A very useful feature. You can watch what happens to the perl guts of the server. Below you will find the instructions of configuration and usage of this feature

[TOC]


Configuration

Add this to httpd.conf:

  <Location /perl-status>
    SetHandler perl-script
    PerlHandler Apache::Status
    order deny,allow
    #deny from all
    #allow from 
  </Location>

If you are going to use Apache::Status, it's important to put it as a first module in the start-up file, or in the httpd.conf (after Apache::Registry):

  # startup.pl
  use Apache::Registry ();
  use Apache::Status ();
  use Apache::DBI ();

If you don't put Apache::Status before Apache::DBI then you don't get Apache::DBI's menu entry in status.

[TOC]


Usage

Assuming that your mod_perl server listens to port 81, fetch http://www.nowhere.com:81/perl-status

  Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16 
  process 187138, running since Thu Nov 19 09:50:33 1998

This is the linked menu that you should see:

  Signal Handlers
  Enabled mod_perl Hooks
  PerlRequire'd Files
  Environment
  Perl Section Configuration
  Loaded Modules
  Perl Configuration
  ISA Tree
  Inheritance Tree
  Compiled Registry Scripts
  Symbol Table Dump

Let's follow for example : PerlRequire'd Files -- we see:

  PerlRequire                          Location
  /usr/myproject/lib/apache-startup.pl /usr/myproject/lib/apache-startup.pl

From some menus you can continue deeper to peek at the perl internals of the server, to watch the values of the global variables in the packages, to the list of cached scripts and modules and much more. Just click around...

[TOC]


Compiled Registry Scripts section seems to be empty.

Sometimes when you fetch /perl-status you and follow the Compiled Registry Scripts link from the status menu -- you see no listing of scripts at all. This is absolutely correct -- Apache::Status shows the registry scripts compiled in the httpd child which is serving your request for /perl-status. If a child has not compiled yet the script you are asking for, /perl-status will just show you the main menu. This usually happens when the child was just spawned.

[TOC]


PerlSetVar, PerlSetEnv and PerlPassEnv

  PerlSetEnv key val
  PerlPassEnv key

PerlPassEnv passes, PerlSetEnv sets and passes the ENVironment variables to your scripts. you can access them in your scripts through %ENV (e.g. $ENV{"key"}).

Regarding the setting of PerlPassEnv PERL5LIB in httpd.conf If you turn on taint checks (PerlTaintMode On), $ENV{PERL5LIB} will be ignored (unset).

PerlSetVar is very similar to PerlSetEnv, but you extract it with another method. In <Perl> sections:

  push @{ $Location{"/"}->{PerlSetVar} }, [ 'FOO' => BAR ];

and in the code you read it with:

  my $r = Apache->request;
  print $r->dir_config('FOO');

[TOC]


perl-startup file

Since many times you have to add many perl directives to the configuration file, it can be a good idea to put all of these into a one file, so the configuration file will be cleaner. Add the following line to httpd.conf:

    # startup.perl loads all functions that we want to use within
    # mod_perl
  Perlrequire /path/to/startup.pl

before the rest of the mod_perl configuration directives.

Also you can call perl -c perl-startup to test the file's syntax. What does this take?

[TOC]


Sample perl-startup file

An example of perl-startup file:

  use strict;
  
  # extend @INC if needed
  use lib qw(/dir/foo /dir/bar);
  
  # make sure we are in a sane environment.
  $ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/
     or die "GATEWAY_INTERFACE not Perl!";
   
  # for things in the "/perl" URL
  use Apache::Registry;          
   
  #load perl modules of your choice here
  #this code is interpreted *once* when the server starts
  use LWP::UserAgent ();
  use DBI ();
  
  # tell me more about warnings
  use Carp ();
  $SIG{__WARN__} = \&Carp::cluck;
  
  # Load CGI.pm and call its compile() method to precompile 
  # (but not to import) its autoloaded methods. 
  use CGI ();
  CGI->compile(':all');

Note that starting with $CGI::VERSION 2.46, the recommended method to precompile the code in CGI.pm is:

  use CGI qw(-compile :all);

But the old method is still available for backward compatibility.

See also Apache::Status

[TOC]


What modules should you add to the startup file and why.

Modules that are being loaded at the server startup will be shared among server children, so only one copy of each module will be loaded, thus saving a lot of RAM for you. Usually I put most of the code I develop into modules and preload them from here. You can even preload your CGI script with Apache::RegistryLoader and preopen the DB connections with Apache::DBI. (See Preload Perl modules at server startup).

[TOC]


The confusion with use() clause at the server startup?

Many people wonder, why there is a need for duplication of use() clause both in startup file and in the script itself. The question rises from misunderstanding of the use() operand. use() consists of two other operands, namely require() and import(). So when you write:

  use Foo qw(bar);

perl actually does:

  require Foo.pm;
  import qw(bar);

When you write:

  use Foo qw();

perl actually does:

  require Foo.pm;
  import qw();

which means that the caller does not want any symbols to be imported. Why is this important? Since some modules has @EXPORT set to a list of tags to be exported by default and when you write:

  use Foo;

and think nothing is being imported, the import() call is being executed and probably some symbols do being imported. See the docs/source of the module in question to make sure you use() it correctly. When you write your own modules, always remember that it's better to use @EXPORT_OK instead of @EXPORT, since the former doesn't export tags unless it was asked to.

Since the symbols that you might import into a startup's script namespace will be visible by none of the children, scripts that need a Foo's module exported tags have to pull it in like if you did not preload Foo at the startup file. For example, just because you have use()d Apache::Constants in the startup script, does not mean you can have the following handler:

  package MyModule;
  
  sub {
    my $r = shift;
  
    ## Cool stuff goes here
  
    return OK;
  }

  1;

You would either need to add:

  use Apache::Constants qw( OK );

Or instead of return OK; say:

  return Apache::Constants::OK;

See the manpage/perldoc on Exporter and perlmod for more on import().

[TOC]


The confusion with defining globals in startup

PerlRequire allows you to execute code that preloads modules and does more things. Imported or defined variables are visible in the scope of the startup file. It is a wrong assumption that global variables that were defined in the startup file, will be accessible by child processes.

You do have to define/import variables in your scripts and they will be visible inside a child process who run this script. They will be not shared between siblings. Remember that every script is running in a specially (uniquely) named package - so it cannot access variables from other packages unless it inherits from them or use()'s them.

[TOC]


Running 'apachectl configtest' or 'httpd -t'

apachectl configtest tests the configuration file without starting the server. You can safely modify the configuration file on your production server, if you run this test before you restart the server. Of course it is not 100% error prone, but it will reveal any syntax errors you might do while editing the file.

'apachectl configtest' is the same as 'httpd -t' and it actually executes the code in startup.pl, not just parses it. <Perl> configuration has always started Perl during the configuration read, Perl{Require,Module} do so as well.

If you want your startup code to get a control over the -t (configtest) server launch, start the server configuration test with:

  httpd -t -Dsyntax_check

and in your startup file, add (at the top):

  return if Apache->define('syntax_check');

if you want to prevent the code in the file from being executed.

[TOC]


Perl behavior controls

For PerlWarn and PerlTaintCheck see Switches -w, -T

[TOC]


Tuning MinSpareServers MaxSpareServers StartServers MaxClients MaxRequestsPerChild

See Tuning the Apache's configuration variables for the best performance

[TOC]


Publishing port numbers different from 80

It is advised not to publish the 8080 (or alike) port number in URLs, but rather using a proxying rewrite rule in the thin (httpd_docs) server:

  RewriteRule .*/perl/(.*) http://my.url:8080/perl/$1 [P]

One problem with publishing 8080 port numbers is that I was told that IE 4.x has a bug when re-posting data to a non-port-80 url. It drops the port designator, and uses port 80 anyway.

[TOC]


Perl Sections

With <Perl></Perl> sections, it is possible to configure your server entirely in Perl.

<Perl> sections can contain *any* and as much Perl code as you wish. These sections are compiled into a special package whose symbol table mod_perl can then walk and grind the names and values of Perl variables/structures through the apache core configuration gears. Most of the configurations directives can be represented as scalars ($scalar) or lists (@list). An @List inside these sections is simply converted into a space delimited string for you inside. Here is an example:

  #httpd.conf
  <Perl>
  @PerlModule = qw(Mail::Send Devel::Peek);
 
  #run the server as whoever starts it
  $User  = getpwuid($>) || $>;
  $Group = getgrgid($)) || $); 
 
  $ServerAdmin = $User;
 
  </Perl>

Block sections such as <Location..</Location>> are represented in a %Location hash, e.g.:

  $Location{"/~dougm/"} = {
    AuthUserFile => '/tmp/htpasswd',
    AuthType => 'Basic',
    AuthName => 'test',
    DirectoryIndex => [qw(index.html index.htm)],  
    Limit => {
    METHODS => 'GET POST',
    require => 'user dougm',
    },
  };

If a Directive can take two *or* three arguments you may push strings and the lowest number of arguments will be shifted off the @List or use array reference to handle any number greater than the minimum for that directive:

  push @Redirect, "/foo", "http://www.foo.com/";;
  
  push @Redirect, "/imdb", "http://www.imdb.com/";;
  
  push @Redirect, [qw(temp "/here" "http://www.there.com";)];

Other section counterparts include %VirtualHost, %Directory and %Files.

To pass all environment variables to the children with a single configuration directive, rather than listing each one via PassEnv or PerlPassEnv, a <Perl> section could read in a file and:

  push @PerlPassEnv, [$key => $val];

or

  Apache->httpd_conf("PerlPassEnv $key $val");

These are somewhat simple examples, but they should give you the basic idea. You can mix in any Perl code your heart desires. See eg/httpd.conf.pl and eg/perl_sections.txt in mod_perl distribution for some examples.

A tip for syntax checking outside of httpd:

  <Perl>
  # !perl
  
  #... code here ...
  
  __END__
  </Perl>

Now you may run:

  perl -cx httpd.conf

To enable <Perl> sections you should build mod_perl with perl Makefile.PL PERL_SECTIONS=1.

You can watch how have you configured the <Perl> sections through the /perl-status location, by choosing the Perl Sections from the menu.

You can dump the configuration by <Perl> sections configuration this way:

  <Perl>
  use Apache::PerlSections();
  ...
  print STDERR Apache::PerlSections->dump();
  </Perl>

Alternatively you can store it in a file:

  Apache::PerlSections->store("httpd_config.pl");

You can then require() that file in some other <Perl> section.

[TOC]


Configuring Apache + mod_perl with mod_macro

mod_macro is an Apache module written by Fabien Coelho that lets you define and use macros in the Apache configuration file.

mod_macro proved really useful when you have many virtual hosts, each virtual host has a number of scripts/modules, most of them with a moderately complex configuration setup.

First download the latest version of mod_macro from http://www.cri.ensmp.fr/~coelho/mod_macro/ , and configure your Apache server to use this module.

Here are some useful macros for mod_perl users:

        # set up a registry script
        <Macro registry>
        SetHandler "perl-script"
        PerlHandler Apache::Registry
        Options +ExecCGI
        </Macro>

        # example
        Alias /stuff /usr/www/scripts/stuff
        <Location /stuff>
        Use registry
        </Location>

If your registry scripts are all located in the same directory, and your aliasing rules consistent, you can use this macro:

        # set up a registry script for a specific location
        <Macro registry $location $script>
        Alias /script /usr/www/scripts/$script
        <Location $location>
        SetHandler "perl-script"
        PerlHandler Apache::Registry
        Options +ExecCGI
        </Location>
        </Macro>

        # example
        Use registry stuff stuff.pl

If you're using content handlers packaged as modules, you can use the following macro:

        # set up a mod_perl content handler module
        <Macro modperl $module>
        SetHandler "perl-script"
        Options +ExecCGI
        PerlHandler $module
        </Macro>

        #examples
        <Location /perl-status>
        PerlSetVar StatusPeek On
        PerlSetVar StatusGraph On
        PerlSetVar StatusDumper On
        Use modperl Apache::Status
        </Location>

The following macro sets up a Location for use with HTML::Embperl. Here we define all ``.html'' files to be processed by Embperl.

        <Macro embperl>
        SetHandler "perl-script"
        Options +ExecCGI
        PerlHandler HTML::Embperl
        PerlSetEnv EMBPERL_FILESMATCH \.html$
        </Macro>

        # examples
        <Location /mrtg>
        Use embperl
        </Location>

Macros are also very useful for things that tend to be verbose, such as setting up Basic Authentication:

        # Sets up Basic Authentication
        <Macro BasicAuth $realm $group>
        Order deny,allow
        Satisfy any
        AuthType Basic
        AuthName $realm
        AuthGroupFile /usr/www/auth/groups
        AuthUserFile /usr/www/auth/users
        Require group $group
        Deny from all
        </Macro>

        # example of use
        <Location /stats>
        Use BasicAuth WebStats Admin
        </Location>

Finally, here is a complete example that uses macros to set up simple virtual hosts. It uses the BasicAuth macro defined previously (yes, macros can be nested!).

        <Macro vhost $ip $domain $docroot $admingroup>
        <VirtualHost $ip>
        ServerAdmin webmaster@$domain
        DocumentRoot /usr/www/htdocs/$docroot
        ServerName www.$domain
        <Location /stats>
        Use BasicAuth Stats-$domain $admingroup
        </Location>
        </VirtualHost>
        </Macro>

        # define some virtual hosts
        Use vhost 10.1.1.1 example.com example example-admin
        Use vhost 10.1.1.2 example.net examplenet examplenet-admin

mod_macro also useful in a non vhost setting. Some sites for example have lots of scripts where people use to view various statistics, email settings and etc. It is much easier to read things like:

  use /forwards email/showforwards
  use /webstats web/showstats

[TOC]


General pitfalls

[TOC]


My cgi/perl code is being returned as a plain text instead of being executed by the webserver?

Check your configuration files and make sure that the ``ExecCGI'' is turned on in your configurations.

  <Location /perl>
    SetHandler perl-script
    PerlHandler Apache::Registry
    Options ExecCGI
    allow from all
    PerlSendHeader On
  </Location>

[TOC]


My script works under cgi-bin, but when called via mod_perl I see A 'Save-As' prompt

Did you put PerlSendHeader On in the configuration part of the <Location foo></Location>?

[TOC]


Is there a way to provide a different startup.pl file for each individual virtual host

No. Any virtual host will be able to see the routines from a startup.pl loaded for any other virtual host.

[TOC]


Is there a way to modify @INC on a per-virtual-host or per-location basis.

You can use 'PerlSetEnv PERL5LIB ...' or a PerlFixupHandler w/ the lib pragma.

Even a better way is to use Apache::PerlVINC

[TOC]


A Script from one virtual host calls a script with the same path from the other virtual host

This has been a bug before, last fixed in 1.15_01, i.e. if you are running 1.15, that could be the problem. You should set this variable in a startup file (PerlRequire):

  $Apache::Registry::NameWithVirtualHost = 1;

But, as we know sometimes bug turns into a feature. If there is the same script running for more than one Virtual host on the same machine, this can be a waste, right? Set it to 0 in a startup script if you want to turn it off and have this bug as a feature. (Only makes sense if you are sure that there will be no otherscripts named by the same path/name). It also saves you some memory on the way.

  $Apache::Registry::NameWithVirtualHost = 0;

[TOC]


the server no longer retrieves the DirectoryIndex files for a directory

The problem was reported by users who declared mod_perl configuration inside a <Directory> section for all files matching to *.pl. The problem has gone away after placing the usage of mod_perl in a <File>- section.

[TOC]


Configuration Security Concerns

It is better not to advertise the port mod_perl server running at to the outside world for it creates a potential security risk by revealing which module(s) and/or OS you are running your web server on.

The more modules you have in your web server, the more complex the code in your webserver.

The more complex the code in your web server, the more chances for bugs.

The more chance for bugs, the more chance that some of those bugs may involve security.

Never was completely sure why the default of the ServerToken directive in Apache is Full rather than Minimal. Seems like you would only make it full if you are debugging.

For more information see Publishing port numbers different from 80

Another approach is to modify httpd sources to reveal no unwanted information, so if you know the port the HEAD request will return an empty or phony Server: field.

[TOC]


Logical grouping of Location, Directory and FilesMatch directives

Let's say that you want all the file in a specific directory and below to be handled the same way, but a few of them to be handled somewhat different. For example:

  <Directory /home/foo>
    <FilesMatch "\.(html|txt)$">
      SetHandler perl-script
      PerlHandler Apache::AddrMunge
    </FilesMatch>
  </Directory>

Alternatively you can use <Files> inside an .htaccess file.

Note that you cannot have Files derective inside Location, but you can have Files inside Directory.

[TOC]


Apache restarts twice on start

When the server is restarted. the configuration and module initialization phases are called again (twice in total). To ensure that the future restart will workout correctly, Apache actually runs these two phases twice during server startup, to check that all modules can survive a restart.

(META: And add an example that writes to the log file - I was restarted 1, 2 times)

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 12/18/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/control.html0100644000000000000000000013624007027225633013234 0ustar rootroot mod_perl guide: Controlling and Monitoring the Server

Mod Perl Icon Mod Perl Icon Controlling and Monitoring the Server


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Restarting techniques

All of these techniques require that you know the server PID (Process ID). The easiest way to find the PID is to look it up in the httpd.pid file. With my configuration it exists as /usr/local/var/httpd_perl/run/httpd.pid. It's easy to discover where to look at, by checking out the httpd.conf file. Open the file and locate the entry PidFile:

  PidFile /usr/local/var/httpd_perl/run/httpd.pid

Another way is to use the ps and grep utilities:

  % ps auxc | grep httpd_perl

or maybe:

  % ps -ef | grep httpd_perl

This will produce a list of all httpd_perl (the parent and the children) processes. You are looking for the parent process. If you run your server as root - you will easily locate it, since it belongs to root. If you run the server as user (when you don't have a root access, most likely all the processes will belong to that user (unless defined differently in the httpd.conf), but it's still easy to know 'who is the parent' -- the one of the smallest size...

You will notice many httpd_perl executables running on your system, but you should not send signals to any of them except the parent, whose pid is in the PidFile. That is to say you shouldn't ever need to send signals to any process except the parent. There are three signals that you can send the parent: TERM, HUP, and USR1.

[TOC]


Implications of sending TERM, HUP, and USR1 to the server

We will concentrate here on the implications of sending these signals to a mod_perl enabled server. For documentation on the implications of sending these signals to a plain Apache server see http://www.apache.org/docs/stopping.html .

TERM Signal: stop now

Sending the TERM signal to the parent causes it to immediately attempt to kill off all of its children. This process may take several seconds to complete, following which the parent itself exits. Any requests in progress are terminated, and no further requests are served.

That's the moment that the accumulated END blocks will be executed! Note that if you use Apache::Registry or Apache::PerlRun, then END blocks are being executed upon each request (at the end).

HUP Signal: restart now

Sending the HUP signal to the parent causes it to kill off its children like in TERM (Any requests in progress are terminated) but the parent doesn't exit. It re-reads its configuration files, and re-opens any log files. Then it spawns a new set of children and continues serving hits.

The server will reread its configuration files, flush all the compiled and preloaded modules, and rerun any startup files. It's equivalent to stopping, then restarting a server.

Note: If your configuration file has errors in it when you issue a restart then your parent will not restart but exit with an error. See below for a method of avoiding this.

USR1 Signal: graceful restart

The USR1 signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.

The only difference between USR1 and HUP is that USR1 allows children to complete any in-progress request prior to killing them off.

By default, if a server is restarted (ala kill -USR1 `cat logs/httpd.pid` or with HUP signal), Perl scripts and modules are not reloaded. To reload PerlRequire's, PerlModule's, other use()'d modules and flush the Apache::Registry cache, enable with this command:

 PerlFreshRestart On              (in httpd.conf) 

Make sure you read Evil things might happen when using PerlFreshRestart.

It's worth mentioning that restart or termination can sometimes take quite a lot of time. Check out the PERL_DESTRUCT_LEVEL=-1 option during the mod_perl perl Makefile.PL stage, which speeds this up and leads to more robust operation in the face of problems, like running out of memory. It is only usable if no significant cleanup has to be done by perl END blocks and DESTROY methods when the child terminates, of course. What constitutes significant cleanup? Any change of state outside of the current process that would not be handled by the operating system itself. So committing database transactions is significant but closing an ordinary file isn't.

Some folks prefer to specify signals using numerical values, rather than symbolics. If you are looking for these, check out your kill(3) man page. My page points to /usr/include/sys/signal.h, the relevant entries are:

  #define SIGHUP     1    /* hangup, generated when terminal disconnects */ 
  #define SIGTERM   15    /* software termination signal */
  #define SIGUSR1   30    /* user defined signal 1 */

[TOC]


Using apachectl to control the server

Apache's distribution provides a nice script to control the server. It's called apachectl and it's installed into the same location with httpd. In our scenario - it's /usr/local/sbin/httpd_perl/apachectl.

Start httpd:

  % /usr/local/sbin/httpd_perl/apachectl start 

Stop httpd:

  % /usr/local/sbin/httpd_perl/apachectl stop

Restart httpd if running by sending a SIGHUP or start if not running:

  % /usr/local/sbin/httpd_perl/apachectl restart

Do a graceful restart by sending a SIGUSR1 or start if not running:

  % /usr/local/sbin/httpd_perl/apachectl graceful    

Do a configuration syntax test:

  % /usr/local/sbin/httpd_perl/apachectl configtest 

Replace httpd_perl with httpd_docs in the above calls to control the httpd_docs server.

There are other options for apachectl, use help option to see them all.

It's important to understand that this script is based on the PID file which is PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid. If you delete the file by hand - apachectl will fail to run.

Also, notice that apachectl is suitable to use from within your Unix system's startup files so that your web server is automatically restarted upon system reboot. Either copy the apachectl file to the appropriate location (/etc/rc.d/rc3.d/S99apache works on my RedHat Linux system) or create a symlink with that name pointing to the the canonical location. (If you do this, make certain that the script is writable only by root -- the startup scripts have root privileges during init processing, and you don't want to be opening any security holes.)

[TOC]


Safe Code Updates on a Live Production Server

You have prepared a new version of code, uploaded it into a production server, restarted it and it doesn't work. What could be worse than that? You also cannot go back, because you have overwritten the good working code.

It's quite easy to prevent it! Just don't overwrite the previous good files!!!

Personally I do all updates on the live server with a following sequence. Assume that the root directory lies in /home/httpd/perl/rel. When I'm about to update the files I create a new directory /home/httpd/perl/beta, copy the old files from /home/httpd/perl/rel and update it with new files I'm about to replace. The I do last sanity checks (file permissions (read+executable), run perl -c on the new modules to make sure there no errors in them). When I think I'm ready I do:

  % cd /home/httpd/perl
  % mv rel old && mv beta rel && stop && sleep 3 && restart && err

Let's explain what I'm doing. First I use alises to make things faster:

  % alias | grep apachectl
  graceful        /usr/local/apache/bin/apachectl graceful
  rehup   /usr/local/apache/sbin/apachectl restart
  restart /usr/local/apache/bin/apachectl restart
  start   /usr/local/apache/bin/apachectl start
  stop    /usr/local/apache/bin/apachectl stop
  
  % alias err
  tail -f /usr/local/apache/logs/error_log

So I write all the commands in one line, separated with semicolon and only then press Enter key. That ensures that if I suddenly get a connection lost (sadly but that happens sometimes) I wouldn't leave the server down if only the stop command squeezed in.

I backup the old working directory in old, and move the new one instead. I stop the server, give it a few seconds to shutdown (it might take even longer) and then do restart followed by immediate view of the tail of the error_log file in order to see that everything is OK. apachectl generates the status messages too early (e.g. on stop it says server has been stopped, while it's not yet, so don't rely on it, rely on error_log file instead). Also you have noticed that I use restart and not just start. I do this for the same reason of Apache's long stopping times (it depends on what you do with it of course!), so if you use start and Apache didn't release the port it listens to, the start would fail and error_log would tell that port is in use, e.g.:

  Address already in use: make_sock: could not bind to port 8080

But if you use restart, it will patiently wait for the server to quit and then will cleanly start it.

Now what happens if the new modules are broken? First of all, I see immediately the indication of the problems reported at error_log file, which I tail -f immediately after a restart command. That's easy, we just put everything as it was before:

  % mv rel bad && mv old rel && stop && sleep 3 && restart && err

And 99.9% that everything would be alright, and you have had only about 10 secs of downtime, which is pretty good!

[TOC]


An Intentional Disabling of Live Scripts

What happens if you really must took down the server or disable the scripts? This situation might happen when you need to do some maintanance works on your database server, which you have to put down and which cause all the scripts using this database server non-working. If you do nothing, user will see either grey The Error has happened or a better customized error message if you have added a code to trap and customize the errors (See Redirecting Errors to the Client instead of error_log for the latter case)

A much more user friendly approach is to confess to your users that you are doing some maintainance works and plead for a paitience, promising that the services will become fully functional in X minutes (it worth to keep the promize!). There are a few ways to do that:

First doesn't require messing with server and works when you have to disable a script and not a module! Just prepare a little script like:

  /home/http/perl/construction.pl
  ----------------------------
  #!/usr/bin/perl -wT
  
  use strict;
  use CGI;
  my $q = new CGI;
  print $q->header,
  "Sorry, the service is down for maintainance. 
   It will be back in a about 5-15 minutes.
   Please, bear with us.
   Thank you!";

And if now you have to disable a script at /home/http/perl/chat.pl, just do:

  % mv /home/http/perl/chat.pl /home/http/perl/chat.pl.orig
  % ln -s /home/http/perl/construction.pl /home/http/perl/chat.pl

Of course you server configuration should allow symbolic links for this trick to work. Just make sure you have

  Options FollowSymLinks

directive in your <Location>/<Directory> section configuration.

When done, it's easy to restore the previous setup. Just do:

  % mv /home/http/perl/chat.pl.orig /home/http/perl/chat.pl

and overwrite the symbolic link. Apache will automatically detect the change and will use the moved script instead.

Second approach, is changing the server configuration and configure a whole directories to be handled by Contruction handler that you would write, e.g. if you write something like:

  Construction.pm
  ---------------
  use strict;
  use CGI;
  use Apache::Constants;
  sub handler{
    my $q = new CGI;
    print $q->header,
    "Sorry, the service is down for maintainance. 
     It will be back in a about 5-15 minutes.
     Please, bear with us.
     Thank you!";
    return OK;
  }

and put it in directory that in the server's @INC, to put down all your scripts at /perl you would replace:

  <Location /perl>
    SetHandler perl-script
    PerlHandler Apache::Registry
    [snip]
  </Location>

with

  <Location /perl>
    SetHandler perl-script
    PerlHandler Construction
    [snip]
  </Location>

Now restart the server and your user will be happy to know that you are working on a much better version of the service and it worth for them to go read slashdot.org and come back in 10 minutes.

If you need to disable a location handled by some module, the second approach would work just as well.

[TOC]


SUID start-up scripts

For those who wants to use SUID startup script, here is an example for you. This script is SUID to root, and should be executable only by members of some special group at your site. Note the 10th line, which ``fixes an obscure error when starting apache/mod_perl'' by setting the real to the effective UID. As others have pointed out, it is the mismatch between the real and the effective UIDs that causes Perl to croak on the -e switch.

Note that you must be using a version of Perl that recognizes and emulates the suid bits in order for this to work. The script will do different things depending on whether it is named start_http, stop_http or restart_http. You can use symbolic links for this purpose.

 #!/usr/bin/perl
 
 # These constants will need to be adjusted.
 $PID_FILE = '/home/www/logs/httpd.pid';
 $HTTPD = '/home/www/httpd -d /home/www';
 
 # These prevent taint warnings while running suid
 $ENV{PATH}='/bin:/usr/bin';
 $ENV{IFS}='';
 
 # This sets the real to the effective ID, and prevents
 # an obscure error when starting apache/mod_perl
 $< = $>;
 $( = $) = 0; # set the group to root too
 
 # Do different things depending on our name
 ($name) = $0 =~ m|([^/]+)$|;
 
 if ($name eq 'start_http') {
     system $HTTPD and die "Unable to start HTTP";
     print "HTTP started.\n";
     exit 0;
 }
 
 # extract the process id and confirm that it is numeric
 $pid = `cat $PID_FILE`;
 $pid =~ /(\d+)/ or die "PID $pid not numeric";
 $pid = $1;
 
 if ($name eq 'stop_http') {
     kill 'TERM',$pid or die "Unable to signal HTTP";
     print "HTTP stopped.\n";
     exit 0;
 }
 
 if ($name eq 'restart_http') {
     kill 'HUP',$pid or die "Unable to signal HTTP";
     print "HTTP restarted.\n";
     exit 0;
 }
 
 die "Script must be named start_http, stop_http, or restart_http.\n";

[TOC]


Preparing for Machine Reboot

When you run your own development box, it's OK to start the webserver by hand when you need it. On the production system, there is chance that the machine the server is running on will have to be rebooted. Once the reboot is completed, who is going to rememeber to start the server? It's an easy to forget task, and what happens if you aren't around when the machine was rebooted?

After the server installation is complete, it's important not to forget that you need to put a script, to perform the server startup and shutdown, into a standard system location, like /etc/rc.d/init.d or equivalent (varies from OS to OS). This is the directory where all other daemons are being started and shutted down from.

Generally the simplest solution is to copy there the apachectl script, that you will find in the same directory with httpd executable after Apache installation. If you have more than one Apache server, you have to put a script for each one, of course renaming them on the way.

For example on Linux RedHat machine with two server setup, I've the following setup:

  /etc/rc.d/init.d/httpd_docs
  /etc/rc.d/init.d/httpd_perl
  /etc/rc.d/rc3.d/S86httpd_docs -> ../init.d/httpd_docs
  /etc/rc.d/rc3.d/S87httpd_perl -> ../init.d/httpd_perl
  /etc/rc.d/rc6.d/K86httpd_docs -> ../init.d/httpd_docs
  /etc/rc.d/rc6.d/K87httpd_perl -> ../init.d/httpd_perl

In <init.d> directory reside the scripts themselves. In the rest of directories reside the symbolic links to these scripts, prepended with numbers to preserve a particular order of execution.

When a machine is booted and its runlevel set as 3 (multiuser+network), Linux goes into /etc/rc.d/rc3.d/ and executes the scripts the symbolic links point to with the start argument, so when it sees the S87httpd_perl, it executes:

  /etc/rc.d/init.d/httpd_perl start

When the machine is being shutted down, the scripts pointed from /etc/rc.d/rc6.d/ directory are being executed, this time the scripts are called with stop argument, like:

  /etc/rc.d/init.d/httpd_perl stop

Most of the systems are coming with GUI utilites to automate the symbolic links creation. For example Linux RH includes a control-panel utility, which among other utilities includes a RunLevel Manager that will help you to properly create the symbolic links. Of course before you use it, you should put the apachectl or similar scripts into a init.d or equivalent directory.

[TOC]


Monitoring the Server. A watchdog.

With mod_perl many things can happen to your server. The worst one is the possibility that the server will die when you will be not around. As with any other critical service you need to run some kind of watchdog.

One simple solution is to use a slightly modified apachectl script which I called apache.watchdog and to put it into the crontab to be called every 30 minutes or even every minute - if it's so critical to make sure the server will be up all the time.

The crontab entry:

  0,30 * * * * /path/to/the/apache.watchdog >/dev/null 2>&1

The script:

  #!/bin/sh
    
  # this script is a watchdog to see whether the server is online
  # It tries to restart the server if it's
  # down and sends an email alert to admin 
  
  # admin's email
  EMAIL=webmaster@somewhere.far
  #EMAIL=root@localhost
    
  # the path to your PID file
  PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid
    
  # the path to your httpd binary, including options if necessary
  HTTPD=/usr/local/sbin/httpd_perl/httpd_perl
        
  # check for pidfile
  if [ -f $PIDFILE ] ; then
    PID=`cat $PIDFILE`
    
    if kill -0 $PID; then
      STATUS="httpd (pid $PID) running"
      RUNNING=1
    else
      STATUS="httpd (pid $PID?) not running"
      RUNNING=0
    fi
  else
    STATUS="httpd (no pid file) not running"
    RUNNING=0
  fi
      
  if [ $RUNNING -eq 0 ]; then
    echo "$0 $ARG: httpd not running, trying to start"
    if $HTTPD ; then
      echo "$0 $ARG: httpd started"
      mail $EMAIL -s "$0 $ARG: httpd started" </dev/null >& /dev/null
    else
      echo "$0 $ARG: httpd could not be started"
      mail $EMAIL -s "$0 $ARG: httpd could not be started" </dev/null >& /dev/null
    fi
  fi

Another approach, probably even more practical, is to use the cool LWP perl package , to test the server by trying to fetch some document (script) served by the server. Why is it more practical? Because, while server can be up as a process, it can be stuck and not working, So failing to get the document will trigger restart, and ``probably'' the problem will go away. (Just replace start with restart in the $restart_command below.

Again we put this script into a crontab to call it every 30 minutes. Personally I call it every minute, to fetch some very light script. Why so often? If your server starts to spin and trash your disk's space with multiply error messages, in a 5 minutes you might run out of free space, which might bring your system to its knees. And most chances that no other child will be able to serve requests, since the system will be too busy, writing to an error_log file. Think big -- if you are running a heavy service, which is very fast, since you are running under mod_perl, adding one more request every minute, will be not felt by the server at all.

So we end up with crontab entry:

  * * * * * /path/to/the/watchdog.pl >/dev/null 2>&1

And the watchdog itself:

  #!/usr/local/bin/perl -w
  
  use strict;
  use diagnostics;
  use URI::URL;
  use LWP::MediaTypes qw(media_suffix);
  
  my $VERSION = '0.01';
  use vars qw($ua $proxy);
  $proxy = '';    

  require LWP::UserAgent;
  use HTTP::Status;
  
  ###### Config ########
  my $test_script_url = 'http://www.stas.com:81/perl/test.pl';
  my $monitor_email   = 'root@localhost';
  my $restart_command = '/usr/local/sbin/httpd_perl/apachectl restart';
  my $mail_program    = '/usr/lib/sendmail -t -n';
  ######################
  
  $ua  = new LWP::UserAgent;
  $ua->agent("$0/Stas " . $ua->agent);
  # Uncomment the proxy if you don't use it!
  #  $proxy="http://www-proxy.com";;
  $ua->proxy('http', $proxy) if $proxy;
  
  # If returns '1' it's we are alive
  exit 1 if checkurl($test_script_url);
  
  # We have got the problem - the server seems to be down. Try to
  # restart it. 
  my $status = system $restart_command;
  #  print "Status $status\n";
  
  my $message = ($status == 0) 
              ? "Server was down and successfully restarted!" 
              : "Server is down. Can't restart.";
    
  my $subject = ($status == 0) 
              ? "Attention! Webserver restarted"
              : "Attention! Webserver is down. can't restart";
  
  # email the monitoring person
  my $to = $monitor_email;
  my $from = $monitor_email;
  send_mail($from,$to,$subject,$message);
  
  # input:  URL to check 
  # output: 1 if success, o for fail  
  #######################  
  sub checkurl{
    my ($url) = @_;
  
    # Fetch document 
    my $res = $ua->request(HTTP::Request->new(GET => $url));
  
    # Check the result status
    return 1 if is_success($res->code);
  
    # failed
    return 0;
  } #  end of sub checkurl
  
  # sends email about the problem 
  #######################  
  sub send_mail{
    my($from,$to,$subject,$messagebody) = @_;
  
    open MAIL, "|$mail_program"
        or die "Can't open a pipe to a $mail_program :$!\n";
   
    print MAIL <<__END_OF_MAIL__;
  To: $to
  From: $from
  Subject: $subject
  
  $messagebody
  
  __END_OF_MAIL__
  
    close MAIL;
  } 

[TOC]


Running server in a single mode

Often while developing new code, you will want to run the server in single process mode. See Sometimes it works Sometimes it does Not and Names collisions with Modules and libs Running in single process mode inhibits the server from ``daemonizing'', allowing you to run it more easily under debugger control.

  % /usr/local/sbin/httpd_perl/httpd_perl -X

When you execute the above the server will run in the fg (foreground) of the shell you have called it from. So to kill you just kill it with Ctrl-C.

Note that in -X mode the server will run very slowly while fetching images. If you use Netscape while your server is running in single-process mode, HTTP's KeepAlive feature gets in the way. Netscape tries to open multiple connections and keep them open. Because there is only one server process listening, each connection has to time-out before the next succeeds. Turn off KeepAlive in httpd.conf to avoid this effect while developing or you can press STOP after a few seconds (assuming you use the image size params, so the Netscape will be able to render the rest of the page).

In addition you should know that when running with -X you will not see any control messages that the parent server normally writes to the error_log. (Like ``server started, server stopped and etc''.) Since httpd -X causes the server to handle all requests itself, without forking any children, there is no controlling parent to write status messages.

[TOC]


Starting a personal server for each developer

If you are the only developer working on the specific server:port - you have no problems, since you have a complete control over the server. However, many times you have a group of developers who need to concurrently develop their own mod_perl scripts. This means that each one will want to have control over the server - to kill it, to run it in single server mode, to restart it again, etc., as well to have control over the location of the log files and other configuration settings like MaxClients, etc. You can work around this problem by preparing a few httpd.conf file and forcing each developer to use:

  httpd_perl -f /path/to/httpd.conf  

I have approached it in other way. I have used the -Dparameter startup option of the server. I call my version of the server

  % http_perl -Dsbekman

In httpd.conf I wrote:

  # Personal development Server for sbekman
  # sbekman use the server running on port 8000
  <IfDefine sbekman>
  Port 8000
  PidFile /usr/local/var/httpd_perl/run/httpd.pid.sbekman
  ErrorLog /usr/local/var/httpd_perl/logs/error_log.sbekman
  Timeout 300
  KeepAlive On
  MinSpareServers 2
  MaxSpareServers 2
  StartServers 1
  MaxClients 3
  MaxRequestsPerChild 15
  </IfDefine>
  
  # Personal development Server for userfoo
  # userfoo use the server running on port 8001
  <IfDefine userfoo>
  Port 8001
  PidFile /usr/local/var/httpd_perl/run/httpd.pid.userfoo
  ErrorLog /usr/local/var/httpd_perl/logs/error_log.userfoo
  Timeout 300
  KeepAlive Off
  MinSpareServers 1
  MaxSpareServers 2
  StartServers 1
  MaxClients 5
  MaxRequestsPerChild 0
  </IfDefine>

What we have achieved with this technique: Full control over start/stop, number of children, separate error log file, and port selection. This saves me from getting called every few minutes - ``Stas, I'm going to restart the server''.

To make things even easier. (In the above technique, you have to discover the PID of your parent httpd_perl process - written in /usr/local/var/httpd_perl/run/httpd.pid.userfoo) . We change the apachectl script to do the work for us. We make a copy for each developer called apachectl.username and we change 2 lines in script:

  PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid.sbekman
  HTTPD='/usr/local/sbin/httpd_perl/httpd_perl -Dsbekman'

Of course you think you can use only one control file and know who is calling by using uid, but since you have to be root to start the server - it is not so simple.

The last thing was to let developers an option to run in single process mode by:

  /usr/local/sbin/httpd_perl/httpd_perl -Dsbekman -X

In addition to making life easier, we decided to use relative links everywhere in the static docs (including the calls to CGIs). You may ask how using the relative link you will get to the right server? Very simple - we have utilized the mod_rewrite to solve our problems:

In access.conf of the httpd_docs server we have the following code: (you have to configure your httpd_docs server with --enable-module=rewrite )

  # sbekman' server
  # port = 8000
  RewriteCond  %{REQUEST_URI} ^/(perl|cgi-perl)  
  RewriteCond  %{REMOTE_ADDR} 123.34.45.56
  RewriteRule ^(.*)           http://nowhere.com:8000/$1 [R,L]
  
  # userfoo's server
  # port = 8001
  RewriteCond  %{REQUEST_URI} ^/(perl|cgi-perl)  
  RewriteCond  %{REMOTE_ADDR} 123.34.45.57
  RewriteRule ^(.*)           http://nowhere.com:8001/$1 [R,L]
  
  # all the rest
  RewriteCond  %{REQUEST_URI} ^/(perl|cgi-perl)  
  RewriteRule ^(.*)           http://nowhere.com:81/$1 [R]
  

where IP numbers are the IPs of the developer client machines (where they are running their web browser.) (I have tried to use REMOTE_USER since we have all the users authenticated but it did not work for me)

So if I have a relative URL like /perl/test.pl written in some html or even http://www.nowhere.com/perl/test.pl in my case (user at machine of sbekman) it will be redirected by httpd_docs to http://www.nowhere.com:8000/perl/test.pl.

Of course you have another problem: The CGI generates some html, which should be called again. If it generates a URL with hard coded PORT the above scheme will not work. There 2 solutions:

First, generate relative URL so it will reuse the technique above, with redirect (which is transparent for user) but it will not work if you have something to POST (redirect looses all the data!).

Second, use a general configuration module which generates a correct full URL according to REMOTE_USER, so if $ENV{REMOTE_USER} eq 'sbekman', I return http://www.nowhere.com:8000/perl/ as cgi_base_url. Again this will work if the user is authenticated.

All this is good for development. It is better to use the full URLs in production, since if you have a static form and the Action is relative but the static document located on another server, pressing the form's submit will cause a redirect to mod_perl server, but all the form's data will be lost during the redirect.

[TOC]


Wrapper to emulate the server environment

Many times you start off debugging your script by running it from your favorite shell. Sometimes you encounter a very weird situation when script runs from the shell but dies when called as a CGI. The real problem lies in the difference between the environment that is being used by your server and your shell. An example can be a different perl path or having PERL5LIB env variable which includes paths that are not in the @INC of the perl compiled with mod_perl server and configured during the startup.

The best debugging approach is to write a wrapper that emulates the exact environment of the server, by first deleting the environment variables like PERL5LIB and calling the same perl binary that it is being used by the server. Next, set the environment identical to the server's by copying the perl run directives from server startup and configuration files. It will also allow you to remove completely the first line of the script - since mod_perl skips it and the wrapper knows how to call the script.

Below is the example of such a script. Note that we force the -Tw when we call the real script. (I have also added the ability to pass params, which will not happen when you call the cgi from the web)

  #!/usr/local/bin/perl -w    
   
  # This is a wrapper example 
   
  # It simulates the web server environment by setting the @INC and other
  # stuff, so what will run under this wrapper will run under web and
  # vice versa. 
  
  #
  # Usage: wrap.pl some_cgi.pl
  #
  
  BEGIN{
    use vars qw($basedir);
    $basedir = "/usr/local";
  
    # we want to make a complete emulation, 
    # so we must remove the user's environment
    @INC = ();
  
    # local perl libs
    push @INC,
      qw($basedir/lib/perl5/5.00502/aix
         $basedir/lib/perl5/5.00502
         $basedir/lib/perl5/site_perl/5.005/aix
         $basedir/lib/perl5/site_perl/5.005
        );
  }
  
  use strict;
  use File::Basename;
  
    # process the passed params
  my $cgi = shift || '';
  my $params = (@ARGV) ? join(" ", @ARGV) : '';
  
  die "Usage:\n\t$0 some_cgi.pl\n" unless $cgi;
  
    # Set the environment
  my $PERL5LIB = join ":", @INC;
  
    # if the path includes the directory 
    # we extract it and chdir there
  if ($cgi =~ m|/|) {
    my $dirname = dirname($cgi);
    chdir $dirname or die "Can't chdir to $dirname: $! \n";
    $cgi =~ m|$dirname/(.*)|;
    $cgi = $1;
  }
  
    # run the cgi from the script's directory
    # Note that we invoke warnings and Taint mode ON!!!
  system qq{$basedir/bin/perl -I$PERL5LIB -Tw $cgi $params};

[TOC]


Log Rotation

A little bit off topic but good to know and use with mod_perl where your error_log can grow at a 10-100Mb per day rate if your scripts spit out lots of warnings...

To rotate the logs do:

  mv access_log access_log.renamed
  kill -HUP `cat httpd.pid`
  sleep 10; # allow some children to complete requests and logging
  # now it's safe to use access_log.renamed
  .....

The effect of SIGUSR1 and SIGHUP is detailed in: http://www.apache.org/docs/stopping.html .

I use this script:

  #!/usr/local/bin/perl -Tw
  
  # this script does a log rotation. Called from crontab.
  
  use strict;
  $ENV{PATH}='/bin:/usr/bin';
  
  ### configuration
  my @logfiles = qw(access_log error_log);
  umask 0;
  my $server = "httpd_perl";
  my $logs_dir = "/usr/local/var/$server/logs";
  my $restart_command = "/usr/local/sbin/$server/apachectl restart";
  my $gzip_exec = "/usr/bin/gzip";
  
  my ($sec,$min,$hour,$mday,$mon,$year) = localtime(time);
  my $time = sprintf "%0.2d.%0.2d.%0.2d-%0.2d.%0.2d.%0.2d", $year,++$mon,$mday,$hour,$min,$sec;
  $^I = ".".$time;
  
  # rename log files
  chdir $logs_dir;
  @ARGV = @logfiles;
  while (<>) {
    close ARGV;
  }
  
  # now restart the server so the logs will be restarted
  system $restart_command;
  
  # compress log files
  foreach (@logfiles) {
      system "$gzip_exec $_.$time";
  }

Randal L. Schwartz contributed this:

Cron fires off setuid script called log-roller that looks like this:

    #!/usr/bin/perl -Tw
    use strict;
    use File::Basename;
    
    $ENV{PATH} = "/usr/ucb:/bin:/usr/bin";
    
    my $ROOT = "/WWW/apache"; # names are relative to this
    my $CONF = "$ROOT/conf/httpd.conf"; # master conf
    my $MIDNIGHT = "MIDNIGHT";  # name of program in each logdir
    
    my ($user_id, $group_id, $pidfile); # will be set during parse of conf
    die "not running as root" if $>;
    
    chdir $ROOT or die "Cannot chdir $ROOT: $!";
    
    my %midnights;
    open CONF, "<$CONF" or die "Cannot open $CONF: $!";
    while (<CONF>) {
      if (/^User (\w+)/i) {
        $user_id = getpwnam($1);
        next;
      }
      if (/^Group (\w+)/i) {
        $group_id = getgrnam($1);
        next;
      }
      if (/^PidFile (.*)/i) {
        $pidfile = $1;
        next;
      }
     next unless /^ErrorLog (.*)/i;
      my $midnight = (dirname $1)."/$MIDNIGHT";
      next unless -x $midnight;
      $midnights{$midnight}++;
    }
    close CONF;
    
    die "missing User definition" unless defined $user_id;
    die "missing Group definition" unless defined $group_id;
    die "missing PidFile definition" unless defined $pidfile;
    
    open PID, $pidfile or die "Cannot open $pidfile: $!";
    <PID> =~ /(\d+)/;
    my $httpd_pid = $1;
    close PID;
    die "missing pid definition" unless defined $httpd_pid and $httpd_pid;
    kill 0, $httpd_pid or die "cannot find pid $httpd_pid: $!";
    
    
    for (sort keys %midnights) {
      defined(my $pid = fork) or die "cannot fork: $!";
      if ($pid) {
        ## parent:
        waitpid $pid, 0;
      } else {
        my $dir = dirname $_;
        ($(,$)) = ($group_id,$group_id);
        ($<,$>) = ($user_id,$user_id);
        chdir $dir or die "cannot chdir $dir: $!";
        exec "./$MIDNIGHT";
        die "cannot exec $MIDNIGHT: $!";
      }
    }
    
    kill 1, $httpd_pid or die "Cannot sighup $httpd_pid: $!";

And then individual MIDNIGHT scripts can look like this:

    #!/usr/bin/perl -Tw
    use strict;
    
    die "bad guy" unless getpwuid($<) =~ /^(root|nobody)$/;
    my @LOGFILES = qw(access_log error_log);
    umask 0;
    $^I = ".".time;
    @ARGV = @LOGFILES;
    while (<>) {
      close ARGV;
    }

Can you spot the security holes? Our trusted user base can't or won't. :) But these shouldn't be used in hostile situations.

[TOC]


Preventing from modperl process from going wild

Sometimes calling an undefined subroutine in a module can cause a tight loop that consumes all memory. Here is a way to catch such errors. Define an autoload subroutine:

  sub UNIVERSAL::AUTOLOAD {
    my $class = shift;
    warn "$class can't \$UNIVERSAL::AUTOLOAD!\n";
  }

It will produce a nice error in error_log, giving the line number of the call and the name of the undefined subroutine.

Sometimes an error happens and causes the server to write millions of lines into your error_log file and in a few minutes to put your server down on its knees. For example I get an error Callback called exit show up in my error_log file many times. The error_log file grows to 300 Mbytes in size in a few minutes. You should run a cron job to make sure this does not happen and if it does to take care of it. Andreas J. Koenig is running this shell script every minute:

  S=`ls -s /usr/local/apache/logs/error_log | awk '{print $1}'`
  if [ "$S" -gt 100000 ] ; then
    mv  /usr/local/apache/logs/error_log /usr/local/apache/logs/error_log.old
    /etc/rc.d/init.d/httpd restart
    date | /bin/mail -s "error_log $S kB on inx" myemail@domain.com
  fi

It seems that his script will trigger restart every minute, since once the logfile grows to be of 100000 lines, it will stay of this size, unless you remove or rename it, before you do restart. On my server I run a watchdog every five minutes which restarts the server if it is getting stuck (it always works since when some modperl child process goes wild, the I/O it causes is so heavy that other brother processes cannot normally to serve the requests.) See Monitoring the Server for more hints.

Also check out the daemontools from ftp://koobera.math.uic.edu/www/daemontools.html :

  ,-----
  | cyclog writes a log to disk. It automatically synchronizes the log
  | every 100KB (by default) to guarantee data integrity after a crash. It
  | automatically rotates the log to keep it below 1MB (by default). If
  | the disk fills up, cyclog pauses and then tries again, without losing
  | any data.
  `-----

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 12/18/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/correct_headers.html0100644000000000000000000011107507027225633014707 0ustar rootroot mod_perl guide: Correct Headers - A quick guide for mod_perl users

Mod Perl Icon Mod Perl Icon Correct Headers - A quick guide for mod_perl users


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


SYNOPSIS

As there is always more than one way to do it, I'm tempted to believe one must be the best. Hardly ever am I right.

[TOC]


The origin of this chapter

This chapter has been contributed to the Guide by Andreas Koenig. You will find the references and other related info at the bottom of this page. I'll try to keep it concurrent with the Master version which resides on CPAN. If in doubt -- always check the CPAN for Apache::correct_headers.

If you have any questions regarding this specific document only, please refer to Andreas, since he is the guru on this subject. On any other matter please contact the mod_perl mailing list.

[TOC]


DESCRIPTION

[TOC]


1) Why headers

Dynamic Content is dynamic, after all, so why would anybody care about HTTP headers? Header composition is an often neglected task in the CGI world. Because pages are generated dynamically, you might believe that pages without a Last-Modified header are fine, and that an If-Modified-Since header in the browser's request can go by unnoticed. This laissez-faire principle gets in the way when you try to establish a server that is entirely driven by dynamic components and the number of hits is significant.

If the number of hits is not significant, don't bother to read this document.

If the number of hits is significant, you might want to consider what cache-friendliness means (you may also want to read [4]) and how you can cooperate with caches to increase the performace of your site. Especially if you use a squid in accelerator mode (helpful hints for squid, see [1]), you will have a strong motivation to cooperate with it. This document may help you to do it correctly.

[TOC]


2) Which Headers

The HTTP standard (v 1.1 is specified in [3], v 1.0 in [2]) describes lots of headers. In this document, we only discuss those headers which are most relevant to caching.

I have grouped the headers in three groups: date headers, content headers, and the special Vary header.

[TOC]


2.1) Date related headers

[TOC]


2.1.1) Date

Section 14.18 of the HTTP standard deals with the circumstances, under which you must or must not send a Date header. For almost everything a normal mod_perl user is doing, a Date header needs to be generated. But the mod_perl programmer doesn't have to care for this header, the apache server guarantees that this header is being sent.

In http_protocol.c the Date header is set according to $r->request_time. A modperl script can read, but not change, $r->request_time.

[TOC]


2.1.2) Last-Modified

Section 14.29 of the HTTP standard deals with this. The Last-Modified header is mostly used as a so-called weak validator. I'm citing two sentences from the HTTP specs:

  A validator that does not always change when the resource
  changes is a "weak validator."

  One can think of a strong validator as one that changes
  whenever the bits of an entity changes, while a weak value
  changes whenever the meaning of an entity changes.

This tells us that we should consider the semantics of the page we are generating and not the date when we are running. The question is, when did the meaning of this page change last time? Let's imagine, the document in question is a text-to-gif renderer that takes as input a font to use, background and foreground color, and a string to render. Although the actual image is created on-the-fly, the semantics of the page are determined when the script has changed the last time, right?

Actually, there are a few more things relevant: the semantics also change a little when you update one of the fonts that may be used or when you update your ImageMagick or whatever program. It's something you should consider, if you want to get it right.

If you have several components that compose a page, you should ask the question for all components, when they changed their semantic behaviour last time. And then pick the maximum of those times.

mod_perl offers you two convenient methods to deal with this header: update_mtime and set_last_modified. Both these two and several more methods are not available in the normal mod_perl environment but get added silently when you require Apache::File. As of this writing, Apache::File comes without a manpage, so you have to read about it in Chapter 9 of [5].

update_mtime() takes a UNIX time as argument and sets Apache's request structure finfo.st_mtime to this value. It does so only when the argument is greater than an already stored finfo.st_mtime.

set_last_modified() sets the outgoing header Last-Modified to the string that corresponds to the stored finfo.st_mtime. By passing a UNIX time to set_last_modified(), mod_perl calls update_mtime() with this argument first.

  use Apache::File;
  use Date::Parse;
  # Date::Parse parses RCS format, Apache::Util::parsedate doesn't
  $Mtime ||=
    Date::Parse::str2time(substr q$Date: 1999/08/14 06:21:32 $, 6);
  $r->set_last_modified($Mtime);

[TOC]


2.1.3) Expires and Cache-Control

Section 14.21 of the HTTP standard deals with the Expires header. The meaning of the Expires header is to determine a point in time after which this document should be considered out of date (stale). Don't confuse this with the very different meaning of the Last-Modified. The Expires header is useful to avoid unnecessary validation from now on until the document expires and it helps the recipient to clean up his stored documents. A sentence from the HTTP standard:

  The presence of an Expires field does not imply that the
  original resource will change or cease to exist at, before, or
  after that time.

So think before you set up a time when you believe, a resource should be regarded as stale. Most of the time I can determine an expected lifetime from ``now'', that is the time of the request. I would not recommend to hardcode the date of Expiry, because when you forget that you did that, and the date arrives, you will serve ``already expired'' documents that cannot be cached at all by anybody. If you believe, a resource will never expire, read this quote from the HTTP specs:

  To mark a response as "never expires," an origin server sends an
  Expires date approximately one year from the time the response is
  sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one
  year in the future.

Now the code for the mod_perl programmer that wants to expire a document half a year from now:

  $r->header_out('Expires',
                 HTTP::Date::time2str(time + 180*24*60*60));

A very handy alternative to this computation is available in HTTP 1.1, the cache control mechanism. Instead of setting the Expires header you can specify a delta value in a Cache-Control header. You can do that by running just

  $r->header_out('Cache-Control', "max-age=" . 180*24*60*60);

which is, of course much cheaper than the above because perl computes the value only once at compile time and optimizes it away as a constant.

As this alternative is only available in HTTP 1.1 and old cache servers may not understand this header, it is advisable to send both headers. In this case the Cache-Control header takes precedence, so that the Expires header is ignored on HTTP 1.1 complient servers. Or you could go with an if/else clause:

  if ($r->protocol =~ /(\d\.\d)/ && $1 >= 1.1){
    $r->header_out('Cache-Control', "max-age=" . 180*24*60*60);
  } else {
    $r->header_out('Expires',
                   HTTP::Date::time2str(time + 180*24*60*60));
  }

If you restart your apache regularly, I'd save the Expires header in a global variable. Oh, well, this is probably over-engineered now.

If people are determined that their document shouldn't be cached, here is the easy way to set a suitable Expires header...

The call $r->no_cache(1) will cause apache to generate an Expires header with the same content as the Date-header in the response, so that the document ``expires immediately''. Don't set Expires with $r->header_out if you use $r->no_cache, because header_out takes precedence. the problem that remains are broken browsers that ignore Expires headers.

Currently to avoid caching alltogether

  my $headers = $r->headers_out;
  $headers->{'Pragma'} = $headers->{'Cache-control'} = 'no-cache';
  $r->no_cache(1);

works with the major browsers.

[TOC]


2.2) Content related headers

[TOC]


2.2.1) Content-Type

You are most probably familiar with Content-Type. Sections 3.7, 7.2.1 and 14.17 of the HTTP specs deal with the details. Mod_perl has the content_type() method to deal with this header, as in

  $r->content_type("image/png");

Content-Type SHOULD be included in all messages according to the specs, and apache will generate one if you don't. It will be whatever is specified in the relevant DefaultType configuration directive or text/plain if none is active.

[TOC]


2.2.2) Content-Length

The Content-Length header according to the HTTP specs section 14.13, is the number of octets in the body of a message. If it can be determined prior to sending, it can be very useful for several reasons to include it. The most important reason why it is good to include it, is that keepalive requests only work with responses that contain a Content-Length header. In mod_perl you can say

  $r->header_out('Content-Length', $length);

If you use Apache::File, you get the additional set_content_length method for the Apache class which is a bit more efficient than the above. You can then say:

  $r->set_content_length($length);

The Content-Length header can have an important impact on caches by invalidating cache entries as the following citation of the specs explains:

  The response to a HEAD request MAY be cacheable in the sense that
  the information contained in the response MAY be used to update a
  previously cached entity from that resource. If the new field values
  indicate that the cached entity differs from the current entity (as
  would be indicated by a change in Content-Length, Content-MD5, ETag
  or Last-Modified), then the cache MUST treat the cache entry as
  stale.

So be careful to never send a wrong Content-Length, be it in a GET or in a HEAD request.

[TOC]


2.2.3) Entity Tags

An Entity Tag is a validator that can be used instead of or in addition to the Last-Modified header. An entity tag is a quoted string that has the property to identify different versions of a particular resource. An entity tag can be added to the response headers like so:

  $r->header_out("ETag","\"$VERSION\"");

Note: mod_perl offers the Apache::set_etag() method if you have loaded Apache::File. It is strongly recommended to not use this method unless you know what you are doing. set_etag() is expecting that it is used in conjunction with a static request for a file on disk that has been stat()ed in the course of the current request. It is inappropriate and dangerous to use it for dynamic content.

By sending an entity tag you promise to the recipient, that you will not send the same ETag for the same resource again unless the content is equal to the one you are sending now (see below for what equality means).

The pros and cons of using entity tags are discussed in section 13.3 of the HTTP specs. For us mod_perl programmers that discussion can be summed up as follows:

There are strong and weak validators. Strong validators change whenever a single bit changes in the response. Weak validators change when the meaning of the response changes. Strong validators are needed for caches to allow for sub-range requests. Weak validators allow a more efficient caching of equivalent objects. Algorithms like MD5 or SHA are good strong validators, but what we usually want, when we want to take advantage of caching, is a good weak validator.

A Last-Modified time, when used as a validator in a request, can be strong or weak, depending on a couple of rules. Please refer to section 13.3.3 of the HTTP standard to understand these rules. This is mostly relevant for range requests as this citation of section 14.27 explains:

  If the client has no entity tag for an entity, but does have a
  Last-Modified date, it MAY use that date in a If-Range header.

But it is not limited to range requests. Section 13.3.1 succintly states that

  The Last-Modified entity-header field value is often used as a
  cache validator.

The fact that a Last-Modified date may be used as a strong validator can be pretty disturbing if we are in fact changing our output slightly without changing the semantics of the output. To prevent such kind of misunderstanding between us and the cache servers in the response chain, we can send a weak validator in an ETag header. This is possible because the specs say:

  If a client wishes to perform a sub-range retrieval on a value for
  which it has only a Last-Modified time and no opaque validator, it
  MAY do this only if the Last-Modified time is strong in the sense
  described here.

In other words: by sending them an ETag that is marked as weak we prevent them to use the Last-Modified header as a strong validator.

An ETag value is marked as a weak validator by prepending the string W/ to the quoted string, otherwise it is strong. In perl this would mean something like this:

  $r->header_out('ETag',"W/\"$VERSION\"");

Consider carefully, which string you choose to act as a validator. You are left alone with this decision because...

  ... only the service author knows the semantics of a resource
  well enough to select an appropriate cache validation
  mechanism, and the specification of any validator comparison
  function more complex than byte-equality would open up a can
  of worms. Thus, comparisons of any other headers (except
  Last-Modified, for compatibility with HTTP/1.0) are never used
  for purposes of validating a cache entry.

If you are composing a message from multiple components, it may be necessary to combine some kind of version information for all components into a single string.

If you are producing relative big documents or contents that do not change frequently, you most likely will prefer a strong entity tag, thus giving caches a chance to transfer the document in chunks. (Anybody in the mood to add a chapter about ranges to this document?)

[TOC]


2.3) Content Negotiation

A particularly wonderful but unfortunately not yet widely supported feature that was introduced with HTTP 1.1 is content negotiation. The probably most popular usage scenario of content negotiation is language negotiation. A user specifies in his browser preferences the languages he understands and how well he understands them. The browser includes these settings in an Accept-Language header when it sends the request to the server and the server then chooses among several available representations of the document the one that fits the user's preferences best. Content negotiation is not limited to language. Citing the specs:

  HTTP/1.1 includes the following request-header fields for enabling
  server-driven negotiation through description of user agent
  capabilities and user preferences: Accept (section 14.1), Accept-
  Charset (section 14.2), Accept-Encoding (section 14.3), Accept-
  Language (section 14.4), and User-Agent (section 14.43). However, an
  origin server is not limited to these dimensions and MAY vary the
  response based on any aspect of the request, including information
  outside the request-header fields or within extension header fields
  not defined by this specification.

[TOC]


2.3.1) Vary

In order to signal to the recipient that content negotiation has been used to determine the best available representation for a given request, the server must include a Vary header that tells the recipient, which of the request headers have been used to determine it. So an answer may be generated like so:

  $r->header_out('Vary', join ", ", 'accept', 'accept-language',
                 'accept-encoding', 'user-agent');

While this may be in the header of a very cool page that greets the user with something like

  Hallo Kraut, Dein NutScrape versteht zwar PNG aber leider
  kein GZIP.

it has the side effect of being expensive for a caching proxy. As of this writing, squid (version 2.1PATCH2) does not cache resources at all that come with a Vary header. So unless you find a clever workaround, you won't enjoy your squid accelerator for these documents :-(

[TOC]


3) Requests

Section 13.11 of the specs states that the only two cachable methods are GET and HEAD.

[TOC]


3.1) HEAD

Among the above recommended headers, the date-related ones (Date, Last-Modified, and Expires/Cache-Control) are usually easy to produce and thus should be computed for HEAD requests just the same as for GET requests.

The Content-Type and Content-Length headers should be exactly the same as would be supplied to the corresponding GET request. But as it can be expensive to compute them, they can just as well be omitted, there is nothing in the specs that forces you to compute them.

What is important for the mod_perl programmer is that the response to a HEAD request MUST NOT contain a message-body. The code in your mod_perl handler might look like this:

  # compute all headers that are easy to compute
  if ( $r->header_only ){ # currently equivalent for $r->method eq "HEAD"
    $r->send_http_header;
    return OK;
  }

If you are running a squid accelerator, it will be able to handle the whole HEAD request for you, but under some circumstances it may not be allowed to do so.

[TOC]


3.2) POST

The response to a POST request is not cachable due to an underspecification in the HTTP standards. Section 13.4 does not forbid caching of responses to POST request but no other part of the HTTP standard explains how caching of POST requests could be implemented, so we are in a vacuum here and all existing caching servers therefore refuse to implement caching of POST requests. This may change if somebody does the footwork of defining the semantics for cache operations on POST. Note that some browsers with their more aggressive caching do implement caching of POST requests.

Note: If you are running a squid accelerator, you should be aware that it accelerates outgoing traffic, but does not bundle incoming traffic, so if you have long post requests, the squid doesn't buy you anything. So always consider to use a GET instead of a POST if possible.

[TOC]


3.3) <CODE>GET</CODE>

A normal GET is what we usually write our mod_perl programs for. Nothing special about it. We send our headers followed by the body.

But there is a certain case that needs a workaround to achieve better cacheability. We need to deal with the ``?'' in the rel_path part of the requested URI. Section 13.9 specifies, that

  ... caches MUST NOT treat responses to such URIs as fresh unless
  the server provides an explicit expiration time. This specifically
  means that responses from HTTP/1.0 servers for such URIs SHOULD NOT
  be taken from a cache.

You're tempted to believe, that we are using HTTP 1.1 and sending an explicit expiration time, so we're on the safe side? Unfortunately reality is a little bit different. It has been a bad habit for quite a long time to misconfigure cache servers such that they treat all GET requests containing a question mark as uncacheable. People even used to mark everything as uncacheable that contained the string cgi-bin.

To work around this bug in the heads, I have dropped the habit to call my CGI directories cgi-bin and I have written the following handler that lets me work with CGI-like query strings without rewriting the software that deals with them, namely Apache::Request or CGI.pm.

  sub handler {
    my($r) = @_;
    my $uri = $r->uri;
    if ( my($u1,$u2) = $uri =~ / ^ ([^?]+?) ; ([^?]*) $ /x ) {
      $r->uri($u1);
      $r->args($u2);
    } elsif ( my($u1,$u2) = $uri =~ m/^(.*?)%3[Bb](.*)$/ ) {
      # protect against old proxies that escape volens nolens
      # (see HTTP standard section 5.1.2)
      $r->uri($u1);
      $u2 =~ s/%3B/;/gi;
      $u2 =~ s/%26/;/gi; # &
      $u2 =~ s/%3D/=/gi;
      $r->args($u2);
    }
    DECLINED;
  }

This handler must be installed as a PerlPostReadRequestHandler.

The handler takes any request that contains no questionmark but one or more semicolons such that the first semicolon is interpreted as a questionmark and everything after that as the querystring. You can now exchange the request

  http://foo.com/query?BGCOLOR=blue;FGCOLOR=red

with

  http://foo.com/query;BGCOLOR=blue;FGCOLOR=red

Thus it allows the co-existence of queries from ordinary forms that are being processed by a browser and predefined requests for the same resource. It has one minor bug: Apache doesn't allow percent-escaped slashes in such a querystring. So you must write

  http://foo.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=/font/bla

and must not say

  http://foo.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=%2Ffont%2Fbla

[TOC]


3.4) Conditional GET

A rather challenging request we mod_perl programmers can get is the conditional GET, which typically means a request with an If-Modified-Since header. The HTTP specs have this to say:

  The semantics of the GET method change to a "conditional GET"
  if the request message includes an If-Modified-Since,
  If-Unmodified-Since, If-Match, If-None-Match, or If-Range
  header field. A conditional GET method requests that the
  entity be transferred only under the circumstances described
  by the conditional header field(s). The conditional GET method
  is intended to reduce unnecessary network usage by allowing
  cached entities to be refreshed without requiring multiple
  requests or transferring data already held by the client.

So how can we reduce the unnecessary network usage in such a case? mod_perl makes it easy for you by offering apache's meets_conditions(). You have to set up your Last-Modified (and possibly ETag) header before running this method. If the return value of this method is anything but OK, you should return from your handler with that return value and you're done. Apache handles the rest for you. The following example is taken from [5]:

  if((my $rc = $r->meets_conditions) != OK) {
     return $rc;
  }
  #else ... go and send the response body ...

If you have a squid accellerator running, it will often handle the conditionals for you and you can enjoy its extreme fast responses for such requests by reading the access.log. Just grep for TCP_IMS_HIT/304. But as with a HEAD request there are circumstances under which it may not be allowed to do so. That is why the origin server (which is the server you're programming) needs to handle conditional GETs as well even if a squid accelerator is running.

[TOC]


3.) Avoiding to deal with them

There is another approach to dynamic content that is possible with mod_perl. This approach is appropriate if the content changes relatively infrequently, if you expect lots of requests to retrieve the same content before it changes again and if it is much cheaper to test whether the content needs refreshing than it is to refresh it.

In this case a PerlFixupHandler can be installed for the relevant location. It tests whether the content is up to date. If so it returns DECLINED and lets the apache core serve the content from a file. Otherwise, it regenerates the content into the file, updates the $r->finfo status and again returns DECLINED so that apache serves the updated file. Updating $r->finfo can be achieved by calling

  $r->filename($file); # force update of finfo

even if this seems redundant because the filename is already equal to $file. Setting the filename has the side effect of doing a stat() on the file. This is important because otherwise apache would use the out of date finfo when generating the response header.

[TOC]


References and other literature

[TOC]


[1]

Stas Bekman: Mod_perl Guide. http://perl.apache.org/guide/

[TOC]


[2]

T. Berners-Lee et al.: Hypertext Transfer Protocol -- HTTP/1.0, RFC 1945.

[TOC]


[3]

R. Fielding et al.: Hypertext Transfer Protocol -- HTTP/1.1, RFC 2616.

[TOC]


[4]

Martin Hamilton: Cachebusting - cause and prevention, draft-hamilton-cachebusting-01. Also available online at http://vancouver-webpages.com/CacheNow/

[TOC]


[5]

Lincoln Stein, Doug MacEachern: Writing Apache Modules with Perl and C, O'Reilly, 1-56592-567-X. Selected chapters available online at http://www.modperl.com . Amazon page at http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu/

[TOC]


VERSION

You're reading revision $Revision: 1.16 $ of this document, written on $Date: 1999/08/14 06:21:32 $

[TOC]


AUTHOR

Andreas Koenig with helpful corrections, addition, comments from Ask Bjoern Hansen <ask@netcetera.dk>, Frank D. Cringle <fdc@cliwe.ping.de>, Eric Cholet <cholet@logilune.com>, Mark Kennedy <mark.kennedy@gs.com>, Doug MacEachern <dougm@pobox.com>, Tom Hukins <tom@eborcom.com>, Wham Bang <wham_bang@yahoo.com> and many others.

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 10/24/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/databases.html0100644000000000000000000013561307027225633013506 0ustar rootroot mod_perl guide: mod_perl and Relational Databases

Mod Perl Icon Mod Perl Icon mod_perl and Relational Databases


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Why Relational (SQL) Databases

Nowadays millions of people surf the Internet. There are millions of Terabytes of data lying around. To manipulate the data new smart techniques and technologies were invented. One of the major inventions was the relational database, which allows us to search and modify huge stores of data in very little time. We use SQL (Structured Query Language) to manipulate the contents of these databases.

When people started to use the web, they found that they needed to write web interfaces to their databases. CGI is the most widely used technology for building such interfaces. The main limitation of a CGI script driving a database is that its database connection is not persistent - on every request the CGI script has to initiate a connection to the database, and when the request is completed the connection is closed. Apache::DBI was written to remove this limitation. When you use it, you have a database connection which persists for the process' entire life. So when your mod_perl script needs to use a database, Apache::DBI provides a valid connection immediately and your script starts work right away without having to initiate a database connection first.

This is possible only with CGI running under a mod_perl enabled server, since in this model the child process does not quit when the request has been served.

It's almost as straightforward as is it sounds, there are just a few things to know about and we will cover them in this section.

[TOC]


Apache::DBI - Initiate a persistent database connection

This module initiates a persistent database connection. It is possible only with mod_perl.

[TOC]


Introduction

The DBI module can make use of the Apache::DBI module. When it loads, the DBI module tests if the environment variable $ENV{GATEWAY_INTERFACE} starts with CGI-Perl, and if the Apache::DBI module has already been loaded. If so, the DBI module will forward every connect() request to the Apache::DBI module. Apache::DBI uses the ping() method to look for a database handle from a previous connect() request, and tests if this handle is still valid. If these two conditions are fulfilled it just returns the database handle.

If there is no appropriate database handle or if the ping() method fails, Apache::DBI establishes a new connection and stores the handle for later re-use. When the script is run again by a child that is still connected, Apache::DBI just checks the cache of open connections by matching host,username and password parameters against it. A matching connection is returned if available or a new one is initiated and then returned.

There is no need to delete the disconnect() statements from your code. They won't do anything because the Apache::DBI module overloads the disconnect() method with an empty one.

When this module should be used and when shouldn't?

You will want to use this module if you are opening several database connections to the server. Apache::DBI will make them persistent per child, so if you have ten children and each opens two different connections (with different connect() arguments) you will have in total twenty opened and persistent connections. After the initial connect() you will save the connection time for every connect() request from your DBI module. This can be a huge benefit for a server with a high volume of database traffic.

You must NOT use this module if you are opening a special connection for each of your users. Each connection will stay persistent and in a short time the number of connections will be so big that your machine will scream in agony and die.

If you want to use Apache::DBI but you have both situations on one machine, at the time of writing the only solution is to run two Apache/mod_perl servers, one which uses Apache::DBI and one which does not.

[TOC]


Configuration

After installing this module, the configuration is simple - add the following directive to httpd.conf

  PerlModule Apache::DBI

Note that it is important to load this module before any other Apache*DBI module and DBI module itself!

You can skip preloading DBI, since Apache::DBI does that. But there is no harm in leaving it in, as long as it is loaded after Apache::DBI.

[TOC]


Preopening DBI connections

If you want to make sure that a connection will already be opened when your script is first executed after a server restart, then you should use the connect_on_init() method in the startup file to preload every connection you are going to use. For example:

  Apache::DBI->connect_on_init
  ("DBI:mysql:myDB::myserver",
   "username",
   "passwd",
   {
    PrintError => 1, # warn() on errors
    RaiseError => 0, # don't die on error
    AutoCommit => 1, # commit executes immediately
   }
  );

As noted above, use this method only if you only want all of apache to be able to connect to the database server as one user (or as a very few users).

Be warned though, that if you call connect_on_init() and your database is down, Apache children will be delayed at server startup, trying to connect. They won't begin serving requests until either they are connected, or the connection attempt fails. Depending on your DBD driver, this can take several minutes!

[TOC]


Debugging Apache::DBI

If you are not sure this module is working as advertised, you should enable Debug mode in the startup script by:

  $Apache::DBI::DEBUG = 1;

Starting with ApacheDBI-0.84, setting $Apache::DBI::DEBUG = 1 will produce only minimal output. For a full trace you set $Apache::DBI::DEBUG = 2.

Another approach is to add to httpd.conf (which does the same):

  PerlModule Apache::DebugDBI

After setting the DEBUG level you will see entries in the error_log both when Apache::DBI initializes a connection and when it returns one from its cache. Use the following command to view the log in real time (your error_log might be located at a different path, it is set in the Apache configuration files):

  tail -f /usr/local/apache/logs/error_log

I use alias (in tcsh) so I do not have to remember the path:

  alias err "tail -f /usr/local/apache/logs/error_log"

[TOC]


Troubleshooting

[TOC]


The Morning Bug

The SQL server keeps a connection to the client open for a limited period of time. Many developers were bitten by so called Morning bug, when every morning the first users to use the site received a No Data Returned message, but after that everything worked fine. The error is caused by Apache::DBI returning a handle of the invalid connection (the server closed it because of a timeout), and the script was dying on that error. The infamous ping() method was introduced to solve this problem, but still people were being bitten by this problem. Another solution was found - to increase the timeout parameter when starting the SQL server. Currently I startup MySQL server with a script safe_mysql, so I have modified it to use this option:

  nohup $ledir/mysqld [snipped other options] -O wait_timeout=172800

172800 seconds is equal to 48 hours. This change solves the problem.

Note that as from version 0.82, Apache::DBI implements ping() inside the eval block. This means that if the handle has timed out it should be reconnected automatically, and avoid the morning bug.

[TOC]


Opening connections with different parameters

When it received a connection request, before it will decide to use an existing cached connection, Apache::DBI insists that the new connection be opened in exactly the same way as the cached connection. If I have one script that sets LongReadLen and one that does not, Apache::DBI will make two different connections. So instead of having a maximum of 40 open connections, I can end up with 80.

However, you are free to modify the handle immediately after you get it from the cache. So always initiate connections using the same parameters and set LongReadLen (or whatever) afterwards.

[TOC]


Cannot find the DBI handler

You must use DBI::connect() as in normal DBI usage to get your $dbh database handler. Using the Apache::DBI does not eliminate the need to write proper DBI code. As the Apache::DBI man page states, you should program as if you are not using Apache::DBI at all. Apache::DBI will override the DBI methods where necessary and return your cached connection. Any disconnect() call will be just ignored.

[TOC]


Apache:DBI does not work

Make sure you have it installed.

Make sure you configured mod_perl with EVERYTHING=1.

Use the example script eg/startup.pl (in the mod_perl distribution). Remove the comment from the line.

  # use Apache::DebugDBI;

and adapt the connect string. Do not change anything in your scripts for use with Apache::DBI.

[TOC]


Skipping connection cache during server startup

Does your error_log look like this?

  10169 Apache::DBI PerlChildInitHandler
  10169 Apache::DBI skipping connection cache during server startup
  Database handle destroyed without explicit disconnect at
  /usr/lib/perl5/site_perl/5.005/Apache/DBI.pm line 29.

If so you are trying to open a database connection in the parent httpd process. If you do, children will each get a copy of this handle, causing clashes when the handle is used by two processes at the same time. Each child must have its own, unique, connection handle.

To avoid this problem, Apache::DBI checks whether it is called during server startup. If so the module skips the connection cache and returns immediately without a database handle.

You must use the Apache::DBI->connect_on_init() method in the startup file.

[TOC]


Debugging code which deploys DBI

To log a trace of DBI statement execution, you must set the DBI_TRACE environment variable. The PerlSetEnv DBI_TRACE directive must appear before you load Apache::DBI and DBI.

For example if you use Apache::DBI, modify your httpd.conf with:

  PerlSetEnv DBI_TRACE "3=/tmp/dbitrace.log"
  PerlModule Apache::DBI

Replace 3 with the TRACE level you want. The traces from each request will be appended to /tmp/dbitrace.log. Note that the logs might interleave if requests are processed concurrently.

Within your code you can control trace generation with the trace() method:

  DBI->trace($trace_level)
  DBI->trace($trace_level, $trace_filename)

0 disables the trace. 2 generates detailed call trace information including parameters and return values.

(META: 1, 3 - no info in the manpage about these levels?)

[TOC]


mysql_use_result vs. mysql_store_result.

Since many mod_perl developers use mysql as their preferred SQL engine, these notes explain the difference between mysql_use_result() and mysql_store_result(). The two influence the speed and size of the processes.

The DBD::mysql (version 2.0217) documentation includes the following snippet:

  mysql_use_result attribute: This forces the driver to use
  mysql_use_result rather than mysql_store_result. The former is
  faster and less memory consuming, but tends to block other
  processes. (That's why mysql_store_result is the default.)

Think about it in client/server terms. When you ask the server to spoon-feed you the data as you use it, the server process must buffer the data, tie up that thread, and possibly keep any database locks open for a long time. So if you read a row of data and ponder it for a while, the tables you have locked are still locked, and the server is busy talking to you every so often. That is mysql_use_result().

If you just suck down the whole dataset to the client, then the server is free to go about its business serving other requests. This results in parallelism since the server and client are doing work at the same time, rather than blocking on each other doing frequent I/O. That is mysql_store_result().

As the mysql manual suggests: you should not use mysql_use_result() if you are doing a lot of processing for each row on the client side. This can tie up the server and prevent other threads from updating the tables.

[TOC]


Some useful code snippets to be used with relational Databases

In this section you will find scripts, modules and code snippets to help you get started using relational Databases with mod_perl scripts. Note that I work with mysql ( http://www.mysql.com ), so the code you find here will work out of box with mysql. If you use some other SQL engine, it might work for you or it might need some changes. YMMV.

[TOC]


Turning SQL query writing into a short and simple task

Having to write many queries in my CGI scripts, persuaded me to write a stand alone module that saves me a lot of time in coding and debugging my code. It also makes my scripts much smaller and easier to read. I will present the module here, with examples following:

Notice the DESTROY block at the end of the module, which makes various cleanups and allows this module to be used under mod_perl and mod_cgi as well. Note that you will not get the benefit of persistent database handles with mod_cgi.

Notice the DESTROY block at the end of the module, which makes various cleanups and allows this module to be used under mod_cgi as well.

[TOC]


The My::DB module

(Note that you will not find this on CPAN. at least not yet :)

  package My::DB;
  
  use strict;
  use 5.004;
  
  use DBI;
  
  use vars qw(%c);
  
  %c =
    (
       # DB debug
     #db_debug   => 1,
     db_debug  => 0,
  
     db => {
          DB_NAME      => 'foo',
          SERVER       => 'localhost',
          USER         => 'put_username_here',
          USER_PASSWD  => 'put_passwd_here',
         },
  
    );
  
  use Carp qw(croak verbose);
  #local $SIG{__WARN__} = \&Carp::cluck;
  
  # untaint the path by explicit setting
  local $ENV{PATH} = '/bin:/usr/bin';
  
  #######
  sub new {
    my $proto = shift;
    my $class = ref($proto) || $proto;
    my $self  = {};
  
      # connect to the DB, Apache::DBI takes care of caching the connections
      # save into a dbh - Database handle object
    $self->{dbh} = DBI->connect("DBI:mysql:$c{db}{DB_NAME}::$c{db}{SERVER}",
                               $c{db}{USER},
                               $c{db}{USER_PASSWD},
                               {
                                PrintError => 1, # warn() on errors
                                RaiseError => 0, # don't die on error
                                AutoCommit => 1, # commit executes immediately
                               }
                              )
      or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");
  
      # we want to die on errors if in debug mode
    $self->{dbh}->{RaiseError} = 1 if $c{'db_debug'};
  
      # init the sth - Statement handle object
    $self->{sth} = '';
  
    bless ($self, $class);
  
    $self;
  
  } # end of sub new
  
  
  
  ######################################################################
                 ###################################
                 ###                             ###
                 ###       SQL Functions         ###
                 ###                             ###
                 ###################################
  ######################################################################
  
  # print debug messages
  sub d{
     # we want to print the trace in debug mode
    print "<DT><B>".join("<BR>", @_)."</B>\n" if $c{'db_debug'};
  
  } # end of sub d
  
  
  ######################################################################
  # return a count of matched rows, by conditions 
  #
  #  $count = sql_count_matched($table_name,\@conditions);
  #
  # conditions must be an array so we can pass more than one column with
  # the same name.
  #
  #  @conditions =  ( column => ['comp_sign','value'],
  #                  foo    => ['>',15],
  #                  foo    => ['<',30],
  #                );
  #
  # The sub knows automatically to detect and quote strings
  #
  ##########################
  sub sql_count_matched{
    my $self    = shift;
    my $table   = shift || '';
    my $r_conds = shift || [];
  
      # we want to print the trace in debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
      # build the query
    my $do_sql = "SELECT COUNT(*) FROM $table ";
    my @where = ();
    for(my $i=0;$i<@{$r_conds};$i=$i+2) {
      push @where, join " ",
        $$r_conds[$i],
        $$r_conds[$i+1][0],
        sql_quote(sql_escape($$r_conds[$i+1][1]));
    }
      # Add the where clause if we have one
    $do_sql .= "WHERE ". join " AND ", @where if @where;
  
    d("SQL: $do_sql");
  
      # do query
    $self->{sth} = $self->{dbh}->prepare($do_sql);
    $self->{sth}->execute();
    my ($count) = $self->{sth}->fetchrow_array;
  
    d("Result: $count");
  
    $self->{sth}->finish;
  
    return $count;
  
  } # end of sub sql_count_matched
  
  
  
  ######################################################################
  # return a single (first) matched value or undef, by conditions and
  # restrictions
  #
  # sql_get_matched_value($table_name,$column,\@conditions,\@restrictions);
  #
  # column is a name of the column
  #
  # conditions must be an array so we can path more than one column with
  # the same name.
  #  @conditions =  ( column => ['comp_sign','value'],
  #                  foo    => ['>',15],
  #                  foo    => ['<',30],
  #                );
  # The sub knows automatically to detect and quote strings
  #
  # restrictions is a list of restrictions like ('order by email')
  #
  ##########################
  sub sql_get_matched_value{
    my $self    = shift;
    my $table   = shift || '';
    my $column  = shift || '';
    my $r_conds = shift || [];
    my $r_restr = shift || [];
  
      # we want to print in the trace debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
      # build the query
    my $do_sql = "SELECT $column FROM $table ";
  
    my @where = ();
    for(my $i=0;$i<@{$r_conds};$i=$i+2) {
      push @where, join " ",
        $$r_conds[$i],
        $$r_conds[$i+1][0],
        sql_quote(sql_escape($$r_conds[$i+1][1]));
    }
      # Add the where clause if we have one
    $do_sql .= " WHERE ". join " AND ", @where if @where;
  
      # restrictions (DONT put commas!)
    $do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
  
    d("SQL: $do_sql");
  
      # do query
    return $self->{dbh}->selectrow_array($do_sql);
  
  } # end of sub sql_get_matched_value
  
  
  
  
  ######################################################################
  # return a single row of first matched rows, by conditions and
  # restrictions. The row is being inserted into @results_row array
  # (value1,value2,...)  or empty () if none matched
  #
  # sql_get_matched_row(\@results_row,$table_name,\@columns,\@conditions,\@restrictions);
  #
  # columns is a list of columns to be returned (username, fname,...)
  #
  # conditions must be an array so we can path more than one column with
  # the same name.
  #  @conditions =  ( column => ['comp_sign','value'],
  #                  foo    => ['>',15],
  #                  foo    => ['<',30],
  #                );
  # The sub knows automatically to detect and quote strings
  #
  # restrictions is a list of restrictions like ('order by email')
  #
  ##########################
  sub sql_get_matched_row{
    my $self    = shift;
    my $r_row   = shift || {};
    my $table   = shift || '';
    my $r_cols  = shift || [];
    my $r_conds = shift || [];
    my $r_restr = shift || [];
  
      # we want to print in the trace debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
      # build the query
    my $do_sql = "SELECT ";
    $do_sql .= join ",", @{$r_cols} if @{$r_cols};
    $do_sql .= " FROM $table ";
  
    my @where = ();
    for(my $i=0;$i<@{$r_conds};$i=$i+2) {
      push @where, join " ",
        $$r_conds[$i],
        $$r_conds[$i+1][0],
        sql_quote(sql_escape($$r_conds[$i+1][1]));
    }
      # Add the where clause if we have one
    $do_sql .= " WHERE ". join " AND ", @where if @where;
  
      # restrictions (DONT put commas!)
    $do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
  
    d("SQL: $do_sql");
  
      # do query
    @{$r_row} = $self->{dbh}->selectrow_array($do_sql);
  
  } # end of sub sql_get_matched_row
  
  
  
  ######################################################################
  # return a ref to hash of single matched row, by conditions
  # and restrictions. return undef if nothing matched.
  # (column1 => value1, column2 => value2) or empty () if non matched
  #
  # sql_get_hash_ref($table_name,\@columns,\@conditions,\@restrictions);
  #
  # columns is a list of columns to be returned (username, fname,...)
  #
  # conditions must be an array so we can path more than one column with
  # the same name.
  #  @conditions =  ( column => ['comp_sign','value'],
  #                  foo    => ['>',15],
  #                  foo    => ['<',30],
  #                );
  # The sub knows automatically to detect and quote strings
  #
  # restrictions is a list of restrictions like ('order by email')
  #
  ##########################
  sub sql_get_hash_ref{
    my $self    = shift;
    my $table   = shift || '';
    my $r_cols  = shift || [];
    my $r_conds = shift || [];
    my $r_restr = shift || [];
  
      # we want to print in the trace debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
      # build the query
    my $do_sql = "SELECT ";
    $do_sql .= join ",", @{$r_cols} if @{$r_cols};
    $do_sql .= " FROM $table ";
  
    my @where = ();
    for(my $i=0;$i<@{$r_conds};$i=$i+2) {
      push @where, join " ",
        $$r_conds[$i],
        $$r_conds[$i+1][0],
        sql_quote(sql_escape($$r_conds[$i+1][1]));
    }
      # Add the where clause if we have one
    $do_sql .= " WHERE ". join " AND ", @where if @where;
  
      # restrictions (DONT put commas!)
    $do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
  
    d("SQL: $do_sql");
  
      # do query
    $self->{sth} = $self->{dbh}->prepare($do_sql);
    $self->{sth}->execute();
  
    return $self->{sth}->fetchrow_hashref;
  
  } # end of sub sql_get_hash_ref
  
  
  
  
  
  ######################################################################
  # returns a reference to an array, matched by conditions and
  # restrictions, which contains one reference to array per row. If
  # there are no rows to return, returns a reference to an empty array:
  # [
  #  [array1],
  #   ......
  #  [arrayN],
  # ];
  #
  # $ref = sql_get_matched_rows_ary_ref($table_name,\@columns,\@conditions,\@restrictions);
  #
  # columns is a list of columns to be returned (username, fname,...)
  #
  # conditions must be an array so we can path more than one column with
  # the same name. @conditions are being cancatenated with AND
  #  @conditions =  ( column => ['comp_sign','value'],
  #                  foo    => ['>',15],
  #                  foo    => ['<',30],
  #                );
  # results in
  # WHERE foo > 15 AND foo < 30
  #
  #  to make an OR logic use (then ANDed )
  #  @conditions =  ( column => ['comp_sign',['value1','value2']],
  #                  foo    => ['=',[15,24] ],
  #                  bar    => ['=',[16,21] ],
  #                );
  # results in
  # WHERE (foo = 15 OR foo = 24) AND (bar = 16 OR bar = 21)
  #
  # The sub knows automatically to detect and quote strings
  #
  # restrictions is a list of restrictions like ('order by email')
  #
  ##########################
  sub sql_get_matched_rows_ary_ref{
    my $self    = shift;
    my $table   = shift || '';
    my $r_cols  = shift || [];
    my $r_conds = shift || [];
    my $r_restr = shift || [];
  
      # we want to print in the trace debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
      # build the query
    my $do_sql = "SELECT ";
    $do_sql .= join ",", @{$r_cols} if @{$r_cols};
    $do_sql .= " FROM $table ";
  
    my @where = ();
    for(my $i=0;$i<@{$r_conds};$i=$i+2) {
  
      if (ref $$r_conds[$i+1][1] eq 'ARRAY') {
          # multi condition for the same field/comparator to be ORed
        push @where, map {"($_)"} join " OR ",
        map { join " ", 
                $r_conds->[$i],
                $r_conds->[$i+1][0],
                sql_quote(sql_escape($_));
            } @{$r_conds->[$i+1][1]};
      } else {
          # single condition for the same field/comparator
        push @where, join " ",
        $r_conds->[$i],
          $r_conds->[$i+1][0],
          sql_quote(sql_escape($r_conds->[$i+1][1]));
      }
    } # end of for(my $i=0;$i<@{$r_conds};$i=$i+2
  
      # Add the where clause if we have one
    $do_sql .= " WHERE ". join " AND ", @where if @where;
  
      # restrictions (DONT put commas!)
    $do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
  
    d("SQL: $do_sql");
  
      # do query
    return $self->{dbh}->selectall_arrayref($do_sql);
  
  } # end of sub sql_get_matched_rows_ary_ref
  
  
  
  
  ######################################################################
  # insert a single row into a DB
  #
  #  sql_insert_row($table_name,\%data,$delayed);
  #
  # data is hash of type (column1 => value1 ,column2 => value2 , )
  #
  # $delayed: 1 => do delayed insert, 0 or none passed => immediate
  #
  # * The sub knows automatically to detect and quote strings 
  #
  # * The insert id delayed, so the user will not wait untill the insert
  # will be completed, if many select queries are running 
  #
  ##########################
  sub sql_insert_row{
    my $self    = shift;
    my $table   = shift || '';
    my $r_data = shift || {};
    my $delayed = (shift) ? 'DELAYED' : '';
  
      # we want to print in the trace debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
      # build the query
    my $do_sql = "INSERT $delayed INTO $table ";
    $do_sql   .= "(".join(",",keys %{$r_data}).")";
    $do_sql   .= " VALUES (";
    $do_sql   .= join ",", sql_quote(sql_escape( values %{$r_data} ) );
    $do_sql   .= ")";
  
    d("SQL: $do_sql");
  
      # do query
    $self->{sth} = $self->{dbh}->prepare($do_sql);
    $self->{sth}->execute();
  
  } # end of sub sql_insert_row
  
  
  ######################################################################
  # update rows in a DB by condition
  #
  #  sql_update_rows($table_name,\%data,\@conditions,$delayed);
  #
  # data is hash of type (column1 => value1 ,column2 => value2 , )
  #
  # conditions must be an array so we can path more than one column with
  # the same name.
  #  @conditions =  ( column => ['comp_sign','value'],
  #                  foo    => ['>',15],
  #                  foo    => ['<',30],
  #                ); 
  #
  # $delayed: 1 => do delayed insert, 0 or none passed => immediate
  #
  # * The sub knows automatically to detect and quote strings 
  #
  #
  ##########################
  sub sql_update_rows{
    my $self    = shift;
    my $table   = shift || '';
    my $r_data = shift || {};
    my $r_conds = shift || [];
    my $delayed = (shift) ? 'LOW_PRIORITY' : '';
  
      # we want to print in the trace debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
      # build the query
    my $do_sql = "UPDATE $delayed $table SET ";
    $do_sql   .= join ",", 
      map { "$_=".join "",sql_quote(sql_escape($$r_data{$_})) } keys %{$r_data};
  
    my @where = ();
    for(my $i=0;$i<@{$r_conds};$i=$i+2) {
      push @where, join " ",
        $$r_conds[$i],
        $$r_conds[$i+1][0],
        sql_quote(sql_escape($$r_conds[$i+1][1]));
    }
      # Add the where clause if we have one
    $do_sql .= " WHERE ". join " AND ", @where if @where;
  
  
    d("SQL: $do_sql");
  
      # do query
    $self->{sth} = $self->{dbh}->prepare($do_sql);
  
    $self->{sth}->execute();
  
  #  my ($count) = $self->{sth}->fetchrow_array;
  #
  #  d("Result: $count");
  
  } # end of sub sql_update_rows
  
  
  ######################################################################
  # delete rows from DB by condition
  #
  # sql_delete_rows($table_name,\@conditions);
  #
  # conditions must be an array so we can path more than one column with
  # the same name.
  #  @conditions =  ( column => ['comp_sign','value'],
  #                  foo    => ['>',15],
  #                  foo    => ['<',30],
  #                );
  #
  # * The sub knows automatically to detect and quote strings 
  #
  #
  ##########################
  sub sql_delete_rows{
    my $self    = shift;
    my $table   = shift || '';
    my $r_conds = shift || [];
  
      # we want to print in the trace debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
      # build the query
    my $do_sql = "DELETE FROM $table ";
  
    my @where = ();
    for(my $i=0;$i<@{$r_conds};$i=$i+2) {
      push @where, join " ",
        $$r_conds[$i],
        $$r_conds[$i+1][0],
        sql_quote(sql_escape($$r_conds[$i+1][1]));
    }
  
      # Must be very careful with deletes, imagine somehow @where is
      # not getting set, "DELETE FROM NAME" deletes the contents of the table
    warn("Attempt to delete a whole table $table from DB\n!!!"),return unless @where;
  
      # Add the where clause if we have one
    $do_sql .= " WHERE ". join " AND ", @where;
  
    d("SQL: $do_sql");
  
      # do query
    $self->{sth} = $self->{dbh}->prepare($do_sql);
    $self->{sth}->execute();
  
  } # end of sub sql_delete_rows
  
  
  ######################################################################
  # executes the passed query and returns a reference to an array which
  # contains one reference per row. If there are no rows to return,
  # returns a reference to an empty array.
  #
  # $r_array = sql_execute_and_get_r_array($query);
  #
  #
  ##########################
  sub sql_execute_and_get_r_array{
    my $self     = shift;
    my $do_sql   = shift || '';
  
      # we want to print in the trace debug mode
    d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
  
    d("SQL: $do_sql");
  
    $self->{dbh}->selectall_arrayref($do_sql);
  
  } # end of sub sql_execute_and_get_r_array
  
  
  
  #
  #
  # return current date formatted for a DATE field type
  # YYYYMMDD
  #
  ############
  sub sql_date{
    my $self     = shift;
  
    my ($mday,$mon,$year) = (localtime)[3..5];
    return sprintf "%0.4d%0.2d%0.2d",1900+$year,++$mon,$mday;
  
  } # end of sub sql_date
  
  #
  #
  # return current date formatted for a DATE field type
  # YYYYMMDDHHMMSS
  #
  ############
  sub sql_datetime{
    my $self     = shift;
  
    my ($sec,$min,$hour,$mday,$mon,$year) = localtime();
    return sprintf "%0.4d%0.2d%0.2d%0.2d%0.2d%0.2d",1900+$year,++$mon,$mday,$hour,$min,$sec;
  
  } # end of sub sql_datetime
  
  
  # Quote the list of parameters.  Parameters consisting entirely of
  # digits (i.e. integers) are unquoted.
  # print sql_quote("one",2,"three"); => 'one', 2, 'three'
  #############
  sub sql_quote{ map{ /^(\d+|NULL)$/ ? $_ : "\'$_\'" } @_ }
  
  # Escape the list of parameters (all unsafe chars like ",' are escaped)
  # We make a copy of @_ since we might try to change the passed values,
  # producing an error when modification of a read-only value is attempted
  ##############
  sub sql_escape{ my @a = @_; map { s/([\'])/\\$1/g;$_} @a }
  
  
  # DESTROY makes all kinds of cleanups if the fuctions were interuppted
  # before their completion and haven't had a chance to make a clean up.
  ###########
  sub DESTROY{
    my $self = shift;
  
    $self->{sth}->finish     if defined $self->{sth} and $self->{sth};
    $self->{dbh}->disconnect if defined $self->{dbh} and $self->{dbh};
  
  } # end of sub DESTROY
  
  # Don't remove
  1;

[TOC]


My::DB Module's Usage Examples

To use My::DB in your script, you first have to create a My::DB object:

  use vars qw($db_obj);
  my $db_obj = new My::DB or croak "Can't initialize My::DB object: $!\n";

Now you can use any of My::DB's methods. Assume that we have a table called tracker where we store the names of the users and what they are doing in each and every moment (think about online community program).

I will start with a very simple query--I want to know where the users are and produce statistics. tracker is the name of the table.

    # fetch the statistics of where users are
  my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
    ("tracker",
     [qw(where_user_are)],
    );
  
  my %stats = ();
  my $total = 0;
  foreach my $r_row (@$r_ary){
    $stats{$r_row->[0]}++;
    $total++;
  }

Now let's count how many users we have (in table users):

  my $count = $db_obj->sql_count_matched("users");

Check whether a user exists:

  my $username = 'stas';
  my $exists = $db_obj->sql_count_matched
  ("users",
   [username => ["=",$username]]
  );

Check whether a user is online, and get the time since she went online (since is a column in the tracker table, it tells us when a user went online):

  my @row = ();
  $db_obj->sql_get_matched_row
  (\@row,
   "tracker",
   ['UNIX_TIMESTAMP(since)'],
   [username => ["=",$username]]
  );
  
  if (@row) {
    my $idle = int( (time() - $row[0]) / 60);
    return "Current status: Is Online and idle for $idle minutes.";
  }

A complex query. I join two tables, and I want a reference to an array which will store a slice of the matched query (LIMIT $offset,$hits) sorted by username. Each row in the array is to include the fields from the users table, but only those listed in @verbose_cols. Then we print it out.

  my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
    (
     "tracker STRAIGHT_JOIN users",
     [map {"users.$_"} @verbose_cols],
     [],
     ["WHERE tracker.username=users.username",
      "ORDER BY users.username",
      "LIMIT $offset,$hits"],
    );
  
  foreach my $r_row (@$r_ary){
    print ...
  }

Another complex query. The user checks checkboxes to be queried by, selects from lists and types in match strings, we process input and build the @where array. Then we want to get the number of matches and the matched rows as well.

META: Add what the tables contain

  my @where = ();
    # Process the checkboxes - we turn them into a regular expression
  foreach (keys %search_keys) {
    next unless defined $q->param($_) and $q->param($_);
    my $regexp = "[".join("",$q->param($_))."]";
    push @where, ($_ => ['REGEXP',$regexp]);
  }
  
    # Add the items selected by the user from our lists
    # selected => exact match
  push @where,(country => ['=',$q->param('country')]) if $q->param('country');
  
    # Add the parameters typed by the user
  foreach (qw(city state)) {
    push @where,($_ => ['LIKE',$q->param($_)]) if $q->param($_);
  }
  
     # Count all that matched the query
  my $total_matched_users =  $db_obj->sql_count_matched
    (
     "users",
     \@where,
    );
  
    # Now process the orderby
  my $orderby = $q->param('orderby') || 'username';
  
     # Do the query and fetch the data
  my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
  (
   "users",
   \@display_columns,
   \@where,
   ["ORDER BY $orderby",
    "LIMIT $offset,$hits"],
  );

sql_get_matched_rows_ary_ref knows to handle both ORed and ANDed params. This example shows how to use OR on parameters:

This snippet is an implementation of a watchdog. Our users want to know when their colleagues go online. They register the usernames of the people they want to know about. We have to make two queries: one to get a list of usernames, the second to find out whether any of these users is online. In the second query we use the OR keyword.

  # check who we are looking for
  $r_ary = $db_obj->sql_get_matched_rows_ary_ref
    ("watchdog",
     [qw(watched)],
     [username => ['=',$username)],
     ],
    );
  
    # put them into an array
  my @watched = map {$_->[0]} @{$r_ary};
  
  my %matched = ();
    # Does the user have some registered usernames?
  if (@watched) {
  
  # Try to fetch all the users who match the usernames exactly.
  # Put it into an array and compare it with a hash!
    $r_ary = $db_obj->sql_get_matched_rows_ary_ref
      ("tracker",
       [qw(username)],
       [username => ['=',\@watched],
       ]
      );
  
    map {$matched{$_->[0]} = 1} @{$r_ary};
  }
  
  # Now %matched includes the usernames of the users who are being
  # watched by $username and currently are online.

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 12/17/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/dbm.html0100644000000000000000000005022607027225633012315 0ustar rootroot mod_perl guide: mod_perl and dbm files

Mod Perl Icon Mod Perl Icon mod_perl and dbm files


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Where and Why to use dbm files

If you need a light database, with an easy API, using simple key-value pairs to store and manipulate the records, this is a solution that should be amongst the first you consider. The maximum practical size of a dbm database depends on your hardware and the desired response times of course, but as a rough guide consider 5000 to 10000 records to be reasonable.

Some of the earliest databases implemented on Unix were dbm files, and many are still in use today. As of this writing the Berkeley DB is the most powerful dbm implementation.

With dbm, the whole database is rarely read into a memory. Combine this feature with the use of smart storage techniques, and dbm files can be manipulated much faster than their flat file brothers. Flat file databases can become very slow on insert, update and delete operations, especially when the number of records exceeds a couple of thousand. The situation is worse if you need to run a sort algorithm on a flat file.

Several different indexing algorithms can be used with dbm:

  • The HASH algorithm gives a 0(1) complexity of search and update, fast insert and delete, but a slow sort. (You have to do it yourself.)

  • The BTREE algorithm allows arbitrary key/value pairs to be stored in a sorted, balanced binary tree, which allows us to get a sorted sequence of data pairs in 0(1), but at the expense of much slower insert, update, delete operations than is the case with HASH.

  • The RECNO algorithm is more complicated, and enables both fixed-length and variable-length flat text files to be manipulated using the same key/value pair interface as in HASH and BTREE. In this case the key will consist of a record (line) number.

Most often you will want to use the HASH method, but your choice depends very much on your application.

dbm databases are not limited to storing key/value pairs. They can store more complicated data structures with the help of the MLDBM module. This module can dump and restore the whole symbol table of your script, including arrays, hashes and other complicated data structures.

It is important to note that you cannot simply switch a dbm file from one storage algorithm to another. The only way to change the algorithm is to dump the data to a flat file and then restore it using the new storage method. You can use a script like this:

  #!/usr/bin/perl -w
  
  #
  # This script gets as a parameter a Berkeley DB file(s) which is stored
  # with DB_BTREE algorithm, and will backup it with .bak and create
  # instead the db with the same records but stored with DB_HASH
  # algorithm
  #
  # Usage: btree2hash.pl filename(s)
  
  use strict;
  use DB_File;
  use File::Copy;
  
    # Do checks 
  die "Usage: btree2hash.pl filename(s))\n" unless @ARGV;
  
  foreach my $filename (@ARGV) {
  
    die "Can't find $filename: $!\n" unless -e $filename and -r $filename;
  
      # First backup the file
    move("$filename","$filename.btree") 
      or die "can't move $filename $filename.btree:$!\n";
  
    my %btree;
    my %hash;
  
      # tie both dbs (db_hash is a fresh one!)
    tie %btree , 'DB_File',"$filename.btree", O_RDWR|O_CREAT, 
        0660, $DB_BTREE or die "Can't tie %btree";
    tie %hash ,  'DB_File',"$filename" , O_RDWR|O_CREAT, 
        0660, $DB_HASH  or die "Can't tie %hash";
  
      # copy DB
    %hash = %btree;
  
      # untie
    untie %btree ;
    untie %hash ;
  }

Note that some dbm implementations come with other conversion utilities as well.

[TOC]


mod_perl and dbm

Where does mod_perl fit into the picture?

If you are using a read only dbm file you can have it work faster if you keep it open (tied) all the time, so when your CGI script wants to access the database it is already tied and ready to be used. It will work with dynamic (read/write) databases as well but you need to use locking and data flushing to avoid data corruption.

Although mod_perl and dbm can give huge performance gains to your CGIs scripts, you should be very careful. You need to consider locking, and the consequences of die() and unexpected process deaths.

If your locking mechanism cannot handle dropped locks, a stale lock can deactivate your whole site. You can enter a deadlock situation if two processes simultaneously try to acquire locks on two separate databases. Each has locked only one of the databases, and cannot continue without locking the second. Yet this will never be freed because it is locked by the other process. If your processes all ask for their DB files in the same order, this situation cannot occur.

If you modify the DB you should be make very sure that you flush the data and synchronize it, especially when the process serving your CGI unexpectedly dies. In general your application should be tested very thoroughly before you put it into production to handle important data.

[TOC]


Locking dbm handlers

Let's make the lock status a global variable, so it will persist from request to request. If we request a lock - READ (shared) or WRITE (exclusive), we obtain the current lock status first.

If we are making a READ lock request, it is granted as soon as the file becomes unlocked or if it is already READ locked. The lock status becomes READ on success.

If we make a WRITE lock request, it is granted as soon as the file becomes unlocked. The lock status becomes WRITE on success.

The treatment of the WRITE lock request is most important.

If the DB is READ locked, a process that makes a WRITE request will poll until there are no reading or writing processes left. Lots of processes can successfully read the file, since they do not block each other. This means that a process that wants to write to the file (so first it needs to obtain an exclusive lock) may never get a chance to squeeze in. The following diagram represents a possible scenario where everybody can read but no one can write:

  [-p1-]                 [--p1--]
     [--p2--]
   [---------p3---------]
                 [------p4-----]
     [--p5--]   [----p5----]

The result is a starving process, which will timeout the request, and it will fail to update the DB. This is a good reason not to cache the dbm handle with dynamic dbm files. It will work perfectly with static DBM files without any need to lock files at all.

Ken Williams solved the above problem with his Tie::DB_Lock module, which I will present in the next section.

[TOC]


Tie::DB_Lock

Tie::DB_Lock ties hashes to databases using shared and exclusive locks. This module, by Ken Williams, solves the problems raised in the previous section.

The main difference from what I have described above is that Tie::DB_Lock copies a dbm file on read. Reading processes do not have to keep the file locked while they read it, and writing processes can still access the file while others are reading. This works best when you have lots of long-duration reading, and a few short bursts of writing.

The drawback of this module is the heavy IO performed when every reader makes a fresh copy of the DB. With big dbm files this can be quite a disadvantage and can slow the server down considerably.

An alternative would be to have one copy of the dbm image shared by all the reading processes. This can cut the number of files that are copied, and puts the responsibility of copying the read-only file on the writer, not the reader. It would need some care to make sure it does not disturb readers when putting a new read-only copy into place.

[TOC]


Locking techniques that works with dbm files

[TOC]


Flawed methods which must not be used

Caution: The suggested locking methods in the Camel book and DB_File man page (at least before the version 1.72) are flawed. If you use them in an environment where more than one process can modify the dbm file, it can get corrupted!!! The following is an explanation of why this happens.

You may not use a tied file's filehandle for locking, since you get the filehandle after the file has been already tied. It's too late to lock. The problem is that the database file is locked after it is opened. When the database is opened, the first 4k (in my dbm library) are read and then cached in memory. Therefore, a process can open the database file, cache the first 4k, and then block while another process writes to the file. If the second process modifies the first 4k of the file, when the original process gets the lock is now has an inconsistent view of the database. If it writes using this view it may easily corrupt the database on disk.

This problem can be difficult to trace because it does not cause corruption every time a process has to wait for a lock. One can do quite a bit of writing to a database file without actually changing the first 4k. But once you suspect this problem you can easily reproduce it by making your program modify the records in the first 4k of the DB.

[TOC]


Lock on tie (only supported by a few operating systems)

On some Operating Systems like FreeBSD, it's possible to lock on tie:

  tie my %t, 'DB_File', $TOK_FILE, O_RDWR | O_EXLOCK, 0664;

and only release the lock by untieing the file. Notice the O_EXLOCK flag, which is not available on all Operating Systems.

[TOC]


DB_File::Lock

Here is DB_File::Lock which does the locking by using an external lockfile. This allows you to gain the lock before the file is tied. Note that it's not yet on CPAN and so is listed here in its entirety. Note also that this code still needs some testing, so be careful if you use it on a production machine.

  package DB_File::Lock;
  require 5.004;
  
  use strict;
  
  BEGIN {
      # RCS/CVS compliant:  must be all one line, for MakeMaker
    $DB_File::Lock::VERSION = do { my @r = (q$Revision: 1.5 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r };
  
  }
  
  use DB_File ();
  use Fcntl qw(:flock O_RDWR O_CREAT);
  use Carp qw(croak carp verbose);
  use Symbol ();
  
  @DB_File::Lock::ISA    = qw( DB_File );
  %DB_File::Lock::lockfhs = ();
  
  use constant DEBUG => 0;
  
    # file creation permissions mode
  use constant PERM_MODE => 0660;
  
    # file locking modes
  %DB_File::Lock::locks =
    (
     read  => LOCK_SH,
     write => LOCK_EX,
    );
  
  # SYNOPSIS:
  # tie my %mydb, 'DB_File::Lock', $filepath, 
  #     ['read' || 'write', 'HASH' || 'BTREE']
  # while (my($k,$v) = each %mydb) {
  #   print "$k => $v\n";
  # }
  # untie %mydb;
  #########
  sub TIEHASH {
    my $class     = shift;
    my $file      = shift;
    my $lock_mode = lc shift || 'read';
    my $db_type   = shift || 'HASH';
  
    die "Dunno about lock mode: [$lock_mode].\n
         Valid modes are 'read' or 'write'.\n"
      unless $lock_mode eq 'read' or $lock_mode eq 'write';
  
    # Critical section starts here if in write mode!
  
      # create an external lock
    my $lockfh = Symbol::gensym();
    open $lockfh, ">$file.lock" or die "Cannot open $file.lock for writing: $!\n";
    unless (flock $lockfh, $DB_File::Lock::locks{$lock_mode}) {
      croak "cannot flock: $lock_mode => $DB_File::Lock::locks{$lock_mode}: $!\n";
    }
  
    my $self = $class->SUPER::TIEHASH
      ($file,
       O_RDWR|O_CREAT,
       PERM_MODE,
       ($db_type eq 'BTREE' ? $DB_File::DB_BTREE : $DB_File::DB_HASH )
      );
  
      # remove the package name in case re-blessing occurs
    (my $id = "$self") =~ s/^[^=]+=//;
  
      # cache the lock fh
    $DB_File::Lock::lockfhs{$id} = $lockfh;
  
    return $self;
  
  } # end of sub new
  
  
  # DESTROY is automatically called when a tied variable
  # goes out of scope, on explicit untie() or when the program is
  # interrupted, e.g. with a die() call.
  # 
  # It unties the db by forwarding it to the parent class,
  # unlocks the file and removes it from the cache of locks.
  ###########
  sub DESTROY{
    my $self = shift;
  
    $self->SUPER::DESTROY(@_);
  
      # now it safe to unlock the file, (close() unlocks as well). Since
      # the object has gone we remove its lock filehandler entry
      # from the cache.
    (my $id = "$self") =~ s/^[^=]+=//; # see 'sub TIEHASH'
    close delete $DB_File::Lock::lockfhs{$id};
  
      # Critical section ends here if in write mode!
  
    print "Destroying ".__PACKAGE__."\n" if DEBUG;
  
  }
  
  ####
  END {
    print "Calling the END from ".__PACKAGE__."\n" if DEBUG;
  
  }
  
  1;

And you use it like this:

  use DB_File::Lock ();

A simple tie, READ lock and untie

  use DB_File::Lock ();
  my $dbfile = "/tmp/test";
  tie my %mydb, 'DB_File::Lock', $dbfile, 'read';
  print $mydb{foo} if exists $mydb{foo};
  untie %mydb;

You can even skip the untie() call. When $mydb goes out of scope everything will be done automatically. However it is better use the explicit call, to make sure the critical sections between lock and unlock are as short as possible. This is especially important when requesting an exclusive (write) lock.

The following example shows how it might be convenient to skip the explicit untie(). In this example, we don't need to save the intermediate result, we just return and the cleanup is done automatically.

  use DB_File::Lock ();
  my $dbfile = "/tmp/test";
  print user_exists("stas") ? "Yes" : "No";
  sub user_exists{
    my $username = shift || '';
  
    warn("No username passed\n"), return 0 unless $username;
  
    tie my %mydb, 'DB_File::Lock', $dbfile, 'read';
  
    # if we match the username return 1, else 0
    return $mydb{$username} ? 1 : 0;
  
  } # end of sub user_exists

Now let's write all the upper case characters and their respective ASCII values to a dbm file. Then read the file and print them the contents of the DB, unsorted.

  use DB_File::Lock ();
  my $dbfile = "/tmp/test";
  
    # write 
  tie my %mydb, 'DB_File::Lock', $dbfile,'write';
  for (0..26) {
    $mydb{chr 65+$_} = $_;
  }
  untie %mydb;
  
    # now, read them and printout (unsorted)
  tie %mydb, 'DB_File::Lock', $dbfile;
  while (my($k,$v) = each %mydb) {
    print "$k => $v\n";
  }
  untie %mydb;

If your CGI was interrupted in the middle, DESTROY block will take care of unlocking the dbm file and flush any changes. So your DB will be safe against possible corruption because of unclean program termination.

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 12/19/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/debug.html0100644000000000000000000046443507027225633012654 0ustar rootroot mod_perl guide: Debugging mod_perl

Mod Perl Icon Mod Perl Icon Debugging mod_perl


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Curing The "Internal Server Error"

You have just installed this new CGI script and when you try it out you see the grey screen of death saying ``Internal Server Error''... Or even worse you have a script running on production server for a long time without problems, when suddenly the same grey screen occasionally shows up.

What are you going to do? How can find out what the problem is? You code in Perl for years, and whenever an error was occuring you always saw it as it was displayed in the same terminal window you've started the script from. But when you work with webserver, there is no terminal to look for errors, since the server in most cases has no terminal to send the error messages to.

Actually, the error messages don't disappear, there end up in the error_log file, that located in the directory specified by an ErrorLog directive in the httpd.conf file. The default setting is generally:

  ErrorLog /usr/local/apache/logs/error_log

So whenever you see the "Internal Server Error" it's a time to look at this file. We have solved the first problem, where to look for error messages.

There is a chance that seeing the error message doesn't really help to spot and fix the error. The error message can be of immediate help, but it might not help at all. The usefulness of the error message depends solely on the programmers coding style.

Let's take an example of the call to a function that opens a file passed as a parameter and does nothing with it. The first version of the code:

  my $r = shift;
  $r->send_http_header('text/plain');
  
  sub open_file{
    my $filename = shift || '';
    die "No filename passed!" unless $filename;
  
    open FILE, $filename or die;
  }
  
  open_file("/tmp/test.txt");

I assume that /tmp/test.txt doesn't exist so the open() would fail to open the file. When we call this script from our browser, the browser returns an "internal error" message and we see the following error at at the end of error_log file:

  Died at /home/httpd/perl/test.pl line 9.

So we use the hint, Perl kindly gave to us to find where in the code the die() was called. What we still don't know is what filename that was passed to this subroutine caused the program termination. When we have only once function call like in the example above -- the task of finding the problematic file is trivial.

Now let's add two more open_file() function calls and assume that among tree files only /tmp/test2.txt exists:

  open_file("/tmp/test.txt");
  open_file("/tmp/test2.txt");
  open_file("/tmp/test3.txt");

When you execute the above call, you will see the same error message for two times.

  Died at /home/httpd/perl/test.pl line 9.
  Died at /home/httpd/perl/test.pl line 9.

Based on this error message, can you tell what files your program failed to open? Probably not. Let's fix it by passing to die() the name of the file is question.

  sub open_file{
    my $filename = shift || '';
    die "No filename passed!" unless $filename;
  
    open FILE, $filename or die "failed to open $filename";;
  }
  
  open_file("/tmp/test.txt");

When we execute the above code, we see:

  failed to open /tmp/test.txt at /home/httpd/perl/test.pl line 9.

Which makes a big difference, since we know what file we should be checking on.

By the way, if you append a newline at the end of the message you pass to die(), perl wouldn't report the line number the error has happened at, so if you code:

  open FILE, $filename or die "failed to open a file\n";

The error message in case of failure would be:

  failed to open a file

Which gives you no debug information at all. It's very hard to debug this kind of code.

The warn() function, a kinder sister of die(), which logs the message but doesn't cause the program termination, behaves in the same way -- if you don't add a newline at the end of the message, the line number warn() was called at would be logged, otherwise it wouldn't.

You might want to use warn() instead of die(), if the file opening failure isn't critical, consider the following code:

  if(open FILE, $filename){
    # do something with file
  } else {
    warn "failed to open a $filename";
  } 
  # more code here...

So, we improved our code to report to us the names of the problematic files, but we still don't know the reason for open()'s failure. Let's try to improve the warn() example:

  if(-r $filename){}
    open FILE, $filename;
    # do something with file
  } else {
    warn "$filename doesn't exist or is not readable";
  } 

We see the warning in the error_log file:

  /tmp/test.txt doesn't exist or is not readable
  at /home/httpd/perl/test.pl line 9.

Since it tells us the reason for failure and we don't have to go to the code and check what it was trying to do with a file: open it for writing, reading or else. -r operator tests whether the file is readable.

It could by quite an overhead to explain the possible failure that way But why reinvent the wheel, when we already have the reason of failure stored in $! variable. Let's go back to the open_file() function:

  sub open_file{
    my $filename = shift || '';
    die "No filename passed!" unless $filename;
  
    open FILE, $filename or die "failed to open $filename: $!";
  }
  
  open_file("/tmp/test.txt");

We see:

  failed to open /tmp/test.txt: No such file or directory
  at /home/httpd/perl/test.pl line 9.

Now we have all the information we ever need to debug this problems: we know what line of code triggered die(), we know what file was attempted to be opened and the last but not least -- the reason, which an operational system gladly tells to us thru $! variable.

Now let's create the /tmp/test.txt file, so it would exist

  % touch /tmp/test.txt

Now when we execute the latest version of the code, we see:

  failed to open a /tmp/test.txt: Permission denied
  at /home/httpd/perl/test.pl line 9.

We see a different reason: I've created the file that doesn't belong to user nobody, the server runs as. So it has no permission to read the file.

Now you understand that it's much easier to debug your code if you validate the return values of the system calls, and properly code arguments to die() and warn() calls. open() function is just one of the many system calls perl provides to your convenience.

So now you can code and debug CGI scripts and modules, as easy as if they were plain perl scripts that you used to execute from a shell.

[TOC]


Helping error_log to Help Us

It's a good idea to keep it open all the time in a dedicated terminal with help of tail -f.

  % tail -f /usr/local/apache/logs/error_log

So you will see all the errors and warning immediately showing up as they happen.

Another tips is to create an shell alias, to make it easier to execute the above command. In tcsh you would do:

  % alias err "tail -f /usr/local/apache/logs/error_log"

and from now on in the shell you set the alias in, executing err will call the tail -f /usr/local/apache/logs/error_log. Since you want this alias to be available to you all the time, you should put it into a .tcshrc file or its equivalent if you don't use tcsh. (.bashrc for bash users)

[TOC]


The Importance of Warnings

Just like errors, perl's mandatory warnings are going to the error_log file, if the they are enabled.

The code you write lives a dual life. In the first life it's being written, tested, debugged, improved, tested, debugged, rewritten, tested, debugged. In the second life it's being used, period.

A significant part of the first life the script spends at the developers, its personal God's machine. The other part is being spent at the production server where the developer's creature is supposed to be perfect, since it was created in his own image...

So when you develop the code you want all the help in the world, to help you spot possible problems, and that's where enabling warning is a must mode to enable. It's very important to get rid of all or at least most of the warnings that appear in the error_log file. Why?

  • If there are warnings -- your code is not clean, and if they are waved away -- expect them to hit back on production server, when it's too late.

  • The other not less important reason, is that when each script's invocation generates more than 5 lines of warnings, it's very hard to catch real problems, as you just cannot see them among all these warnings you believe are unimportant.

On the other hand, on production server, you really *want* to turn warnings off. And there are good reasons for that:

  • There is no added value in having the same warning showing up, when triggered by thousands of script invocations. If your code isn't very clean and generates even a single warning per script invocation , you will end up with a huge error_log file in a short time on the heavily loaded server. Imagine what happens when you've got more than one warning appended to the log file. The warnings elimination phase is supposed to be a part of the development process, and should be done before the code goes live.

  • Enabling runtime warning checking has a small performance impact (in any perl script, not just under mod_perl).

mod_perl gives you a very simple solution to this warnings saga, don't enable warnings in the scripts unless you really have to. Let mod_perl to control this mode globally. All it takes is having a:

  PerlWarn On

directive added to httpd.conf on your development machine and having a:

  PerlWarn Off

directive at the live box's configuration file.

If there is a piece of code that generates warnings and really want to disable them only in this code, you can do that too. $^W special variable allows you to dynamically turn on and off the warnings mode. So just embrace the code into a block, and disable the warnings through the scope of this block. The original value of $^W will be restored upon exit from the block.

  {
   local $^W=0;
    # some code that generates warnings
  }

Again, unless you have a really good reason, for your own sake the advise is avoid this workaround.

Don't forget the local() operand, as if you do, $^W will affect all the requests processed by the same process that globally changed this variable.

diagnostics pragma can shed more light on the errors and warnings as you will see in a moment.

[TOC]


diagnostics pragma

This module extends the terse diagnostics normally emitted by both the perl compiler and the perl interpreter, augmenting them with the more explicative and endearing descriptions found in the perldiag manpage. Like the other pragmata, it affects the compilation phase of your program rather than merely the execution phase.

To use in your program as a pragma, merely invoke

    use diagnostics;

at the start (or near the start) of your program. This pragma turns on the -w mode as well.

Note that generally this pragma is useful, when you are new to perl, and want a better explanation of the errors and warnings, or when you encounter some warning you've never seen before, e.g. when this new warning was introduced in a newer version of Perl.

If leaving the warnings On on production server, might consume your hard disk space much faster, with diagnostics pragma you will run out of space about ten times faster if your code generates warnings. Since for each line of text generated by mere warnings mode, diagnostics generates ten times more.

The other reason, is a huge performance overhead that is being added in comparison with just having warnings On. Let's see some numbers. We will run the same benchmark, once with enabled diagnostics and once disabled on a subroutine test_code which does nothing, but doing a power of two numbers in the loop, a numeric comparison of two strings and assignment of one string to another which never happens, because the conditions is the same all the time and it's false. The wrong comparison choice is intentional and you will understand the choice in a second. By the way, the choice of the rest of the code inside test_code subroutine was absolutely at random.

  use Benchmark;
  use diagnostics;
  
  my $count = 10000;
  
  disable diagnostics;
  $t1 = timeit($count,\&test_code);
  
  enable  diagnostics;
  $t2 = timeit($count,\&test_code);
  
  print "Diagnostics off:",timestr($t1),"\n";
  print "Diagnostics on :",timestr($t2),"\n";
  
  sub test_code{
    for my $i (1..10) {
      my $j = $i**2;
    }
    $a = "Hi";
    $b = "Bye";
    if ($a == $b) {
      $c = $a;
    }
  }

For only a few lines of code we get:

  Diagnostics off: 2 wallclock secs ( 1.77 usr +  0.02 sys =  1.79 CPU)
  Diagnostics on :17 wallclock secs (13.16 usr +  0.08 sys = 13.24 CPU)

Result: the code running with enabled diagnostics runs seven times slower!!!

Now let's fix the comparison the way it should be, by replacing == with eq, so we get:

    $a = "Hi";
    $b = "Bye";
    if ($a eq $b) {
      $c = $a;
    }

and run the same benchmark again:

  Diagnostics off: 1 wallclock secs ( 1.43 usr +  0.01 sys =  1.44 CPU)
  Diagnostics on : 2 wallclock secs ( 1.41 usr +  0.01 sys =  1.42 CPU)

Amazing, but now there is no overhead at all. And why is that? As we find out, that diagnostics pragma slows things down only when something is wrong with the code.

It was just a little example, but it's obvious that you wouldn't benchmark all your scripts to check whether you have to remove this pragma or not. Just remember to remove it, when your code goes live.

[TOC]


Monitoring the error_log file

While debugging my mod_perl and general CGI code, I keep the error_log file open in a dedicated terminal window (xterm), so I can see errors and warnings as soon as they are appended to the file. I do it with:

  tail -f /usr/local/apache/logs/error_log

which shows all the lines that are being added lately into the file.

If you cannot access your error_log file because you are unable to telnet to your machine (generally a case with some ISPs who provides user CGI support but no telnet access), you might want to use a CGI script I wrote to fetch the latest lines from the file (with a bonus of colored output for an easier reading). You might need to ask your ISP to install this script for a general usage. See Watching the error_log file without telneting to the server.

[TOC]


Hanging processes: Detection and Diagnostics

Sometimes a httpd process might hang in a middle of a request processing, either because there is a bug in your code (i.e. the code is stuck in a while loop, blocked by some system call or because of a resource deadlock) or for some other reason. There are two things we want to know: when and why this happens.

# META: handle this

#=head1 Spinning httpds

#To see where an httpd is ``spinning'', try adding this to your script or #a startup file:

# use Carp (); # $SIG{'USR1'} = sub { # Carp::confess(``caught SIGUSR1!''); # };

#Then issue the command line:

# kill -USR1 <spinning_httpd_pid>

[TOC]


An Example of the Code that Might Hang the Process

Just to give you an idea of what kind of bug might cause the code to hang, let's look at the following example. Your process have to gain lock on some resource (o.e. file) before it continues, so it makes an attempt and if fails (no lock gained), it sleep()s for a second and increment the counter of attempts.

  until(gain_lock()){
    $tries++;
    sleep 1;
  }

Either because there are many processes competing on this resource or because there is a deadlock (a situation when two processes X and Y need resources A and B to continue, where X process holds on A and Y on B. There is no possibility for Y process to continue before X releases the resource A. But X cannot release A before it gets Y. Therefore this event is being known as deadlock.

A real world situation that you may encounter very often is an exclusive lock starvation. Generally there are two lock types in use: SHARED lock which allows many processes to perform simultaneously READ operation and EXCLUSIVE lock which ensures an access by a single process, which makes possible a safe WRITE operation.

You can lock any kind of resource, in our example we talk about files.

If there is a READ lock request, it is granted as soon as file becomes unlocked or already READ locked. Lock status becomes READ on success.

If there is a WRITE lock request, it is granted as soon as file becomes unlocked. Lock status becomes WRITE on success.

What happens to the WRITE lock request, is the most important. If the file is being READ locked, a process that requests to write will poll until there will be no reading or writing process left. Lots of processes can successfully read the file, since they do not block each other from doing so. This means that a process that wants to write to the file (first obtaining an exclusive lock) never gets a chance to squeeze in. The following diagram represents a possible scenario where everybody read but no one can write:

  [-p1-]                 [--p1--]
     [--p2--]
   [---------p3---------]
                 [------p4-----]
     [--p5--]   [----p5----]

Let's look at the real code and see it in action. The following script imports flock() related parameters from the Fcntl module, opens a file that will be locked and we define and set two variables: $lock_type and $lock_type_verbose which are set to LOCK_EX and EX if the first command line argument ($ARGV[0]) is defined and equal to <EM>w</EM> indicating that this process will try to gain <EM>WRITE</EM> (exclusive) lock, otherwise the two are set to <CODE>LOCK_SH</CODE> and <SH for SHARED (read) lock.

Once the variables are set, we enter the never ending while(1) loop that attempts to lock the file by the mode set in $lock_type, report success and type of lock that was gained, then sleeps for a random period between 0 to 9 seconds and unlocks the file. Then the loop starts from the beginning.

  lock.pl
  -------------------
  #!/usr/bin/perl -w
  use Fcntl qw(:flock);
  
  $lock = "/tmp/lock";
  
  open LOCK, ">$lock" or die "Cannot open $lock for writing: $!";
  my $lock_type         = LOCK_SH;
  my $lock_type_verbose = 'SH';
  if (defined $ARGV[0] and $ARGV[0] eq 'w'){
    $lock_type         = LOCK_EX;
    $lock_type_verbose = 'EX';
  }
  
  while(1){
    flock LOCK,$lock_type;
      # start of critical section
    print "$$: $lock_type_verbose\n";
    sleep int(rand(10));
      # end of critical section
    flock LOCK, LOCK_UN;
  }
  close LOCK;

When spawning a few of the above scripts simultaneously and making sure that the first processes to start are READ processes and there is majority of them, it's very easy to see the WRITE processes starvation. Execute three read and one write processes like:

 % ./lock.pl r & ; ./lock.pl r & ; ./lock.pl r & ; ./lock.pl w &

You see something like:

  24233: SH
  24232: SH
  24232: SH
  24233: SH
  24232: SH
  24233: SH
  24231: SH
  24231: SH
  24231: SH

and not a single EX line... When you kill off the reading processes, then the write lock will be gained. Note that this is a rough example, since I've used sleep() function. To emulate a real situation you need to use Time::HiRes module which allows you to sleep for microseconds.

The interval between lock and unlock is being called a Critical Section, which should be kept as little as possible in terms of time, and not in terms of amount of the code. As you just saw, a single sleep statement can make the critical section long.

To summarize the presented case, if you have a script that uses both READ and WRITE locks and the critical section isn't very short, The writing process might get into a starvation mode and after a while a browser that initiated this request will timeout the connection and abort the request, but it's more likely that user will press the Stop or Reload button before it happens. Since the process in question just waits, there is no way for Apache to know that the request was aborted and it will hang till the lock will be gained and only when a write to a client's broken connection will be attempted, Apache will terminate the script.

So this was a single example of how the process can hang.

[TOC]


Detecting hanging processes

It's not so easy to detect the hanging process. There is no way you can tell how long the request is being processed by using plain system utilities like ps() and top(). The reason is that each Apache process serves many requests without quitting. System utilities can tell how long the process is running since its creation, but this information is useless in our case, since the long running Apache process is a normal and expected behavior.

However there are a few approaches that can help to detect the hanging process.

If the process hangs and demands lots of resources it's quite easy to bust it by monitoring the output of top() utility. You will see the same process show up in the first few lines of the automatically refreshed report. But many times the hanging process, uses little or close to zero resources, e.g. when waiting for some event to happen.

Another easy spotting is when some process trashes the error_log and writes millions of error messages there... Generally this process uses lots of resources and spotted by using top() as described above.

What we have to use are the tools that report the status of the Apache processes. You can use either a mod_status module, which usually accessed from /server_status location, or an Apache::VMonitor module. Both tools provide counters of processed requests per Apache process. So what you can do is to watch the report for about 5-10 minutes spotting which process number has the same number of processed requests while its status is 'W' (Which means that it hangs), but when you have about 50 processes, it's quite hard to spot such a process. So let's write a watchdog to do the work for us:

.....META??? Apache::SafeHang code

When you've got a real problem and the processes hang one after the other, the moment comes when the number of hanging processes becomes equal to the value of MaxClients directive, which means that no more processes will be spawned and your service is halted from the point of user. This is easy to detect, attempt to resolve and notify the administrator by a simple crontab watchdog that requests some very light script an every minute or so. (See Monitoring the Server. A watchdog.)

In the watchdog you set a timeout you think is appropriate for your service, which may vary between a few seconds and 1 minute. If the server fails to respond before the timeout expires, watchdog has spotted a trouble and attempts to restart the server. After a restart an email report is being sent to administrator reporting first that there was a problem, second whether the restart was successful or not.

If you get such reports constantly something is wrong with your web service and you should revise your code. Note that it's possible that your server is overloaded when being hit by more requests that it can handle, so the requests are being queued and not processed for awhile, which triggers the watchdog's alarm. If this is a case you need to add more servers, memory and probably to split your single machine across a cluster of webserver machines.

[TOC]


Determination of the reason

Given the process pid, there are two ways to find out where it's hanging. Depending on operating system you should have either truss or strace utilities available within your code development software. The usage is simple:

  % truss -p PID

or

  % strace -p PID

Replace PID with a process number you want to check on.

Let's write a program that hangs and deploy strace to find out the point it hangs at:

  hangme.pl
  ---------
  $|=1;
  my $r = shift;
  $r->send_http_header('text/plain');
  
  print "PID = $$\n";
  
  while(1){
    $i++;
    sleep 1;
  }

The reason this simple code hangs is obvious from its examination -- the program never breaks from the while loop. As you have noticed, I print the PID of the current process to the browser, to learn what process to look after. Of course in a real situation, you cannot do the same trick. In the previous section I have presented a few ways to detect the runaway processes and their PIDs.

I save the above code in a file and execute it from the browser. Note that I've made the STDOUT unbuffered with $|=1; so I would immediately see the process ID. Once the script make a request the script prints its process PID and obviously hangs. So we press the 'Stop' button, but the process continues to hang in this code. Isn't apache supposed to detect the broken connection and abort the request processing? Yes and No, you will understand soon what's really happening.

First let's attach to the process and see what's it doing. I use the PID the script printed to the browser, which is 10045 in this case:

  % strace -p 10045
  
  [...truncated an identical output...]
  SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0
  SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0
  SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2)   = 0
  nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0
  time([940973834])                       = 940973834
  time([940973834])                       = 940973834
  [...truncated the identical output...]

It doesn't what we have expected to see, does it? These are some system calls we don't see in our little example. What we actually see is how Perl translates our code into a system calls. Since we know that our code hangs in this snippet:

  while(1){
    $i++;
    sleep 1;
  }

We "easily" figure out that the first three system calls implement the $i++, while the other other three are responsible for the sleep 1 call.

Generally the situation is quite opposite. You detect the hanging process, you attach to it and watch the trace of calls it does (or the last commands if the process hangs waiting for something, e.g. when blocking on file lock request). From watching the trace you should figure out what actually it's doing and probably find the corresponding lines in your perl code. For example let's see how one process "hangs" while requesting an exclusive lock on the file exclusively locked by another process:

  excl_lock.pl
  ---------
  use Fcntl qw(:flock);
  use Symbol;
  
  if ( fork() ) {
    my $fh = gensym;
    open $fh, ">/tmp/lock" or die "cannot open /tmp/lock $!";
    print "$$: I'm going to obtain the lock\n";
    flock $fh, LOCK_EX;
    print "$$: I've got the lock\n";
    sleep 20;
    close $fh;
  
  } else {
    my $fh = gensym;
    open $fh, ">/tmp/lock" or die "cannot open /tmp/lock $!";
    print "$$: I'm going to obtain the lock\n";
    flock $fh, LOCK_EX;
    print "$$: I've got the lock\n";
    sleep 20;
    close $fh;
  }

The code is simple. The process executing the code forks a second process, and both are doing the same thing: generate an unique symbol to be used as a file handler open the lock file for writing using the generated symbol, lock the file in an exclusive mode sleep for 20 seconds, pretending doing some lengthy operations and close the lock file, which also unlocks the file.

gensym function is a courtesy of Symbol module the code imports it from. Fcntl module provides us with a symbolic constant LOCK_EX which is being imported with :flock tag, which imports this an other flock() function attributes.

The code used by both processes is identical, therefore we cannot predict which one will get its hands on the lock file and succeed to lock it first, so we add print() statements to find out the PID of the blocking on lock request process.

When the above code executed from the command line, we see that one of the processes gets the lock:

  % ./excl_lock.pl
  
  3038: I'm going to obtain the lock
  3038: I've got the lock
  3037: I'm going to obtain the lock

We see that process 3037 is blocking (waiting to get the lock), so we attach to it:

  % strace -p 3037
  
  about to attach c10
  flock(3, LOCK_EX

It's clear from the above trace, that the process waits for exclusive lock.

The more you watch traces of different processes, the easier the understanding of what actually happens would be

Another approach to see another kind of trace of the running code is to use gdb (GNU debugger) (or another debugger). It's supposed to work at any platform the GNU development tools were ported to. Its purpose is to allow you to see what is going on ``inside'' another program while it executes--or what another program was doing at the moment it crashed. gdb requires the path to the binary program that the process you want to examine is executing, in addition to the process ID. In case of perl code it's /usr/bin/perl or a different path, for httpd process it would be the path to your httpd executable. I will show a few examples of using gdb to get a better understanding.

For example let's go back to our last locking example, execute it as before and attach to the process that didn't get the lock and waits:

  % gdb /usr/bin/perl 3037

The moment the debugger was started, we execute where command to see the trace:

  (gdb) where
  #0  0x40131781 in __flock ()
  #1  0x80a5421 in Perl_pp_flock ()
  #2  0x80b148d in Perl_runops_standard ()
  #3  0x80592b8 in perl_run ()
  #4  0x805782f in main ()
  #5  0x400a6cb3 in __libc_start_main (main=0x80577c0 <main>, argc=2, 
      argv=0xbffff7f4, init=0x8056af4 <_init>, fini=0x80b14fc <_fini>, 
      rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffff7ec)
      at ../sysdeps/generic/libc-start.c:78

Again, that's not what we've expected to see and now it's a different trace. #0 tells us the most recent call that was executed, which is a C language level flock()'s implementation, but the previous call (#1) isn't print() as we would expect, but a higher level of Perl's internal flock(). If we follow the trace of calls, what we actually see is an Opcodes tree, which can be better presented as:

  __libc_start_main
    main ()
      perl_run () 
        Perl_runops_standard ()
          Perl_pp_flock ()
            __flock ()

So I would say that it's less useful than strace, since it's almost impossible to know which of the flock()s was called if there are more than one in the code, something that is strace solves by showing the sequence of the system calls that are being executed, so using the sequence we can locate the corresponding lines in the code.

(META: the above is wrong - you can ask to display the previous command! What is it?)

For your information, when you attach to a running process with debugger, the program stops its executing and the control over the program is being passed to a debugger, so you can continue the normal program run with continue command or to execute it step by step with next and step commands you type at the gdb prompt. (next steps over any function calls in the line, while step steps into them).

C/C++ debuggers is a very large topic and I wouldn't discuss it in the scope of this document, but a gdb man page is quite a good document to start with. You might want also to check the ddd (Data Display Debbuger) which provides a visual interface to gdb and other debuggers. It even knows to debug perl programs!!!

For a completeness let's see the gdb trace of the httpd process that still hangs in the while(1) loop of the first example in this section.

  % gdb /usr/local/apache/bin/httpd 1005
  
  (gdb) where
  #0  0x4014a861 in __libc_nanosleep ()
  #1  0x4014a7ed in __sleep (seconds=1) at ../sysdeps/unix/sysv/linux/sleep.c:78
  #2  0x8122c01 in Perl_pp_sleep ()
  #3  0x812b25d in Perl_runops_standard ()
  #4  0x80d3721 in perl_call_sv ()
  #5  0x807a46b in perl_call_handler ()
  #6  0x8079e35 in perl_run_stacked_handlers ()
  #7  0x8078d6d in perl_handler ()
  #8  0x8091e43 in ap_invoke_handler ()
  #9  0x80a5109 in ap_some_auth_required ()
  #10 0x80a516c in ap_process_request ()
  #11 0x809cb2e in ap_child_terminate ()
  #12 0x809cd6c in ap_child_terminate ()
  #13 0x809ce19 in ap_child_terminate ()
  #14 0x809d446 in ap_child_terminate ()
  #15 0x809dbc3 in main ()
  #16 0x400d3cb3 in __libc_start_main (main=0x809d88c <main>, argc=1, 
      argv=0xbffff7e4, init=0x80606f8 <_init>, fini=0x812b33c <_fini>, 
      rtld_fini=0x4000a350 <_dl_fini>, stack_end=0xbffff7dc)
      at ../sysdeps/generic/libc-start.c:78

Just as before we can see a complete trace of the last executed call.

As you noticed I still didn't provide the promised explanation of the reason, the hanging in while(1) loop request processing wasn't aborted by Apache. The next section covers the case.

#=head1 Examples of strace (or truss) usage

#(META: below are some snippets of strace outputs from list's emails)

#[there was a talk about Streaming LWP through mod_perl and the topic #was suggested optimal buffer size]

#Optimal buffer size depends on your system configuration, watch #apache with strace -p (or truss) when its sending a static file, here #perlfunc.pod on my laptop (linux 2.2.7):

# writev(4, [{``HTTP/1.1 200 OK\r\nDate: Wed, 02''..., 289}, {``=head1 # NAME\n\nperlfunc - Perl b''..., 32768}], 2) = 33057 # alarm(300) = 300 # write(4, ``m. In older versions of Perl, i''..., 32768) = 32768 # alarm(300) = 300 # write(4, ``hout waiting for the user to hit''..., 32768) = 32768 # alarm(300) = 300 # write(4, ``>&STDOUT'') || die ``Can't dup ''..., 32768) = 32768 # alarm(300) = 300 # write(4, ``LEHANDLE is supplied. This has ''..., 32768) = 32768 # alarm(300) = 300 # write(4, ``ite,\nseek, tell, or eo''..., 25657) = 25657

[TOC]


Handling the 'User pressed Stop button' case

When a user presses STOP or RELOAD buttons, Apache detects this event via a SIGPIPE signal (Broken pipe) and ceases the script execution and performs all the cleanup stuff it has to do. It's important to stress the point that SIGPIPE will be triggered only when a process, that handles the connection that went broken, will attempt to send some data to the client (browser). If the script is doing some lengthy operation, without writing a thing to the client, it wouldn't be stopped until before the operation is completed and at least one character was sent back to the client.

This will work for apache >= 1.3.6, where it will not catch SIGPIPE anymore and modperl will do it much better. Here is a snippet from a Apache 1.3.6 CHANGES file.

  *) SIGPIPE is now ignored by the server core.  The request write
  routines (ap_rputc, ap_rputs, ap_rvputs, ap_rwrite, ap_rprintf,
  ap_rflush) now correctly check for output errors and mark the
  connection as aborted.  Replaced many direct (unchecked) calls to
  ap_b* routines with the analogous ap_r* calls.  [Roy Fielding]

Since Apache version 1.3.6:

  • $r->print returns true on success, false on failure (broken connection).

  • If you want the old SIGPIPE semanics, simply configure:

      PerlFixupHandler Apache::SIG
    

[TOC]


Detecting Aborted Connections

Let's use the knowledge we have acquired before to trace the execution of the code and see all the events as they are happening.

Let's take a little script that obviously ``hangs'' the server:

  my $r = shift;
  $r->send_http_header('text/plain');
  
  print "PID = $$\n";
  $r->rflush;
  
  while(1){
    $i++;
    sleep 1;
  }

The script gets a request object $r by shift()ing it from the @_ argument list passed by the handler() subroutine. (The magic is being done by Apache::Registry of course). Then the script sends a Content-type header, saying to the client that we are going to send a plain text.

We print out a single line telling us the number of the process that handles this request, which we need to know in order to run the tracing utility. Then we flush Apache's buffer, since if we don't we would never see the line printed. That's because the length of the output we print is very small and the buffer wouldn't be flushed before it becomes full or the request is over. Since our script intentionally hangs, we have to enforce the buffer to get flushed.

Then we enter a never ending while(1) loop, which all it does is incrementing a dummy $i variable and sleeping for a second, before returning on the two operations again and again.

Running strace -p PID, where PID is the process ID as printed to the browser, we see the following output printed every second:

  SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0
  SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0
  SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2)   = 0
  nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0
  time([941281947])                       = 941281947
  time([941281947])                       = 941281947

Let's leave the strace running and press the STOP button now. Anything was changed? No, the same trace printed every second. Which means that Apache didn't detect the broken connection, which verifies the statement that the script has to write something to trigger the SIGPIPE event.

Let's try to write that will write a NULL \0 character to the client so the detection would be possible as soon the Stop button was pressed:

  while(1){
    $r->print("\0");
    last if $r->connection->aborted;
    $i++;
    sleep 1;
  }

We add a print() statement to print a NULL character and then we check whether the connection was aborted. If it was, we break from the loop.

But if we run this script and strace on it as before, we see that it still doesn't work. What's missing is a flushing of the buffer, when we add it:

  my $r = shift;
  $r->send_http_header('text/plain');
  
  print "PID = $$\n";
  $r->rflush;
  
  while(1){
    $r->print("\0");
    $r->rflush;
  
    last if $r->connection->aborted;
  
    $i++;
    sleep 1;
  }

Watch the strace's output on the running process and press the Stop button, we see:

  SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0
  SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0
  SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2)   = 0
  nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0
  time([941284358])                       = 941284358
  write(4, "\0", 1)                       = -1 EPIPE (Broken pipe)
  --- SIGPIPE (Broken pipe) ---
  select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
  time(NULL)                              = 941284358
  write(17, "127.0.0.1 - - [30/Oct/1999:13:52"..., 81) = 81
  gettimeofday({941284359, 39113}, NULL)  = 0
  times({tms_utime=9, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 41551400
  close(4)                                = 0
  SYS_174(0xa, 0xbffff4e0, 0xbffff454, 0x8, 0xa) = 0
  SYS_174(0xe, 0xbffff46c, 0xbffff3e0, 0x8, 0xe) = 0
  fcntl(18, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}

Apache detects the broken pipe as we see from this snippet:

  write(4, "\0", 1)                       = -1 EPIPE (Broken pipe)
  --- SIGPIPE (Broken pipe) ---

Then stops the script, and does all the cleanup work, like access logging:

  write(17, "127.0.0.1 - - [30/Oct/1999:13:52"..., 81) = 81

That's what we see in a access_log file, 17 is a file descriptor of this file in this process. We will immediately talk about cleanups, since it's a very critical issue, with aborted scripts. But first let's see how can we make the code more generic.

Apache::SIG comes to help us, the following script doesn't need to check for aborted connections.

  use Apache::SIG ();
  Apache::SIG->set;
  
  my $r = shift;
  $r->send_http_header('text/plain');
  
  print "PID = $$\n";
  $r->rflush;
  while(1){
    $r->print("\0");
    $r->rflush;
    $i++;
    sleep 1;
  }

META: it kills the server!!! ???

Apache::SIG installs the SIGPIPE handler, that stops the script's execution for us.

If you would like to log when a request was canceled by a SIGPIPE in your Apache access_log, you can declare Apache::SIG as a handler (any Perl*Handler will do, as long as it is run before PerlHandler, e.g. PerlFixupHandler), and you must also define a custom LogFormat in your httpd.conf, like this:

  PerlFixupHandler Apache::SIG LogFormat "%h %l %u %t \"%r\" %s %b %{SIGPIPE}e"

If the server has noticed that the request was canceled via a SIGPIPE, then the log line will end with 1, otherwise it will just be a dash.

[TOC]


The Importance of Cleanup Code

Now the question is what happen to the locked resources if there are any? Will they be freed or not? Since if there are not, any script using these resources and the same advisory locking scheme, will be unable to run and will hang, waiting for this resource to get free, something that would never happen.

Under mod_cgi this was a problem only if you happened to use external lock files for lock indication, instead of using flock(). (there are systems where flock(2) unavailable, and you can use Perl's emulation of this function). If the script was aborted in the between lock and unlock code and you didn't worry to write a cleanup code to remove locks or otherwise write a code to break locks that are too old and suspected to be dead, you are in a big trouble.

With mod_cgi you can create an END block, and put the cleanup code there:

  END{
    # some code that ensures that locks are removed
  }

When the script is aborted, Apache will run the END blocks. But if you use flock() things are much simpler, since all opened files will be closed and all the internally locked resources will be freed, because when the file is being closed, the lock is being removed as well.

Things are more complex with mod_perl. Unless you explicitly close() the files, they wouldn't be automatically closed, since the processes don't exit upon a single request processing completion. Let's see what problems we might encounter and possible solutions for them.

[TOC]


Critical Section

I want to make a little step aside and discuss a ``critical section'' issue before we continue.

Let's start with resource locking scheme. A schematic representation of a proper locking technique is as follows:

  1. lock a resource
     <critical section starts>
  2. do something with the resource
     <critical section ends>
  3. unlock the resource

If the locking is exclusive, only one process can hold the resource at any given time, which means that all the other processes will have to wait, and this code snippet becomes a so called bottleneck. That's why the section of the code where the resource is locked is called critical and you must make it as short as possible.

In a shared locking scheme, where many processes can concurrently access the resource, it's important to keep the critical section as short as possible as well, if there are processes that sometimes want to get an exclusive lock. This code uses a shared lock, but has a non-optimized critical section:

  use Fcntl qw(:flock);
  use Symbol;
  my $fh = gensym;
  
  open $fh, "filename" or die "$!";
  flock $fh, LOCK_SH;
  
    # start critical section
  seek $fh, 0, 0;
  my @lines = <$fh>;
  for(@lines){
    print if /foo/;
  }
    # end critical section
  
  close $fh; # close unlocks the file

It opens the file for reading, locks and rewinds to the start, reads all the lines in and prints out the lines that include a foo string in them. Since once the file was read, we don't need it opened and locked anymore, we might close it earlier, since the loop might take some time to complete so we move it after the resource was freed:

  use Fcntl qw(:flock);
  use Symbol;
  my $fh = gensym;
  
  open $fh, "filename" or die "$!";
  flock $fh, LOCK_SH;
  
    # start critical section
  seek $fh, 0, 0;
  my @lines = <$fh>;
    # end critical section
  
  close $fh; # close unlocks the file
  
  for(@lines){
    print if /foo/;
  }

This is another very similar script, but now using a shared lock. It reads in a file and writes it back prepending a number of new text lines to a head of the file.

  use Fcntl qw(:flock);
  use Symbol;
  my $fh = gensym;
  
  open $fh, "+>>filename" or die "$!";
  flock $fh, LOCK_EX;
  
    # start critical section
  seek $fh, 0, 0;
  my @add_lines =
    (
     qq{Complete documentation for Perl, including FAQ lists,\n},
     qq{should be found on this system using `man perl' or\n},
     qq{`perldoc perl'. If you have access to the Internet, point\n},
     qq{your browser at http://www.perl.com/, the Perl Home Page.\n},
    );
  
  my @lines = (@add_lines, <$fh>);
  seek $fh, 0, 0;
  truncate $fh, 0;
  print $fh @lines;
    # end critical section
  
  close $fh; # close unlocks the file

First let's explain how the code works. I will discuss in a minute why did I use Symbol module to generate the file handler variables.

Since we want to read the file, modify and write it back, without anyone changing it on the way, we open it for read and write with help of +>> (you could get away with +< as well, see perldoc -f open or perlfunc manpage for more information about open() function) and lock it with exclusive lock. You cannot safely accomplish this task with opening the file first for read and then reopening for write, since another process might change the file, between the stages.

Next the code prepares the lines of text it wants to prepend to the head of the file, and assigns them and the content of the file to @lines array. Now when we have a data that ready to be written back to the file, the file is being rewinded to the start with help of seek() and truncate()d to a zero size, which is useless in our case, but a must thing if there is a chance that the file will shrink. In our example the file only grows. But it's better to always use truncate(), as you never know what changes your code might undergo in the future. This operation is not the one that you will blame for a performance overhead.

Finally we write the data to the file and close it, which unlocks it as well. Did you notice that we created the text lines to be prepended, as close to the place of usage as possible, according to a locality of code style, which is good but it makes the critical section longer. In such a places you should sacrifice any style rules you've got used to, in order to make the critical section as short as possible. A corrected version of this script with the shorter critical section looks like:

  use Fcntl qw(:flock);
  use Symbol;
  
  my @lines =
    (
     qq{Complete documentation for Perl, including FAQ lists,\n},
     qq{should be found on this system using `man perl' or\n},
     qq{`perldoc perl'. If you have access to the Internet, point\n},
     qq{your browser at http://www.perl.com/, the Perl Home Page.\n},
    );
  
  my $fh = gensym;
  open $fh, "+>>filename" or die "$!";
  flock $fh, LOCK_EX;
  
    # start critical section
  seek $fh, 0, 0;
  push @lines, <$fh>;
  
  seek $fh, 0, 0;
  truncate $fh, 0;
  print $fh @lines;
    # end critical section
  
  close $fh; # close unlocks the file

The difference is in preparing the text lines before the file is locked and appending the rest of the file to the @lines array, instead of creating a new array and copying the lines that were available before the locking time after it as in the original example.

[TOC]


Safe Resource Locking

Let's get back to the main issue of this section, which is a safe locking.

If didn't make a habit of closing all the files that you open, you will encounter lots of troubles, unless you use the Apache::PerlRun handler that does the cleanup for you. If you open the file but doesn't close it, you will have a file descriptor leakage. Since a number of file descriptors available is final, at some point you will run out of them and your service will cease its operations.

This is bad, but you can live with this till before you run out of file descriptors, of course this will happen much faster on a heavily used server. But this is nothing relative to the trouble you enter yourself into if you lock the files and forget to unlock or close them. Since close() always unlocks the file, you don't have to explicitly unlock files. Unlocked file will stay locked after your code finished, and all the other scripts requesting to lock the same resource (file) will wait indefinitely for it to become unlocked. Since it would never happen, until the server restart time, all these processes would hang. This is the offending code:

  open IN, "+>>filename" or die "$!";
  flock IN, LOCK_EX;
  # do something
  # quit without closing and unlocking the file

OK, so let's add the close():

  open IN, "+>>filename" or die "$!";
  flock IN, LOCK_EX;
    # start critical section
  # do something
    # end critical section
  # close and unlock the file
  close IN;

Is it a safe code now? Unfortunately it is not. If user aborts the request by pressing Stop or Reload buttons in the middle of the critical section, there is a chance that script will be aborted in before it had a chance to close() the file, which returns us back to the situation where we were forgetting to close the files in first place.

What is the remedy for this poison? There are few approaches to solve this problem. If you are running under Apache::Registry and friends handlers, the END block will perform the cleanup work for you, the same way you might use it in the scripts running under mod_cgi or in the plain perl scripts. Just add the cleanup code to this block and you are all safe. If you are writing your own handlers you register_cleanup() allows you to register code similar to the END blocks, since END blocks will be executed only when a process exits, and not after a request completion.

We will see a few examples later. Now I want to show a much easier safe locking solution. The problem we have encountered, is actually lays in the fact that file handlers like IN are global variables. If we could make them lexically scoped all our worries would go away. You know that lexically scoped (with my() operand) variables are being automatically destroyed when they go out of scope, so when the program quits all the lexical variables will be destroyed, since they leave the file scope. When the variable holding an opened file descriptor is being destroyed, the file will be automatically closed.

So if you use this technique to work with files, you even don't have to close the files! You still want to make sure that you close them as soon as possible if you recall the critical section discussion. In addition to this safe file handling having the file handlers lexically scoped, protect you from names collisions, e.g when you have to open more than one file, you always have to make sure you didn't use the same name somewhere else in the code and that file is might still be open. To emphasize the risk of collisions think of subroutine that opens a file for you:

  sub open_file{
    my $filename  = shift;
    open FILE, ">$filename" or die "$!";
    return \*FILE;
  }

  my $fh1 = open_file("/tmp/x");
  my $fh2 = open_file("/tmp/y");
  print $fh1 "X";
  print $fh2 "Y";

Obviously this code doesn't do what you think it should do. Instead of writing a character X to /tmp/x file and Y to /tmp/y, what you see is that /tmp/x is empty and /tmp/y contains a XY string. Why is that? Because you have used the same global variable, and when you have called open_file() for a second time, it opened a different file using the same variable. Since open_file() returns a reference to a file handler and it's the same global variable all the time -- both $fh1 and $fh2 point to it.

However, as you just saw we can generate unique file handlers that can be lexically scoped with Symbol module. Symbol::gensym() creates an anonymous glob and returns a reference to it. Such a glob reference can be used as a file or directory handle.

  use Symbol;
  my $fh = gensym;
  open $fh, "+>>filename" or die "$!";
  flock $fh, LOCK_EX;
  # do something

Now the file will be always unlocked a the end of the request's processing. Instead of close() you might use a block:

  use Symbol;
  {
    my $fh = gensym;
    open $fh, "+>>filename" or die "$!";
    flock $fh, LOCK_EX;
    # do something
  }
  # the file will be automatically closed and unlocked at this point

But this is not so obvious to the reader of the code so you might want to avoid the last technique.

You can use the IO::* modules as well, such as IO::File or IO::Dir, but these are much bigger than <Symbol> module, and worth using for file or directory opening only if you are already using them for other features they provide. As a matter of fact, these modules use <Symbol> module themselves. The examples of their usage:

  use IO::File;
  my $fh = new IO::File "> filename";
  # the rest is as before

and:

  use IO::Dir;
  my $dh = new IO::Dir "dirname";

[TOC]


Cleanup Code

Finally, let's see when do we need a special clean up code. As you just saw we have solved the problem of file handers by lexically scoping them. There are situation, you must write a cleanup code. A good example for this is a tied dbm file.

A reminder: dbm file is a simple database, which allows you to store pairs of keys and values in it. As of this writing Berkeley DB is the most advanced dbm implementation, and allows you to store key/values using the HASH, BTREE and RECNO algorithms. (refer to a DB_File man page for more info.) DB_File module provides a Perl interface to 1.x versions of Berkeley DB. (BerkeleyDB module should handle more recent Berkeley DB versions 2 and 3)

Working with dbm files is very simple, because they are represented in Perl as a simple hash variables, with help of TIE interface, and they behave exactly like hashes. In order to access a dbm file you have to tie it first:

  use Fcntl qw(O_RDWR O_CREAT);
  use DB_File;
  my $filename = "/tmp/mydb";
  my %hash;
  tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
     or die "Can't tie %hash : $!";

A first argument to tie() is a hash variable, we want the dbm file to be tied to. Following arguments are a name of the module that provides an interface to a dbm implementation we want to use, DB_File in our case, then a filename the dbm resides in, Fcntl flags, file permissions and finally the interface method (DB_HASH, DB_BTREE or DB_RECNO) to be used.

From now on we use %hash to read from and write to a dbm file, like:

  my $name = $hash{foo};
  $hash{foo} = "Larry Wall";

The only nuance is that when we modify the hash by assigning some values, it doesn't write the changes immediately to a file, but caches them to improve a performance. It flushes its cache buffers when either they become full, a sync() method is being called on its database handler or the hash is being untied (closed). So if the program quits abnormally, a dbm file might get corrupted.

To untie the dbm file, you simply call:

  untie %hash;

To get the access to sync() method, you should retrieve the database handler which is being returned by tie() method:

  my $dbh = tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
     or die "Can't tie %hash : $!";

Now you can flush the cache with:

  $hash{foo} = "Larry Wall";
  $dbh->sync;

Important: If you have saved a copy of the object returned from tie(), the underlying database file will not be closed until both the tied variable is untied and all copies of the saved object are destroyed. We do it as follows

  undef $dbh;
  untie %hash;

Of course, you have to lock the dbm file exactly like any other resource if some script modifies its contents. Refer to Locking dbm handlers for more info.

Ok, enough with introduction, let's get to the point. Since both %hash and $dbh are lexically scoped variables, they always will be destroyed, no matter whether you forgot to untie() or the request was aborted before the untie() part.

Suppose that you want to take the benefit of mod_perl's persistent global variables in each process and to use this feature to create persistent dbm hashes. So you tie them only once per process, and save the time to tie() and untie() per request. The idea is good, assuming that you remember that you have to flush the cache buffers when you modify the hash that represents the dbm file with sync() method.

Let's code the idea:

  use strict;
  use vars qw($dbh %hash);
  use Fcntl qw(:flock O_RDWR O_CREAT);
  use DB_File;
  use Symbol;

We declare $dbh and %hash as global variables, then pull in the Fcntl module and import the symbols we are going to use. Actually we need only LOCK_EX from the tags provided by :flock. We pull in DB_File and Symbol modules.

  my $r = shift;
  $r->send_http_header('text/plain');
  $r->print("PID $$\n");

Send the Content-type header of plain text type and tell the user the PID of the process that serves the request.

  my $filename = "/tmp/mydb";
  my $lockfile = "$filename.lock";

Configure the location of the dbm file and its lock file.

  my $fh = gensym;
  open $fh, ">$lockfile" or die "Cannot open $lockfile: $!";
  flock $fh, LOCK_EX;

Generate a unique anonymous glob, store it in a lexically scoped variable $fh and lock the file, which in turn advisory locks the dbm file which will be safely tied now, because for the other copies of this script to access the following code they have to acquire the lock file first, and since it's an exclusive lock, only one replication of the script will be able to tie the dbm file.

  $dbh ||= tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
     or die "Can't tie %hash : $!";

This code snippet demands some deeper explanation.

  $a ||= $b;

is the same as:

  $a = $a || $b;

The || check is a boolean one (testing for truth) and it doesn't care about undefined values, since undef is false in Perl. So what it does is: leave $a unmodified if it's a true value, otherwise test $b and assign its value to $a if it's true. If it's false as well, $a stays undefined. (note that 0 and "" (empty string) are both defined but false values!) (refer to perlop(1) manpage for more info about || operator)

Back to our tie() snippet. For each mod_perl process when this code will be executed for the first time, $dbh variable is undefined, therefore a right part of the statement will be executed, which will tie() the dbm file. On every consequent code execution in the same process, $dbh will contain a database handler which is a true value, so the tie() call will be saved.

  $hash{int rand 10} = (qw(a b c d))[int rand 4];

Fill the dbm file with random keys and values. Each invocation of the code would either generate a new key/value pair or override an old one, if an existing key will be chosen by rand().

  $dbh->sync();

The most important part of the code is to flush the modifications.

    # unlock the db
  close $fh;

Now it's safe to unlock the dbm file. Please refer to Locking dbm handlers to learn why you should use a dbm's file descriptor to lock itself. To make long explanations short -- it may get your dbm file corrupted.

    # printout the contents of the the dbm file
  print map {"$_ => $hash{$_}\n"} sort keys %hash;

After we leave the critical section, we can take our time and print out the current contents of the dbm file.

Here is the same code in one piece:

  use strict;
  use vars qw($dbh %hash);
  use Fcntl qw(:flock O_RDWR O_CREAT);
  use DB_File;
  use Symbol;
  
  my $r = shift;
  $r->send_http_header('text/plain');
  $r->print("PID $$\n");
  
  my $filename = "/tmp/mydb";
  my $lockfile = "$filename.lock";
  
  my $fh = gensym;
  open $fh, ">$lockfile" or die "Cannot open $lockfile: $!";
  
    # must lock the db file before opening it
  flock $fh, LOCK_EX;
  
  $dbh ||= tie %hash, 'DB_File', $filename, O_RDWR|O_CREAT, 0660, $DB_HASH
     or die "Can't tie %hash : $!";
  
    # fill the dbmfile with random keys values 
  $hash{int rand 10} = (qw(a b c d))[int rand 4];
  
    # sync the DB
  $dbh->sync();
  
    # unlock the db
  close $fh;
  
    # printout the contents of the the dbm file
  print map {"$_ => $hash{$_}\n"} sort keys %hash;

Well, if you run this code, you pretty soon figure out that this code doesn't do what we thought it would. What happens is that each process keeps its own copy of the %hash and modifies it. When the process calls sync() method, the dbm file is being updated and now equal to the contents of the %hash of this process. If the next request will be processed by the process that didn't yet tie()d the %hash it would be initialized to the value of the %hash of the last process that called sync() on this dbm file, but if it would be handled by a process that already tied %hash before it wouldn't read the contents from the dbm file but use its private value of the %hash.

In reality things are even more complicated. The above scenario is true only when the hash file is smaller than a buffer size of the dbm file, when it becomes bigger than buffer, its contents are being flushed. So when you do keys %hash, all the keys should be brought from the dbm file, which causes the process to read the values saved by the previous sync() calls and buffer overflow automatic flushes. Which creates a whole big mess with data and makes the whole idea unreal and useless.

But if we have arrived so far, let's see what other thing is flawed in this code. It's the sync() call. If script is being stopped before sync() called, the dbm will be unlocked, since $fh is lexically scoped, but it wouldn't be properly sync()ed, which at some point will corrupt the dbm file.

The solution is quite simple -- write an END block to sync the file:

  END{
    # make sure that the DB is flushed
     $dbh->sync();
  }

The above will work only for Apache::Registry scripts, otherwise the END will be postponed till the process termination time. If you write a handler in Perl API use the register_cleanup() method instead. It accepts a reference to a subroutine as an argument:

  $r->register_cleanup(sub { $dbh->sync() });

Even a more correct code would be to check whether the connection was aborted, since you if you don't check -- the cleanup code will be always executed, which can be an unwanted thing for a normally finished scripts.

  $r->register_cleanup
   (sub { 
      $dbh->sync() if Apache->request->connection->aborted();
        });

So in the case of END block usage you would use:

  END{
    # make sure that the DB is flushed
     $dbh->sync() if Apache->request->connection->aborted();
  }

Note that if you use register_cleanup() it should be used at the beginning of the script, or as soon as variables you want to use in this code becomes available. If you use it at the end of the script, and script is being aborted before this code is reached, there will be no cleanup performed.

For example CGI.pm registers the cleanup subroutine in its new() method:

  sub new {
    # code snipped
    if ($MOD_PERL) {   
        Apache->request->register_cleanup(\&CGI::_reset_globals);
        undef $NPH;
    }
    # more code snipped
  }

There is also another way to register a cleanup code for Perl API handlers. You may use a PerlCleanupHandler in the configuration file, like:

  <Location /foo>
    SetHandler perl-script
    PerlHandler        Apache::MyModule
    PerlCleanupHandler Apache::MyModule::cleanup()
    Options ExecCGI
  </Location>

where Apache::MyModule::cleanup() is supposed to perform a cleanup.

[TOC]


Handling the server timeout cases and working with $SIG{ALRM}

A similar situation to Pressed Stop button disease happens when client (browser) timeouts the connection (is it about 2 minutes?) . There are cases when your script is about to perform a very long operation and there is a chance that its duration will be longer than the client's timeout. One case I can think about is the DataBase interaction, where the DB engine hangs or needs a lot of time to return results. If this is the case, use $SIG{ALRM} to prevent the timeouts:

    $timeout = 10; # seconds
  eval {
    local $SIG{ALRM} =
        sub { die "Sorry timed out. Please try again\n" };
    alarm $timeout;
    ... db stuff ...
    alarm 0;
  };
  
  die $@ if $@;

But, as lately it was discovered local $SIG{'ALRM'} does not restore the original underlying C handler. It was fixed in the mod_perl 1.19_01 (CVS version). As a matter of fact none of the local $SIG{FOO} restore the original C handler - read Debugging Signal Handlers ($SIG{FOO}) for a debug technique and a possible workaround.

[TOC]


Watching the server

This is a very useful feature. You can watch what happens to the Perl parts of the server. Here are the instructions for configuring and using this feature:

[TOC]


Configuration

Add this to http.conf:

  <Location /perl-status>
    SetHandler perl-script
    PerlHandler Apache::Status
    order deny,allow
    #deny from all
    #allow from 
  </Location>

If you are going to use Apache::Status it's important to put it as the first module in the start-up file, or in the httpd.conf:

  # startup.pl
  use Apache::Registry ();
  use Apache::Status ();
  use Apache::DBI ();

If you don't put Apache::Status before Apache::DBI, you wouldn't get Apache::DBI's menu entry in status.

For more about Apache::DBI see Persistent DB Connections.

[TOC]


Usage

Assuming that your mod_perl server listens on port 81, fetch http://www.myserver.com:81/perl-status

  Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16 
  process 187138, running since Thu Nov 19 09:50:33 1998

Below all sections should be links:

  Signal Handlers
  Enabled mod_perl Hooks
  PerlRequire'd Files
  Environment
  Perl Section Configuration
  Loaded Modules
  Perl Configuration
  ISA Tree
  Inheritance Tree
  Compiled Registry Scripts
  Symbol Table Dump

Let's follow, for example, PerlRequire'd Files. We see:

  PerlRequire                   Location
  /home/perl/apache-startup.pl  /home/perl/apache-startup.pl

From some menus you can continue deeper to peek into the internals of the server, to see the values of the global variables in the packages, to the cached scripts and modules, and much more. Just click around...

[TOC]


Compiled Registry Scripts section seems to be empty.

Sometimes when you fetch /perl-status and follow the Compiled Registry Scripts you see no listing of scripts at all. This is absolutely correct: Apache::Status shows the registry scripts compiled in the httpd child which is serving your request for /perl-status. If a child has not compiled yet the script you are asking for, /perl-status will just show you the main menu.

[TOC]


Sometimes script works, sometimes does not

See Sometimes it Works Sometimes it does Not

[TOC]


Code Debug

When the code doesn't perform what it's expected to, either never or just sometimes we say that this code requires debugging. There are a few levels of debug complexity.

The basic level is when perl terminates the program in the interpretation (compilation) stage before it started to run. Usually that happens when either there are syntax errors or some module is missing. Sometimes it takes an effort to solve this task, since code that uses Apache CORE modules generally wouldn't compile when executed from shell. We will learn how to solve syntax problems in mod_perl code quite easily.

Once the program compiles and begins to run, there might be logical (algorithmic) problems, when the program doesn't do the right thing you programmed it to do. This is somewhat harder to solve, especially when there is a lot of code that need to be observed and reviewed, but it's just a matter of time. Perl helps a lot to locate typos when you enable to warnings, for example it warns you about places when you wanted to compare to numbers, but omitted the second '=' character, so you end up with something like if $yes = 1 instead of if $yes == 1.

The next level is when the program does what it expected to most of the time, but occasionally it misbehaves, but doing something different. An observation of the code generally doesn't help, and either print() statements or perl debugger come to help. Many times it's quite easy to debug with print(), but sometimes the overhead of typing the debug messages can be very tedious, especially when you didn't yet spot the lines where the bug happens to hide. That's where a perl debugger comes to help.

While print() statements are always work, running the perl debugger for CGI scripts, might be quite a challenge. But with a right knowledge and tools in hand the debug process becomes much easier. Unfortunately there is no way to easy the debug of the program itself, as it depends on the code you wrote, and it can be quite a nightmare to debug a really complex code.

The worst thing you can think of, is when the process terminates in the middle of a request processing and dumps core. Operating system dumps core (read: creates a file called core in directory the process was running at) when the program tries to access a memory area that doesn't belong to it, which generally happens when there is a bug. This is something that you would almost never see with plain perl scripts, but can easily happen if you use modules whose guts are written in C or C++ and something goes wrong with them. Occasionally there is a bug in underlying C code of mod_perl itself, that was in a deep slumber before your code waked it up.

In the following sections we would go in details through each of the presented problems, thoroughly discuss them and present a few techniques to solve them.

[TOC]


Locating and correcting Syntax Errors

While developing code, many times we do some syntax mistakes, like forgetting to put a semicolon at the end of statement ([S] unless it's an end of a block, where it's not required, but better if used since there is a chance that you will add more code at the end, and when you do, you might forget to add the missing semicolon.[/S]), comma in the list ([S] for the same reason, more items might be added to the list and perl has no problem when you finish the list with comma unlike other languages.[/S]) or else.

One of the approaches to locate the syntactically incorrect code, is to execute the script from shell with -c flag that only validates the syntax but wouldn't run the code (Actually, it will execute BEGIN, END blocks, and use() calls, because these are considered as occurring outside the execution of your program. Also it's a good idea to add -w switch to enable the warnings:

  perl -cw test.pl

When executed and there are errors in the code, perl will report about the errors and the appropriate line numbers in the script.

Next step is to execute the script, since besides syntax errors there are run time errors, these are the errors that cause the "Internal Server Error" when executes from the browser. With plain CGI scripts it's the same as running a plain perl scripts -- just execute it and see that they work.

However the whole thing is quite different with scripts that use Apache::* modules which can be used only from within the mod_perl server, since they rely on the code and circumstances , which aren't available when you attempt to execute the script from shell, since there is no Apache request object available to the code.

If you have problems with code, you can either watch the errors and warnings as they are logged to error_log file when you make a request to the script from the browser, or use an Apache::FakeRequest module written by Doug MacEachern and Andrew Ford.

[TOC]


Using Apache::FakeRequest to Debug Apache Perl Modules

Apache::FakeRequest is used to set up an empty Apache request object that can be used for debugging. The Apache::FakeRequest methods just set internal variables of the same name as the method and return the value of the internal variables. Initial values for methods can be specified when the object is created. The print method prints to STDOUT.

Subroutines for Apache constants are also defined so that using Apache::Constants while debugging works, although the values of the constants are hard-coded rather than extracted from the Apache source code.

Let's write a very simple module, which prints "OK" to the client's browser:

  package Apache::Example;
  use Apache::Constants;
  
  sub handler{
    my $r = shift;
    $r->send_http_header('text/plain');
    print "You are OK ", $r->get_remote_host, "\n";
    return OK;
  }
  
  1;

You cannot debug this module unless you configure the server to call its handler from some location. But with help of Apache::FakeRequest you can write a little script that will emulate a request and return the expected output.

  #!/usr/bin/perl

  use Apache::FakeRequest ();
  use Apache::Example ();
  
  my $r = Apache::FakeRequest->new('get_remote_host'=>'www.foo.com');
  Apache::Example::handler($r);

when you execute the script from the command line, you will see the following output:

  You are OK www.foo.com

[TOC]


Finding the Line Number the Error/Warning has been Triggered at

Apache::Registry, Apache::PerlRun and modules that compile-via-eval confuse the line numbering. Modules that are read normally by Perl from disk have no problem with file name/line number.

If you compile with the experimental PERL_MARK_WHERE=1, it shows you almost the exact line number, where this is happening. Generally a compiler makes a shift in its line counter. You can always stuff your code with special compiler directives, to reset its counter to the value you will tell. At the beginning of the line you should write (the '#' in column 1):

  #line 298 myscript.pl
  or 
  #line 890 some_label_to_be_used_in_the_error_message

The label is optional - the filename of the script will be used by default. This specifies the line number of the following line, not the line the directive is on. You can use a little script to stuff every N lines of your code with these directives, but then you will have to rerun this script every time you add or remove code lines. The script:

  #!/usr/bin/perl
  # Puts Perl line markers in a Perl program for debugging purposes.  
  # Also takes out old line markers.
  die "No filename to process.\n" unless @ARGV;
  my $filename = $ARGV[0];
  my $lines = 100;
  open IN, $filename or die "Cannot open file: $filename: $!\n";
  open OUT, ">$filename.marked"
      or die "Cannot open file: $filename.marked: $!\n";
  my $counter = 1;
  while (<IN>) {
    print OUT "#line $counter\n" unless $counter++ % $lines;
    next if $_ =~ /^#line /;
    print OUT $_;
  }
  close OUT;
  close IN;
  chmod 0755, "$filename.marked";

Also notice, that another solution is to move most of the code into a separate modules, which ensures that the line number will be reported correctly.

To have a complete trace of calls add:

  use Carp ();
  local $SIG{__WARN__} = \&Carp::cluck;

[TOC]


Using print() Function for Debugging

The universal debugging tool across nearly all platforms and programming languages is the printf() or equivalent output function, which can send data to the console, a file, application window and so on. In perl we generally use the print() function. With an idea of where and when the bug is triggered, a developer can insert print() statements in the source code to examine the value of data at certain points of execution.

However, it is rather difficult to anticipate all possible directions a program might take and what data to suspect of causing trouble. In addition, inline debugging code tends to add bloat and degrade performance of an application. So you have to comment out or remove the debug printings when you think that you have solved the problem, but if later you discover that you need to debug the same code again you need in the best case to uncomment the debug code lines or write them from scratch.

Let's see a few examples where we use print() to debug some problem. In one of my applications I wrote a function that returns the date that was a week ago. Here it is:

  print "Content-type: text/plain\n\n";
  
  print "A week ago date was ",date_a_week_ago(),"\n";
  
  # return a date one week ago as a string in format: MM/DD/YYYY
  ####################
  sub date_a_week_ago{
  
    my @month_len   = (31,28,31,30,31,30,31,31,30,31,30,31);
  
    my ($day,$month,$year) = (localtime)[3..5];
    for (my $j = 0; $j < 7; $j++) {
  
      $day--;
      if ($day == 0) {
  
        $month--;
        if ($month == 0) {
          $year--;
          $month = 12;
        }
  
          # there are 29 days in February in a leap year
        $month_len[1] =  
          (($year % 4 or $year % 100 == 0) and $year % 400 )
        ? 28 : 29;
  
          # set $day to be the last day of the previous month 
        $day = $month_len[$month - 1]; 
  
      }   # end of if ($day == 0)
    }     # end of for ($i = 0;$i < 7;$i++)
  
    return sprintf "%02d/%02d/%04d",$month,$day,$year+1900;
  }

This code is pretty straightforward. Get today's date and subtract one from the value of the day we get, updating on the way the month and the year if the boundaries are being crossed (end of month, end of year). Do it seven times in loop, and at the end you should get a date that was a week ago.

Note that since locatime() returns year as a value of current_four_digits_format_year-1900, which means that we don't have a century boundary to worry about, since if we are in the middle of the first week of the year 2000, the value of year returned by localtime() would be 100 and not 0 as you mistakenly might assume. So when the code does $year-- it becomes 99 and not -1. At the end we add 1900 and get back a correct four digit year format.

Also note that we have to cover the case of the leap year, where there are 29 days in the February. For the rest of months we have prepared an array with month lengths.

Now when we run this code and check the result, we see that something is wrong. For example if today is 10/23/1999 and we expect the above code to print 10/16/1999, it prints: 09/16/1999, which means that we have lost a month, therefore the above code is buggy.

Let's stuff a few debug print() statements in the code near the $month variable:

  sub date_a_week_ago{
  
    my @month_len   = (31,28,31,30,31,30,31,31,30,31,30,31);
  
    my ($day,$month,$year) = (localtime)[3..5];
    print "[set] month : $month\n";
    for (my $j = 0; $j < 7; $j++) {
  
      $day--;
      if ($day == 0) {
  
        $month--;
        if ($month == 0) {
          $year--;
          $month = 12;
        }
        print "[loop $i] month : $month\n";
  
          # there are 29 days in February in a leap year
        $month_len[1] =  
          (($year % 4 or $year % 100 == 0) and $year % 400 )
        ? 28 : 29;
  
          # set $day to be the last day of the previous month 
        $day = $month_len[$month - 1]; 
  
      }   # end of if ($day == 0)
    }     # end of for ($i = 0;$i < 7;$i++)
  
    return sprintf "%02d/%02d/%04d",$month,$day,$year+1900;
  }

When we run it we see:

  [set] month : 9

Which is supposed to be the number of current month (10), when it actually is not. We have spotted a bug, since the only code that sets the $month variable consists of a call to localtime(). So did we find a bug in Perl? let's look at the man page of the localtime() function: % perldoc -f localtime Converts a time as returned by the time function to a 9-element array with the time analyzed for the local time zone. Typically used as follows:

    #  0    1    2     3     4    5     6     7     8
    ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
                                                localtime(time);

  All array elements are numeric, and come straight out of a struct
  tm.  In particular this means that C<$mon> has the range C<0..11>
  and C<$wday> has the range C<0..6> with Sunday as day C<0>.  Also,
  C<$year> is the number of years since 1900, that is, C<$year> is
  C<123> in year 2023, and I<not> simply the last two digits of the
  year.  If you assume it is, then you create non-Y2K-compliant
  programs--and you wouldn't want to do that, would you?
  [more info snipped]

Which reveals us that we are supposed to increment a value of <$month>, if we want to count months from 1 to 12 and not 0 to 11. Among other interesting facts about locatime() we also see an explanation about $year, which as I've mentioned before is being set to the number of years since 1900.

Thus we have found the bug in our code and learned new things about localtime(). To correct the above code we just add a month's increment after we call localtime():

    my ($day,$month,$year) = (localtime)[3..5];
    $month++;

META: continue (unfinished)!!!

Now let's see some code including conditional and loop statements.

  for my $i (1..31)
  if( $day > 20) {
  }

[TOC]


Using print() and Data::Dumper for Debugging

Sometimes you need to peek into a complex data structures, and trying to print them out can be a non-easy task. That's where Data::Dumper comes to a resque. For example if we create this complex data structure:

  $data =
    {
     array => [qw(a b c d)],
     hash  => {
               foo => "oof",
               bar => "rab",
              },
    };

How do we print it out? Very easily:

  use Data::Dumper;
  print Dumper \$data;

What we get is a pretty printed $data:

  $VAR1 = \{
            'hash' => {
                        'foo' => 'oof',
                        'bar' => 'rab'
                      },
            'array' => [
                        'a',
                        'b',
                        'c',
                        'd'
                       ]
          };

While writing this example I made a mistake and wrote qw(a b c d) instead of [qw(a b c d)], when I pretty printed the contents of $data I immediately saw my mistake:

  $VAR1 = \{
            'b' => 'c',
            'd' => 'hash',
            'HASH(0x80cd79c)' => undef,
            'array' => 'a'
          };

That's not what I wanted of course, I've spotted the bug and corrected it, as you saw in the original example from above.

[TOC]


The Importance of Good Coding Style and Conciseness

META: rewrite: blabla about -- very hard to find bugs and even understand the code below because of its obscurity. The example from the previous section is hard to debug too, because there is too much redundancy in it, you should develop a good coding style by creating a concise code but which is easy to understand (See the example below)...

it's much easier to find bugs

A shrinked version of the main loop, that wouldn't add for easier code understanding looks like:

  for (0..7) {
    next if --$day;
    $year--,$month=12 unless --$month;
    $day = $month != 1 ? $month_len[$month-1] : $year % 4 ? 28 : 29;
  }

Don't do that at home :)

Why did I actually present the latter version? The shrinked version is too obfuscated, which makes it not easy to understand and maintain. From the other hand part of this code is easier to understand.

Larry Wall, the author of Perl, is a linguist, so he tried to define the syntax in a way that will make coding in Perl much like in English. So it can be a very good idea to learn perl coding idioms, which might seem inconvenient in the beginning but once you get used to them, you will not understand how could you live without them before. I'll show just a few of most used perl coding style idioms. It's a good idea to write the code that more readable but avoid redundancy, like instead of writing:

  if ($i == 0) ...

it's better to write:

  unless ($i)

Use a much more concise perlish style:

  for my $j (0..7) {

instead of a syntax you've got used from other languages:

  for (my $j=0; $j<7; $j++) {

it's much simpler to write and comprehend the code like:

  print "something" if $debug;

rather than:

  if($debug){
    print "something";
  }

A good style that improves understanding, readability and reduces a chance to have a bug is shown below in a form of yet another rewrite of the original version of the code:

  for (0..7) {
    $day--;
    next if $day;
  
    $month--;
    unless ($month){
      $year--;
      $month=12
    }
  
    if($month == 1){
      $day = $year % 4 ? 28 : 29;
    } else {
      $day = $month_len[$month-1];
    }
  } 

which is a gold middle between the too verbose style as in the first example and too obfuscated second example.

And of course a two liner, which is much faster and easier to understand is:

  sub date_a_week_ago{
    my ($day,$month,$year) = (localtime(time-604800))[3..5];
    return sprintf "%02d/%02d/%04d",$month+1,$day,$year+1900;
  }

Just take the current date in seconds since epoch as time() returns, subtract a week in seconds (7*24*60*60 = 604800) and feed the result to localtime() - voila we've got a date from a week ago!

Why the last version is important, when the first one works just fine? Not because of performance issues, and the last one is a twice faster, but because the are more chances that you have a bug in the first version, than in the last one.

[TOC]


Introduction into Perl Debugger

As we saw it's almost always possible to debug code with help of print(). However, it is rather difficult to anticipate all possible directions a program might take and what data to suspect of causing trouble. In addition, inline debugging code tends to add bloat and degrade performance of an application. Although, most applications offer inline debugging as a compile time option to avoid these hits. In any case, this information tends to only be useful to the programmer who added the trace statement in first place.

Sometimes you have to debug tens of thousands lines Perl application and while you can be a very experienced Perl programmer and can understand Perl code quite well by just looking at it, no mere mortal can begin to understand what will actually happen in such a large application, until the code is running. You just don't know where to start adding trusty print() statements to see what is happening inside.

The most effective way to track down a bug is running the program with an interactive debugger. The majority of programming languages have such a tool available that allows one to see what is happening inside an application while it is running. Basic features of an interactive debugger allow you to: <ul>

  • Stop at a certain point in the code, based on a routine name or specific source file and line number

  • Stop at a certain point in the code, based on specific conditions such as the value of a given variable

  • Perform an action without stopping, based on the same criteria above

  • View and modify the value of variables at any given point

  • Provide context information such as stack traces and source windows

It does take practice to learn the most effective ways of using an interactive debugger, but the time and effort will be paid back many-fold in the long run.

Most C and C++ programmers are familiar with the interactive GNU debugger (gdb). gdb is a stand-alone program that requires your code to be compiled with debugging symbols to be useful. While gdb can be used to debug the perl interpreter program itself, it cannot be used to debug your own Perl programs. Not to worry, Perl provides its own interactive debugger, called perldb. Giving control of your Perl program to the interactive debugger is simply a matter of specifying the -d command line switch. When this switch is used, Perl will insert debugging hooks into the program syntax tree, but leaves the actual job of debugging to a Perl module outside of the perl binary program itself.

I will start by introducing a few basic concepts and commands of the Perl interactive debugger. These basic warm up examples are all run from the command line, outside of the mod_perl, but are all still relevant once we do go inside Apache.

You may want to keep the the perldebug manpage handy for reference while reading this section and for future debugging sessions on your own.

The interactive debugger will attach to the current terminal and present you with a prompt just before the first program statement is executed. For example:

  % perl -d -le 'print "mod_perl rules the world"'
  
  Loading DB routines from perl5db.pl version 1.0402
  
  Emacs support available.
  
  Enter h or `h h' for help.
  
  main::(-e:1):   print "mod_perl rules the world"
    DB<1>

The source line shown is that which Perl is about to execute, the next command (or just n) will cause this line to be executed and stop again right before the next line:

  main::(-e:1):   print "mod_perl rules the world"
    DB<1> n
  mod_perl rules the world
  Debugged program terminated.  Use q to quit or R to restart,
  use O inhibit_exit to avoid stopping after program termination,
  h q, h R or h O to get additional info.
  DB<1>

In this case, our example code is only one line long, so we are done interacting after the first line of code is executed. Let's try again with a bit longer example which is the following script:

  my $word = 'mod_perl';
  my @array = qw(rules the world);
  
  print "$word @array\n";

Save the script in a file named domination.pl and run with the -d switch:

  % perl -d domination.pl
  
  main::(domination.pl:1):      my $word = 'mod_perl';
    DB<1> n
  main::(domination.pl:2):      my @array = qw(rules the world);
    DB<1>

At this point, the first line of code has been executed and the variable $word has been assigned to the value mod_perl. We can check that assumption by using the p command (a shortage for the print, the two are interchangeable):

  main::(domination.pl:2):      my @array = qw(rules the world);
    DB<1> p $word
  mod_perl

The print command works just like the Perl's builtin print() function, but adds a trailing newline and outputs to the $DB::OUT file handle, which is normally opened to the terminal where perl was launched from. Let's carry on:

    DB<2> n
  main::(domination.pl:4):      print "$word @array\n";
    DB<2> p @array
  rulestheworld
    DB<3> n
  mod_perl rules the world
  Debugged program terminated.  Use q to quit or R to restart,
  use O inhibit_exit to avoid stopping after program termination,
  h q, h R or h O to get additional info.  

Ouch, p @array printed rulestheworld and not rules the world, as you might expect it to, but it's absolutely normal. If you print an array without expanding it first into a string it would be printed without adding spaces (or other content of the $" variable, otherwise known as $LIST_SEPARATOR if English pragma is being used.) between the members of the array. If you do:

  print "@array";

you would get the rules the world output, since the default value of $" variable is a single space.

You should notice by now, there is some valuable information to the left of each executable statement:

  main::(domination.pl:4):      print "$word @array\n";
    DB<2>

First is the current package name, in this case main::. Next is the current filename and statement line number, domination.pl and 4 in the example above. The number presented at the prompt is the command number which can be used to recall commands in session history, with help of ! command followed by this number. For example, !1 would repeat the first command:

  % perl -d -e0
  
  main::(-e:1):   0
    DB<1> p $]
  5.00503
    DB<2> !1
  p $]5.00503
    DB<3> 

Where $] is the perl's version number. As you see !1 prints the value of $], prepended by the command that was executed.

Things start to get more interesting as the code does. In the example script below (save it in a file named test.pl) we've increased the number of source files and packages by including the standard Symbol module, along with invoking its gensym() function:

  use Symbol ();
  
  my $sym = Symbol::gensym();
  
  print "$sym\n";

  % perl -d test.pl 
  
  main::(test.pl:3):      my $sym = Symbol::gensym();
    DB<1> n
  main::(test.pl:5):      print "$sym\n";
    DB<1> n
  GLOB(0x80c7a44)

First, notice the debugger did not stop at the first line of the file, this is because use ... is a compile-time statement, not a run-time statement. Also notice, there was more work going on, than the debugger revealed. That's because the next command does not enter subroutine calls. To step into a subroutine code use the step command (or s):

  % perl -d test.pl
  
  main::(test.pl:3):      my $sym = Symbol::gensym();
    DB<1> s
  Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:86):
  86:         my $name = "GEN" . $genseq++;
    DB<1> 

Notice the source line information has changed to the Symbol::gensym package and the Symbol.pm file. We can carry on by hitting the return key at each prompt, which causes the debugger to repeat the last step or next command. It wouldn't repeat a print command for example. The debugger will return out of the subroutine and back to our main program:

    DB<1> 
  Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:87):
  87:         my $ref = \*{$genpkg . $name};
    DB<1> 
  Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:88):
  88:         delete $$genpkg{$name};
    DB<1> 
  Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:89):
  89:         $ref;
    DB<1> 
  main::(test.pl:5):      print "$sym\n";
    DB<1> 
  GLOB(0x80c7a44)

Our line-by-line debugging approach has served us well for this small program, but imagine the time it takes to step through a large application at the same pace. There are several ways to speed up a debugging session, one of which is known as setting a breakpoint. The breakpoint command (b) can be used for instructing the debugger to stop at a named subroutine or at line of a given file. In this example session, we will set a breakpoint at the Symbol::gensym subroutine at the first prompt, telling the debugger to stop at the first line of this routine when it is called. Rather than move along with next or step we enter the continue command (c) which tells the debugger to execute each line without stopping until it reaches a breakpoint:

  % perl -d test.pl
  
  main::(test.pl:3):      my $sym = Symbol::gensym();
    DB<1> b Symbol::gensym
    DB<2> c
  Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:86):
  86:         my $name = "GEN" . $genseq++;

Now let's pretend we are debugging a large application where Symbol::gensym might be called in various places. When the subroutine breakpoint is reached, the debugger does not reveal where it was called from by default. One way to find out this information is with the Trace command (T):

    DB<2> T
  $ = Symbol::gensym() called from file `test.pl' line 3

In this example, the call stack is only one level deep, so only that line is printed, we'll look at an example with a deeper stack later. The left-most character reveals the context in which the subroutine was called. $ represents a scalar context, in others you may see @ which represent a list context or . which represents a void context. In our case we have called:

  my $sym = Symbol::gensym();

which calls the Symbol::gensym() in a scalar context.

Below we've made our test.pl example a little more complex. First, we've added a My::World package declaration at the top of the script, so we are no longer working in the main:: package. Next, we've added a subroutine named do_work() which invokes the familiar Symbol::gensym, along with another function called Symbol::qualify and returns a hash reference of the results. The do_work() routine is invoked inside a for loop which will be run twice:

  package My::World;
  
  use Symbol ();
  
  for (1,2) {
    do_work("now");
  }
  
  sub do_work {
    my($var) = @_;
  
    return undef unless $var;
  
    my $sym  = Symbol::gensym();
    my $qvar = Symbol::qualify($var);
  
    my $retval = {
                 'sym' => $sym,
                 'var' => $qvar,
                 };
  
    return $retval;
  }

We'll start by setting a few breakpoints and then we use List command (L) to display them:

  % perl -d test.pl
  
  My::World::(test.pl:5):   for (1,2) {
    DB<1> b Symbol::qualify
    DB<2> b Symbol::gensym
    DB<3> L
  /usr/lib/perl5/5.00503/Symbol.pm:
   86:        my $name = "GEN" . $genseq++;
     break if (1)
   95:        my ($name) = @_;
     break if (1)

The filename and line number of the breakpoint are displayed just before the source line itself. Since both breakpoints located at the same file -- the filename is being displayed only once. After the source line we see the condition on which to stop, in our case as the constant value 1 indicates, we will always stop at these breakpoint. Later on you'll see how to specify a certain condition.

As we see, when continue command is executed, the normal flow of the program stops at one of these breakpoints, either on line 86 or 95 of /usr/lib/perl5/5.00503/Symbol.pm file, whichever will be reached first. As you understand the displayed code lines are the first rows of the two subroutines from Symbol.pm. Lines that qualify to be used as breakpoints cannot be empty lines or comments, there must be a code there.

In our example List command shows the lines the breakpoints were set on, but we cannot tell which breakpoint belongs to which subroutine. There are two ways to find it out. One is to run continue command and when it stops, execute the Trace command we saw before:

    DB<3> c
  Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:86):
  86:         my $name = "GEN" . $genseq++;
    DB<3> T
  $ = Symbol::gensym() called from file `test.pl' line 14
  . = My::World::do_work('now') called from file `test.pl' line 6

So we see that it was a Symbol::gensym. The other way is to ask for a listing of code at some lines range. For example, let's check which subroutine line 86 is a part of. We use a list (lowercase!) command (l), which displays parts of the code. Among various arguments it accepts, there is one that we want to use here, a lines range. Since the breakpoint is at line 86, let's print a few lines back and forward:

    DB<3> l 85-87
  85      sub gensym () {
  86==>b      my $name = "GEN" . $genseq++;
  87:         my $ref = \*{$genpkg . $name};

Now we know it's gensym sub and we also see the breakpoint displayed with help of ==>b markup. We could also use the name of the sub to display its code:

    DB<4> l Symbol::gensym
  85      sub gensym () {
  86==>b      my $name = "GEN" . $genseq++;
  87:         my $ref = \*{$genpkg . $name};
  88:         delete $$genpkg{$name};
  89:         $ref;
  90      }

The delete command (d) is used to remove certain breakpoints by specifying the line number of the breakpoint. Let's remove the first one:

    DB<5> d 95

The Delete command (with a capital `D') or d removes all currently installed breakpoints.

Now let's look again at the trace produced at the breakpoint:

    DB<3> c
  Symbol::gensym(/usr/lib/perl5/5.00503/Symbol.pm:86):
  86:         my $name = "GEN" . $genseq++;
    DB<3> T
  $ = Symbol::gensym() called from file `test.pl' line 14
  . = My::World::do_work('now') called from file `test.pl' line 6

As you can see, the stack trace prints the values which are passed into the subroutine. Ah, and perhaps we've found our first bug, as we can see do_work() was called in a void context, so the return value was lost into thin air. Let's change the for loop logic to check the return value of do_work():

  for (1,2) {
    my $stuff = do_work("now");
    if ($stuff) {
        print "work is done\n";
    }
  }

In this session we will set a breakpoint at line 7 of test.pl where we check the return value of do_work():

  % perl -d test.pl
  
  My::World::(test.pl:5):   for (1,2) {
    DB<1> b 7
    DB<2> c
  My::World::(test.pl:7):     if ($stuff) {
    DB<2>

Our program is still small, but it is getting more difficult to understand the context of just one line of code, the window command (w) will list the first few lines of code that surround the current line:

    DB<2> w
  4         
  5:        for (1,2) {
  6:          my $stuff = do_work("now");
  7==>b       if ($stuff) {
  8:              print "work is done\n";
  9           }
  10        }
  11        
  12        sub do_work {
  13:         my($var) = @_;

The arrow points to the line which is about to be executed and also contains a 'b' indicating we have set a breakpoint at this line. The breakable lines of code include a `:' just after the line number.

Now, let's take a look at the value of the $stuff variable with the trusty old print command:

    DB<2> p $stuff
  HASH(0x82b89b4)

That's not very useful information. Remember, the print command works just as the built-in print() function does. The x command evaluates a given expression and prints the results in a ``pretty'' fashion:

    DB<3> x $stuff
  0  HASH(0x82b89b4)
     'sym' => GLOB(0x826a944)
        -> *Symbol::GEN0
     'var' => 'My::World::now'

There, things seem to be okay, lets double check by calling do_work() with a different value and print the results:

    DB<4> x do_work('later')
  0  HASH(0x82bacc8)
     'sym' => GLOB(0x818f16c)
        -> *Symbol::GEN1
     'var' => 'My::World::later'

We can see the symbol was incremented from GEN0 to GEN1 and the variable later was qualified, as expected.

Now let's change the test program a little to iterate over a list of arguments held in @args and print a slightly different message:

  package My::World;
  
  use Symbol ();
  
  my @args = qw(now later);
  for my $arg (@args) {
    my $stuff = do_work($arg);
    if ($stuff) {
        print "do your work $arg\n";
    }
  }
  
  sub do_work {
    my($var) = @_;
  
    return undef unless $var;
  
    my $sym = Symbol::gensym();
    my $qvar = Symbol::qualify($var);
  
    my $retval = {
        'sym' => $sym,
        'var' => $qvar,
    };
  
    return $retval;
  }

There are only two arguments in the list, so stopping to look at each one isn't too time consuming, but consider the debugging pace with a large list of 100 or so entries. It is possible to customize breakpoints by specifying a condition. Each time a breakpoint is reached, the condition is evaluated, stopping only if the condition is true. In the session below the window command shows breakable lines and we set a breakpoint at line 7 with the condition $arg eq 'later'. As we continue, the breakpoint is skipped when $arg has the value of now and stops when it has the value of later:

  % perl -d test.pl
  
  My::World::(test.pl:5): my @args = qw(now later);
    DB<1> w
  2 
  3:      use Symbol ();
  4 
  5==>    my @args = qw(now later);
  6:      for my $arg (@args) {
  7:          my $stuff = do_work($arg);
  8:          if ($stuff) {
  9:              print "do your work $arg\n";
  10          }
  11      }

==> symbol shows us the line of the code that's about to be executed.

    DB<1> b 7 $arg eq 'later'
    DB<2> c
  do your work now
  My::World::(test.pl:7):     my $stuff = do_work($arg);
    DB<2> n
  My::World::(test.pl:8):     if ($stuff) {
    DB<2> x $stuff
  0  HASH(0x82b90e4)
     'sym' => GLOB(0x82b9138)
        -> *Symbol::GEN1
     'var' => 'My::World::later'
    DB<5> c
  do your work later
  Debugged program terminated.  Use q to quit or R to restart,

There are plenty more tricks left to pull from the perldb bag, but you should understand enough about the debugger to try them on your own with the perldebug manpage by your side. A quick online help can be reached by typing a h command. It will display a list of most useful commands and a short explanation of what they are doing.

[TOC]


Interactive Perl Debugging under mod_cgi

Devel::ptkdb is a visual Perl debugger that uses perlTk for a user interface.

To debug plain perl script with it, invoke it as:

  % perl -d:ptkdb myscript.pl

A Tk application will be loaded. Now you can do most of the debugging you did with command line standard Perl debugger, but using a simple GUI to set/remove breakpoints, browse the code, step thru it and more.

With help of ptkdb you can debug your CGI scripts running under mod_cgi. Be sure that that your web server's perl installation includes Tk package. In order to enable the debugger you should change your:

  #! /usr/local/bin/perl -wT

to

  #! /usr/local/bin/perl -wTd:ptkdb

You can debug scripts remotely if you're using a Unix based server and where you are authoring the script has an Xserver. The Xserver can be another Unix workstation, a Macintosh or Win32 platform with an appropriate XWindows package. In your script insert the following BEGIN subroutine:

  sub BEGIN {
    $ENV{'DISPLAY'} = "myHostname:0.0" ;
  }

You can use either IP (123.123.123.123:0.0) or DNS convention (myhost.com:0.0). Be sure that your web server has permission to open windows on your Xserver (see the xhost manpage for more info).

Access your web page with your browser and Submit the script as normal. The ptkdb window should appear on your monitor if you have set correctly the $ENV{'DISPLAY'} variable. At this point you can start debugging your script. Be aware that your browser may timeout waiting for the script to run.

To expedite debugging you may want to setup your breakpoints in advance with a .ptkdbrc file and use the $DB::no_stop_at_start variable. NOTE: for debugging web scripts you may have to have the .ptkdbrc file installed in the server account's home directory (~www) or whatever username your webserver is running under. Also try installing a .ptkdbrc file in the same directory as the target script.

META: insert snapshots of ptkdb screen

[TOC]


Non-Interactive Perl Debugging under mod_perl

To debug scripts running under mod_perl either use Apache::DB (interactive Perl debugging) or an older non-interactive method as described below.

NonStop debugger option enables us to get some decent debug info when running under mod_perl. For example, before starting the server:

  % setenv PERL5OPT -d
  % setenv PERLDB_OPTS "NonStop=1 LineInfo=db.out AutoTrace=1 frame=2"

Now watch db.out for line:filename info. This is most useful for tracking those core dumps that normally leave us guessing, even with a stack trace from gdb. db.out will show you what Perl code triggered the core. 'man perldebug' for more PERLDB_OPTS. Note, Perl will ignore PERL5OPT if PerlTaintCheck is On.

[TOC]


Interactive mod_perl Debugging

Now we'll turn to looking at how the interactive debugger is used in a mod_perl environment. The Apache::DB module available from CPAN provides a wrapper around perldb for debugging Perl code running under mod_perl.

The server must be run in non-forking mode to use the interactive debugger, this mode is turned on by passing the -X flag to httpd executable. It is convenient to use an IfDefine section around the Apache::DB configuration, the example below does this using the name PERLDB. With this setup, debugging is only turned on when starting the server with httpd -D PERLDB command.

This section should be at the top of your perl configuration section of the configuration file, before any Perl code is pulled in, so debugging symbols will be inserted into the syntax tree, triggered by the call to Apache::DB->init. The Apache::DB::handler can be configured using any of the Perl*Handler directives, in this case we use a PerlFixupHandler so handlers in the response phase will bring up the debugger prompt:

  <IfDefine PERLDB>

    <Perl>
      use Apache::DB ();
      Apache::DB->init;
    </Perl>
  
    <Location />
      PerlFixupHandler Apache::DB
    </Location>
  
  </IfDefine>

Since we have used / as an argument to Location directive, the debugger will be invoked for any kind of requests (even for static objects (images, static documents), but of course it would immediately quit, unless there is some perl module registered to handle these static objects).

In our first example, we will debug the standard Apache::Status module, which is configured like so:

  PerlModule Apache::Status
  <Location /perl-status>
    PerlHandler Apache::Status
    SetHandler perl-script
  </Location>

When the server is started with the debugging flag, a notice will be printed to the console:

  % httpd -X -D PERLDB
  [notice] Apache::DB initialized in child 950

The debugger prompt will not be available until the first request is made, in our case to http://localhost/perl-status. Once we are at the prompt, all the standard debugging commands are available. First we run the window for some context of the code being debugged, move to the next statement after $r has been assigned to and print the request URI. If no breakpoints are set, the continue command will give control back to Apache and the request will finish with the Apache::Status main menu showing up in the browser window:

  Loading DB routines from perl5db.pl version 1.0402
  Emacs support available.
  
  Enter h or `h h' for help.
  
  Apache::Status::handler(/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Status.pm:55): 
  55:         my($r) = @_;
    DB<1> w
  52      }
  53
  54      sub handler {
  55==>       my($r) = @_;
  56:         Apache->request($r); #for Apache::CGI
  57:         my $qs = $r->args || "";
  58:         my $sub = "status_$qs";
  59:         no strict 'refs';
  60
  61:         if($qs =~ s/^(noh_\w+).*/$1/) {
    DB<1> n
 Apache::Status::handler(/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Status.pm:56):
  56:         Apache->request($r); #  for Apache::CGI
    DB<1> p $r->uri
  /perl-status
    DB<2> c

All the techniques we saw while debugging plain perl scripts can be applied to this debugging session.

Debugging Apache::Registry scripts is somewhat different, because the handler routine does quite a bit of work before it reaches your script. In this example, we make a request for /perl/test.pl, which consists of this code:

  use strict;
  
  my $r = shift;
  $r->send_http_header('text/plain');
  
  print "mod_perl rules";

When a request is issued, the debugger stops at line 28 of Apache/Registry.pm. We set a breakpoint at line 140, which is the line that actually calls the script wrapper subroutine. The continue command will bring us to that line, where we can step into the script handler:

  Apache::Registry::handler(/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm:28):
28:         my $r = shift;
    DB<1> b 140
    DB<2> c
  Apache::Registry::handler(/usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm:140):
  140:            eval { &{$cv}($r, @_) } if $r->seqno;
    DB<2> s
  Apache::ROOT::perl::test_2epl::handler((eval 87):3):
  3:        my $r = shift;

Notice the funny package name, that's generated from the URI of the request for namespace protection. The filename is not displayed, since the code was compiled via eval(), but the print command can be used to show you $r->filename:

    DB<2> n
  Apache::ROOT::perl::test_2epl::handler((eval 87):4):
  4:        $r->send_http_header('text/plain');
    DB<2> p $r->filename
  /home/httpd/perl/test.pl

The line number might seem off too, but the window command will give you a better idea where you are:

    DB<4> w
  1:      package Apache::ROOT::perl::test_2epl;use Apache qw(exit);sub handler {  use strict;
  2 
  3:        my $r = shift;
  4==>      $r->send_http_header('text/plain');
  5 
  6:        print "mod_perl rules";
  7 
  8       }
  9       ;

The code from the test.pl file is between lines 2 and 7, the rest is the Apache::Registry magic to cache your code inside a handler subroutine.

It will always take some practice and patience when putting together debugging strategies that make effective use of the interactive debugger for various situations. Once you do have a good strategy in mind, bug squashing can actually be quite a bit of fun!

[TOC]


ptkdb and Interactive mod_perl Debugging

Well as you we saw earlier you can use a ptkdb visual debugger to debug CGI scripts running under mod_cgi. It wouldn't work for mod_perl though using the same configuration as used in mod_cgi. We have to tweak the Apache/DB.pm module to use Devel/ptkdb.pm instead of Apache/perl5db.pl.

Open the file in your favorite editor and replace:

    require 'Apache/perl5db.pl';

with:

    require 'Devel/ptkdb.pm';

Now when you use the interactive mod_perl debugger configuration from the previous section and issue a request, a ptkdb visual debugger will be loaded.

If you are debugging Apache::Registry scripts, exactly like in the terminal debugging mode example, you should go to the line 140 or whatever line the eval { &{$cv}($r, @_) } if $r-seqno;> located and to <step in> to enter your script.

Note, that you can work with ptkdb in plain multi-server mode, so you don't have to start the server with -X option.

META: One caveat:

* When the request is completed, ptkdb would hang. Anyone knows what code should be registered for it to exit on completion? To replace the original Apache::DB cleanup code, as:

    if (ref $r) {
        $SIG{INT} = \&DB::catch;
        $r->register_cleanup(sub { 
            $SIG{INT} = \&DB::ApacheSIGINT();
        });
    }

Any Perl/Tk guru to assist???

[TOC]


Debugging when Server Crashes on Startup before Writing to Log File.

If your server crashes on startup, you need to start it under gdb and ask it to generate the stack trace.

I'll emulate a faulty server by starting a startup file with dump() command:

  startup.pl
  ----------
  dump;
  1;

and requiring this file from the httpd.conf:

  PerlRequire /path/to/startup.pl

Make sure no server is running on port 80 or use an alternate config with an alternate port if you are on a production server.

  % gdb /path/to/httpd
  (gdb) set args -X

Use:

  set args -X -f /path/to/alternate/serverconfig_ifneeded.conf

if you want the server to start from an alternative configuration file.

Now run the program:

  (gdb) run
  
  Starting program: /usr/local/apache/bin/httpd -X
  
  Program received signal SIGABRT, Aborted.
  0x400da4e1 in __kill () from /lib/libc.so.6

At this point the server should die (because of dump()) and when it happens we ask for a stack trace (using bt or where commands):

  (gdb) where
  
  #0  0x400da4e1 in __kill () from /lib/libc.so.6
  #1  0x80d43bc in Perl_my_unexec ()
  #2  0x8119544 in Perl_pp_goto ()
  #3  0x8118990 in Perl_pp_dump ()
  #4  0x812b2ad in Perl_runops_standard ()
  #5  0x80d3a9c in perl_eval_sv ()
  #6  0x807ef1c in perl_do_file ()
  #7  0x807ef4f in perl_load_startup_script ()
  #8  0x807b7ec in perl_cmd_require ()
  #9  0x8092af7 in ap_clear_module_list ()
  #10 0x8092f43 in ap_handle_command ()
  #11 0x8092fd7 in ap_srm_command_loop ()
  #12 0x80933e0 in ap_process_resource_config ()
  #13 0x8093ca2 in ap_read_config ()
  #14 0x809db63 in main ()
  #15 0x400d41eb in __libc_start_main (main=0x809d8dc <main>, argc=2, 
      argv=0xbffffab4, init=0x80606f8 <_init>, fini=0x812b38c <_fini>, 
      rtld_fini=0x4000a610 <_dl_fini>, stack_end=0xbffffaac)
      at ../sysdeps/generic/libc-start.c:90

If you are clueless of what this trace say, send it to the mod_perl mailing list. Make sure to include versions of apache, mod perl and perl.

In our case we already know that server is supposed to die when compiling the startup file and we can clearly see that from the trace. We always read it from its end upward:

We are in config file:

  #13 0x8093ca2 in ap_read_config ()

We do require:

  #8  0x807b7ec in perl_cmd_require ()

We load the file and compile it:

  #6  0x807ef1c in perl_do_file ()
  #5  0x80d3a9c in perl_eval_sv ()

dump() gets executed:

  #3  0x8118990 in Perl_pp_dump ()

dump() calls __kill():

  #0  0x400da4e1 in __kill () from /lib/libc.so.6

[TOC]


Debugging Hanging processes (continued)

META: incomplete

mod_perl comes with a number of useful of gdb macros to ease the debug process . You will find the file with macros at mod_perl source distribution in .gdbinit file (mod_perl-x.xx/.gdbinit). You might want to modify the macros definittions.

In order to use this you need to compile mod_perl with PERL_DEBUG=1.

To debug the server, start it :

  % httpd -X

Issue a request to offending script that hangs. Find the PID number of the process that hangs.

Go to the root of the server:

  % cd /usr/local/apache

Now attach to it with gdb (replace PID with actual PID number) and load the macros from .gdbinit:

  % gdb /path/to/httpd PID
  % source /usr/src/mod_perl-x.xx/.gdbinit

Now you can start the server (httpd below is a gdb macro):

  (gdb) httpd

Now run the curinfo macro:

  (gdb) curinfo

It should tell you the line/filename of the offending Perl code.

Add this to the .gdbinit:

  define longmess
    set $sv = perl_eval_pv("Carp::longmess()", 1)
    printf "%s\n", ((XPV*) ($sv)->sv_any )->xpv_pv
  end

and when you reload the macros, run:

  (gdb) longmess

to produce a Perl stacktrace.

[TOC]


Debugging core Dumping Code

       $ perl -e dump
        Abort(coredump)

META: should I move the Apache::StatINC here? (I think not, since it relates to other topics like reloading config files, but you should mention it here with a pointer to it)

[TOC]


Apache::Debug

(META: to be written)

  use Apache::Debug ();
  Apache::Debug::dump($r, SERVER_ERROR, "Uh Oh!");

This module sends what may be helpful debugging info to the client rather that the error log.

Also, you could try using a larger emergency pool, try this instead of Apache::Debug:

 $^M = 'a' x (1<<18);  #260K buffer
 use Carp ();
 $SIG{__DIE__} = \&Carp::confess;
 eval { Carp::confess("init") };

[TOC]


Debug Tracing

To enable mod_perl debug tracing configure mod_perl with the PERL_TRACE option:

 perl Makefile.PL PERL_TRACE=1

The trace levels can then be enabled via the MOD_PERL_TRACE environment variable which can contain any combination of:

  d - Trace directive handling during configuration read
  s - Trace processing of perl sections
  h - Trace Perl*Handler callbacks
  g - Trace global variable handling, interpreter construction, END blocks, etc.
  all - all of the above

add to httpd.conf:

  PerlSetVar MOD_PERL_TRACE all

For example if you want to see a trace of the PerlRequire's and PerlModule's as they are loaded, use:

  PerlSetVar MOD_PERL_TRACE d

[TOC]


gdb says there are no debugging symbols

As you know you need an unstriped executable to be able to debug it. While you can compile the mod_perl with -g (or PERL_DEBUG=1) the apache install strips the symbols.

Makefile.tmpl contains a line:

  IFLAGS_PROGRAM  = -m 755 -s 

Removing the -s does the trick.

[TOC]


Debugging Signal Handlers ($SIG{FOO})

Current perl implementation does not restore the original apache's C handler when you use local $SIG{FOO} clause. While save/restore of $SIG{ALRM} was fixed in the mod_perl 1.19_01 (CVS version), other signals are not yet fixed. The real fix should probably be in Perl itself.

Until recent local $SIG{ALRM} restored the SIGALRM handler to Perl's handler, not the handler it was in the first place (apache's alrm_handler()). if you build mod_perl with PERL_TRACE=1 and set the MOD_PERL_TRACE environment variable to g, you will see this in the error_log file:

  mod_perl: saving SIGALRM (14) handler 0x80b1ff0
  mod_perl: restoring SIGALRM (14) handler from: 0x0 to: 0x80b1ff0

If nobody touched $SIG{ALRM}, 0x0 would be the same address as the others.

If you work with signal handlers take a look at Sys::Signal module, which solves the problem:

Sys::Signal - Set signal handlers with restoration of existing C sighandler. Get it from the CPAN.

The usage is simple, if the original code was:

  eval {
    local $SIG{ALRM} = sub { die "timeout\n" };
    alarm $timeout;
    ... db stuff ...
    alarm 0;
  };
   
  die $@ if $@;

If a timeout happens and SIGALRM is thrown, the alarm() will be reset, otherwise alarm 0 is reached and timer is being reset as well.

Now you would write:

  use Sys::Signal ();
  eval {
    my $h = Sys::Signal->set(ALRM => sub { die "timeout\n" });
    alarm $timeout;
    ... do something that may timeout ...
      alarm 0;
  };
  die $@ if $@;

[TOC]


Code Profiling

(Meta: duplication??? I've started to write about profiling somewhere in this file)

It is possible to profile code run under mod_perl with the Devel::DProf module available on CPAN. However, you must have apache version 1.3b3 or higher and the PerlChildExitHandler enabled. When the server is started, Devel::DProf installs an END block to write the tmon.out file, which will be run when the server is shutdown. Here's how to start and stop a server with the profiler enabled:

 % setenv PERL5OPT -d:DProf
 % httpd -X -d `pwd` &
 ... make some requests to the server here ...
 % kill `cat logs/httpd.pid`
 % unsetenv PERL5OPT
 % dprofpp

See also: Apache::DProf

[TOC]


Devel::Peek

Devel::Peek - A data debugging tool for the XS programmer

Let's see an example of Perl allocating buffer size only once, regardless of my() scoping, although it will realloc() if the size is > SvLEN:

  use Devel::Peek;
  
  for (1..3) {
      foo();
  }
  
  sub foo {
      my $sv;
      Dump $sv;
      $sv = 'x' x 100_000;
      $sv = "";
  }

The output:

  SV = NULL(0x0) at 0x8138008
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY)
  SV = PV(0x80e5794) at 0x8138008
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY)
    PV = 0x815f808 ""\0
    CUR = 0
    LEN = 100001
  SV = PV(0x80e5794) at 0x8138008
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY)
    PV = 0x815f808 ""\0
    CUR = 0

We can see that on subsequent calls (after the first one) $sv already has a preallocated memory.

so, if you can afford the memory, the larger the buffer means less brk() syscalls. if you watch that example with strace, you will only see calls to brk() in the first time through the loop. So, this is a case where you module might want to pre-allocate the buffer for example for LWP, a file scope lexical, like so:

  package Your::Proxy;
  
  my $buffer = ' ' x 100_000;
  $buffer = "";

This way, only the parent has to brk() at server startup, each child already will already have an allocated buffer, just reset to ``'', when you are done.

[TOC]


How can I find if my mod_perl scripts have memory leaks

Apache::Leak (derived from Devel::Leak) should help you with this task. Example:

  use Apache::Leak;
  
  my $global = "FooAAA";
  
  leak_test {
    $$global = 1;
    ++$global;
  };

The argument to leak_test() is an anonymous sub, so you can just throw it around any code you suspect might be leaking. Beware, it will run the code twice, because the first time in, new SVs are created, but does not mean you are leaking, the second pass will give better evidence. You do not need to be inside mod_perl to use it, from the command line, the above script outputs:

  ENTER: 1482 SVs
  new c28b8 : new c2918 : 
  LEAVE: 1484 SVs
  ENTER: 1484 SVs
  new db690 : new db6a8 : 
  LEAVE: 1486 SVs
  !!! 2 SVs leaked !!!

Build a debuggable perl to see dumps of the SVs. The simple way to have both a normal perl and debuggable perl, is to follow hints in the SUPPORT doc for building libperld.a, when that is built copy the perl from that directory to your perl bin directory, but name it dperl.

Leak explanation: $$global = 1; : new global variable created FooAAA with value of 1, will not be destroyed until this module is destroyed.

Apache::Leak is not very user-friendly, have a look at B::LexInfo. You'll see that what might appear to be a leak, is actually just a Perl optimization. e.g. consider this code:

  sub foo {
    my $string = shift;
  }

  foo("a string");

B::LexInfo will show you that Perl does not release the value from $string, unless you undef() it. this is because Perl anticipates the memory will be needed for another string, the next time the subroutine is entered. you'll see similar for @array length, %hash keys, and scratch areas of the pad-list for OPs such as join(), `.', etc.

Apache::Status now includes a new StatusLexInfo option.

Apache::Leak works better if you've built a libperld.a (see SUPPORT document) and given PERL_DEBUG=1 to mod_perl's Makefile.PL.

[TOC]


Debugging your code in Single Server Mode

Running in httpd -X mode. (good only for testing during development phase).

You want to test that your application correctly handles global variables (if you have any - the less you have of them the better, but sometimes you just can't without them). It's hard to test with multiple servers serving your cgi since each child has a different value for its global variables. Imagine that you have a random() sub that returns a random number and you have the following script.

  use vars qw($num);
  $num ||= random();
  print ++$num;

This script initializes the variable $num with a random value, then increments it on each request and prints it out. Running this script in multiple server environments will result in something like 1, 9, 4, 19 (number per reload), since each time your script will be served by a different child. (On some OSes, the parent httpd process will assign all of the requests to the same child process if all of the children are idle... AIX...). But if you run in httpd -X single server mode you will get 2, 3, 4, 5... (assuming that the random() returned 1 at the first call)

But do not get too obsessive with this mode, since working only in single server mode sometimes hides problems that show up when you switch to a normal (multi) server mode. Consider an application that allows you to change the configuration at run time.

Let's say the script produces a form to change the background color of the page. It's not a good design, but for the sake of demonstrating the potential problem, we will assume that our script doesn't write the changed background color to the disk, but simply changes it in memory, like:

  use vars qw($bgcolor);
    # assign default value at first invocation
  $bgcolor ||= "white";
    # modify the color if requested to
  $bgcolor = $q->param('bgcolor') || $bgcolor;

So you have typed in a new color, and in response, your script prints back the html with a new color - you think that's it! It was so simple. And if you keep running in single server mode you will never notice that you have a problem...

If you run the same code in the normal server mode, after you submit the color change you will get the result as expected, but when you will call the same URL again (not reload!) chances are that you will get back the original default color (white in our case), since except the child who processed the color change request no one knows about their global variable change. Just remember that children can't share information, other than that which they inherited from their parent on their load. Of course you should use a hidden variable for the color to be remembered or store it on the server side (database, shared memory, etc).

Also note that since the server is running in single mode, if the output returns HTML with <IMG> tags, then the load of these will take a lot of time.

When you use Netscape client while your server is running in single-process mode, if the output returns a HTML with <IMG> tags, then the load of these will take a lot of time, since the KeepAlive feature gets in the way. Netscape tries to open multiple connections and keep them open. Because there is only one server process listening, each connection has to time-out before the next succeeds. Turn off KeepAlive in httpd.conf to avoid this effect.

Also note that since the server is running in single mode, if the output returns HTML with <IMG> tags, then the load of these will take a lot of time. If you use Netscape while your server is running in single-process mode, HTTP's KeepAlive feature gets in the way. Netscape tries to open multiple connections and keep them open. Because there is only one server process listening, each connection has to time-out before the next succeeds. Turn off KeepAlive in httpd.conf to avoid this effect while developing or you can press STOP after a few seconds (assuming you use the image size params, so the Netscape will be able to render the rest of the page).

In addition you should know that when running with -X you will not see any control messages that the parent server normally writes to the error_log. (Like ``server started, server stopped and etc''.) Since httpd -X causes the server to handle all requests itself, without forking any children, there is no controlling parent to write status messages.

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 12/18/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/download.html0100644000000000000000000002232407027225633013360 0ustar rootroot mod_perl guide: Appendix A: Downloading software and documentation

Mod Perl Icon Mod Perl Icon Appendix A: Downloading software and documentation


[ Prev | Main Page ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Coverage

Here you will find instructions for downloading the software and the related documentation.

[TOC]


Perl

Perl is most likely already installed on your machine, but you should at least check the version you are using. It is highly recommended that you have at least Perl version 5.004. You can get the latest perl version from http://www.perl.com/ . Try the direct download link http://www.perl.com/pace/pub/perldocs/latest.html . You can get Perl documentation from the same location.

[TOC]


Apache

Get the latest Apache webserver and documentation from http://www.apache.org . Try the direct download link http://www.apache.org/dist/ .

[TOC]


mod_perl

Get the latest mod_perl sources and documentation from http://perl.apache.org . Try the direct download link http://perl.apache.org/dist/ .

[TOC]


Squid - Internet Object Cache

http://squid.nlanr.net/

Squid Linux 2.x Redhat RPMs : http://home.earthlink.net/~intrep/linux/

[TOC]


thttpd - tiny/turbo/throttling HTTP server

http://www.acme.com/software/thttpd/

[TOC]


mod_proxy_add_forward

Ask Bjoern Hansen has written a mod_proxy_add_forward.c module for Apache that sets the X-Forwarded-For field when doing a ProxyPass, similar to what Squid can do. His module is at: http://modules.apache.org/search?id=124, at ftp://ftp.netcetera.dk/pub/apache/mod_proxy_add_forward.c or http://www.cpan.org/authors/id/ABH/mod_proxy_add_forward.c

complete with instructions on how to compile it in and whatnot.

[TOC]


httperf - webserver Benchmarking tool

http://www.hpl.hp.com/personal/David_Mosberger/httperf.html

[TOC]


ab - ApacheBench

Comes with the Apache distribution.

[TOC]


High-Availability Linux Project

You will find the definite guide to load balancing techniques at the High-Availability Linux Project site -- http://www.henge.com/~alanr/ha/

More load ballancing URLs:

lbnamed - a load balancing name server written in Perl, by Roland Schemers http://www.stanford.edu/~riepel/lbnamed/ http://www.stanford.edu/~riepel/lbnamed/bof.talk/ http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html

Network Address Translation and Networks: Virtual Servers (Load Balancing) http://www.csn.tu-chemnitz.de/~mha/linux-ip-nat/diplom/node4.html#SECTION00043100000000000000

[TOC]


Apache::Request

Get it from CPAN at $CPAN/authors/id/DOUGM/libapreq-x.xx.tar.gz or from http://perl.apache.org/dist/libapreq-x.xx.tar.gz . (replace x.xx with the current version)

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page ]

Written by Stas Bekman.
Last Modified at 12/04/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/frequent.html0100644000000000000000000001201107027225633013372 0ustar rootroot mod_perl guide: Frequent mod_perl problems

Mod Perl Icon Mod Perl Icon Frequent mod_perl problems


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Coverage

This new document was born because some problems come up so often on the mailing list that should be stressed in the guide as one of the most important things to read/beware of. So I have tried to enlist them in this document. If you think some important problem that is being reported frequently on the list and covered in the guide but not included below, please tell.

[TOC]


my() scoped variable in nested subroutines

See the ``my() Scoped Variable in Nested Subroutines'' section.

[TOC]


Segfaults caused by PerlFreshRestart

See Evil things might happen when using PerlFreshRestart

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 12/18/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/hardware.html0100644000000000000000000006176607027225633013363 0ustar rootroot mod_perl guide: Choosing an Operating System and Hardware

Mod Perl Icon Mod Perl Icon Choosing an Operating System and Hardware


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


Is it important?

You can invest a lot of time and money into server tuning and code rewriting according the guidelines you have just learned, but your performance will be really bad if you do not take into account the hardware demands, and do not wisely choose the operating system suited for your needs. While the tips below apply to any webserver, they are written for an administrator of a mod_perl-enabled webserver

[TOC]


Choosing an Operating System

First let's talk about Operating Systems (OS). While I am personally a Linux devotee, I do not want to start yet another OS war. Assuming this, I will try to define what you should be looking for, then when you know what do you want from your OS, go find it. Visit the Web sites of operating systems you are interested in. You can gauge user's opinions by searching relevant discussions in newsgroup and mailing list archives such as Deja - http://deja.com and eGroups - http://egroups.com . I will leave this fan research up to you. But I would use Linux or something from the *BSD family.

[TOC]


Stability and Robustness

Probably the most desired features in an OS are stability and robustness. You are in an Internet business, which does not have normal working hours, like many conventional businesses you know about (9am to 5pm). You are open 24 hours a day. You cannot afford to be off-line, for your customers will go shop at another service like yours, unless you have a monopoly :) . If the OS of your choice crashes every day or so, I would throw it away, after doing a little investigation, for there might be a reason for a system crash. Like a runaway server that eats up all the memory and disk, so you cannot blame the OS for that. Generally, people who use the OS for some time can tell you a lot about its stability.

[TOC]


Memory Management

You want an OS with a good memory management, some OSes are well known as memory hogs. The same code can use twice as much memory on one OS compared to the other. If the size of the mod_perl process is 10Mb and you have tens of these running, it definitely adds up!

[TOC]


Memory Leakages

Some OSes and/or the libraries (like C runtime libraries) suffer from memory leaks. You cannot afford such a system, for you are already know that a single mod_perl process sometimes serves thousands of requests before itimer terminates. So if a leak occurs on every request, your memory demands will be huge. Of course your code can be the cause of the memory leaks as well (check out the Apache::Leak module). Certainly, you can lower the number of requests to be served over the process' life, but that can degrade performance.

[TOC]


Sharing Memory

You want an OS with good memory sharing capabilities. As you have learned, if you preload the modules and scripts at server startup, they are shared between the spawned children, at least for a part of a process' life span, since memory pages become ``dirty'' and cease to be shared. This feature can save you up a lot of memory!

[TOC]


Cost and Support

If you are in a big business you are probably do not mind paying another $1000 for some fancy OS and to get the bundled support for it. But if your resources are low, you will look for cheaper and free OS. Free does not mean bad, it can be quite opposite as we all either know from our own experience or read about in news. Free OSes could have and do have the best support you can find. It is very easy to understand - most of the people are not rich and will try to use a cheaper or free OS first if it does the work for them. Since it really fits their needs, many people keep using it and eventually know it well enough to be able to provide support for others in trouble. Why would they do this for free? For the spirit of the first days of the Internet, when there was no commercial Internet and people helped each other, because someone helped them in first place. I was there, I was touched by that spirit and I will do anything to keep that spirit alive.

But, let's get back to our world. We are living in material world, and our bosses pay us to keep the systems running. So if you feel that you cannot provide the support yourself and you do not trust the available free resources, you must pay for an OS backed by a company, and blame them for any problem. Your boss wants to be able to sue someone if the project has a problem caused by the external product that is being used in the project. If you buy a product and the company selling it, claims support, you have someone to sue. You do not have someone to sue other than getting yourself fired if you go with Open Source and it fails.

Also remember that if you spend less or zero money on OS and Software, you will be able to buy a better and stronger hardware.

[TOC]


Discontinued products

You have invested a lot of time and money into developing some proprietary software that is bundled with the OS you were developing on. Like writing a mod_perl handler that takes advantage of some proprietary features of the OS and it will not run on any other OS. Things are under control, the performance is great and you sing from happiness. But... one day the company who wrote your beloved OS goes bankrupt, which is not unlikely to happen nowadays. You are stuck with their last masterpiece and no support! What you are going to do then? Invest more into porting the software to another OS...

Everyone can be hit by this mini-disaster, so it is better to check the background of the company when making your choice, but still you never know what will happen tomorrow. The OSes in this hazard group are completely developed by a single companies. Free OSes are probably less susceptible to this, for development is distributed between many companies and developers, so if a person who developed a really important part of the kernel lost interest in continuing, someone else will pick the falling flag and carry on. Of course if tomorrow some better project showed up, developers might migrate there and finally drop the development, but we are here not to let this happen.

In the final analysis, the decision is yours.

[TOC]


OS Releases

Actively developed OSes generally try to keep the pace with the latest technology developments, and continually optimize the kernel and other parts of the OS to become better and faster. Nowadays, Internet and networking in general are the hottest targets for system developers. Sometimes a simple OS upgrade to a latest stable version, can save you an expensive hardware upgrade. Also, remember that when you buy new hardware, chances are that the latest software will make the most of it. Since the existing software (drivers) might support the brand new product because of its backwards compatibility with previous products of the same family, it might not reap all the benefits of the new features. It means that you could spend much less money for almost the same functionality if you were to buy a previous model of the same product.

[TOC]


Choosing Hardware

Since I am not fond of the idea of updating this section every day a new processor or memory type comes out, I will only hint what should you look for and suggest that sometimes the most expensive machine is not the one which provides the best performance.

Your demands are based on many aspects and components. Let's discuss some of them.

In discussion course you might meet some unfamiliar terms, here are some of them:

  • Clustering - a bunch of machines connected together to perform one big or many small computational tasks in a reasonable time.

  • Load balancing - users can remember only a name of one of your machines - namely of your server, but it cannot stand the heavy load, so you use a clustering approach, distributing the load over a number of machines. The central server, the one users access when they type the name of the service, works as a dispatcher, by redirecting requests to the rest of the machines, sometimes it also collects the results and return them to the users. One of the advantages is that you can take one of the machines down for a repair or upgrade, and your service will still work - the main server will not dispatch the requests to the machine that was taken down. I will just say that there are many load balancing techniques. (See High-Availability Linux Project for more info.)

  • NIC - Network Interface Card.

  • RAM - Random Access Memory

  • RAID - META

[TOC]


Expected site traffic

If you are building a fan site, but want to amaze your friends with a mod_perl guest book, an old 486 machine will do it. If you are into a serious business, it is very important to build a scalable server, so if your service is successful and becomes popular, you get your server's traffic doubled every few days, you should be ready to add more resources dynamically. While we can define the webserver scalability more precisely, the important thing is to make sure that you can add more power to your webserver(s) without investing additional money into a software developing (almost, you will need a software to connect your servers if you add more of them). It means that you should choose a hardware/OS that can talk to other machines and become a part of the cluster.

From the other hand if you prepare for a big traffic and buy a monster to do the work for you, what happens if your service does not prove to be as successful as you thought it would be. Then you spent too much money and meanwhile there were a new faster processors and other hardware components released, so you loose again.

Wisdom and prophecy , that's all it takes :)

[TOC]


Cash

Everybody knows that Internet is a cash hole, what you throw in, hardly comes back. This is not always true, but there is a lot of wisdom in these words. While you have to invest money to build a decent service, it can be cheaper! You can spend as much as 10 times more money on a strong new machine, but get only a 10% improvement in performance. Remember that a four year old processor is still very powerful.

If you really need a lot of power do not think about a single strong machine (unless you have money to throw away), think about clustering and load balancing. You can probably buy 10 times more older but very cheap machines and have a 8 times more power, then purchasing only one single new machine. Why is that? Because as I mentioned before generally the performance improvement is marginal while the price is much bigger. Because 10 machines will do faster disk I/O, than one single machine, even if the disk is much faster. Yes, you have more administration overhead, but there is a chance you will have it anyway, for in a short time the machine you have just invested in will not stand the load anyway and you will have to purchase more and think how to implement load balancing and file system distribution.

Why I am so convinced? Facts! Look at the most used services on the Internet: search engines, email servers and the like -- most of them are using a clustering approach. While you may not always notice that, they do it by hiding the real implementation behind the proxy servers.

[TOC]


Internet Connection

You have the best hardware you can get, but the service is still crawling. Make sure you have a fast Internet connection. Not as fast as your ISP claims it to be, but fast as it should be. The ISP might have a very good connection to the Internet, but puts many clients on the same line. If these are heavy clients, your traffic will have to share the same line and the throughput will decline. Think about a dedicated connection and make sure it is truly dedicated. Trust the ISP but check it!

The idea of having a connection to The Internet is a little misleading. Many Web hosting and co-location companies have large amounts of bandwidth, but still have poor connectivity. The public exchanges, such as MAE-East and MAE-West, frequently become overloaded, yet many ISPs depend on these exchanges.

Private peering means that providers can exchange traffic much quicker.

Also, if your Web site is of global interest, check that the ISP has good global connectivity. If the Web site is going to be visited mostly by people in a certain country or region, your server should probably be located there.

And a bad connectivity can directly influence your machine's performance. Here is a story, one of the developers told on the mod_perl mailing list:

  What relationship has 10% packet loss on one upstream provider got
  to do with machine memory ?

  Yes.. a lot. For a nightmare week, the box was located downstream of
  a provider who was struggling with some serious bandwidth problems
  of his own... people were connecting to the site via this link, and
  packet loss was such that retransmits and tcp stalls were keeping
  httpd heavies around for much longer than normal.. instead of
  blasting out the data at high or even modem speeds, they would be
  stuck at 1k/sec or stalled out...  people would press stop and
  refresh, httpds would take 300 seconds to timeout on writes to
  no-one.. it was a nightmare.  Those problems didn't go away till I
  moved the box to a place closer to some decent backbones.

  Note that with a proxy, this only keeps a lightweight httpd tied up,
  assuming the page is small enough to fit in the buffers.  If you are
  a busy internet site you always have some slow clients.  This is a
  difficult thing to simulate in benchmark testing, though.

[TOC]


I/O performance

If your service is I/O bound (does a lot of read/write operations to disk, remember that relational databases are sitting on disk as well) you need a very fast disk. So you should not spend money on Video card and monitor (monochrome card and 14`` B&W are perfectly adequate for a server -- you will probably be telnetted or ssh-ed in most of the time), but rather look for disks with the best price/performance ratio. Of course, ask around and avoid disks that have a reputation for headcrashes and other disasters.

With money in hand you should think about getting a RAID system. RAID is generally a box with many HDs. It is capable of reading and writing data much faster, and is protected against disk failures. It does this by duplicating the same data over a number of disks, so if one fails, the RAID controller detects it and the data is still correct on the duplicated disks. You must think about RAID or similar systems if you have an enormous data set to serve. (What is an enormous data set nowadays? Gigabytes, terabytes?).

Ok, we have a fast disk, what's next? You need a fast disk controller. So either you should use the one embedded on your motherboard or you should plug a controller card if the one you have onboard is not good enough.

[TOC]


Memory

How much RAM (Randomly Accessed Memory) do you need? Nowadays, chances are you will hear: ``Memory is cheap, the more you buy the better''. But how much is enough? The answer pretty straightforward: ``You do not want your machine to swap''. When the CPU needs to write something into memory, but notices that it is already full, it takes the least frequently used memory pages and swaps them out. Swapping out means writing the data to disk. Another process then references some of its own data, which happens to be on one of the pages that were just swapped out. The CPU, ever obliging, swaps it back in again, probably swapping out some other data that will be needed very shortly by another process. Carried to the extreme, the CPU and disk start to thrash hopelessly in circles, without getting any real work done. The less RAM there is, the more often this scenario arises. Worse, you can exhaust swap space as well, and then the troubles really set in...

How do you make a decision? You know the highest rate your server expects to serve pages and how long it takes to do so. Now you can calculate how many server processes you need. Knowing the maximum size any of your servers can get, you know how much memory you need. You probably need less memory than you have calculated if your OS supports memory sharing and you know how to make best use of this feature (preloading the modules and scripts at server startup). Do not forget that other essential system processes need memory as well, so you should plan not only for the web server, but also take into account the other players. Remember that requests can be queued, so you can afford to let your client wait for a few moments until a server is available to serve it, your numbers will be more correct, since you generally do not have the highest load, but you should be ready to bear the peaks. So you need to reserve at least 20% of free memory for peak situations. Many sites have crashed a few moments after a big scoop about them was posted and unexpected number of requests suddenly came in. (This is called a Slashdot effect, which was born at http://slashdot.org ) If you are about to announce something cool, be aware of the possible consequences.

[TOC]


Bottlenecks

The most important thing to understand is that you might use the most expensive components, but still get bad performance. Why? Let me introduce an annoying word: A bottleneck.

A machine is an aggregate of many big and small components. Each one of them may be a bottleneck. If you have a fast processor but a small amount of RAM (memory), the processor will be under-utilized waiting for the kernel to swap the memory pages in and out, because memory is too small to hold the most used ones. If you have a lot of memory and a fast processor and a fast disk, but a slow controller - the performance will be bad, and you have wasted money.

Use a fast NIC (Network Interface Card) that does not create a bottleneck. If it is slow, the whole service is slow. This is the most important component, since webservers are much more network-bound than disk-bound!

[TOC]


Conclusion

To use your money optimally you have to understand the hardware very well, so you will know what to pick. Otherwise, you should hire a knowledgeable hardware consultants and employ him/her on a regular basis, since your demands will probably change as time goes by and your hardware will likewise be forced to adapt as well.

[TOC]


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
[ Prev | Main Page | Next ]

Written by Stas Bekman.
Last Modified at 12/04/1999
Mod Perl Icon Use of the Camel for Perl is
a trademark of O'Reilly & Associates,
and is used by permission.
guide/help.html0100644000000000000000000003707707027225633012514 0ustar rootroot mod_perl guide: Getting Helped and Further Learning

Mod Perl Icon Mod Perl Icon Getting Helped and Further Learning


[ Prev | Main Page | Next ]

Table of Contents:


The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

[TOC]


READ ME FIRST

If after reading this guide and other documents listed in this section, you feel that your question is not yet answered, please ask the apache/mod_perl mailing list to help you. But first try to browse the mailing list archive. Most of the time you will find the answer for your question by searching the mailing archive, since there is a big chance someone else has already encountered the same problem and found a solution for it. If you ignore this advice, do not be surprised if your question will be left unanswered - it bores people to answer the same question more than once. It does not mean that you should avoid asking questions. Just do not abuse the available help and RTFM before you call for HELP. (You have certainly heard the infamous fable of the shepherd boy and the wolves)

For more information See Get helped with mod_perl.

[TOC]


Contacting me

Hi, I wrote this document to help you with mod_perl. It does not mean that if you have any question regarding mod_perl, perl or whatever you think I might know, you should send it directly to me. Please see the Get helped with mod_perl section and follow the guidelines as prescribed there.

However, you are welcome to submit corrections and suggestions directly to me at sbekman@iname.com?subject=mod_perl%20guide%20corrections. If you are going to submit heavy corrections of the text (I love those!), please help me by downloading the source pages in POD (from the main page under the index) and directly editing them. I will use Emacs Ediff to perform an easy merge of your changes. Thank you!

PLEASE NO PERSONAL QUESTIONS, I didn't invite those by writing a guide. They all will be immediately deleted. Please ask the questions at the mod_perl list and if someone or I can answer your question--it will be answered. Thank you!

[TOC]


Get helped with mod_perl

mod_perl home

http://perl.apache.org

mod_perl Garden project

http://modperl.sourcegarden.org

Apache Modules Book

http://www.modperl.com is the home site of The Apache Modules Book, a book about creating Web server modules using the Apache API, written by Lincoln Stein and Doug MacEachern.

Now you can purchase the book at your local bookstore or from the online dealer. O'Reilly lists this book as:

          Writing Apache Modules with Perl and C
          By Lincoln Stein & Doug MacEachern
          1st Edition March 1999
          1-56592-567-X, Order Number: 567X
          746 pages, $34.95

  • mod_perl FAQ

    by Frank Cringle at http://perl.apache.org/faq/ .

  • mod_perl performance tuning guide

    by Vivek Khera at http://perl.apache.org/tuning/ .

  • mod_perl plugin reference guide

    by Doug MacEachern at http://perl.apache.org/src/mod_perl.html .

  • Quick guide for moving from CGI to mod_perl

    at http://perl.apache.org/dist/cgi_to_mod_perl.html .

  • mod_perl_traps, common traps and solutions for mod_perl users

    at http://perl.apache.org/dist/mod_perl_traps.html .

  • mod_perl Quick Reference Card

    http://www.refcards.com (Apache and other refcards are available from this link)

  • mod_perl Resources Page

    http://www.perlreference.com/mod_perl/

  • mod_perl mailing list

    The Apache/Perl mailing list (modperl@apache.org) is available for mod_perl users and developers to share ideas, solve problems and discuss things related to mod_perl and the Apache::* modules. To subscribe to this list, send mail to majordomo@apache.org with empty Subject and with Body:

      subscribe modperl
    

    A searchable mod_perl mailing list archive available at http://forum.swarthmore.edu/epigone/modperl . We owe it to Ken Williams.

    More archives available:

  • [TOC]


    Get helped with Perl

    [TOC]


    Get helped with Perl/CGI

    [TOC]


    Get helped with Apache

    [TOC]


    Get helped with DBI

    [TOC]


    Get helped with Squid - Internet Object Cache

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/images/0040755000000000000000000000000007027225633012130 5ustar rootrootguide/images/CVS/0040755000000000000000000000000007027225633012563 5ustar rootrootguide/images/CVS/Root0100644000000000000000000000001207027225633013417 0ustar rootroot/home/cvs guide/images/CVS/Repository0100644000000000000000000000003107027225633014654 0ustar rootrootmodperl/guide/src/images guide/images/CVS/Entries0100644000000000000000000000014507027225633014114 0ustar rootroot/mod_perl.gif/1.1.1.1/Fri Jul 9 12:09:04 1999// /mod_perl2.jpg/1.1.1.1/Fri Jul 9 12:09:04 1999// D guide/images/mod_perl.gif0100644000000000000000000000156307027225633014422 0ustar rootrootGIF89aZÕÿÿÿ!!!)))111999BBBJJJRRRZZZccckkksss{{{„„„ŒŒŒ”””œœœ¥¥¥­­­µµµ½½½ÆÆÆÎÎÎÖÖÖÞÞÞçççïïï÷÷÷ÿ÷9ÿsïÖJÎ{ÿÆÿ÷ç÷­÷sï!,Zÿ@ŽpH,ȤrÉlªžÐ¨tJ­Z¯Øì5ÁÑz¿à°–+.›ÏÛ®t4"‘¨¥¸¼dª›¦§|¾Šê£R€R„S !!Qˆ"Œ*dQlmnSrSvQz*›R~Q)…P„‚R‡‰P‹V‘O“OnoQ—R™O&{OP}Y¤ ÀÁˆŠ!Ž"R¬®P°²sqvw¶·Ry(Q¼X¾Á¨OªU‘““°n%P³Uvzë'( ïð  ¢O¤†Ã©Å«]ÊS$qOÐaª.€ƒ 5ˆ”SÄŒ9‚¤—wDHÈ1€E,¬>nAÈ j`A‘û`Ž·ã䵕z9ud4¹°\lu´´ï÷{7¡Å"Mu¥ã/ÃÜ”£)GˆÇÇ‹$L~OGÉs¨}l锜¬J-#¨Ò,c+uo XÚÝun³è·Ñ~ß§½[ú·Ô¯ê½?!­m×4—†l‚[í»ÁsΣ §d}`ÊÊÏÃcsªÙSZâ÷×í}Mõigé?ÌTºW×ÎÐ:N?M?¨ÝDµÏÆaÁ.x!÷ú{¾–Õ>W™Ë/F)Èk´}>¦ “å„8qË‹ ” ý)ëõŸ£òû¾—ÐR^z?ƶC¥Íè¯5‰×ítßo£í[ÖõûG êr¨m”׳ô¬õ™ê±Ö³qõíþj¿Ò3ôÛÿCï®Lüž|îÇ‹oTe·÷%&3 ã(ÞÜQ1û8ž‘%‰Õ¾¸ô.•oÙ]qÌê–³§a_%Îw§èUüÛ¶û¿Mé"t»>°g^ÜÜúÛÓq6»ÒéÂ,½ÄŸeÙ¹ÍÓ웉ûÿ¦Êÿ«¡ÿÐì?ÆYõjÛê¬Ü0®§.ÚÚ@&ªÛ¯þßmMs×3F#nkúÖ#ˆÊÈÀ´âd4*±à²ç×ôYVÚ?¢¯Jpik€ ˆ ðBå:‡Õ|,.¦ì Û:~EV=ô1¬ÛQ ýMçú-{šýÌÛþÏGÐDWT+ê‡P§;œcƒF•±´»Üò [î}ÞïUþ¦ÿ¡þ ^Ên=™ž¦žöÌ8oÝúe’æ{?2ßÜ\gÕ®¿_LéY]AÖ½ï.¥Ÿdu{.°5Ÿ£fNç2‡Ñêú¥~û¶¤M׳?i]F}UmÈê}´—N–pòqöšØæåc:Ïç>‡¢ž#-(uÛ¡—÷xýÉù=yêx}#!¹ý6ʲ1¨u§ªSIõèma³!ší«ìw]ß÷#ôžŸéoÚWŠc‡cÑ”Ì67§þЩíϨ7ub‹1ý*'}ŸL9ïÿ Çú~ÿNÕk¬ôN¬Ü N±Õól»;Û[}G¼Tׇ[Enßíu7VÕfœLä3f†a¯¨ñÇOêú®Ø‡)œ˜Çƒ‡Š^Ø–Cíþ³÷=_¤ú[¾¶}[nC±Ô±ÅÌ%¥›ÄÈå¿ÖY¹_ãêN±Ž½×YP$0TíOîµö†VÏúóë\EVðFF/êxÕgu:Yv.-ÆÃ¸XK*¯Ôuo¯ô®k«e[ÿœþe§t;oÁê™ÕRÚGG66ö¾K½Jƒ­ÉÇ®%¶º–7ßÿ¥ö~ó䞟 ?KüO•’<$ñó8qðžUÝðƒ.?ñ]¼Ïñ¤û)?²z`mÁÅ­²èpÔ{v¶½¿¤Ýûë;­ýlê•íËêNe;¶ÔM6¶M[_é»þü"JŽ£‰‘šÌ•‡´dæZ´#ÓǦŠZû²/ÜïfÆ£YкýÝHâÕ~'£f#ó©Î{«²¡ü˜m•Ù§éúý‰ãŸäðÿ¹ùn3ÒY}_úUµ’ø~?ç³Ë˜#ôqF\:sõ|?á9íé¸Õ¶\çs¸»O§[µ÷mS²Þ™‚^Y§Ñ'˜:9»BX_VózŽYùKmwåS†ÊëwÝ ­û‡ø4|Þƒõk£¿*«:†=ù¸„µøå·>ÇXÑî­¥õú;ýßÔFçòŸlpã2"†²Ÿòÿ¹¼– T0âÃ-¸³›•|Þ¬x=ßý+5ýjË¥¸T>âFÒð6·M7nù­“õ—*¾ŸÖ²nuލt»©éä±”Ÿkn²ïÑú¹7zµ¿ù¿fÏðJ®+ó:­ÅéXN¾×‡mkD6>ŽýÎÚÆµ»—¥ýXú«wNé/§¨ÚÛ²r­fE¬ÚÊÜÐÑéWºwû[¶Û<ß)<12æ9s1#‡ÙáýîÐi|G›Ãž1ŒsÓ‰¿Dc?{÷§ÅÿUš?©/êÞ?©Ó0™‰”æÃœ÷/-Ÿ~ë­k]·{IU^Æ.©WÇÁÆÇ.ulh.{ž ßKi…ag¹ÿÑõ+KÅn5·{À%­&$ǵ²¼÷©åýe͵£>ŠqÁ>Ÿ§y³kI.Ýöv×]n{½/ó¿CùÏѯD*ŽvNEM5ú·Õ 54º IÞv¹Œöÿ5¿ùÇ©1dà7ÃÄ׿ysš<"fó£þ/xŸ0ÆÆÁÇéuaõ+šzy¨ÇcK.?f}±v˶1í¿ìÞÊ¿í¿Ò?tjŸöüÂ×åÜ ©kXÖÒ4ûeo©±›}•oþv«n^£ö·£ ¼z2)¬³ë«õÙú^¦ÏEþÿð»,b¥WÕñSìnVMYø¹e¸™4µÕúåÛ›“Ž'õÑûNß§úgé=IqæÂ 8ñ™ÄW⯗åàÿºÿ¸lrÆX8ˆ¶á᳤ŸÓ®v7N§õ«hõò¨¹¦Æý–‹7·c«úu³þ»üÛÕŒo¬}¦ž‹HmýBìüÌÌÜW´³×ËÞܶån­Î̲ºmvýžŸø:×¢átN“€^pñ*¤Ùc®qkDïwÒ{wNÏê3Ø®´ˆÁ6Yy]pÈêI‘˜„¤èððK‡…ŽY$E›â$¿÷¯–Œ´àgô\WŠhÄê,ÌéYÏeŸc}e…¬ÆÈÍmoôßUúnö}¢¿Oz¹Ð0…Ô,ÌéåÙôàZ3:ƒ™ïËpöU‡ê6»2jþWé½Tµ@óy8eÐŒ¾oÒâô{2ï7 ¢rH‰|ÂþgƒgNúÅoKé•u—P2,ꘙ,f=mfÚj¥¾£ic½ªæKprmÏoUêUdcÛyux4RÍÞ›v9”>ÏGí_êWþ•vSê¡–IcÓýÖ*îIóEE!˜Õ2š¹ cC¿Éj2I&z”줒I%?ÿÒõTŸæûü—Ëi$§ésþ°‰_àøî¾dI%?Rc}ô²Œ¾UI%?U$¾UI%?U$¾UI%?U$¾UI%?U$¾UI%?U$¾UI%?ÿÙ8BIMÿþ'File written by Adobe Photoshop¨ 4.0ÿîAdobed€ÿÛ„            ÿÀ;–"ÿÝ ÿĨ !"21BR#Abr‚3Qa’¢²C¡SsƒÁc³ÃÓó$t5  !"21BRAQaqr‚²#“ðb3Ñ’¢ÂÒCcÃÓÿÚ ?Õh¨ì¼ÌŒUcèão’£Ç¡‰§/¶ß?1yª•î|¼iš)(û,‹,6¶4ãjœàl]øƒø›îR$ <`M*¼""TN*«Á)™eà3Œo'-ÑÆÅÛÝTÆñ·òÖQ˜îŒîc>Z²MÂhô„¨*:ŠƒÛefëŽþ'»ÑN)°3!‡ÛGXpm| Wï Gsöñ8 ŽF:›¤ Ž&ªCÔ5MÞW0纉ô2í9-‡Uæ+täÛEVÃdòŒ'‰Ç2(ð­‹Å¥¸þÏ5o¶Å%o{hTŽµÝ®Ì D×#ç1b»1‰Œ¹:ª<òZ =W’ôÒ™Ì➈äÖ¥4Q˜×yÛ‘4ñÜ»£ïVXö2ôlëâ¼ ëí¼Ûj*;€ÆÝj•3 ™0ôVÏ}:ì(«›Hò-mÿ.Gͤ®—w_íÉô¢©óK¶+¼S#ÜÒ1-m;#Èo]K]5¹|µd}öc²o¾hÛM¢‘™pDDñ"¬ïµcI^òY‰ æ#9÷Ô¸DPîûÉWnâÿàdõþ¬s"w¨« +Ž´7q2ñûÐPóØiϤxs}åÄ$UQð¸}T ÍâuæB[DädR|P‘TPx/Ø.ºÉðQ'Nc 8Ö]¢ï«²‘4 t<ß5>íl<¶UÑ—Ssa´ø"éíšt7æpÜr¶É´Ä—üÍSK*·sdOñ§Ô€'Í.™ÞóÇCÅ?/&<¹M¶6Í×!èѶ·T¶ ¹<\iÄ(ûbd)àŠ©©[Y´Üq;i”'?4UVÀl]ËTÅx|–Õã¶eGÛBsƒÅ¡U‘Q<,.o…Fá0ãÆmj•Êéy#‰-­ìF¡˜Ð}OÑPŽwŸm6j85EÑU•8|ÖÛ]&wA€"Ô2ò €½$(¼Æ…ä yëŒ:ê3GÁ•_ >VV^Ÿ\—¢›AšÜÆ•Öø·ÂÒOE.UûÔß+ÜlB"O” :†Â-Ξ¼ŽÝÏ9qzªg$h¨¦òó—îEÌcãj¢µ -Y$Þ¢$á3ªnZ¤çÓîî©EŸÿо÷|Ù0;k#.!(>Ó­š&ªŠ¼·'Ùº²÷1ŽlFÍ‹×{bo*ª!{­è þ„g[øQòÞ…(ob@+nºj‹úÒ³l~>N J`r¤€WoC”+Ê@…h’ó \^èÒbDj+=È¡åóâJPL>,†f•uWm,yáõ[¶Ï¢¬Rq›u–‚öS@#ÕtN^o»T<ŒØ'ÜbäW6d¢Jéí¸C¯£×W\'ÑÉœ®nˆŠì‹kãÃU |ÛšT“¨ŒD<ìùbqQ¡µTqt ×OÞº¥»^kH'‡x‡êcº øû.q.½“äýŠ£xdÙh’,E#0Sl xî™ñê´k¬\÷Ô³8é¤y‘¤Fež¥ŽáY$¬þ`aLhu:Fuè3T´BP‚(š"""øÕF_õ+Ñ(ÇŽóèœøké+µ!¨IÔ<ÙHRŽ-ƒZë¶Mꨟ¬®çÿ0i €ÒÕg®œ#OÚ˜ü'xÿÓ³J‹Í³@ÈcÚ£HZ-Í’pùÛQ*†^φ"Bä¹&$&óŠŠ„–—)¹oJÕ:_x÷—P™}YÓE´4DÓô‡o¿Q2æN˜ò»)õ3-KUUE×ÍÃ_lÿË­±®õ´Ç‡*ë¨o—úgBxcC“"c ïŸá—ع®ÕíhC3Ë!ID÷ǘoÖßÞ¨¼—õó" Tq--yÎ$¾®^‹ª–OÇ"3½tEÕ~)ÿP)³¹–]4Eøý¤ð4®ÄðmæZ¾lƒOëïn6þ¶¡Êýë!<5öVXä÷6zc/HU[€‡AÑ}* ƒÓQn+‡©¼êݯ=ʪ¨KçJƒw¸ KÛ×U]tDø×$äæ+j(¼5*Û„xr9«¶éû8ëŸáþy_›í±©lUÒ‹lšvdF“^¥^*Ÿ$§Ý°bvºÇiÁYK1I#èªJÚµ¶F~K:jº)nèOŸT]æ«gcâ1ß^æ2CúdX6•\D%oWÀUæ-^û5ÛÑ0¦9K²÷ÏNª¢rýI͹Éâ¬n],ă¼aÑÉÖ’Ðew¾v(7`;{JŠS¤ª+ï.–‘¢[æòí4èz­]»Ù¬) ÇT²eÕ]ÈÈU'KÇnå-¡ýÿž p½­3!—Xµ\&‘uB´†ïP/èÿ·WÆåhwÐQß2ªŽ¿.ºWŽ'‘:QEBÿÑÕi¬ï¡ÙÿÍq¿I…úý´ˆ©ÕRòØžäz[†r CSmÆ›CQÂâsþåc¸ÊøÒ¨·z‹íYsJE hM%K*';ú†—°ÞÚ“.(¸ b£!Òb?ã9Ë·LœÌN<ÛÎ-‘£Çy a Õž$ôíÈ;›æ©÷q ù‹Ç*\¤Ò8¬‰ h¡r…Œ€ˆ éòn;H‘Žg<ذŒ«"$"Ó¤®6.´Š¾:(2óàkÖAºÿÉQ³ÎÙr-Üqœš2(-øŸ1¾'Ba$t ñ;rÕVeíJ²$XòåLqÄi¹NÞÖŸ[œ´ÝóYBÎeÃu瑦8).DŸÌÿ›O2Q #¡b(·-i·SBD^*\È$-¹ø”ë·ßÀc2,Nœ×·IÖ¶Âí\Ñ»Gùv—ðWªµE1®ï 5Ú5ŠWƒ—‚z-´ÝãÁ…”'åÏ󲲿k’I÷v?ŬÜN†ö-à‹•Uu'\ãép„y}tVÙÁa&Åí÷3äÕ}['‚ð?à Á/?¢‘ºÚšÜØÙ˜‘£ÄÉïœVww©‡Ýç]ã®÷ÒÅáàb¤JaÈ`èÏFn%"žž¯?ùušîéˆcñ]QÂmëq¯SîÌ›ˆµ¨ïÁ…I ï:Í‘c‚ÀÂg¼f ·þGŒŒ“¦‚’©^0®ºû¹¾ÅDfûr[ÜÜ rÙÈ+/Dw]teën1»«l·*j¹øüL–"cÝ“É>‹5%0ã­m"*{„º}C¦ÇOüêt|D†±„ÆO#M–YV˜&55ö¶ìü:¯ÌòÎìÚ,6ý¾9ƒ`Ȇ\„¨kK_¥Öð·Rþ´‡~7g„ùXg±®}R8ë•L¥+àžã»Ií{œ›v׌ÔÞÑ !¬ŸÖ¨ˆª…±oÅ»öú¬ùêÏ# œ³ù7`ÌyÉgù^Ú+{ÆœæS¶ö ùèNÚË—nÅļDħ¦¬§I•+mDƒi\ž^Z†Ü‚ZÏk\Þo¡»RÚ*!R{Þ ïü/îrñ4`×kc d;j ð‘·g}BMÖâVÑ«´.^¯%#1tf&vÐGiµ1nnù’ ¢ØÓûW¯W/%Y‡Qrx!Uv@b¦ò¢­Äí¶s•Þ\#CÅÅWÍbK”ûŽ›¢®4h—‘‹|L‚Ê–Ýî¡Ù>êµnâå_j!¹±êº€IµK¢sܶÚËûÒ·ìÜüä0ØmI.#åÑ:K˜¼*ùÛÝ­Ú{Ò‰49šžôÕ-T˜l”,UUø|µÖ° ÄÜîÙyXôz–VëÄ·…±ˆL}„Ðkµ€)àˆœuþÕøÒ¨¢ªqBŠ(¢ÿÒÕšËôp'¬RhRµ‹ûˆ©Ý$ºWý”BFBq×\~Cm¨¸â š¹ðAOnÄ´ym?Û®,4ì‘mÊšƒ  Yl á㪽ÕÕÌÞÝ”ì¼|ô•ýw}ïL{­Gi‰N35æRߨq¡BTEäõt´˜pyéü&˜?hŒZ;¸Y{b›CËN¹~OïÖŸ3øiD®/LhÎ&+fn+MÜâêv¶)rú‹A®Œ@‡Üq†A²x¯uE4¸½G¥;¢–7ë[ºÞ®­Ñ ÃÂV—^S“ƽÑiTQ„E¥QD!EQQEBQEŸÿÙguide/index.html0100644000000000000000000015201507027225633012661 0ustar rootroot mod_perl Guide

    mod_perl Guide

    Deploying mod_perl technology to give a rocket speed to your CGI/perl scripts.

    Version 1.19 Dec, 19 1999


    Mirror readers: Make sure you read the latest copy.

    Table of Contents:



    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.



    Search perl.apache.org along with this guide.

    URL: http://perl.apache.org/guide
    Copyright © 1998, 1999 Stas Bekman. All rights reserved.

    Written by Stas Bekman.
    Last Modified at 12/19/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/install.html0100644000000000000000000036700307027225633013225 0ustar rootroot mod_perl guide: mod_perl Installation

    Mod Perl Icon Mod Perl Icon mod_perl Installation


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    Installing mod_perl in 10 Minutes and 10 Command Lines

    Did you know that it takes about 10 minutes to build and install mod_perl enabled Apache on a pretty average processor and decent amount of system memory? It goes like that:

      % cd /usr/src
      % lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz
      % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
      % tar zvxf apache_x.x.x.tar.gz
      % tar zvxf mod_perl-x.xx.tar.gz
      % cd mod_perl-x.xx
      % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
        DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1
      % make && make test && make install
      % cd ../apache_x.x.x
      % make install
    

    That's all!

    * Of course replace x.x.x with the real version numbers of mod_perl and Apache.

    * GNU tar utility knows to uncompress as well (with z flag).

    What's left is to add a few configuration lines to a httpd.conf, an Apache configuration file, start the server and enjoy mod_perl.

    If you have stumbled upon a problem at any of the above steps, don't despair -- the next section will explain in details each and every step.

    [TOC]


    The Gory Details

    We saw that the basic mod_perl installation is quite simple and takes about 10 command that can be copied and pasted from these pages. However, sometimes you need to make different optimizations by passing only specific parameters (compared to EVERYTHING=1), bundling other components with mod_perl and etc. You may want to build mod_perl as loadable object, that can be upgraded without rebuilding the Apache itself.

    To accomplish this you will want to understand various techniques for mod_perl configuration and building. You need to know what configuration parameters are available and when each of them should be used.

    Like with Perl, simple things are simple with mod_perl, but when you need to accomplish some more complicated tasks you have to invest some time into a deeper understanding of the process. In this chapter we will take the following route. I'll start with a detailed explanation of the four stages of the mod_perl installation process, then continue with different passes each installation might take according to your goal, following by a few copy-and-paste real world installation scenarios. Toward the end of the chapter we will see various approaches that make the installations easier, by automating most of the process' steps , finally I'll cover some of the general issues users get stumbled with while installing mod_perl.

    We can clearly separate the installation process into the following stages: Sources Configuration, Building, Testing and Installation itself.

    [TOC]


    Sources Configuration (perl Makefile.PL ...)

    Before building and installing mod_perl you have to configure it. You configure mod_perl as any other Perl module:

      % perl Makefile.PL [parameters]
    

    In this section we will go through most of the parameters mod_perl can accept and explain each one of them.

    But first let's see what configuration mechanisms we have in our hands. Basically from the configuration point, all of them define a special set of parameters to be passed to perl Makefile.PL. Depending on the chosen configuration, the final product might be a stand-alone http binary, a loadable object or else.

    The source configuration mechanism in Apache 1.3 provides four major highlights mod_perl can benefit from:

    Per-module configuration scripts (ConfigStart/End)

    This is a mechanism modules can use to link themselves into the configuration processing. It is useful for automatically adjusting configuration and build parameters from the modules sources. It is triggered by ConfigStart/ConfigEnd sections inside modulename.module files.

    Apache Autoconf-style Interface (APACI)

    This is the new top-level configure script from Apache 1.3 which provides a GNU Autoconf-style interface. It is useful for configuring the source tree without manually editing any src/Configuration files. Any parameterization can be done via command line options to the configure script. Internally this is just a nifty a wrapper to the old src/Configure script.

    Since Apache 1.3 this is a De facto way to install mod_perl as clean as possible. This currently is a pure Unix-based solution because the complete Apache 1.3 source configuration mechanism currently is only workable under Unix. It doesn't work under the Win32 platform, so mod_perl cannot use it there, too.

    Dynamic Shared Object (DSO) support

    This is beside Windows NT support one of most interesting features in Apache 1.3. Its a way to build Apache modules as so-called dynamic shared objects (usually named modulename.so) which can be loaded via the LoadModule directives from within Apache's httpd.conf file. The benefit is that the modules is part of the httpd program only on-demand, i.e. only when the user wants the module it is loaded into the address space of the httpd module. This is for instance interesting for memory consumption and easy upgrade issues.

    The DSO mechanism is provided by Apache's mod_so.c which needs to be compiled into the httpd program. This is automatically done when DSO is enabled for module mod_xxx via configure --enable-module=xxx or by explicitly adding mod_so via configure --enable-module=so.

    APache eXtenSion (APXS) support tool

    This is a new support tool from Apache 1.3 which can be used to build an Apache module as a DSO even outside the Apache source-tree. One can say APXS is what MakeMaker and XS is for Perl. It knows the platform dependent build parameters for making DSO files and provides an easy way to run the build commands with them.

    Taking these four features together provides a way to integrate mod_perl into Apache in a very clean and smooth way. No patching of the Apache source tree is needed in the standard situation and even not only the source tree itself is needed in the APXS situation.

    To benefit from the above described features a new hybrid build environment was created for the Apache-side of mod_perl. The Apache-side consists of the C source files of mod_perl which have to be compiled into the httpd program. They are usually copied to the subdirectory src/modules/perl/ in the Apache source tree. To integrate this subtree into the Apache build process a lot of adjustments were done by mod_perl's Makefile.PL in the past. And additionally the Makefile.PL controlled the Apache build process.

    The side-effect of this approach was that it is both an not very clean and especially captive way. Because it assumed mod_perl is the only third-party modules which has to be integrated into Apache. This is very problematic.

    The new approach described below avoids these problems. It only prepares the src/modules/perl/ subtree inside the Apache source tree without adjusting or editing anything else. This way no conflicts can occur. Instead mod_perl is activated (and then configures itself) when the Apache source tree is configured via standard APACI calls later.

    We will return to each of the above configuration mechanisms when describing different installation passes, once the overview of the four building steps is completed.

    [TOC]


    Configuration parameters

    perl Makefile.PL accepts various parameters. In this section we will learn what are they and when should they be used.

    [TOC]


    APACHE_SRC

    You will be asked the following question during the configuration stage:

      "Configure mod_perl with ../apache_xxx/src ?"
    

    APACHE_SRC should be used to define the Apache's source tree directory. For example:

      APACHE_SRC=../apache-x.x.x/src
    

    Unless APACHE_SRC specified, Makefile.PL makes an intelligent guess by looking at the directories at the same level as mod_perl sources and suggests a directory with the highest version of Apache found there.

    Answering 'y' confirms either Makefile.PL's guess about the location of the tree, or the directory you have specified with APACHE_SRC.

    If you use DO_HTTPD=1 or NO_HTTPD -- the first apache source tree found or the one you have defined will be used for the rest of the build process.

    [TOC]


    DO_HTTPD, NO_HTTPD, PREP_HTTPD

    Unless any of DO_HTTPD, NO_HTTPD or PREP_HTTPD used you will be prompted by the following question:

      "Shall I build httpd in ../apache-x.x.x/src for you?"
    

    Answering 'y' will make sure an httpd binary will be built in ../apache-x.x.x/src when running make.

    To avoid this prompt when the answer is Yes use:

      DO_HTTPD=1
    

    Note that if you set DO_HTTPD=1, but not used APACHE_SRC=../apache-x.x.x/src -- the first apache source tree found will be used to configure and build against.

    PREP_HTTPD=1 just means default 'n' to the second prompt -- meaning, do not build httpd (make) in the apache source tree. But it still will ask you about Apache's source location even if you have used the APACHE_SRC parameter. Providing the APACHE_SRC parameter will just save perl Makefile.PL a need to make a guess.

    To avoid the two prompts and avoid building httpd, use:

      NO_HTTPD=1
    

    If you choose not to build the binary, you will have to do that manually. We will talk about it later. In any case, you need to run make install in the mod_perl source tree, so the perl side of mod_perl will be installed. Certainly, make test wouldn't work until before you get the server built.

    [TOC]


    Callback Hooks

    By default, all callback hooks except for PerlHandler are turned off. You may edit src/modules/perl/Makefile, or enable when running perl Makefile.PL.

    Possible parameters are:

      PERL_POST_READ_REQUEST
      PERL_TRANS
      PERL_INIT
    

      PERL_HEADER_PARSER
      PERL_AUTHEN
      PERL_AUTHZ
      PERL_ACCESS
      PERL_TYPE
      PERL_FIXUP
      PERL_LOG
      PERL_CLEANUP
      PERL_CHILD_INIT
      PERL_CHILD_EXIT
      PERL_DISPATCH
      
      PERL_STACKED_HANDLERS
      PERL_METHOD_HANDLERS
      PERL_SECTIONS
      PERL_SSI
    

    As with any parameters that are either defined or not, use foo=1 to enable them (e.g. PERL_AUTHEN=1).

    To enable all callback hooks use:

      ALL_HOOKS=1
    

    [TOC]


    EVERYTHING

    To enable all possible hooks, set:

      EVERYTHING=1
    

    [TOC]


    PERL_TRACE

    To enable tracing set: PERL_TRACE=1

    [TOC]


    APACHE_HEADER_INSTALL

    By default, the Apache headers files are installed into $Config{sitearchexp}/auto/Apache/include directory.

    The reason for installing the header files is to make life simple for module authors/users when building/installing a module that taps into some Apache C functions, e.g. Embperl, Apache::Peek, etc.

    If you wish not to install these files use:

      APACHE_HEADER_INSTALL=0
    

    [TOC]


    PERL_STATIC_EXTS

    Normally, if an extension is linked static with Perl it is listed in Config.pm's $Config{static_exts}, in which case, mod_perl will also link this extension static with httpd. However, if an extension is linked static with Perl after it is installed, it is not listed in Config.pm. You may either edit Config.pm and add these extensions, or configure mod_perl like so:

     perl Makefile.PL "PERL_STATIC_EXTS=Something::Static Another::One" 
    

    [TOC]


    PERL_MARK_WHERE

    Generally for Apache::Registry scripts, the reported line number for warnings and errors that end up in the error_log file is not correct. This is due to the fact that Apache::Registry auto-magically wraps the scripts running under its handler into a special code that enables the caching of the compiled scripts.

    If configured with PERL_MARK_WHERE=1, mod_perl will attempt to show the exact line the error or warning happened at.

    [TOC]


    APACHE_PREFIX

    If you want to use a non-default Apache installation prefix, use APACHE_PREFIX parameter, e.g.:

      % perl Makefile.PL APACHE_PREFIX=/usr/local/ [...]
    

    [TOC]


    APACI_ARGS

    When you use <USE_APACI=1> parameter, you can tell the perl Makefile.PL to pass any arguments you want to the Apache's ./configure utility, e.g:

      % perl Makefile.PL USE_APACI=1 \
      APACI_ARGS=--sbindir=/usr/local/sbin/httpd_perl, \
             --sysconfdir=/usr/local/etc/httpd_perl, \
             --localstatedir=/usr/local/var/httpd_perl, \
             --runtimedir=/usr/local/var/httpd_perl/run, \
             --logfiledir=/usr/local/var/httpd_perl/logs, \
             --proxycachedir=/usr/local/var/httpd_perl/proxy
    

    Notice that all APACI_ARGS (above) must be passed as one long line if you work with t?csh!!! However it works correctly the way it shown above with (ba)?sh (by breaking the long lines with '\'). If you work with t?csh it does not work, since t?csh passes APACI_ARGS arguments to ./configure by keeping the new lines untouched, but stripping the original '\', which makes the all the arguments but the first one, ignored by the configuration process.

    [TOC]


    Reusing Configuration Parameters

    It's quite hard to remember what parameters were used in mod_perl build, when you have to upgrade the server. So it's better to save them into a file. For example if you create a file at ~/.mod_perl_build_options, with contents:

      APACHE_SRC=../apache_x.x.x/src DO_HTTPD=1 USE_APACI=1 \
      PERL_MARK_WHERE=1 EVERYTHING=1
    

    You can build the server with the following command:

      % perl Makefile.PL `cat ~/.mod_perl_build_options`
      % make && make test && make install
    

    But wait, mod_perl has a standard method to perform the above trick. If a file name makepl_args.mod_perl is found in the same directory as the mod_perl build location with any of these options, it will be read in by Makefile.PL.

      % ls -1 /usr/src
      apache_x.x.x/
      makepl_args.mod_perl
      mod_perl-x.xx/
      
      % cat makepl_args.mod_perl
      APACHE_SRC=../apache_x.x.x/src DO_HTTPD=1 USE_APACI=1 \
      PERL_MARK_WHERE=1 EVERYTHING=1
      
      % cd mod_perl-x.xx
      % perl Makefile.PL
      % make && make test && make install
    

    Now the parameters from makepl_args.mod_perl file will be used, as if they were directly typed in.

    There is a sample makepl_args.mod_perl in the eg/ directory of mod_perl distribution package, in which you might find a few options to enable experimental features to play with too!

    But if you have found yourself with a compiled mod_perl and no traces of the specified parameters left, usually you can still find them out, if the sources were not make clean'd. You will find the Apache specific parameters in apache_x.x.x/config.status and mod_perl's at in mod_perl_x.xx/apaci/mod_perl.config.

    [TOC]


    Discovering whether some option was configured

    To find out whether some parameter was included in the server, you can take a look at the symbols inside the httpd executable with help of nm or similar utility. For example if you want to see whether you have enabled PERL_AUTH=1 while building the mod_perl, you do:

      % nm httpd | grep perl_authenticate
    

    But it would work if you have an unstripped httpd binary. make install by default strips the binary before installing it.

    Another approach is to try to use this parameter in the configuration file, and if it wasn't enabled Apache will tell you that, when you will start the server, by reporting an unknown directive (e.g. when you attempt to use PerlAuthenHandler handler without building with PERL_AUTHEN=1 parameter)

    [TOC]


    Using an alternative Configuration file

    By default mod_perl provides its own copy of Configuration file to Apache's ./configure utility. If you wish to pass it your own copy, do:

      % perl Makefile.PL CONFIG=Configuration.custom
    

    Where Configuration.custom is the name of any file relative to the apache source tree you build against.

    [TOC]


    perl Makefile.PL Troubleshooting

    [TOC]


    "A test compilation with your Makefile configuration failed..."

    When you see this during the perl Makefile.PL stage:

      ** A test compilation with your Makefile configuration
      ** failed. This is most likely because your C compiler
      ** is not ANSI. Apache requires an ANSI C Compiler, such
      ** as gcc. The above error message from your compiler
      ** will also provide a clue.
       Aborting!
    

    you've got a problem with your compiler. There is chance that it's improperly installed or not installed at all. Sometimes the reason is that your perl executable was built on a different machine, and the software installed on your maching is not the same. Generally this happens when you install the prebuilt packages, like RPM or DEB. What happens is that the dependencies weren't properly defined in the perl binary package and you were allowed to install it, while some essential package is not installed.

    The most frequent pitfall is a missing gdbm library. See Missing or Misconfigured libgdbm.so for more info.

    But why guess, when we can actually see the real error message and understand what the real problem is. To get a real error message, edit the Apache src/Configure script. Down around line 2140 you will see a line like:

       if ./helpers/TestCompile sanity; then
    

    change it to:

       if ./helpers/TestCompile -v sanity; then
    

    and try again. Now you should get a useful error message.

    [TOC]


    Missing or Misconfigured libgdbm.so

    On some RedHat systems you might encounter a problem during the perl Makefile.PL stage, when the installed perl was built with gdbm library, but the library isn't actually installed. If this is your situation make sure you install it, before proceeding with the build process.

    You can check how the Perl was built, by running perl -V command:

      % perl -V | grep libs
    

    On my machine I get:

      libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
    

    Sometimes the problem is even more obscure, you do have libgdbm installed but it's not properly installed. Take a look at:

      % ls /usr/lib/libgdbm.so*
    

    If you get all three files like I do:

      lrwxrwxrwx   /usr/lib/libgdbm.so -> libgdbm.so.2.0.0
      lrwxrwxrwx   /usr/lib/libgdbm.so.2 -> libgdbm.so.2.0.0
      -rw-r--r--   /usr/lib/libgdbm.so.2.0.0
    

    you are all set. On some installations the libgdbm.so symbolic link is missing, so you get only:

      lrwxrwxrwx   /usr/lib/libgdbm.so.2 -> libgdbm.so.2.0.0
      -rw-r--r--   /usr/lib/libgdbm.so.2.0.0
    

    To fix this problem add the missing symbolic link:

      % cd /usr/lib
      % ln -s libgdbm.so.2.0.0 libgdbm.so
    

    Now you should be able to build mod_perl without any problems.

    [TOC]


    mod_perl Building (make)

    After configuration completion you build the server, by calling:

      % make
    

    which compiles the source files and creates an httpd binary or/and a separate library for each module, which can be loaded at run time or inserted into the httpd binary sometime later when the make will be called from Apache source directory.

    Note: it's important that you don't put the mod_perl source tree, inside the Apache's sources subdirectory -- since Apache::src seems to not work then!

    [TOC]


    make Troubleshooting

    [TOC]


    undefined reference to 'Perl_newAV'

    This and similar error messages show up during make process. Generally happens when you have a broken Perl installation. Make sure it's not installed from a broken RPM or another binary package, if it is -- build Perl from source or use another properly built binary package. Run perl -V to learn what version of Perl you are using and other important details.

    [TOC]


    unrecognized format specifier for...

      From: Scott Fagg <scott.fagg@arup.com.au>
      
      I'm using apache 1.3.9 , mod_fastcgi 2.2.2 and mod_perl 1.21
      
      Originally my build of these three together worked, however when i
      went to rebuild a few months later i recieved a lot of "unrecognized
      format specifier" errors. A search of the internet showed that i
      wasn't the only one but i couldn't find a solution mentioned.
      
      Puzzled i tried to track down the problem. Using clean source i
      could build apache/mod_perl/mod_fastcgi on my RedHat 5.2 workstation
      but never on my RedHat
      5.2 server.
      
      The only tinkering i'd done with the server was to use SFIO to
      rebuild perl and get mod_fastcgi working the first time i used
      fastcgi.
      
      By removing the SFIO .h files, the apache/mod_perl compile would get
      further and the 'unrecognized format specifier' errors disappeared,
      but naturally other pieces of code refused to compile complaining
      about the missing sfio files.
      
      A quick check of the mod_fast site noted that it no longer needed
      SFIO, so i removed it and replaced my rebuilt sfio-perl binaries
      with clean ones (from a redhat RPM) and was able to rebuild apache
      with mod_perl + mod_fastcgi ( + php) All of my mod_perl stuff works
      and so too does my fastcgi.
      
      Hope that helps some one. I wasn't able to find any answers to the
      problem while searching the net.
    

    [TOC]


    Built Server Testing (make test)

    After building the server, it's a good idea to throughly test it, by calling:

      % make test
    

    Fortunately mod_perl comes with a bunch of tests, which attempt to try to use all the features you asked for at the configuration stage. If any of the test fails, the make test stage would fail.

    Running make test will start a freshly built httpd on port 8529 running under the uid and gid of the perl Makefile.PL process, the httpd will be terminated when the tests are finished.

    Each file in the testing suite generally includes more than one test, but when you do the testing, the program will solely report how many were passed and the total number of tests defined in the test file. However if not all the tests in the file fail you want to know which ones did. To gain this information, you should run the tests in a verbose mode. You can enable this mode by using TEST_VERBOSE parameter:

      % make test TEST_VERBOSE=1
    

    To change the default port the testing happens on (8529 as of this writing), do:

      % perl Makefile.PL PORT=xxxx
    

    To simply start the newly built httpd run:

      % make start_httpd
    

    To shutdown this httpd run:

      % make kill_httpd
    

    NOTE to Ben-SSL users: httpsd does not seem to handle /dev/null as the location of certain files, you'll have to change these by hand. Tests are run with SSLDisable directive.

    [TOC]


    Manual Testing

    Tests are invoked by running the ./TEST script located at ./t directory. Use -v option for verbose tests. You might run an individual test like this:

      % t/TEST -v modules/file.t
    

    or all tests in a test sub-directory:

      % t/TEST modules
    

    TEST script worries to start the server before the test is getting executed. If for some reason it fails, use make start_httpd to start it explicitly.

    [TOC]


    make test Troubleshooting

    [TOC]


    make test fails

    You cannot run make test before you build the httpd, if you told perl Makefile.PL not to build the httpd executable, there is no httpd to make the test against. Go to Apache source tree and run make, then return to mod_perl source tree and continue with server testing.

    [TOC]


    mod_perl.c is incompatible with this version of apache

    You will see this message when you try to run a httpd, if you have had a stale old apache header layout in one of the include paths during the build process. Do run find (or locate) utility in order to locate ap_mmn.h file. In my case I have had a /usr/local/include/ap_mmn.h which was installed by RedHat install process. If this is the case get rid of it, and rebuild it again.

    For all RedHat fans, before you are going to build the apache by yourself, do:

     % rpm -e apache
    

    to remove the pre-installed RPM package first!

    [TOC]


    make test......skipping test on this platform

    While doing make test you would notice that some of the tests are being reported as skipped. The real reason is that you are missing some optional modules for these test to be passed. For a hint you might want to peek at the content of each test (you will find them all in the ./t directory (mnemonic - t, tests). I'll list a few examples, but of course the requirements might be changed in the future.

      modules/cookie......skipping test on this platform
    

    Install libapreq

      modules/psections...skipping test on this platform
    

    Install Devel::Symdump / Data::Dumper

      modules/request.....skipping test on this platform
    

    Install libapreq (Apache::Request)

      modules/sandwich....skipping test on this platform
    

    Install Apache::Sandwich

      modules/stage.......skipping test on this platform
    

    Install Apache::Stage

      modules/symbol......skipping test on this platform
    

    Install Devel::Symdump

    Chances are that all of these are installed if you use CPAN.pm to install Bundle::Apache.

    [TOC]


    Installation (make install)

    After testing the server, the last step left is to install it. First install all the perl side files:

       % make install
    

    The go to the Apache source tree and complete the Apache files installation (config files, httpd and other utilities):

      % cd ../apache_x.x.x
      % make install
    

    Now the installation should be considered completed. You may configure your server now and start using it.

    [TOC]


    Building Apache and mod_perl by Hand

    If you wish to process the httpd build separately from the mod_perl, you should use NO_HTTPD=1 option during the perl Makefile.PL stage, then configure various things by hand and proceed with building process. You shouldn't run perl Makefile before following the steps described in this section.

    These are the configurations you should make before the build stage, if you choose to manually build mod_perl:

    mod_perl's Makefile

    When perl Makefile.PL is executed, $APACHE_SRC/modules/perl/Makefile will be modified to enable various options (e.g. ALL_HOOKS=1). Instead of tweaking the options during the the perl Makefile.PL, you may also edit mod_perl-x.xx/src/modules/perl/Makefile before running perl Makefile.PL.

    This is an optional step.

    Configuration

    Add to apache_x.x.x/src/Configuration :

      AddModule modules/perl/libperl.a
    

    We suggest you add this entry at the end of the Configuration file if you want your callback hooks to have precedence over core handlers.

    Add the following to EXTRA_LIBS:

      EXTRA_LIBS=`perl -MExtUtils::Embed -e ldopts`
    

    Add the following to EXTRA_CFLAGS:

      EXTRA_CFLAGS=`perl -MExtUtils::Embed -e ccopts` 
    

    mod_perl source files

    Return to the mod_perl directory and copy the mod_perl source files into the apache build directory:

      % cp -r src/modules/perl apache_x.x.x/src/modules/
    

    When you have done with the configuration parts, run:

      % perl Makefile.PL NO_HTTPD=1 DYNAMIC=1  EVERYTHING=1\
       APACHE_SRC=../apache_x.x.x/src
    

    DYNAMIC=1 enables a build of a shared mod_perl library. Add other options if required.

      % make install
    

    Now you may proceed with plain Apache build process. Note that in order for your changes to the apache_x.x.x/src/Configuration file to take effect, you must run apache_x.x.x/src/Configure instead of the default apache_x.x.x/configure script:

      % cd ../apache_x.x.x/src
      % ./Configure
      % make
      % make install
    

    [TOC]


    Installation Scenarios for Standalone mod_perl

    There are various ways available to build Apache with the new hybrid build environment:

    [TOC]


    The All-In-One Way

    If your goal is just to build and install Apache with mod_perl out of their source trees and have no special interests in further adjusting or enhancing Apache proceed as before:

      % tar zvxf apache_x.x.x.tar.gz
      % tar zvxf mod_perl-x.xx.tar.gz
      % cd mod_perl-x.xx
      % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
        DO_HTTPD=1 USE_APACI=1 EVERYTHING=1
      % make && make test && make install
      % cd ../apache_x.x.x
      % make install
    

    This builds Apache statically with mod_perl, installs Apache under the default /usr/local/apache tree and mod_perl into the site_perl hierarchy of your existing Perl installation. All in one step.

    [TOC]


    The Flexible Way

    This is the standard situation when you want to be flexible while building: Statically building mod_perl into the httpd binary of Apache but via different steps, so you have a chance for other third-party Apache modules, etc.

    1. Prepare the Apache source tree

      The first step is as before extracts the distributions:

        % tar zvxf apache_x.x.x.tar.gz
        % tar zvxf mod_perl-x.xx.tar.gz
      

    2. Install mod_perl's Perl-side and prepare the Apache-side

      The second step is to install the Perl-side of mod_perl into the Perl hierarchy and prepare the src/modules/perl/ subdirectory inside the Apache source tree:

       $ cd mod_perl-x.xx
       $ perl Makefile.PL \
           APACHE_SRC=../apache_x.x.x/src \
           NO_HTTPD=1 \
           USE_APACI=1 \
           PREP_HTTPD=1 \
           EVERYTHING=1 \
           [...]
       $ make
       $ make test
       $ make install
       $ cd ..
      

      The APACHE_SRC set the path to your Apache source tree, the NO_HTTPD option forces this path and only this path to be used, the USE_APACI option triggers the new hybrid build environment and the PREP_HTTPD forces only a preparation of the APACHE_SRC/modules/perl/ tree but no automatic builds.

      Then the configuration process prepares the Apache-side of mod_perl in the Apache source tree but doesn't touch anything else inside it. It then just builds the Perl-side of mod_perl and installs it into the Perl installation hierarchy.

      Important: If you use PREP_HTTPD as described above, to complete the build you must go into an apache source directory and run make and make install.

    3. Additionally prepare other third-party modules

      Now you still have a chance to prepare more third-party modules. For instance the PHP3 language can be added similarly to the above mod_perl procedure.

    4. Build the Apache package

      Finally it's a time to build the Apache package and thus also the Apache-side of mod_perl and any other prepared third-party modules:

       $ cd apache_x.x.x
       $ ./configure \
           --prefix=/path/to/install/of/apache \
           --activate-module=src/modules/perl/libperl.a \
           [...]
       $ make
       $ make test
       $ make install
      

      The --prefix option is needed if you want to change the default target directory of apache installation and the --activate-module option activates mod_perl for the configuration process and thus also for the build process.

      The last three steps build, test and install the Apache-side of the mod_perl enabled server (probably including other third-party components, otherwise you wouldn't choose this scenario)

    The scenario we just saw enables you to insert mod_perl into Apache without having to mangle the Apache source tree for mod_perl plus the freedom of being able to adding more third-party modules.

    [TOC]


    Build mod_perl as DSO inside Apache source tree via APACI

    Warning: With Apache 1.3 there is support for building modules as Dynamic Shared Objects (DSO). So there is support for DSO in mod_perl now, too. BUT THIS IS STILL EXPERIMENTAL, SO BE WARNED!

    We already said that the new mod_perl build environment is a hybrid one. What does it mean? It means for instance that the same src/modules/perl/ stuff can be used to build mod_perl as a DSO, too. And again without having to edit anything specially for this. When you want to build libperl.so (sorry for the name, libmodperl.so would be more correct, but because of historic Apache issues the name has to be libperl.so. Don't confuse this with the real libperl.a or even libperl.so from the Perl installation) all you have to do is to add one single option to the above steps.

    You have two options here, depending on which way you have chosen above: If you choose the All-In-One way from above then add:

      USE_DSO=1
    

    to the perl Makefile.PL options. If you choose the Flexible way then add:

      --enable-shared=perl
    

    to the Apache's ./configure options.

    As you can see only an additional USE_DSO=1 or --enable-shared=perl option is needed. Anything else is done automatically: mod_so is automatically enabled, the Makefiles are adjusted automatically and even the install target from APACI now additionally installs the libperl.so into the Apache installation tree. And even more: The LoadModule and AddModule directives (which dynamically load and inserts the mod_perl into a httpd) are automatically added to the httpd.conf file.

    [TOC]


    Build mod_perl as DSO outside Apache source tree via APXS

    Above we've seen how to build mod_perl as DSO inside the Apache source tree. But there is a nifty alternative: Building mod_perl as DSO outside the Apache source tree via the new Apache 1.3 support tool apxs (APache eXtension). The advantage is obvious: You can extend an already installed Apache with mod_perl even if you don't have the sources (for instance you installed an Apache binary package from your vendor).

    Here are the building steps:

      % tar zvxf mod_perl-x.xx.tar.gz
      % cd mod_perl-x.xx
      % perl Makefile.PL \
        USE_APXS=1 \
        WITH_APXS=/path/to/bin/apxs \
        EVERYTHING=1 \
         [...]
      % make && make test && make install
    

    This will build the DSO libperl.so outside the Apache source tree with the new Apache 1.3 support tool apxs and install it into the existing Apache hierarchy.

    [TOC]


    Installation Scenarios for mod_perl and Other Components

    (META: please send more scenarios of mod_perl + other components installation guidelines. Thanks!)

    You have seen a very detailed installation scenarios, since mod_perl is used with many other components that plug into Apache, you definitely want to know how to build them together with mod_perl. Since all the steps are simple assuming that you understood how the build process works, I'll show only the commands to be executed, with no comments unless there is something we haven't discussed before.

    Generally each and every scenario that I'm going to show consist of downloading source distributions of components to be used, un-packing them, configuring them and proceeding with Apache build process using the appropriate to each component parameters, followed by make test and make install.

    All these scenarios were tested on Linux platform, you might need to refer to the specific component's documentation if something doesn't work for you as described below. The intention of this section is not to show how to install other non-mod_perl components alone, but how to do this in bundle with mod_perl.

    Also, notice that the links I've used below are likely to change when you read this document. That's why I have used the x.x.x convention, instead of using hardcoded version numbers. Remember to replace x.xx place-holders with version numbers of the distributions you are about to use. To find out the latest stable version number, visit the components sites. So if I say http://perl.apache.org/dist/mod_perl-x.xx.tar.gz , go to http://perl.apache.org/dist/ in order to learn the version number.

    Unless told different, all the components install themselves into a default location. When you run make install the installation program tells you where it's going to install the files.

    [TOC]


    mod_perl and mod_ssl (+openssl)

    mod_ssl provides strong cryptography for the Apache 1.3 webserver via the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) protocols by the help of the Open Source SSL/TLS toolkit OpenSSL, which is based on SSLeay from Eric A. Young and Tim J. Hudson.

    Download the sources:

      % lwp-download http://www.apache.org/dist/apache_x.xx.tar.gz
      % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
      % lwp-download http://www.modssl.org/source/mod_ssl-x.x.x-x.x.x.tar.gz
      % lwp-download http://www.openssl.org/source/openssl-x.x.x.tar.gz
    

    Un-pack:

      % tar zvxf mod_perl-x.xx
      % tar zvxf apache_x.x.x.tar.gz
      % tar zvxf mod_ssl-x.x.x-x.x.x.tar.gz
      % tar zvxf openssl-x.x.x.tar.gz
    

    Configure, build and install openssl:

      % cd openssl-x.x.x
      % ./config
      % make && make test && make install
    

    Configure:

      % cd mod_ssl-x.x.x-x.x.x
      % ./configure --with-apache=../apache_x.x.x
      % cd ../mod_perl-x.xx
      % perl Makefile.PL USE_APACI=1 EVERYTHING=1 \
            DO_HTTPD=1 SSL_BASE=/usr/local/ssl \
            APACHE_PREFIX=/usr/local/apachessl \
            APACHE_SRC=../apache_x.x.x/src \
            APACI_ARGS=--enable-module=ssl,--enable-module=rewrite
    

    Build, test and install:

      % make && make test && make install
      % cd ../apache_x.x.x
      % make certificate
      % make install
    

    Now proceed with mod_ssl and mod_perl parts of the server configurations, before starting the server.

    When the server started you should see the following or alike in the error_log file:

      [Fri Nov 12 16:14:11 1999] [notice] Apache/1.3.9 (Unix)
      mod_perl/1.21_01-dev mod_ssl/2.4.8 OpenSSL/0.9.4 configured
      -- resuming normal operations
    

    [TOC]


    mod_perl and mod_ssl Rolled from RPMs

    Just as in previous section this one shows an installation of mod_perl and mod_sll, but this time using a with all the sources/binaries coming prepackaged in RPMs.

    (As always replace xxx with proper version numbers. And i386 with your platform if different from x86.)

    1.   % get apache-mod_ssl-x.x.x.x-x.x.x.src.rpm
      

      Source: http://www.modssl.org

    2.   % get openssl-x.x.x.i386.rpm
      

      Source: http://www.openssl.org/

    3.   % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
      

      Source: http://perl.apache.org/dist

    4.   % lwp-download http://www.engelschall.com/sw/mm/mm-x.x.xx.tar.gz
      

      Source: http://www.engelschall.com/sw/mm/

    5.   % rpm -ivh openssl-x.x.x.i386.rpm
      

    6.   % rpm -ivh apache-mod_ssl-x.x.x.x-x.x.x.src.rpm
      

    7.   % cd /usr/src/redhat/SPECS
      

    8.   % rpm -bp apache-mod_ssl.spec
      

    9.   % cd /usr/src/redhat/BUILD/apache-mod_ssl-x.x.x.x-x.x.x
      

    10.   % tar xvzf mod_perl-x.xx.tar.gz
      

    11.   % cd mod_perl-x.xx
      

    12.   % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
           DO_HTTPD=1 \
           USE_APACI=1 \
           PREP_HTTPD=1 \
           EVERYTHING=1 \
      

      Add or remove parameters if appropriate.

    13.   % make
      

    14.   % make install
      

    15.   % cd ../mm-x.x.xx/
      

    16.   % ./configure --disable-shared
      

    17.   % make
      

    18.   % cd ../mod_ssl-x.x.x-x.x.x
      

    19.   % ./configure \
              --with-perl=/usr/bin/perl \
              --with-apache=../apache_x.x.x\
              --with-ssl=SYSTEM \
              --with-mm=../mm-x.x.x \
              --with-layout=RedHat \
              --disable-rule=WANTHSREGEX \
              --enable-module=all \
              --enable-module=define \
              --activate-module=src/modules/perl/libperl.a \
              --enable-shared=max \
              --disable-shared=perl \
              --enable-suexec --suexec-caller=nobody \
              --suexec-uidmin=500 --suexec-gidmin=500
      

    20.   % make
      

    21.   % make certificate 
      

      with whatever option is suitable to your config.

    22.   % make install
      

    You should be all set.

    Note: If you use the standard config for mod_ssl don't forget to run apache as : ``httpd -DSSL''

    [TOC]


    mod_perl and apache-ssl (+openssl)

    Apache-SSL is a secure Webserver, based on Apache and SSLeay/OpenSSL. It is licensed under a BSD-style license, which means, in short, that you are free to use it for commercial or non-commercial purposes, so long as you retain the copyright notices.

    Download the sources:

      % lwp-download http://www.apache.org/dist/apache_x.xx.tar.gz
      % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
      % lwp-download http://www.apache-ssl.org/.../apache_x.x.x+ssl_x.xx.tar.gz
      % lwp-download http://www.openssl.org/source/openssl-x.x.x.tar.gz
    

    Un-pack:

      % tar zvxf mod_perl-x.xx
      % tar zvxf apache_x.x.x.tar.gz
      % tar zvxf openssl-x.x.x.tar.gz
    

    Configure and install openssl:

      % cd openssl-x.x.x
      % ./config
      % make && make test && make install
    

    Patch Apache with SSLeay paths

      % cd apache_x.xx
      % tar -zxf ../apache_x.x.x+ssl_x.xx.tar.gz
      % FixPatch
      Do you want me to apply the fixed-up Apache-SSL patch for you? [n] y
    

      % <edit the src/Configuration file if needed>
      % cd -
    

    Configure:

      % cd ../mod_perl-x.xx
      % perl Makefile.PL USE_APACI=1 EVERYTHING=1 \
            DO_HTTPD=1 SSL_BASE=/usr/local/ssl \
            APACHE_SRC=../apache_x.x.x/src \
    

    Build, test and install:

      % make && make test && make install
      % cd ../apache_x.x.x
      % make certificate
      % make install
    

    Note, that you might need to modify the 'make test' stage, as it takes much longer for this server to get started and make test waits only a few seconds before it timeouts.

    Now proceed with apache_ssl and mod_perl parts of the server configuration files, before starting the server.

    [TOC]


    mod_perl and Stronghold

    Stronghold is a secure SSL Web server for Unix which allows you to give your web site full-strength, 128-bit encryption.

    You must first build and install Stronghold without mod_perl, following Stronghold's install procedure. For more information visit: http://www.c2.net/products/sh2/ .

    Download the sources:

      % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
    

    Un-pack:

      % tar zvxf mod_perl-x.xx.tar.gz
    

    Configure (assuming that you have the Stronghold sources extracted at /usr/local/stronghold:

      % cd mod_perl-x.xx
      % perl Makefile.PL APACHE_SRC=/usr/local/stronghold/src \
        DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1
    

    Build:

      % make 
    

    Before running make test, you must add your StrongholdKey to t/conf/httpd.conf. If you are configuring by hand, be sure to edit src/modules/perl/Makefile and uncomment #APACHE_SSL directive.

    Test and Install:

      % make test && make install
      % cd /usr/local/stronghold
      % make install
    

    [TOC]


    Note For Solaris 2.5 users

    There has been a report related to the REGEX that comes with Stronghold, after building Apache with mod_perl would produce core dumps. To get around this:

    In $STRONGHOLD/src/Configuration, Change:

      Rule WANTHSREGEX=default
    

    To:

      Rule WANTHSREGEX=no
    

    [TOC]


    mod_perl Installation with CPAN.pm's Interactive Shell

    To install mod_perl and all the required packages is much easier with help of CPAN.pm module, which provides you among other features a shell interface to a CPAN repository (CPAN = Comprehensive Perl Archive Network, which is a repository of thousands Perl modules, scripts and documentation. See http://cpan.org for more info)

    First thing first is to download an Apache source code, unpack it into a directory the name of which you will need very soon.

    Now execute:

      % perl -MCPAN -eshell
    

    If it's a first time that you use it, it will ask you about 10 questions to configure the module. It's quite easy to accomplish this task, when following the very helpful hints coming along with the questions. When you done, you will see a cpan prompt:

      cpan> 
    

    CPAN will download mod_perl for you, unpack it, will check prerequisites, detect the missing third party modules if any, download and install them. All you need to install mod_perl is to type at the prompt:

      cpan> install mod_perl
    

    You will see (I'll use x.xx instead of real version numbers, since these change very frequently):

      Running make for DOUGM/mod_perl-x.xx.tar.gz
      Fetching with LWP:
      http://www.perl.com/CPAN-local/authors/id/DOUGM/mod_perl-x.xx.tar.gz
      
      CPAN.pm: Going to build DOUGM/mod_perl-x.xx.tar.gz
      
      Enter `q' to stop search
      Please tell me where I can find your apache src
      [../apache-x.x.x/src]
    

    It will search for a latest apache sources and suggest a directory. Here you need to type in the directory you have unpacked the apache in unless it CPAN detected and suggested the right directory... The next question is about the src directory which resides at the root level of the unpacked Apache distribution. In most cases CPAN would ``guess'' the correct directory.

      Please tell me where I can find your apache src
      [../apache-x.x.x/src] 
    

    Answer yes to all the following questions, unless you have a reason not to do that.

      Configure mod_perl with /usr/src/apache_x.x.x/src ? [y] 
      Shall I build httpd in /usr/src/apache_x.x.x/src for you? [y] 
    

    Now it will build the apache with enabled mod_perl. The only thing left to do is to go to apache sources root directory (when you quit CPAN shell or use using another terminal) and run:

      % make install
    

    which will complete the installation by installing Apache headers and the binary at the appropriate directories.

    The only caveat of described process is that you don't have a control over a configuration process. Actually, it's an easy to solve problem -- you can tell <CPAN.pm> to pass whatever parameters you want to perl Makefile.PL. You do this with o conf makepl_arg command:

      cpan> o conf makepl_arg 'DO_HTTPD=1 USE_APACI=1 EVERYTHING=1'
    

    You just enlist all the parameters like you were to pass to a familiar perl Makefile.PL. If you add APACHE_SRC=/usr/src/apache_x.x.x/src and DO_HTTPD=1 parameters, you will be not asked a single question. Of course use a correct path to the apache source distribution.

    Now proceed with install mod_perl, like before. When the installation is completed, remember to unset the makepl_arg variable, by executing:

      cpan> o conf makepl_arg ''
    

    In case you have the makepl_arg previously (before you altered it for a mod_perl installation) set to some value, you will probably want to save it somewhere, and restore when you done with mod_perl installation. To read the original value, use:

      cpan> o conf makepl_arg
    

    You can install all the modules you might want to use with mod_perl. You install them all by typing a singe command:

      cpan> install Bundle::Apache
    

    It'll install mod_perl if isn't yet installed and many other packages like: ExtUtils::Embed, MIME::Base64, URI::URL, Digest::MD5, Net::FTP, LWP, HTML::TreeBuilder, CGI, Devel::Symdump, Apache::DB, Tie::IxHash, Data::Dumper and etc.

    A helpful hint: If you have a system with all the perl modules you use and you want to replicate them all at some other place, and if you cannot just copy the whole /usr/lib/perl5 directory because of a possible binary incompatibility of the other system, making your own bundle comes as a handy solution. To accomplish that the command autobundle can be used on the CPAN shell command line. This command writes a bundle definition file for all modules that are installed for the currently running perl interpreter.

    With a clever bundle file you can then simply say

      cpan> install Bundle::my_bundle
    

    then answer a few questions and then go out for a coffee.

    [TOC]


    Installing on multiple machines

    You may wish to build httpd once, then copy it to other machines. The Perl side of mod_perl needs the apache headers files to compile, to avoid dragging and build apache on all your other machines, there are a few Makefile targets to help you out:

      % make tar_Apache
    

    This will tar all files mod_perl installs in your Perl's site_perl directory, into a file called Apache.tar. You can then unpack this under site_perl directory on another machine.

      % make offsite-tar
    

    This will copy all header files from the apache source directory you configured mod_perl against, then it will make dist which creates a mod_perl-x.xx.tar.gz, ready to unpack on another machine to compile and install the Perl side of mod_perl.

    If you really want to make your life easy you should use one of the more advanced packaging systems. For example, almost all Linux OS distributions use packaging tools on top of plain tar.gz, allowing to track prerequisites for each package, easy installation, upgrade and cleanup. Once of the most used packagers is RPM (Red Hat Package Manager) See http://www.rpm.org for more information.

    So what you have to do is to prepare a SRPM (source distribution package), then build a binary release, which then can be installed on any number of machines in a matter of seconds.

    It would work even on live machines! If you have two identical machines (both software and hardware, hardware is less critical, depending on your setup). One is a live server and the other -- development, if you build an RPM with mod_perl binary distribution, install it on the development machine and find it working and stable. You can install the RPM package on the live server without any fear. Just make sure that the httpd.conf is correct, since it generally includes parameters unique to the live machine, like hostname.

    When you installed the package just restart the server. It can be a good idea to keep the previous package of the system, so in case something is going wrong you can remove the installed package and put the old once back.

    (META: Do you care to share a step by step scenario of preparation of SRPMs for mod_perl? Thanks!!!)

    [TOC]


    using RPM, DEB and other packages to install mod_perl

    META: meanwhile only RPM. please submit the info about DEB and other available packages.

    [TOC]


    Static debian package

    David Huggins-Daines has built a static apache/mod_perl 1.3.9/1.21 debian package.

    David has hacked the Debian package to build with static mod_perl in the Apache binary, source and binary packages are at:

    http://elgin.plcom.on.ca/debian/dists/unstable/main/

    Or put this in your /etc/apt/sources.list and ``apt-get install apache-perl'':

      % deb http://elgin.plcom.on.ca/debian unstable main
    

    (note: this server may be up and down for a bit, it's also the development machine for David's project at work that uses mod_perl...)

    These aren't official packages, of course. Hopefully the memory leakage on DSO problem can be resolved before we release potato, it is a rather severe bug IMHO.

    [TOC]


    A word on mod_perl RPM packages

    The virtues of RPM packages is the subject of much debate among mod_perl users. While RPMs do take the pain away from package installation and maintenance for most applications, the nuances of mod_perl make RPMs somewhat less than ideal for those just getting started. The following help and advice is for those new to mod_perl, Apache, Linux, and RPMs. If you know what you are doing, this is probably old hat - contributing your past experiences is, as always, welcome by the community.

    [TOC]


    Getting Started

    If you are new to mod_perl and are using this Guide and the Eagle book to help you on your way, it is probably better to grab the latest Apache and mod_perl sources and compile the sources yourself. Not only will you find that this is less daunting than you suspect, but it will probably save you a few headaches down the line for several reasons.

    First, given the pace at which the open source community produces software, RPMs, especially those found on distribution CDs, are often several versions out of date. The most recent version will not only be more stable, but will likely incorporate some new functionality that you will eventually want to play with.

    It is also unlikely that the file system's layout of an RPM package will match what you see in either the Eagle book or this Guide. If you are new to mod_perl, Apache, or both, you will probably want to get familiar with file system used by the examples given here before trying something less standard.

    Finally, the RPMs found on a typical distribution CDs use mod_perl build with Apache's Dynamic Shared Objects (DSO) support. While mod_perl can be successfully used as a DSO module, it adds a layer of complexity that you may want to live without for now.

    All that being said, should you still feel that rolling your own mod_perl enabled Apache server is not likely, here are a few helpful hints...

    [TOC]


    Compiling RPM source files

    It is possible to compile the source files provided my RPM packages, but if you are using RPMs to ease mod_perl installation, that is not the way to do it. Both Apache and mod_perl RPMs are designed to be install-and-go. If you really want to compile mod_perl to your own specific needs, your best bet is to get the most recent sources from CPAN.

    [TOC]


    Mix and Match RPM and source

    It is probably not the best idea to use a self-compiled Apache with a mod_perl RPM (or vice versa). Sticking with one format or the other at first will result in fewer headaches and more hair.

    [TOC]


    Installing a single apache+mod_perl RPM

    If you use an apache+mod_perl RPM, chances are rpm -i or glint (GUI for RPM) will have you up and running immediately, no compilation necessary. If you encounter problems, try downloading from another mirror site or searching http://rpmfind.net/ for a different package - there are plenty out there to choose from.

    David Harris has started the efforts to build a better RPM/SRPM mod_perl packages. You will find them at: http://www.davideous.com/modperlrpm/distrib/

    Features of this RPM:

    • Installs mod_perl as an ``add in'' to the RedHat Apache package, but does not install mod_perl as a DSO and all the problems that brings.

    • Includes the four header files required for building libapreq (Apache::Request)

    • Distributes plain text forms of the pod documentation files that come with mod_perl.

    • Checks the module magic number on the existing apache package to see if things are compatible

    Notes on this un-conventional RPM packaging of mod_perl

    by David Harris <dharris@drh.net> on Oct 13, 1999

    This package will install the mod_perl library files on your machine along with the following two Apache files:

      /usr/lib/apache/mod_include_modperl.so
      /usr/sbin/httpd_modperl
    

    This package does not install a complete apache subtree built with mod_perl, but rather just the two above files that are different for mod_perl. This conceptually thinks of mod_perl as a kind of an ``add on'' that we would like to add to the regular apache tree. However, we are prevented from distributing mod_perl as an actual DSO, because it is not recommended by the mod_perl developers and various features must be turned off. So, instead, we distribute a httpd binary with mod_perl statically linked (httpd_modperl) and the special modified mod_include.so required for this binary (mod_include_modperl.so). You can use the exact same configuration files and other DSO modules, but you just ``enable'' the mod_perl ``add on'' by following the below directions.

    To enable mod_perl, do the following:

      (1) Configure /etc/rc.d/init.d/httpd to run httpd_modperl instead of
          httpd by changing the "daemon" command line.
      (2) Replace mod_include.so with mod_include_modperl.so in the
          module loading section of /etc/httpd/conf/httpd.conf
      (3) Uncomment the "AddModule mod_perl.c" line in /etc/httpd/conf/httpd.conf
    

    Or run the following command: (and the other version to disable mod_perl)

      /usr/sbin/modperl-enable on
      /usr/sbin/modperl-enable off
    

    [TOC]


    Compiling libapreq (Apache::Request) with the RH 6.0 mod_perl RPM

    There have been many reports of libapreq - which provides the Apache::Request module - not working properly with various RPM packages. However, it is possible to integrate libapreq with mod_perl RPMs, it just requires a few additional steps.

    1. Make certain you have the apache-devel-x.x.x-x.i386.rpm package installed. Also, download the latest version of libapreq from CPAN.

    2. Install the source RPM for your mod_perl RPM and then do a build prep, which unpacks the sources. From there, copy four header files (mod_perl.h, mod_perl_version.h, mod_perl_xs.h, and mod_PL.h) to /usr/include/apache.

      • 2.1 Get the SRPM from somemirror.../redhat-6.0/SRPMS/mod_perl-1.19-2.src.rpm.

      • 2.2 Install the SRPM. (This creates files in /usr/src/redhat/SPECS and /usr/src/redhat/SOURCES). Run:

         % rpm -ih mod_perl-1.19-2.src.rpm
        

      • 2.3 Do a "prep" build of the package, which just unpackages the sources and applies any patches.

          % rpm -bp /usr/src/redhat/SPECS/mod_perl.spec
          Executing: %prep
          + umask 022
          + cd /usr/src/redhat/BUILD
          + cd /usr/src/redhat/BUILD
          + rm -rf mod_perl-1.19
          + /bin/gzip -dc /usr/src/redhat/SOURCES/mod_perl-1.19.tar.gz
          + tar -xf -
          + STATUS=0
          + [ 0 -ne 0 ]
          + cd mod_perl-1.19
          ++ /usr/bin/id -u
          + [ 0 = 0 ]
          + /bin/chown -Rf root .
          ++ /usr/bin/id -u
          + [ 0 = 0 ]
          + /bin/chgrp -Rf root .
          + /bin/chmod -Rf a+rX,g-w,o-w .
          + echo Patch #0:
          Patch #0:
          + patch -p1 -b --suffix .rh -s
          + exit 0
        

        NOTE: What you have just done in steps 2.1 through 2.3 was just a fancy un-packing of the source tree that builds the RPM into /usr/src/redhat/BUILD/mod_perl-1.19. You could un-pack the mod_perl-x.xx.tar.gz file somewhere and then do the following steps on that source tree. But this is more ``pure'' because I'm grabbing the header files from the same tree that built the RPM. But this does not matter because RedHat is not patching that file. So, it might be better if you just tell the person to grab the mod_perl source and unpack it to get these files.. less fuss and mess.

      • 2.4 Look at the files you will copy: (this is not really a step, but useful to show)

          % find /usr/src/redhat/BUILD/mod_perl-1.19 -name '*.h'
          /usr/src/redhat/BUILD/mod_perl-1.19/src/modules/perl/mod_perl.h
          /usr/src/redhat/BUILD/mod_perl-1.19/src/modules/perl/mod_perl_xs.h
          /usr/src/redhat/BUILD/mod_perl-1.19/src/modules/perl/mod_perl_version.h
          /usr/src/redhat/BUILD/mod_perl-1.19/src/modules/perl/perl_PL.h
        

      • 2.5 Copy the files into /usr/include/apache.

          % find /usr/src/redhat/BUILD/mod_perl-1.19 -name '*.h' \
            -exec cp {} /usr/include/apache \;
        

        NOTE: You should not have to do a:

          % mkdir /usr/include/apache
        

        because that directory should be created by apache-devel.

    3. Apply this patch to libapreq: http://www.davideous.com/modperlrpm/distrib/libapreq-0.31_include.patch

    4. Follow the libapreq directions as usual:

        % perl Makefile.PL
        % make && make test && make install
      

    [TOC]


    Installing separate Apache and mod_perl RPMs

    If you are trying to install separate Apache and mod_perl RPMs, like those provided by RedHat distributions, you may be in for a bit of a surprise. Installing the Apache RPM will go just fine, and http://localhost will bring up some type of web page for your viewing pleasure. However, installation of the mod_perl RPM, followed by the How can I tell whether mod_perl is running tests, will show that Apache is not mod_perl enabled. This is because mod_perl needs to be added as a separate module using Apache's Dynamic Shared Objects.

    To use mod_perl as a DSO, make the following modifications to your Apache configuration files:

      httpd.conf:
      ----------
      LoadModule perl_module modules/libperl.so
      AddModule mod_perl.c
    

      srm.conf (or httpd.conf in later versions of Apache):
      ----------
      PerlModule Apache::Registry 
      Alias /perl/ /home/httpd/perl/ 
      <Location /perl 
        SetHandler perl-script 
        PerlHandler Apache::Registry 
        PerlSendHeader On 
        Options +ExecCGI
      </Location
    

    After a complete shutdown and startup of the server, mod_perl should be up and running.

    [TOC]


    Testing the mod_perl API

    Some people have reported that even when the server responds positively to the How can I tell whether mod_perl is running tests, the mod_perl API will not function properly. You may want to run the below script to verify the availability of the mod_perl API.

            use strict;
            my $r = shift;
            $r->send_http_header('text/html');
            $r->print("It worked!!!\n");
    

    [TOC]


    Installation Without Superuser Privileges

    As you have already learned, mod_perl enabled Apache consists of two main components: perl modules and Apache itself. Let's tackle each task at a time.

    I'll show a complete installation example using a stas as a username, and assume that /home/stas is a home directory of that user.

    [TOC]


    Installing Perl Modules into a Directory of Choice

    Since without a superuser permissions you aren't allowed to install modules into a system directories like /usr/lib/perl5, you need to find out how to install the modules under your home directory. The task is a very one.

    First you have to decide where the modules to be installed. The simplest approach is to simulate a relevant to perl portion of the / file system, under your home directory. Actually we need only two directories:

      /home/stas/bin
      /home/stas/lib
    

    But we don't have to create them, since it'll be done automatically when the first module will be installed. 99% of the files will go into the lib directory, occasionally when some module comes with perl scripts, these will go into a bin directory, and the directory itself will be created if it wasn't there before.

    Let's install a CGI.pm package, which among CGI.pm includes a few other CGI::* modules. As usually, download the package from CPAN repository, unpack it and chdir to the created directory.

    Now we do a standard perl Makefile.PL to prepare a Makefile, but this time we tell the MakeMaker to use non-default perl installation directories.

      % perl Makefile.PL PREFIX=/home/stas
    

    PREFIX=/home/stas is the only different part of the standard perl modules installation process. Note that if you don't like how MakeMaker choose to select the rest of the directories or if you are using an older version of it, which requires an explicit declaration of all target directories you should do:

      % perl Makefile.PL PREFIX=/home/stas \
        INSTALLPRIVLIB=/home/stas/lib/perl5 \
        INSTALLSCRIPT=/home/stas/bin \
        INSTALLSITELIB=/home/stas/lib/perl5/site_perl \
        INSTALLBIN=/home/stas/bin \
        INSTALLMAN1DIR=/home/stas/lib/perl5/man  \
        INSTALLMAN3DIR=/home/stas/lib/perl5/man3
    

    The rest is as usual:

      % make
      % make test
      % make install
    

    We see that make install installs all the files in my private repository. Note that all the missing directories are created automatically, so there is no need to create them in first place. Here is what it does (this is a slightly truncated output):

      Installing /home/stas/lib/perl5/CGI/Cookie.pm
      Installing /home/stas/lib/perl5/CGI.pm
      Installing /home/stas/lib/perl5/man3/CGI.3
      Installing /home/stas/lib/perl5/man3/CGI::Cookie.3
      Writing /home/stas/lib/perl5/auto/CGI/.packlist
      Appending installation info to /home/stas/lib/perl5/perllocal.pod
    

    If you have to use the explicit target parameters, instead of a single PREFIX parameter, you will find it useful to create a file called for example ~/.perl_dirs (where ~ is /home/stas in our example) and to populate it with:

        PREFIX=/home/stas \
        INSTALLPRIVLIB=/home/stas/lib/perl5 \
        INSTALLSCRIPT=/home/stas/bin \
        INSTALLSITELIB=/home/stas/lib/perl5/site_perl \
        INSTALLBIN=/home/stas/bin \
        INSTALLMAN1DIR=/home/stas/lib/perl5/man  \
        INSTALLMAN3DIR=/home/stas/lib/perl5/man3
    

    From now on any time you want to install perl modules locally you simply execute:

      % perl Makefile.PL `cat ~/.perl_dirs`
      % make
      % make test
      % make install
    

    Using the last tip, you can easily maintain several Perl module repositories, for example one for production perl and another for development. When the only difference is either you call:

      % perl Makefile.PL `cat ~/.perl_dirs.production`
    

    or

      % perl Makefile.PL `cat ~/.perl_dirs.develop`
    

    [TOC]


    Making Your Scripts Find the Locally Installed Modules

    Perl modules are generally being dispatched into a five main directories. You find out these directories, execute:

      % perl -V
    

    and in the generated output, among other important information about your perl installation you will see at the end:

      Characteristics of this binary (from libperl):
      Built under linux
      Compiled at Apr  6 1999 23:34:07
      @INC:
        /usr/lib/perl5/5.00503/i386-linux
        /usr/lib/perl5/5.00503
        /usr/lib/perl5/site_perl/5.005/i386-linux
        /usr/lib/perl5/site_perl/5.005
        .
    

    It shows us the content of the @INC perl special variable, which is being used by perl to look for its modules, as an equivalent to a PATH environment variable in Unix shells which is being used to find the binaries to be executed.

    Of course this is the information of the 5.00503 version of perl installed on my x86 architecture PC running Linux. That's why you see i386-linux and 5.00503. If your system runs a different operating system, processor or chipset architecture and version of perl, directories would have a different names.

    I also have a perl-5.00561 installed under /usr/local/lib/ so when I do:

      % /usr/local/bin/perl5.00561 -V
    

    I see:

      @INC:
        /usr/local/lib/perl5/5.00561/i586-linux
        /usr/local/lib/perl5/5.00561
        /usr/local/lib/site_perl/5.00561/i586-linux
        /usr/local/lib/site_perl
    

    Notice, that it's still linux but a newer perl version uses a version of my Pentium processor (thus the i586 and not i386 as it was before), which makes a use of compiler optimization for a Pentium processors, when the binary perl extensions are being created.

    i386-linux like directories are the ones, where all the platform specific files are supposed to go, such as compiled C files glued to Perl with XS or SWIG.

    The above discussion is important to us, because since we have installed the perl modules into a non-standard directories, somehow we have to make Perl know where to look for the four directories. There are two ways to accomplish this task. You should either set the PERL5LIB environment variable or modify the @INC variable in yours scripts.

    Assuming that we use perl-5.00503, in our example the directories are:

        /home/sbekman/lib/perl5/5.00503/i386-linux
        /home/sbekman/lib/perl5/5.00503
        /home/sbekman/lib/perl5/site_perl/5.005/i386-linux
        /home/sbekman/lib/perl5/site_perl/5.005
    

    As I've mentioned it before, you find out the exact directories by executing perl -V and replacing the global's perl installation's base directory with your home directory.

    Modifying @INC is quite easy. The best approach is to use lib module, by adding the following snippet at the top of all your scripts that require the locally installed modules.

      use lib qw(/home/stas/lib/perl5/5.00503/
                 /home/stas/lib/perl5/site_perl/5.005);
    

    Another way is to explicitly write the code to alter @INC:

      BEGIN {
        unshift @INC,
          qw(/home/stas/lib/perl5/5.00503
             /home/stas/lib/perl5/5.00503/i386-linux
             /home/stas/lib/perl5/site_perl/5.005
             /home/stas/lib/perl5/site_perl/5.005/i386-linux);
            }
    

    Notice, that with lib module, we don't have to enlist the corresponding architecture specific directories, since it adds them automatically if they are exist (well, to be exact, when $dir/$archname/auto directory exists).

    Also, notice that both approaches prepend the directories to be searched to @INC, which allows you to install a more recent module into your local repository and perl will use it instead of the older one installed in the main system repository.

    Both approaches, modify the value of @INC at the compilation time, lib module uses the BEGIN block as well, but internally.

    Now, let's assume the following scenario. I have installed LWP package in my local repository. Now I want to install another module (e.g. mod_perl) and it has LWP listed in its prerequisites list. I know that I've LWP installed, but when I run perl Makefile.PL for the module I'm about to install, I'm being told that I don't have LWP installed.

    If we think for a moment, there is no way for Perl to know that we have some locally installed modules. All it does, is searching the directories listed in @INC and since the latter contains only the default five directories, no wander it cannot find locally installed LWP package. There is no script we could add the @INC modification code, but there is a PERL5LIB variable that I've mentioned before, that solves this problem. If you are using a t?csh for interactive work, do:

      setenv PERL5LIB /home/stas/lib/perl5/5.00503:
      /home/stas/lib/perl5/site_perl/5.005
    

    It should be a single line with directories separated by colons (:) and no spaces. If you are a bash user, do:

      export PERL5LIB=/home/stas/lib/perl5/5.00503:
      /home/stas/lib/perl5/site_perl/5.005
    

    Again make it a single line. Actually bash allows to have a multi-line settings with help of backslash (\). So you can set it this way:

      export PERL5LIB=/home/stas/lib/perl5/5.00503:\
      /home/stas/lib/perl5/site_perl/5.005
    

    As with use lib, perl automatically prepends the architecture specific directories to @INC if those exist.

    When you have done with this setting, verify the value of the newly configured @INC, by executing perl -V as before. Now you should see the modified value of @INC:

      % perl -V
      
      Characteristics of this binary (from libperl): 
      Built under linux
      Compiled at Apr  6 1999 23:34:07
      %ENV:
        PERL5LIB="/home/stas/lib/perl5/5.00503:/home/stas/lib/perl5/site_perl/5.005"
      @INC:
        /home/stas/lib/perl5/5.00503/i386-linux
        /home/stas/lib/perl5/5.00503
        /home/stas/lib/perl5/site_perl/5.005/i386-linux
        /home/stas/lib/perl5/site_perl/5.005
        /usr/lib/perl5/5.00503/i386-linux
        /usr/lib/perl5/5.00503
        /usr/lib/perl5/site_perl/5.005/i386-linux
        /usr/lib/perl5/site_perl/5.005
        .
    

    The moment everything works as you want it to, add this setting into a .tcshrc or .bashrc file, according to the interactive shell you use, so the next time you open a new shell, this setting would be already in place.

    Note that if you have a PERL5LIB setting, you don't need to alter the @INC value in your scripts, only if you are executing them from the interactive shell or in any other way that sets the PERL5LIB variable. For example, if someone else tries to execute your scripts but doesn't have this setting in the shell she attempts to execute the script from, Perl will fail to find your locally installed modules.

    So the best approach is to have both: the PERL5LIB environment variable and the explicit @INC extension code at the beginning of the scripts as described before.

    [TOC]


    CPAN.pm Shell and Locally Installed Modules

    As we saw in the section describing the usage of CPAN.pm shell to install mod_perl, it saves us a great deal of time, by doing all the job for us, even detecting the missing modules listed in prerequisites, bringing and installing them. So you might wander whether you can use CPAN.pm to maintain your local repository as well. The answer is yes and I'm going to show how can you make installing modules locally a bliss.

    First thing is to configure the CPAN to use our local settings, when you start the CPAN interactive shell, it searches for configuration files first for ~/.cpan/CPAN/MyConfig.pm file and then for the the one installed system wide. The two file on my setup are (when I'm logged as user stas):

        /home/stas/.cpan/CPAN/MyConfig.pm
        /usr/lib/perl5/5.00503/CPAN/Config.pm
    

    If there is no CPAN configured on your system, when you start its shell for the first time, it will ask you a dozen configuration questions and create this file for you.

    In case you've got it already configured, you should have a /usr/lib/perl5/5.00503/CPAN/Config.pm. If you have a different perl version alter the path to use your perl's version number. Create the directory where the local configuration file will go:

      % mkdir -p /home/stas/.cpan/CPAN
    

    mkdir -p creates the whole pass at once. Now copy the system wide configuration file to your local one.

      % cp /usr/lib/perl5/5.00503/CPAN/Config.pm /home/stas/.cpan/CPAN/MyConfig.pm
    

    The only thing left is to replace the base directory of <.cpan>, to the one under your home, On my machine I replace /usr/.cpan (that's where my system's .cpan directory resides) with /home/stas, using Perl of course!

      % perl -pi -e 's|/usr/|/home/stas/|' /home/stas/.cpan/CPAN/MyConfig.pm
    

    Now when you have the local configuration file ready. either after you created it by hand copying the global CPAN/Config.pm or by letting CPAN.pm to create it for you, we have to tell it what special parameters we need to pass when executing perl Makefile.PL stage.

    Open the file in your favorite editor and replace line:

      'makepl_arg' => q[],
    

    with:

      'makepl_arg' => q[PREFIX=/home/stas],
    

    And you've finished the configuration. Now start it as usual (assuming that you are logged as the same user you have prepared the local installation for (stas in our example):

      % perl -MCPAN -e shell
    

    From now on any module you will try to install will be installed locally. If you need to install some system modules, just become a superuser and install them the same way, this time the global configuration file will be used.

    If you have used more than just the PREFIX variable, modify the <MyConfig.pm> to use them, e.g if you have used:

        perl Makefile.PL PREFIX=/home/stas \
        INSTALLPRIVLIB=/home/stas/lib/perl5 \
        INSTALLSCRIPT=/home/stas/bin \
        INSTALLSITELIB=/home/stas/lib/perl5/site_perl \
        INSTALLBIN=/home/stas/bin \
        INSTALLMAN1DIR=/home/stas/lib/perl5/man  \
        INSTALLMAN3DIR=/home/stas/lib/perl5/man3
    

    replace PREFIX=/home/stas in line:

      'makepl_arg' => q[PREFIX=/home/stas],
    

    with all the variables from above:

      'makepl_arg' => q[PREFIX=/home/stas \
        INSTALLPRIVLIB=/home/stas/lib/perl5 \
        INSTALLSCRIPT=/home/stas/bin \
        INSTALLSITELIB=/home/stas/lib/perl5/site_perl \
        INSTALLBIN=/home/stas/bin \
        INSTALLMAN1DIR=/home/stas/lib/perl5/man  \
        INSTALLMAN3DIR=/home/stas/lib/perl5/man3],
    

    If you arrange all the above parameters in one line, you can remove the backslashes (\).

    [TOC]


    Making a Local Apache Installation

    Just like with perl modules, when you don't have permissions to install files into a system area, you have to install them locally under your home directory. It's almost the same as a plain installation, but you will have to run the server listening to port number > 1024, since these are the ports only root processes can listen to.

    Another important issue you would have to solve is how to add an automatic startup and shutdown scripts to the directories use by the rest of the system services. You will have to ask your system administrator to assist you with this issue.

    Now to install Apache locally, all you have to do is to tell a .configure script in the Apache source directory what target directories to be used. If following a convention that I use, which makes your home directory looking like the / (base) directory, the invocation parameters would be:

      ./configure --prefix=/home/stas
    

    Apache will use the prefix for the rest of its target directories instead of the default /usr/local/apache. If you want to see what are they, before you proceed, add the --show-layout option:

      ./configure --prefix=/home/stas --show-layout
    

    You might want to put all the Apache files under /home/stas/apache following the Apache's defaults convention. To accomplish that do:

      ./configure --prefix=/home/stas/apache
    

    If you want to modify some or all of the automatically created names of directories, when you omit their explicit parameters, just set them to the desired values, e.g:

      ./configure --prefix=/home/stas/apache \
        --sbindir=/home/stas/apache/sbin
        --sysconfdir=/home/stas/apache/etc
        --localstatedir=/home/stas/apache/var \
        --runtimedir=/home/stas/apache/var/run \
        --logfiledir=/home/stas/apache/var/logs \
        --proxycachedir=/home/stas/apache/var/proxy
    

    That's all!

    Also remember that you can start the script only under a user and group you belong to. Set the appropriate User and Group directives in the httpd.conf to correct values.

    [TOC]


    Actual Local mod_perl Enabled Apache Installation

    Now when we have learned how to install perl modules and Apache locally, let's see how we use the acquired knowledge to install mod_perl enabled Apache in our home directory. It's almost as simple as doing each one at separate, but a single nuance you should know about and I'll mention it at the end of this section.

    So if you have unpacked Apache and mod_perl sources under the /home/stas/src directory and they look like:

      % ls /home/stas/src
      /home/stas/src/apache_x.x.x
      /home/stas/src/mod_perl-x.xx
    

    where x.xx are the version numbers as usual and you want the perl modules from the mod_perl package to be installed under /home/stas/lib/perl5 and Apache files under /home/stas/apache, the following commands will do that for you.

      % perl Makefile.PL \
      PREFIX=/home/stas \
      APACHE_PREFIX=/home/stas/apache \
      APACHE_SRC=../apache_x.x.x/src \
      DO_HTTPD=1 \
      USE_APACI=1 \
      PERL_MARK_WHERE=1 \
      EVERYTHING=1
      % make && make test && make install 
      % cd ../apache_x.x.x
      % make install
    

    If you need something to be passed to .configure script as we have seen in the previous section use the APACI_ARGS parameter, e.g:

      APACI_ARGS=--sbindir=/home/stas/apache/sbin, \
        --sysconfdir=/home/stas/apache/etc, \
        --localstatedir=/home/stas/apache/var, \
        --runtimedir=/home/stas/apache/var/run, \
        --logfiledir=/home/stas/apache/var/logs, \
        --proxycachedir=/home/stas/apache/var/proxy
    

    Note that the above multiline splitting will work only with bash shell, tcsh users have to list all the parameters in a single line.

    Basically the installation is complete. The only nuance is a @INC variable, that wouldn't be correctly set if you rely on the PERL5LIB environment variable, unless you set it explicitly in the startup file, which is get required before any other module that resides in your local repository is being loaded. But a much nicer approach is to use the lib pragma as we saw before, but in a little different way - we use it in the startup file and it affects all the code that will be executed under mod_perl handlers. e.g:

      PerlRequire /home/stas/apache/perl/startup.pl
    

    where startup.pl starts with:

      use lib qw(/home/stas/lib/perl5/5.00503/
                 /home/stas/lib/perl5/site_perl/5.005);
    

    Note that you can still use the hard-coded @INC modifications in the scripts themselves, but you should know that @INC would be reset to its original value after the scripts would be compiled for the first time and all the hard-coded settings of @INC would be forgot.

    That's because scripts modify @INC in BEGIN blocks and mod_perl executes the BEGIN blocks only when it does script compilation, that's why when you execute the script for a second time, @INC would be reset to its original value.

    The only place you can alter this ``original'' value is during the server configuration stage either in the startup file or by setting:

      PerlSetEnv Perl5LIB /home/stas/lib/perl5/5.00503/:/home/stas/lib/perl5/site_perl/5.005
    

    in the httpd.conf.

    Now the rest of the mod_perl configuration and using is absolutely the same as if you were installing mod_perl as a super user.

    One more important thing to keep in mind is a system resources consuming. mod_perl is memory hungry -- if you run a lot of mod_perl processes on a public, multiuser (not dedicated) machine -- most likely the system administrator of this machine will ask you to use less resources and even to shut down your mod_perl server and to find another home for it. You have a few solutions:

    • Reduce resources usage (see Limiting the size of the processes).

    • Ask your ISP whether they can setup a dedicated machine for you in their computer room, so you will be able to install as much memory as you need and have the ISP to administer the system. But if you get a dedicated machine chances are that you will want to have a root access if you are able to manage the administering your self, keeping on the list of ISP's responsibilities only the following items: keeping a constant electricity supply, making sure that the network link is up, and protecting the machine from possible physical break-ins (when someone breaks into a computer room either to steal the information from your machine, or to damage it physically). Another good idea is to let the ISP to install security patches if you have a trust in them or just incapable of doing that.

    • Look for another ISP with lots of resources or one that supports mod_perl. You can find a list of these ISP at http://perl.apache.org .

    [TOC]


    Local mod_perl Enabled Apache Installation with CPAN.pm

    Again, CPAN makes installation and upgrades simpler. You have seen how to install mod_perl enabled server using solely the CPAN.pm's interactive shell. You have seen how to install perl modules and Apache locally. Now all is left is to merge all these techniques into a single ``local mod_perl Enabled Apache Installation with CPAN.pm'' technique.

    Assuming that you have configured CPAN.pm to install perl modules locally, the installation is a very simple task. Start the CPAN.pm shell, set the arguments to be passed to perl Makefile.PL (modify the example setting to suit your needs), and tell <CPAN.pm> to do the rest of the work for you:

      % perl -MCPAN -eshell
      cpan> o conf makepl_arg 'DO_HTTPD=1 USE_APACI=1 EVERYTHING=1 \
            PREFIX=/home/stas APACHE_PREFIX=/home/stas/apache'
      cpan> install mod_perl
    

    Since when you use CPAN.pm for local installations the value of makepl_arg should be restored to its original value, when the mod_perl installation is complete. The simplest solution is to quit the interactive shell and reenter it if you need to install more modules. Doing that will reset the makepl_arg to its original value.

    If you want to continue working with CPAN without quitting the shell, you've got to remember the value of makepl_arg, and restore it upon mod_perl installation completion. It's quite a cumbersome task as of the this writing. I believe CPAN.pm will be improved to handle these issues more easily, when you read this.

    So if you are still with me, start the shell as usual:

      % perl -MCPAN -eshell
    

    Read the value of the makepl_arg:

      cpan> o conf makepl_arg 
    

      PREFIX=/home/stas
    

    It should be something like PREFIX=/home/stas if you configured CPAN.pm to install modules locally. Save this value:

      cpan> o conf makepl_arg.save PREFIX=/home/stas
    

    Now set a new value, to be used by mod_perl.

      cpan> o conf makepl_arg 'DO_HTTPD=1 USE_APACI=1 EVERYTHING=1 \
            PREFIX=/home/stas APACHE_PREFIX=/home/stas/apache'
    

    Add or remove the parameters according to your needs. Now let <CPAN.pm> to do the rest of the work for you:

      cpan> install mod_perl
    

    Now set makepl_arg's original value back by printing the value of the saved variable and assigning it to makepl_arg.

      cpan> o conf makepl_arg.save
    

      PREFIX=/home/stas
    

      cpan> o conf makepl_arg.save PREFIX=/home/stas
    

    Not so neat, but a working solution.

    [TOC]


    Automating installation

    James G Smith wrote an Apache Builder

    http://hex.tamu.edu/projects/1999/build-apache/

    (META: provide more info)

    [TOC]


    How can I tell whether mod_perl is running

    There are a few ways. In older versions of apache ( < 1.3.6 ?) you could check that by running httpd -v, it no longer works. Now you should use httpd -l. Please notice that it is not enough to have it installed - you should of course configure it for mod_perl and restart the server.

    [TOC]


    Testing by checking the error_log file

    When starting the server, just check the error_log file for the following message:

      [Thu Dec  3 17:27:52 1998] [notice] Apache/1.3.1 (Unix) mod_perl/1.15 configured 
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        -- resuming normal operations
    

    [TOC]


    Testing by viewing /perl-status

    Assuming that you have configured the <Location /perl-status> section in the server configuration file fetch: http://www.nowhere.com/perl-status using your favorite Netscape browser :-)

    You should see something like this:

      Embedded Perl version 5.00502 for Apache/1.3.1 (Unix) mod_perl/1.19 
      process 50880, running since Tue Oct 6 14:31:45 1998
    

    [TOC]


    Testing via telnet

    Knowing the port you have configured apache to listen on, you can use telnet to talk directly to it.

    Assuming that your mod_perl enabled server listens to port 8080, telnet to your server at port 8080, and type HEAD / HTTP/1.0 then press the <ENTER> key TWICE:

      % telnet localhost 8080<ENTER>
      HEAD / HTTP/1.0<ENTER><ENTER>
    

    You should see a response like this:

      HTTP/1.1 200 OK
      Date: Tue, 01 Dec 1998 12:27:52 GMT
      Server: Apache/1.3.6 (Unix) mod_perl/1.19
      Connection: close
      Content-Type: text/html
      
      Connection closed.
    

    The line: Server: Apache/1.3.6 (Unix) mod_perl/1.19 --confirms that you do have mod_perl installed and its version is 1.19. Of course in your case it would be the version you have installed.

    However, just because you have got mod_perl linked in there, that does not mean that you have configured your server to handle Perl scripts with mod_perl. You will find the configuration assistance at ModPerlConfiguration

    [TOC]


    Testing via a CGI script

    Another method is to invoke a CGI script which dumps the server's environment.

    I assume you have configured the server that scripts running under /perl/ location are handled by Apache::Registry handler. And you have the PerlSendHeader directive set to On.

    Copy and paste the script below (no need for the first perl calling (shebang) line!). Let's say you named it test.pl, saved it at the root of the CGI scripts and CGI root is mapped directly to the /perl location of your server.

      print "Content-type: text/plain\n\n";
      print "Server's environment\n";
      foreach ( keys %ENV ) {
          print "$_\t$ENV{$_}\n";
      }
    

    Make it readable and executable by server:

      % chmod a+rx test.pl
    

    (you will want to tune permissions on the public host).

    Now fetch the URL http://www.nowhere.com:8080/perl/test.pl (replace 8080 with the port your mod_perl enabled server is listening to. You should see something like this (the generated output was trimmed):

      SERVER_SOFTWARE Apache/1.3.10-dev (Unix) mod_perl/1.21_01-dev
      GATEWAY_INTERFACE       CGI-Perl/1.1
      DOCUMENT_ROOT   /home/httpd/docs
      REMOTE_ADDR     127.0.0.1
      [more environment variables snipped]
      MOD_PERL        mod_perl/1.21_01-dev
      [more environment variables snipped]
    

    If you see the that the value of GATEWAY_INTERFACE is CGI-Perl/1.1 everything is OK. If you see:

      GATEWAY_INTERFACE       CGI/1.1
    

    it means that you have configured this location to run under mod_cgi. But actually the above script wouldn't run under mod_cgi, since you must use the shebang line #!/usr/bin/perl as a first line of the CGI script

    Also note that there is a MOD_PERL environment variable if you run under mod_perl handler, and it's set to a release number you use.

    Based on these differences you can write code like:

      BEGIN {
          # Auto-detect if we are running under mod_perl or CGI.
        $USE_MOD_PERL = exists $ENV{'GATEWAY_INTERFACE'}
                        and $ENV{'GATEWAY_INTERFACE'} =~ /CGI-Perl/
                        ? 1 : 0;
          # perl5.004 is a must under mod_perl
        require 5.004 if $USE_MOD_PERL;
      }
    

    Another approach is to test for $ENV{MOD_PERL}:

      BEGIN {
          # Auto-detect if we are running under mod_perl or CGI.
        $USE_MOD_PERL = exists $ENV{'MOD_PERL'}
                        ? 1 : 0;
        require 5.004 if $USE_MOD_PERL;
      }
    

    You might wonder why in the world you would need to know in what handler you are running under. For example you will want to use Apache::exit() and not CORE::exit() in your modules, but if you think that your script might be used in both environments (mod_cgi vs. mod_perl), you will have to override the exit() subroutine and to make the runtime decision of what method you will use. Not that if you run scripts under Apache::Registry handler, it takes care of overriding the exit() call for you, so it's not an issue if this is your case. For reasons and implementations see: Terminating requests and processes, exit() function and the whole Writing Mod Perl scripts and Porting plain CGIs to it page.

    [TOC]


    Testing via lwp-request

    Yet another one. Why do I show all these approaches? While here they are serving a very simple purpose, they can be helpful in other situations.

    Assuming you have the libwww-perl (LWP) package installed (you will need it installed in order to pass mod_perl's make test anyway):

      % lwp-request -e -d http://www.nowhere.com
    

    Will show you all the headers. (The -d option disables printing the response content.)

      % lwp-request -e -d http://www.nowhere.com | egrep '^Server:'
    

    To see the server's version only.

    Use http://www.nowhere.com:port_number if your server is listening to a non-default 80 port.

    [TOC]


    General Notes

    [TOC]


    Should I rebuild mod_perl if I have upgraded my perl?

    Yes, you should. You have to rebuild mod_perl enabled server since it has a hard coded @INC which points to the old perl and it is is probably linked to the an old libperl library. You can try to modify the @INC in the startup script (if you keep the old perl version around), but it is better to build a fresh one to save you a mess.

    [TOC]


    Perl installation requirements

    Make sure you have perl installed -- the newer stable version you have the better (minimum perl.5.004!). If you don't have it -- install it. Follow the instructions in the distribution's INSTALL file. During the configuration stage (while running ./Configure), make sure you answer YES to the question:

      Do you wish to use dynamic loading? [y]
    

    Answer y to be able to load dynamically Perl Modules extensions.

    [TOC]


    mod_auth_dbm nuances

    If you are a user of mod_auth_dbm or mod_auth_db, you may need to edit Perl's Config module. When Perl is configured it attempts to find libraries for ndbm, gdbm, db, etc., for the *DBM*_File modules. By default, these libraries are linked with Perl and remembered by the Config module. When mod_perl is configured with apache, the ExtUtils::Embed module returns these libraries to be linked with httpd so Perl extensions will work under mod_perl. However, the order in which these libraries are stored in Config.pm, may confuse mod_auth_db*. If mod_auth_db* does not work with mod_perl, take a look at this order with the following command:

     % perl -V:libs
    

    If -lgdbm or -ldb is before -lndbm, example:

     libs='-lnet -lnsl_s -lgdbm -lndbm -ldb -ldld -lm -lc -lndir -lcrypt';
    

    Edit Config.pm and move -lgdbm and -ldb to the end of the list. Here's how to find Config.pm:

     % perl -MConfig -e 'print "$Config{archlibexp}/Config.pm\n"'
    

    Another solution for building Apache/mod_perl+mod_auth_dbm under Solaris is to remove the DBM and NDBM ``emulation'' from libgdbm.a. Seems Solaris already provides its own DBM and NDBM, and there's no reason to build GDBM with them (for us anyway).

    In our Makefile for GDBM, we changed

      OBJS = $(DBM_OF) $(NDBM_OF) $(GDBM_OF)
    

    to

      OBJS = $(GDBM_OF)
    

    Rebuild libgdbm, then Apache/mod_perl.

    [TOC]


    Stripping apache to make it almost perl-server

    Since most of the functionality that various apache mod_* modules provide is being implemented in Apache::{*} perl modules, it was reported that one can build an apache server with mod_perl only. If you can reduce the problems down to whatever mod_perl can handle, you can eliminate nearly every other module. Then basically you will have a perl-server, with C code to handle the tricky HTTP bits. The only module you will need to leave in is a mod_actions.

    [TOC]


    Saving the config.status Files with mod_perl, php, ssl and Other Components

    Typically, when building the bloated apache that sits behind squid or whatever, you need mod_perl, php, mod_ssl and the rest. As you install each they typically overwrite each other's config.status files. An advise is to save them after each step, so you would be able to reproduce and reuse them later.

    [TOC]


    Should I Build mod_perl with gcc or cc?

    Since mod_perl includes C code, to make it binary compatible with Perl, on most systems the same compiler should be used as the one Perl was built with. So if your Perl was built with gcc, it will pick the same compiler when you do perl Makefile.PL .... To find out which compiler it was built with, run perl -V at the command prompt.

    Sometimes Perl's configuration will choose one compiler, e.g. cc, but Apache's configuration chooses a different one, e.g. gcc. If you run into this problem, consult Perl's and Apache's INSTALL documents on how to ensure both are built with the same compiler.

    [TOC]


    OS Related Notes

    • Gary Shea <shea@xmission.com> discovered a nasty BSDI bug (seen in versions 2.1 and 3.0) related to dynamic loading and two workarounds:

      Turns out they use argv[0] to determine where to find the link tables at run-time, so if a program either changes argv[0], or does a chdir() (like apache!), it can easily confuse the dynamic loader. The short-term solutions to the problem are pitifully simple. Either of the following will work:

      1) Call httpd with a full path, e.g. /opt/www/bin/httpd

      2) Put the httpd you wish to run in a directory in your PATH before any other directory containing a version of httpd, then call it as 'httpd' -- don't use a relative path!

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/intro.html0100644000000000000000000006253207027225633012711 0ustar rootroot mod_perl guide: Introduction. Incentives. Credits.

    Mod Perl Icon Mod Perl Icon Introduction. Incentives. Credits.


    [ Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    What is mod_perl

    The Apache/Perl integration project brings together the full power of the Perl programming language and the Apache HTTP server. With mod_perl it is possible to write Apache modules entirely in Perl, letting you easily do things that are more difficult or impossible in regular CGI programs, such as running sub requests. In addition, the persistent Perl interpreter embedded in the server saves the overhead of starting an external interpreter, i.e. the penalty of Perl start-up time. And not the least important feature is code caching, where modules and scripts are loaded and compiled only once, then for the rest of the server's life they are served from the cache, thus the server spends its time only on running already loaded and compiled code, which is very fast.

    The primary advantages of mod_perl are power and speed. You have full access to the inner workings of the web server and can intervene at any stage of request-processing. This allows for customized processing of (to name just a few of the phases) URI->filename translation, authentication, response generation, and logging. There is very little run-time overhead. In particular, it is not necessary to start a separate process, as is often done with web-server extensions. The most wide-spread such extension, the Common Gateway Interface (CGI), can be replaced entirely with Perl code that handles the response generation phase of request processing. mod_perl includes 2 general purpose modules for this purpose: Apache::Registry, which can transparently run existing perl CGI scripts and Apache::PerlRun, which does a similar job but allows you to run ``dirtier'' (to some extent) scripts.

    You can configure your httpd server and handlers in Perl (using PerlSetVar, and <Perl> sections). You can even define your own configuration directives.

    Many people wonder and ask ``How much of a performance improvement does mod_perl give?'' Well, it all depends on what you are doing with mod_perl and possibly who you ask. Developers report speed boosts from 200% to 2000%. The best way to measure is to try it and see for yourself! (See http://perl.apache.org/tidbits.html and http://perl.apache.org/stories/ for the facts.)

    [TOC]


    mod_cgi

    When you run your CGI scripts by using a configuration of:

      ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/
    

    you run it under a mod_cgi handler, you never define it explicitly. Apache does all the configuration work behind the scenes, when you use a ScriptAlias.

    By the way, don't confuse it with a ExecCGI configuration option, it's being enabled so the script will be executed and not returned as a plain file. For example for mod_perl and Apache::Registry you would use a configuration like:

      <Location /perl>
        SetHandler perl-script
        PerlHandler Apache::Registry
        Options ExecCGI
        PerlSendHeader On
      </Location>
    

    [TOC]


    C API

    META: complete

    [TOC]


    Perl API

    META: complete

    [TOC]


    Apache::Registry

    From the viewpoint of the Perl API, Apache::Registry is just yet another handler that's not conceptually different from any other handler. It reads in the file, compiles, executes it and stores into the cache. Since the perl interpreter keeps running from child process' creation to its death, any code compiled by the interpreter is not removed from memory until the child dies.

    To keep the script names from collisions, it prepends Apache::ROOT:: and the mangled path of the URI to the key of the cached script. This key is actually a package name, the script resides in. So if you have requested a script /perl/project/test.pl, the scripts would be wrapped in code which starts with package declaration of:

      package Apache::ROOT::perl::project::test_e2pl;
    

    Apache::Registry also stores the script's last modification time. Everytime the script changes, the cached code would be discarded and recompiled using the modified source. However, it doesn't check any of the perl libraries the script might use.

    Apache::Registry overrides the CORE::exit() with <Apache::exit()>, so the CGI scripts that used the <exit()> will run correctly. We will talk about all these details in depth later.

    The last thing Apache::Registry does, is emulation of the mod_cgi's environment variables. Like $ENV{SERVER_NAME}, $ENV{REMOTE_USER} and so on. PerlSetupEnv Off disables this feature and saves some memory bits and CPU clocks.

    From the viewpoint of the programmer, there is almost no difference between running a script as a plain CGI under mod_cgi and running it under mod_perl. There is however a great speed improvement, but at the expense of much heavier memory usage (there is no free lunch :).

    When they run under mod_cgi, your CGI scripts are loaded each time they are called and then they exit. Under mod_perl they are loaded once and cached. This gives a big performance boost. But because the code is cached and doesn't exit, it won't cleanup memory as it would under mod_cgi. This can have unexpected effects.

    Your scripts will be recompiled and reloaded by mod_perl when it detects that you have changed them, but remember that any libraries that your scripts might require() or use() will not be recompiled when they are changed. You will have to take action yourself to ensure that they are recompiled.

    Of course the guide will answer all these issues in depth.

    Let's see what happens with your script when it's being executed under Apache::Registry. If we take the simplest code of (URI /perl/project/test.pl)

      print "Content-type: text/html\n\n";
      print "It works\n";
    

    Apache::Registry will convert it into the following:

      package Apache::ROOT::perl::project::test_e2pl;
      use Apache qw(exit);
      sub handler {
        print "Content-type: text/html\n\n";
        print "It works\n";
      }
    

    META: Complete

    [TOC]


    Apache::PerlRun

    META: Complete

    [TOC]


    What will you learn

    This document was written in an effort to help you start using Apache's mod_perl extension as quickly and easily as possible. It includes information about installation and configuration of Perl and the Apache web server and delves deeply into issues of writing and porting existing Perl scripts to run under mod_perl. Note that it does not attempt to enter the big world of using the Perl API or C API. You will find pointers covering these topics in the Getting Help and Further Learning section of this document. This guide tries to cover the most of the Apache::Registry and Apache::PerlRun modules. Along with mod_perl related topics, there are many more issues related to administrating apache servers, debugging scripts, using databases, Perl reference, code snippets and more. The Guide's Overview will help you to find your way through the guide.

    It is assumed that you know at least the basics of building and installing Perl and Apache. (If you do not, just read the INSTALL documents which are part of the distribution of each package.) However, in this guide you will find specific Perl and Apache installation and configuration notes, which will help you successfully complete the mod_perl installation and get the server running in a short time.

    If after reading this guide and other documents listed in Getting Help and Further Learning you feel that your question is yet not answered, please ask the apache/mod_perl mailing list to help you. But first try to browse the mailing list archive (located at http://forum.swarthmore.edu/epigone/modperl ). Often you will find the answer to your question by searching the mailing list archive, since there is a good chance someone else has already encountered the problem and found a solution. If you ignore this advice, do not be surprised if your question goes unanswered - it bores people to answer the same question more than once (twice?). This does not mean that you should avoid asking questions, just do not abuse the available help and RTFM before you call for HELP. (You have certainly heard the infamous fable of the shepherd boy and the wolves...)

    If you find incorrect details or mistakes in my grammar, or you want to contribute to this document please feel free to send me an email at sbekman@iname.com .

    [TOC]


    High-Profile Sites Running mod_perl

    A report prepared by Rex Staples at Thu, 14 Oct 1999:

    [TOC]


    References and Acknowledgments

    I have used the following references while writing this guide:

    As I said, I have quoted many information snippets from FAQs and emails, and I did not credit people after each quote in the guide. I did not mean to take the credit for myself, it's just that I tried to keep track of names, and became lost, so I preferred not to put credit throughout the guide, but rather to centralize it here. If you want your name to show up under your original quote, please tell me and I'll add it for you.

    Major contributors:

    • Doug MacEachern. A large part of this guide is built upon his email replies to users' questions.

    • Frank Cringle. Parts of his mod_perl FAQ have been used in this guide.

    • Vivek Khera. For his mod_perl performance tuning guide.

    • Steve Reppucci, who made a thorough review of the stuff I wrote. He fixed lots of spelling and grammar errors, and made the guide readable to English speakers :)

    • Eric Cholet, who wrote complete sections for the guide, and pointed out technical errors in it.

    • Ken Williams, who reviewed a lot of stuff in the guide. Many snippets from his emails are included in the guide.

    • Wesley Darlington for contributing a big section for scenario chapter.

    • Geoffrey S Young and David Harris for contributing a big sections about mod_perl and RPM packages.

    • Andreas J. Koenig for contributing his ``Correct HTTP headers'' document.

    • Ged W. Haywood for reviewing and fixing a big part of the guide, providing lots of constructive critics and helping to reorganize the guide to make it more user friendly.

    • Jeffrey W. Baker for his ``guide to mod_perl database performance''.

    • Richard A. Wells for reviewing a correcting a large part of the guide.

    • Randy Harmon for rewriting the mod_perl advocacy chapter

    • Dean Fitz for reviewing the ``Operating System and Hardware Demands'' chapter.

    Credits of course go to ( alphabetically sorted ):

    I want to thank all the people who donated their time and efforts to make this amazing idea of mod_perl a reality. This includes Doug MacEachern, the author of mod_perl, and all the developers who contributed bug patches, modules and help. And of course the numerous unseen users around the world who help to promote mod_perl and to make it a better tool.

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/modules.html0100644000000000000000000003572507027225633013232 0ustar rootroot mod_perl guide: Apache::* modules

    Mod Perl Icon Mod Perl Icon Apache::* modules


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    Apache::Session - Maintain session state across HTTP requests

    This module provides the Apache/mod_perl user a mechanism for storing persistent user data in a global hash, which is independent of its real storage mechanism. Currently you can choose from these storage mechanisms Apache::Session::DBI, Apache::Session::Win32, Apache::Session::File, Apache::Session::IPC. Read the man page of the mechanism you want to use for a complete reference.

    What Apache::Session does is provide persistence to a data structure. The data structure has an ID number, and you can retrieve it by using the ID number. In the case of Apache, you would store the ID number in a cookie or the URL to associate it with one browser, but the method of dealing with the ID is completely up to you. The flow of things is generally:

      Tie a session to Apache::Session.
      Get the ID number.
      Store the ID number in a cookie.
      End of Request 1.
    

      (time passes)
    

      Get the cookie.
      Restore your hash using the ID number in the cookie.
      Use whatever data you put in the hash.
      End of Request 2.
    

    Using Apache::Session is easy: simply tie a hash to the session object, stick any data structure into the hash, and the data you put in automatically persists until the next invocation. Here is a quick example which uses cookies to track the user's session.

      #  Pull in the require packages
      use Apache::Session::DBI;
      use Apache;
      
      use strict;
      
      # Read in the cookie if this is an old session
      my $r = Apache->request;
      my $cookie = $r->header_in('Cookie');
      $cookie =~ s/SESSION_ID=(\w*)/$1/;
      
      # Create a session object based on the cookie we got from the
      # browser, or a new session if we got no cookie
      my %session;
      tie %session, 'Apache::Session::DBI', $cookie,
          {DataSource => 'dbi:mysql:sessions',
           UserName   => $db_user,
           Password   => $db_pass
          };
      
      # Might be a new session, so lets give them their cookie back
      my $session_cookie = "SESSION_ID=$session{_session_id};";
      $r->header_out("Set-Cookie" => $session_cookie);
    

    After setting this up, you can stick anything you want into %session (except file handles), and it will still be there when the user invokes the next page.

    It is possible to write an Apache authen handler using Apache::Session. You can put your authentication token into the session. When a user invokes a page, you open their session, check to see if they have a valid token, and approve or deny their authorization based on that.

    As for IIS, let's compare. IIS's sessions are only valid on the same web server as the one that issued the session. Apache::Session's session objects can be shared amongst a farm of many machines running different operating systems, including even Win32. IIS stores session information in RAM. Apache::Session stores sessions in databases, file systems, or RAM. IIS's sessions are only good for storing scalars or arrays. Apache::Session's sessions allow you to store arbitrarily complex objects. IIS sets up the session and automatically tracks it for you. With Apache::Session, you setup and track the session yourself. IIS is proprietary. Apache::Session is open-source. Apache::Session::DBI can issue 400+ session requests per second on light Celeron 300A running Linux. IIS?

    An alternative to Apache::Session is Apache::ASP, which has session tracking abilities. HTML::Embperl hooks into Apache::Session for you.

    [TOC]


    Apache::DBI - Initiate a persistent database connection

    See mod_perl and relational Databases

    [TOC]


    Apache::Request (libapreq) - Generic Apache Request Library

    This package contains modules for manipulating client request data via the Apache API with Perl and C. Functionality includes:

    - parsing of application/x-www-form-urlencoded data

    - parsing of multipart/form-data

    - parsing of HTTP Cookies

    The Perl modules are simply a thin xs layer on top of libapreq, making them a lighter and faster alternative to CGI.pm and CGI::Cookie. See the Apache::Request and Apache::Cookie documentation for more details and eg/perl/ for examples.

    Apache::Request and the libapreq are tied tight to the Apache API, which there is no access to in a process running under mod_cgi.

    (Apache::Request)

    [TOC]


    Apache::PerlRun - Run unaltered CGI scripts under mod_perl

    See Apache::PerlRun - a closer look.

    [TOC]


    Apache::GzipChain - compress HTML (or anything) in the OutputChain

    Have you ever served a huge HTML file (e.g. a file bloated with JavaScript code) and wandered how could you send it compressed, thus drammatically cutting down the download times. After all java applets can be compressed into a jar and benefit from a faster download times. Why cannot we do the same with a plain ASCII (HTML,JS and etc), it is a known fact that ASCII text can be compressed by a factor of 10.

    Apache::GzipChain comes to help you with this task. If a client (browser) understands gzip encoding this module compresses the output and sends it downstream. A client decompresses the data upon receive and renders the HTML as if it was a plain HTML fetch.

    For example to compress all html files on the fly, do:

      <Files *.html>
        SetHandler perl-script
        PerlHandler Apache::OutputChain Apache::GzipChain Apache::PassFile
      </Files>
    

    Remember that it will work only if the browser claims to accept compressed input, thru Accept-Encoding header. Apache::GzipChain keeps a list of user-agents, thus it also looks at User-Agent header, for known to accept compressed output browsers.

    For example if you want to return compressed files which should pass in addition through Embperl module, you would write:

      <Location /test>
        SetHandler perl-script
        PerlHandler Apache::OutputChain Apache::GzipChain Apache::EmbperlChain Apache::PassFile
      </Location>
    

    Hint: Watch an access_log file to see how many bytes were actually send, compare with a regular configuration send.

    (See perldoc Apache::GzipChain).

    Notice that the rightmost PerlHandler must be a content producer. Use Apache::PassFile or another similar module.

    [TOC]


    Apache::PerlVINC - set a different @INC perl-location

    With that module, you can configure @INC and have modules reloaded for a given Location, e.g. say two versions of Apache::Status are being hacked on in the same server, this fixup handler will simply delete $INC{ $filename }, unshift the prefered PerlINC path into @INC, and reload the file with require():

      PerlModule Apache::PerlVINC
    

      <Location /dougm-status>
        SetHandler perl-script
        PerlHandler Apache::Status
      
        PerlINC /home/dougm/dev/modperl/lib
        PerlVersionINC On
        PerlFixupHandler Apache::PerlVINC
        PerlRequire Apache/Status.pm
      </Location>
    

      <Location /other-status>
        SetHandler perl-script
        PerlHandler Apache::Status
      
        PerlINC /home/other/current/modperl/lib
        PerlVersionINC On
        PerlFixupHandler Apache::PerlVINC
        PerlRequire Apache/Status.pm
      </Location>
    

    It's important to stress that changed @INC is effective only inside the <Location> or a similar configuration directive. Apache::PerlVINC subclasses the PerlRequire directive, marking the file to be reloaded by the fixup handler, using the value of PerlINC for @INC. That's local to the fixup handler, so you won't actually see @INC changed in your script.

    To address possible issues of namespace clashes during reload, the handler could call $r->child_terminate() so the next server to load the different versions will have a fresh namespace. (not a good idea in a high load environment, of course.)

    If it is still absent from CPAN get it at: http://perl.apache.org/~dougm/Apache-PerlVINC-0.01.tar.gz

    [TOC]


    Apache::RegistryBB -- Apache::Registry Bare-Bones

    It works just like Apache::Registry, but does not test the x bit, only compiles the file once, and does not chdir() into the script parent directory.

    Configuration:

      PerlModule Apache::RegistryBB
      <Location /perl>
        SetHandler perl-script
        PerlHandler ApacheRegistryBB->handler
      </Location>
    

    [TOC]


    Apache::LogSTDERR

    When Apache's builtin syslog support is used, the stderr stream is redirected to /dev/null. This means Perl warnings, any messages from die(), croak(), etc., will also end up in the black hole. The HookStderr directive will hook the stderr stream to a file of your choice, the default is shown in this example:

     PerlModule Apache::LogSTDERR
     HookStderr logs/stderr_log
    

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 10/12/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/multiuser.html0100644000000000000000000002761607027225633013613 0ustar rootroot mod_perl guide: mod_perl for ISPs. mod_perl and Virtual Hosts.

    Mod Perl Icon Mod Perl Icon mod_perl for ISPs. mod_perl and Virtual Hosts.


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    ISPs providing mod_perl services - a fantasy or reality.

    You have fallen in love with mod_perl from the first sight, since the moment you have installed it at your home box. But when you wanted to convert your CGI scripts, currently running on your favorite ISPs machine, to run under mod_perl - you have discovered, your ISPs either have never heard of such a beast, or refuse to install it for you.

    You are an old sailor in the ISP business, you have seen it all, you know how many ISPs are out there and you know that the sales margins are too low to keep you happy. You are looking for some new service almost no one provides, to attract more clients to become your users and hopefully to have a bigger slice than a neighbor ISP.

    If you are a user asking for a mod_perl service or an ISP considering to provide this service, this section should make things clear for both of you.

    an ISP has 3 choices to choose from:

    1. ISP cannot afford having a user, running scripts under mod_perl, on the main server, since it will die very soon for one of the many reasons: either sloppy programming, or user testing just updated script which probably has some syntax errors and etc, no need to explain why if you are familiar with mod_perl peculiarities. The only scripts that CAN BE ALLOWED to use, are the ones that were written by ISP and are not being modified by user (guest books, counters and etc - the same standard scripts ISPs providing since they were born). So you have to say NO for this choice.

      More things to think about are file permissions (any user who is allowed to write and run CGI script, can at least read if not write any other files that has a permissions of the web server. This has nothing to do with mod_perl, and there are solutions for that suEXEC and cgiwrap for example) and Apache::DBI connections (You can pick a connection from the pool of cached connenctions, opened by someone else by hacking the Apache::DBI code).

    2. But, hey why I cannot let my user to run his own server, so I clean my hands off and do not care how dirty and sloppy user's code is (assuming that user is running the server by his own username).

      This option is fine as long as you are concerned about your new system requirements. If you have even some very limited experience with mod_perl, you know that mod_perl enabled apache servers while freeing up your CPU and lets you run scripts much much faster, has a huge memory demands (5-20 times the plain apache uses). The size depends on the code length, sloppiness of the programmer, possible memory leaks the code might have and all that multiplied by the number of children each server spawns. A very simple example : a server demanding 10Mb of memory which spawns 10 children, already rises your memory requirements by 100Mb (the real requirement are actually smaller if your OS allows code sharing between processes and a programmer exploits these features in her code). Now multiply the received number by the number of users you intend to have and you will get the memory requirements. Since ISPs never say no, you better use an opposite approach - think of a largest memory size you can afford then divide it by one user's requirements as I have shown in example, and you will know how much mod_perl users you can afford :)

      But who am I to prognosticate how much memory your user may use. His requirement from a single server can be very modest, but do you know how many of servers he will run (after all she has all the control over httpd.conf - and it has to be that way, since this is very essential for the user running mod_perl)?

      All this rumbling about memory leads to a single question: Can you restrict user from using more than X memory? Or another variation of the question: Assuming you have as much memory as you want, can you charge user for the average memory usage?

      If the answer for either of the above question is positive, you are all set and your clients will prize your name for letting them run mod_perl! There are tools to restrict resources' usage (See for example man pages for ulimit(3), getrlimit(2), setrlimit(2) and sysconf(3) ).

      <META> If you have an experience with some restriction techniques please share with us. Thank you! </META>

      If you have picked this choice, you have to provide your client:

      • Shutdown/startup scripts installed together with the rest of your daemon startup scripts (e.g /etc/rc.d directory) scripts, so when you reboot your machine user's server will be correctly shutdowned and will be back online the moment your system comes back online. Also make sure to start each server under username the server belongs to, if you are not looking for a big trouble.

      • Proxy (in a forward or httpd accelerator mode) services for user's virtual host. Since user will have to run her server on unprivileged port (>1024), you will have to forward all requests from user.given.virtual.hostname:80 (which is user.given.virtual.hostname without port - 80 is a default) to your.machine.ip:port_assigned_to_user and user to code his scripts to write self referencing URLs to be of user.given.virtual.hostname base of course.

        Letting user to run a mod_perl server, immediately adds a requirement for user to be able to restart and configure their own server. But only root can bind port 80. That is why user has to use ports numbers >1024.

      • Another problem you will have to solve is how to assign ports between users. Since user can pick any port above 1024 to run his server on, you will have to make some regulation here. A simple example will stress the importance of this problem: I am a malicious user or I just a rival of some fellow who runs his own server on your ISP. All I should do is to find out what port his server is listening to (e.g. with help of netstat(8)) and configure my own server to listen on the same port. While I am unable to bind to this same port, imagine what will happen when you reboot your system and my startup script happen to be run before my rivals! I get the port first, now all requests will be redirected to my server and let your imagination go wild about what nasty things might happen then. Of course the ugly things will be revealed pretty soon, but the damage has been done.

    3. A much better, but costly solution is co-location. Let user to hook her (or ISP's) stand alone machine into your network, and forget about this user. Of course either user or you will have to make all the system administration chores and it will cost your client more money.

      All in all, who are the people who seek the mod_perl support? The ones who run serious projects/businesses, who can afford a stand alone box, thus gaining their goal of self autonomy and keeping their ISP happy. So money is not an obstacle.

    [TOC]


    Virtual Hosts in the guide

    If you are about to use Virtual Hosts you might want to read these sections:

    Perl Sections.

    Easing the chores of configuring the virtual hosts with mod_macro

    Is there a way to provide a different startup.pl file for each individual virtual host

    Is there a way to modify @INC on a per-virtual-host basis

    Sometimes the script from one virtual host calls a script with the same path from the second virtual host

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 09/26/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/obvious.html0100644000000000000000000001725407027225633013245 0ustar rootroot mod_perl guide: Things obvious to others, but not to you

    Mod Perl Icon Mod Perl Icon Things obvious to others, but not to you


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    Coverage

    This document describes ``special'' traps you may encounter when running your plain CGIs under Apache::Registry and Apache::PerlRun.

    [TOC]


    Where do the warnings/errors go?

    Your CGI does not work and you want to see what the problem is. The best idea is to check out any errors that the server may be reporting. Where I can find these errors?

    Generally all errors are logged into an error_log file. The exact file location and name are defined in the http.conf file. Look for the ErrorLog parameter. My httpd.conf says:

      ErrorLog var/logs/error_log
    

    Hey, where is the beginning of the path? There is another Apache parameter called ServerRoot. Every time apache sees a value of the parameter with no absolute path (e.g /tmp/my.txt) but with relative path (e.g my.txt) it prepends the value of the ServerRoot to this value. I have:

      ServerRoot /usr/local/apache
    

    So I will look for error_log file at /usr/local/apache/var/logs/error_log. Of course you can also use an absolute path to define the file's location at the file system.

    <META>: is this 100% correct?

    But there are cases when errors don't go to the error_log file. For example some errors are being printed to the console (tty) you have executed the httpd from (unless you redirected the httpd's stderr flow). This happens when the server didn't open the error_log file for writing yet.

    For example, if you have mistakenly entered a non-existent directory path in your ErrorLog directive, the error message will be printed on the controlling tty. Or, if the error happens when server executes PerlRequire or PerlModule directive you might see the errors here also.

    You are probably wonder where all the errors go when you are running the server in single mode (httpd -X). They go to the console. That is because when running in the single mode there is no parent httpd process to perform all the logging. It includes all the status messages that generally show up in the error_log file.

    </META>

    [TOC]


    Setting environment variables for scripts called from CGI.

    Perl uses sh() for its iteractions for system() and open() calls. So when you want to set a temporary variable when you call a script from your CGI you do:

     open UTIL, "USER=stas ; script.pl | " or die "...: $!\n";
    

    or

      system "USER=stas ; script.pl";
    

    This is useful for example if you need to invoke a script that uses CGI.pm from within a mod_perl script. We are tricking the perl script to think it's a simple CGI, which is not running under mod_perl.

      open(PUBLISH, "GATEWAY_INTERFACE=CGI/1.1 ; script.cgi
           \"param1=value1&param2=value2\" |") or die "...: $!\n";
    

    Make sure, that the parameters you pass are shell safe (All ``unsafe'' characters like single-tick should be properly escaped).

    However you are fork-ing to run a Perl script, so you have thrown the so hardly gained performance out the window. Whatever script.cgi is now, it should be moved to a module with a subroutine you can call directly from your script, to avoid the fork.

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/13/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/performance.html0100644000000000000000000043544707027225633014070 0ustar rootroot mod_perl guide: Performance. Benchmarks.

    Mod Perl Icon Mod Perl Icon Performance. Benchmarks.


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    Performance: An Overall picture

    Before we dive into performance issues, there is something very important to understand. It applies to any webserver, not only apache. All the efforts are made to make user's web browsing experience a swift. Among other web site usability factors, speed is one of the most crucial ones. What is a correct speed measurement? Since user is the one that interacts with web site, speed measurement is a time passed from the moment user follows a link or presses a submit button till the resulting page is being rendered by her browser. So if we trace the data packet's movement as it leaves user's machine (request sent) till the reply arrives, the packet travels through many entities on its way. It has to make its way through the network, passing many interconnection nodes, before it enters the target machine it might go through proxy (accelerator) servers, then it's being served by your server, and finally it has to make the whole way back. A webserver is only one of the elements the packet sees on its way. You could work hard to fine tune your webserver for the best performance, but a slow NIC (Network Interface Card) or slow network connection from your server might defeat it all. That's why it's important to think big and to be aware of possible bottlenecks between the server and the web. Of course there is nothing you can do if user has a slow connection on its behalf.

    Moreover, you might tune your scripts and webserver to process incoming requests ultra fast, so you will need a little number of working servers, but you might find out that server processes are busy waiting for slow clients to complete the download. You will see more examples in this chapter.

    My point is that a web service is like car, if one of the details or mechanisms is broken the car will not drive smoothly and it can even stop dead if pushed further without first fixing it.

    [TOC]


    Analysis of SW and HW Requirements

    (META: Only partial analysis. Please submit more points. Many points are scattered around the document and should be gathered here, to represent the whole picture. It also should be merged with the above item!)

    You need to analyze all of the problem's dimensions. There are several things that need to be considered:

    *How long does it take to process each request

    *How many requests can you process simultaneously

    *How many simultaneous requests are you planning to get

    The first one is probably the easiest to optimize. Follow the performance optimization tips in the guide and other docs, let a profeccional perl (mod_perl) programmer to work out your code and improve it.

    The second one is a function of RAM. How much RAM is in the box, how many boxes do you have, and how much RAM does each mod_perl process take? Multiply the first two and divide by the third. Ask yourself whether it is better to switch to another, possibly just as inefficient language will actually cost more than throwing another Ultra 2 into the rack. Also ask yourself whether switching to another language will even help. In some applications, a huge chunk of memory is needed e.g. to link in Oracle runtime libraries. So you would pay this price even if you switch from Perl to C.

    The last one is important. You need to have a realistic answer. Are you really expecting 8 million hits per day? What is the expected peak load, and what kind of response time do you need to guarantee? Remember that this numbers might change drastically when you apply code changes and your site becomes more popular. Remember that when the you get a very high hits rate, the requirements wouldn't grow lineary by exponentialy!

    [TOC]


    Sharing Memory

    A very important point is the sharing of memory. If your OS supports this (and most sane systems do), you might save more memory by sharing it between child processes. This is only possible when you preload code at server startup. However during a child process' life, its memory pages becomes unshared and there is no way we can control perl to make it allocate memory so (dynamic) variables land on different memory pages than constants, that's why the copy-on-write effect (will explain in a moment) will hit almost at random. If you are pre-loading many modules you might be able to balance the memory that stays shared against the time for an occasional fork by tuning the MaxRequestsPerChild to a point where you restart before too much becomes unshared. In this case the MaxRequestsPerChild is very specific to your scenario. You should do some measurements and you might see if this really makes a difference and what a reasonable number might be. Each time a child reaches this upper limit and restarts it should release the unshared copies and the new child will inherit pages that are shared until it scribbles on them.

    It is very important to understand that your goal is not to have MaxRequestsPerChild to be 10000. Having a child serving 300 requests on precompiled code is already a huge speedup, so if it is 100 or 10000 it does not really matter if it saves you the RAM by sharing. Do not forget that if you preload most of your code at the server startup, the fork to spawn a new child will be very very fast, because it inherits most of the preloaded code and the perl interpreter from the parent process. But than, during the work of the child, its memory pages (which aren't really its yet, it uses the parent's pages) are getting dirty (originally inherited and shared variables are getting updated/modified) and the copy-on-write happens, which reduces the number of shared memory pages - thus enlarging the memory demands. Killing the child and respawning a new one, allows to get the pristine shared memory from the parent process again.

    The conclusion is that MaxRequestsPerChild should not be too big, otherwise you loose the benefits of the memory sharing.

    See Choosing MaxRequestsPerChild for more about tuning the MaxRequestsPerChild parameter.

    [TOC]


    How Shared My Memory Is

    You've probably noticed that the word shared is being repeated many times in many things related to mod_perl. Indeed, shared memory might save you a lot of money, since with sharing in place you can run many more servers than without it. See the Formula and the numbers.

    How much shared memory do you have? You can see it by either using the memory utils that comes with your system or you can deploy GTop module:

      print "Shared memory of the current process: ",
        GTop->new->proc_mem($$)->share,"\n";
    

      print "Total shared memory: ",
        GTop->new->mem->share,"\n";
    

    When you watch the output of the top utility, don't confuse RSS (or RES) column with SHARE column -- RES is a RESident memory, which is a size of pages currently swapped in.

    [TOC]


    Preload Perl modules at server startup

    Use the PerlRequire and PerlModule directives to load commonly used modules such as CGI.pm, DBI and etc., when the server is started. On most systems, server children will be able to share the code space used by these modules. Just add the following directives into httpd.conf:

      PerlModule CGI;
      PerlModule DBI;
    

    But even a better approach is to create a separate startup file (where you code in plain perl) and put there things like:

      use DBI;
      use Carp;
    

    Then you require() this startup file with help of PerlRequire directive from httpd.conf, by placing it before the rest of the mod_perl configuration directives:

      PerlRequire /path/to/start-up.pl
    

    CGI.pm is a special case. Ordinarily CGI.pm autoloads most of its functions on an as-needed basis. This speeds up the loading time by deferring the compilation phase. However, if you are using mod_perl, FastCGI or another system that uses a persistent Perl interpreter, you will want to precompile the methods at initialization time. To accomplish this, call the package function compile() like this:

        use CGI ();
        CGI->compile(':all');
    

    The arguments to compile() are a list of method names or sets, and are identical to those accepted by the use() and import() operators. Note that in most cases you will want to replace ':all' with tag names you really use in your code, since generally only a subset of subs is actually being used.

    You can also preload the Registry scripts. See Preload Registry Scripts.

    [TOC]


    Preload Perl modules - Real Numbers

    (META: while the numbers and conclusions are mostly correct, need to rewrite the whole benchmark section using the GTop library to report the shared memory which is very important and will improve the benchmarks)

    (META: Add the memory size tests when the server was compiled with EVERYTHING=1 and without it, does loading everything imposes a big change in the memory footprint? Probably the suggestion would be as follows: For a development server use EVERYTHING=1, while for a production if your server is pretty busy and/or low on memory and every bit is on account, only the required parts should be built in. BTW, remember that apache comes with many modules that are being built by default, and you might not need those!)

    I have conducted a few tests to benchmark the memory usage when some modules are preloaded. The first set of tests checks the memory use with Library Perl Module preload (only CGI.pm). The second set checks the compile method of CGI.pm. The third test checks the benefit of Library Perl Module preload but a few of them (to see more memory saved) and also the effect of precompiling the Registry modules with Apache::RegistryLoader.

    1. In the first test, the following script was used:

      use strict;
      use CGI ();
      my $q = new CGI;
      print $q->header;
      print $q->start_html,$q->p("Hello");
    

    Server restarted

    Before the CGI.pm preload: (No other modules preloaded)

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      87004  0.0  0.0 1060 1524      - A    16:51:14  0:00 httpd
      httpd    240864  0.0  0.0 1304 1784      - A    16:51:13  0:00 httpd
    

    After running a script which uses CGI's methods (no imports):

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root     188068  0.0  0.0 1052 1524      - A    17:04:16  0:00 httpd
      httpd     86952  0.0  1.0 2520 3052      - A    17:04:16  0:00 httpd
    

    Observation: child httpd has grown up by 1268K

    Server restarted

    After the CGI.pm preload:

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root     240796  0.0  0.0 1456 1552      - A    16:55:30  0:00 httpd
      httpd     86944  0.0  0.0 1688 1800      - A    16:55:30  0:00 httpd
    

    after running a script which uses CGI's methods (no imports):

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      86872  0.0  0.0 1448 1552      - A    17:02:56  0:00 httpd
      httpd    187996  0.0  1.0 2808 2968      - A    17:02:56  0:00 httpd
    

    Observation: child httpd has grown up by 1168K, 100K less then without preload - good!

    Server restarted

    After CGI.pm preloaded and compiled with CGI->compile(':all');

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      86980  0.0  0.0 2836 1524      - A    17:05:27  0:00 httpd
      httpd    188104  0.0  0.0 3064 1768      - A    17:05:27  0:00 httpd
    

    After running a script which uses CGI's methods (no imports):

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      86980  0.0  0.0 2828 1524      - A    17:05:27  0:00 httpd
      httpd    188104  0.0  1.0 4188 2940      - A    17:05:27  0:00 httpd
    

    Observation: child httpd has grown up by 1172K No change! So what does CGI->compile(':all') help? I think it's because we never use all of the methods CGI provides - so in real use it's faster. So you might want to compile only the tags you are about to use - then you will benefit for sure.

    2. I have tried the second test to find it. I run the script:

      use strict;
      use CGI qw(:all);
      print header,start_html,p("Hello");
    

    Server restarted

    After CGI.pm was preloaded and NOT compiled with CGI->compile(':all'):

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      17268  0.0  0.0 1456 1552      - A    18:02:49  0:00 httpd
      httpd     86904  0.0  0.0 1688 1800      - A    18:02:49  0:00 httpd
    

    After running a script which imports symbols (all of them):

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      17268  0.0  0.0 1448 1552      - A    18:02:49  0:00 httpd
      httpd     86904  0.0  1.0 2952 3112      - A    18:02:49  0:00 httpd
    

    Observation: child httpd has grown up by 1264K

    Server restarted

    After CGI.pm was preloaded and compiled with CGI->compile(':all'):

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      86812  0.0  0.0 2836 1524      - A    17:59:52  0:00 httpd
      httpd     99104  0.0  0.0 3064 1768      - A    17:59:52  0:00 httpd
    

    After running a script which imports symbols (all of them):

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      86812  0.0  0.0 2832 1436      - A    17:59:52  0:00 httpd
      httpd     99104  0.0  1.0 4884 3636      - A    17:59:52  0:00 httpd
    

    Observation: child httpd has grown by 1868K. Why? Isn't CGI::compile(':all') supposed to make children to share the compiled code with parent? It does works as advertised, but if you pay attention in the code we have called only three CGI.pm's methods - just saying use CGI qw(:all) doesn't mean we compile the all available methods - we just import their names. So actually this test is misleading. Execute compile() only on the methods you are actually using and then you will see the difference.

    3. The third script:

      use strict;
      use CGI;
      use Data::Dumper;
      use Storable;
      [and many lines of code, lots of globals - so the code is huge!]
    

    Server restarted

    Nothing preloaded at startup:

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      90962  0.0  0.0 1060 1524      - A    17:16:45  0:00 httpd
      httpd     86870  0.0  0.0 1304 1784      - A    17:16:45  0:00 httpd
    

    Script using CGI (methods), Storable, Data::Dumper called:

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      90962  0.0  0.0 1064 1436      - A    17:16:45  0:00 httpd
      httpd     86870  0.0  1.0 4024 4548      - A    17:16:45  0:00 httpd
    

    Observation: child httpd has grown by 2764K

    Server restarted

    Preloaded CGI (compiled), Storable, Data::Dumper at startup:

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      26792  0.0  0.0 3120 1528      - A    17:19:21  0:00 httpd
      httpd     91052  0.0  0.0 3340 1764      - A    17:19:21  0:00 httpd
    

    Script using CGI (methods), Storable, Data::Dumper called

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      26792  0.0  0.0 3124 1440      - A    17:19:21  0:00 httpd
      httpd     91052  0.0  1.0 6568 5040      - A    17:19:21  0:00 httpd
    

    Observation: child httpd has grown by 3276K. Ouch: 512K more!!!

    The reason is that when you preload at the startup all of the methods, they all are being precompiled, there are many of them and they take a big chunk of memory. If you don't use the compile() method, only the functions that are being used will be compiled. Yes, it will slightly slow down the first reposnse of each process, but the actuall memory usage will be lower. BTW, if you write in the script:

      use CGI qw(all);
    

    Only the symbols of all functions are being imported. While they are taking some space, it's smaller than the space that a compiled code of these functions might occupy.

    Server restarted

    All the above modules + the above script PreCompiled with Apache::RegistryLoader at startup:

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      43224  0.0  0.0 3256 1528      - A    17:23:12  0:00 httpd
      httpd     26844  0.0  0.0 3488 1776      - A    17:23:12  0:00 httpd
    

    Script using CGI (methods), Storable, Data::Dumper called:

      USER        PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
      root      43224  0.0  0.0 3252 1440      - A    17:23:12  0:00 httpd
      httpd     26844  0.0  1.0 6748 5092      - A    17:23:12  0:00 httpd
    

    Observation: child httpd has grown even more 3316K ! Does not seem to be good!

    Summary:

    1. Library Perl Modules Preloading gave good results everywhere.

    2. CGI.pm's compile() method seems to use even more memory. It's because we never use all of the methods CGI provides. Do compile() only the tags that you are going to use and you will save the overhead of the first call for each has not yet been called method, and the memory - since compiled code will be shared across all the children.

    3. Apache::RegistryLoader might make scripts load faster on the first request after the child has just started but the memory usage is worse!!! See the numbers by yourself.

    HW/SW used : The server is apache 1.3.2, mod_perl 1.16 running on AIX 4.1.5 RS6000 1G RAM.

    [TOC]


    Preload Registry Scripts

    Apache::RegistryLoader compiles Apache::Registry scripts at server startup. It can be a good idea to preload the scripts you are going to use as well. So the code will be shared among the children.

    Here is an example of the use of this technique. This code is included in a PerlRequire'd file, and walks the directory tree under which all registry scripts are installed. For each .pl file encountered, it calls the Apache::RegistryLoader::handler() method to preload the script in the parent server (before pre-forking the child processes):

      use File::Find 'finddepth';
      use Apache::RegistryLoader ();
      {
          my $perl_dir = "perl/";
          my $rl = Apache::RegistryLoader->new;
          finddepth(sub {
              return unless /\.pl$/;
              my $url = "/$File::Find::dir/$_";
              print "pre-loading $url\n";
      
              my $status = $rl->handler($url);
              unless($status == 200) {
                  warn "pre-load of `$url' failed, status=$status\n";
              }
          }, $perl_dir);
      }
    

    Note that we didn't use the second argument to handler() here, as module's manpage suggests. To make the loader smarter about the uri->filename translation, you might need to provide a trans() function to translate the uri to filename. URI to filename translation normally doesn't happen until HTTP request time, so the module is forced to roll its own translation. If filename is omitted and a trans() routine was not defined, the loader will try using the URI relative to ServerRoot.

    You have to check whether this makes any improvement for you though, I did some testing [ Preload Perl modules - Real Numbers ], and it seems that it takes more memory than when the scripts are being called from the child - This is only a first impression and needs better investigation. If you aren't concerned about few script invocations which will take some time to respond while they load the code, you might not need it all!

    See also BEGIN blocks

    [TOC]


    Global vs Fully Qualified Variables

    It's always a good idea to stay away from global variables when possible. Some variables must be global so Perl can see them, such as a module's @ISA or $VERSION variables (or fully qualified @MyModule::ISA). In common practice, a combination of strict and vars pragmas keeps modules clean and reduces a bit of noise. However, vars pragma also creates aliases as the Exporter does, which eat up more memory. When possible, try to use fully qualified names instead of use vars. Example:

      package MyPackage;
      use strict;
      @MyPackage::ISA = qw(...);
      $MyPackage::VERSION = "1.00";
    

    vs.

      package MyPackage;
      use strict;
      use vars qw(@ISA $VERSION);
      @ISA = qw(...);
      $VERSION = "1.00";
    

    Also see Using global variables and sharing them

    [TOC]


    Avoid Importing Functions

    When possible, avoid importing a module's functions into your name space. The aliases which are created can take up quite a bit of space. Try to use method interfaces and fully qualified Package::function or $Package::variable like names instead. For benchmarks see Object Methods Calls Versus Function Calls.

    Note: method interfaces are a little bit slower than function calls. You can use a Benchmark module to profile your specific code.

    [TOC]


    PerlSetupEnv Off

    PerlSetupEnv Off is another optimization you might consider.

    mod_perl fiddles with the environment to make it appear as if the script were being called under the CGI protocol. For example, the $ENV{QUERY_STRING} environment variable is initialized with the contents of Apache::args(), and $ENV{SERVER_NAME} is filled in from the value returned by Apache::server_hostname().

    But %ENV population is expensive. Those who have moved to the Perl Apache API no longer need this extra %ENV population, can gain by turning it Off.

    By default it is On.

    Note that you can still set ENV variables. e.g. when you use the following configuration:

     <Location /perl>
       PerlSetupEnv Off
       PerlSetEnv TEST hi
       SetHandler perl-script
       PerlHandler Apache::RegistryNG->handler
       Options +ExecCGI
     </Location>
    

    A script having a print Data::Dumper(\%ENV) line, prints:

      $VAR1 = {
                'GATEWAY_INTERFACE' => 'CGI-Perl/1.1',
                'MOD_PERL' => 'mod_perl/1.21_01-dev',
                'PATH' => '/usr/lib/perl5/5.00503:... snipped ...',
                'TEST' => 'hi'
              };
    

    [TOC]


    Proxying mod_perl server

    Proxy gives you a great performance increase in most cases. It's being discussed in the Adding a Proxy Server in http Accelerator Mode section.

    [TOC]


    Caching Components with HTML::Mason

    (META: complete the full description) HTML::Mason is a system that makes use of components to build final html pages.

    HTML::Mason can really improve performance of your service and diminish the load on the system in case most of the output generated dynamically, but each final page can be separated into different components, and those cached.

    So if you have a page consisting of five components, each generated by SQL query, but for the four components it's the same query per user, you don't have to rerun this query again and again. Only the fifth component that gets generated by a unique query every time will not use the cache.

    [TOC]


    KeepAlive

    If your mod_perl server's httpd.conf includes the following directives:

      KeepAlive On
      MaxKeepAliveRequests 100
      KeepAliveTimeout 15
    

    you've gotten a real performance penalty, since after completing each request processing, the process will wait for KeepAliveTimeout seconds before closing the connection and thus not serving other requests at this time. You will need many more processes on a server with high traffic.

    Most chances are that you don't want this feature to be enabled. So set it Off with:

      KeepAlive Off
    

    the other two directive don't matter anymore.

    You might want to consider to enable this option if the client's browser needs to bring more than one object from your server at once (for a single HTML page). If this is the situation you actually save the connection overhead for all requests but the first one.

    For example if you have a page with 10 ad banners, which is not uncommon today, you server will work more effectively if a single process will serve them all during a single connection. You client will get a little slower responce, since banners will be brought one at a time and not all together if each IMG tag would open a separate connection.

    There are definite advantages to keep-alive from a TCP perspective since fresh connections will incur not only the 3 way-TCP handshake but also be penalised by slow-start. So while turning it off may help the memory usage on the server, it will disadvantage the client from a network speed perspective.

    You probably have followed the advice of sending all the static object requests to a plain Apache server. And since most of the pages include more than one static unique image, you better keep the default setting of the non-mod_perl server, which has the KeepAlive directive On. Probably reducing a little the number of timeout seconds is a good idea too.

    One option I suppose would be for the proxy/accelerator to keep the connection open to the client but make individual connections to the server, read the response, buffer it for sending to the client and close the server connection (making new connections to the server as required by the client requests obviously).

    [TOC]


    Upload/Download of Big Files

    If some particular script's main functionality is uploading or downloading of big files, you probably want it to be executed on plain apache server under mod_cgi. Taken of course that the script requires none of the functionalities the mod_perl server provides. Like custom authentication handlers.

    You don't want to tie up your precious mod_perl backend server children doing something as long and dumb as transfering a file.

    Also, the user won't really see any important performance benefits from mod_perl anyway, since the upload may take up to several minutes, and the overhead saved by mod_perl is typically under one second.

    [TOC]


    Forking or Executing subprocesses from mod_perl

    Generally you should not fork from your mod_perl scripts, since when you do -- you are forking the entire apache web server, lock, stock and barrel. Not only is your perl code being duplicated, but so is mod_ssl, mod_rewrite, mod_log, mod_proxy, mod_spelling or whatever modules you have used in your server, all the core routines and so on.

    A much wiser approach would be to spawn a sub-process, hand it the information it needs to do the task, and have it detach (close x3 + setsid()). This is wise only if the parent who spawns this process, immediately continue, you do not wait for the sub-process to complete. This approach is suitable for a situation when you want to trigger a long time taking process through the web interface, like processing some data, sending email to thousands of subscribed users and etc. Otherwise, you should convert the code into a module, and use its functions or methods to call from CGI script.

    Just making a system() call defeats the whole idea behind mod_perl, perl interpreter and modules should be loaded again for this external program to run.

    Basically, you would do:

      $params=FreezeThaw::freeze(
            [all data to pass to the other process]
            );
      system("program.pl $params");
    

    and in program.pl :

      use POSIX qw(setsid);
      @params=FreezeThaw::thaw(shift @ARGV);
      # check that @params is ok
      close STDIN;
      close STDOUT;
      close STDERR;
      # you might need to reopen the STDERR
      # open STDERR, ">/dev/null";
      setsid(); # to detach
    

    At this point, program.pl is running in the ``background'' while the system() returns and permits apache to get on with life.

    This has obvious problems. Not the least of which is that @params must not be bigger then whatever your architecture's limit is (could depend on your shell).

    Also, the communication is only one way.

    However, you might want be trying to do the ``wrong thing''. If what you want is to send information to the browser and then do some post-processing, look into PerlCleanupHandler.

    If you are interested in more deep level details, this is what actually happens when you fork and make a system call, like

      system("echo Hi"),CORE::exit(0) unless fork();
    

    which is might be more familiar in this form:

      if (fork){
        #do nothing
      } else {
        system("echo Hi");
        CORE::exit(0);
      }
    

    What happens is that fork() gives you 2 execution paths and the child gets virtual memory sharing a copy of the program text (read only) and sharing a copy of the data space copy-on-write (remember why you pre-load modules in mod_perl?). In the above code a parent will immediately continue with the code that comes up after the fork, while the forked process will execute system("echo Hi") and then terminate itself.

    Notice that I use CORE::exit and not exit which would be automatically overriden by Apache::exit if used in conjunction with Apache::Registry and friends.

    The only work is setting up the page tables for the virtual memory and the second process goes on its separate way.

    Next, Perl will find /bin/echo along the search path, and invoke it directly. Perl system() is *not* system(3) [C-library]. Only when the command has shell meta-chars does Perl invoke a real shell. That's a *very* nice optimization.

    Only if you do:

      system "sh -c 'echo foo'"
    

    OS actually parses your command with a shell so you exec() a copy of /bin/sh, but since one is almost certainly already running somewhere, the system will notice that (via the disk inode reference) and replace your virtual memory page table with one pointed at the already-loaded program code plus your own data space. Then the shell parses the passed command.

    Since it is echo, it will execute it as a built-in in the latter example or a /bin/echo in the former and be done, but this is only an example. You aren't calling system("echo Hi") in your mod_perl scripts, right? Since most other real things (heavy programs executed as a subprocess) would involve repeating the process to load the specified command or script (it might involve some actual demand paging from the program file if you execute new code).

    The only place you see real overhead from this scheme is when the parent process is huge (unfortunately like mod_perl...) and the page table becomes large as a side effect. The whole point of mod_perl is to avoid having to fork() / exec() something on every hit, though. Perl can do just about anything by itself. However, you probably won't get in trouble until you hit about 30 forks/sec on a so-so pentium.

    Now let's get to the gory details of forking. Normally, every process has its parent. Many processes are children of the init process, whose PID equals to 1. When you fork a process you must wait() or waitpid() for it to finish. If you don't wait for it becomes a zombie.

    Zombie, is a process that doesn't have a father. When the child quits, it reports the termination to his parent. If no one wait()s to collect the exit status of the child, it gets ``confused'' and becomes a ghost process, that can be seen, but not killed. It will be killed only when you stop the httpd process that spawned it! (generally top()/ps() utilities display these processes with <defunc> tag, and you will see an increment of the zombies counter reported when doing top().) These zombie processes can take up system resources and are generally undesirable.

    So the proper fork is:

      print "Content-type: text/plain\n\n";
      
      defined (my $kid = fork) or die "Cannot fork: $!\n";
      if ($kid) {
        waitpid($kid,0);
        print "Parent has finished\n";
      } else {
          # do something
          CORE::exit(0);
      }
    

    But in most cases the only reason you would want to fork is when you need to spawn a process that would take a lot of time to complete. So if the server child that spawns this process has to wait for it to finish, you gained nothing. You cannot neither wait for its completion, nor continue because you will get yet another zombie process.

    The simplest solution is to ignore your dead children (this doesn't work everywhere, however) (META: do you know where? tell me!!! It does work with linux!):

      $SIG{CHLD} = IGNORE;
    

    When you set CHLD signal handler to IGNORE, all the processes will be collected by the init process and prevent from them to become zombies.

    Note, that you cannot localize this setting with local(). If you do, it wouldn't take the desired effect. (META: anyone to explain why? It doesn't work...)

    The other thing that you must do -- is to close all the pipes to the connection socket that were opened by the parent process (a STDIN and a STDOUT) and inherited by the child, so the parent will be able to complete the request and free itself for serving other requests. You may need to close and reopen a STDERR filehandler (It's opened to append to the error_log file as inhereted by parent, so chances are that you want it to leave untouched).

    So now the code would look like:

      print "Content-type: text/plain\n\n";
      
      $SIG{CHLD} = IGNORE;
      
      defined (my $kid = fork) or die "Cannot fork: $!\n";
      if ($kid) {
        waitpid($kid,0);
        print "Parent has finished\n";
      } else {
          close STDIN;
          close STDOUT;
          close STDERR;
          # do something long lasting
          CORE::exit(0);
      }
    

    Another more portable, but slightly more expensive solution is to use a double fork approach.

      print "Content-type: text/plain\n\n";
      
      defined (my $kid = fork) or die "Cannot fork: $!\n";
      if ($kid) {
        waitpid($kid,0);
      } else {
        defined (my $grandkid = fork) or die "Kid cannot fork: $!\n";
        if ($grandkid) {
          CORE::exit(0);
      
        } else {
          # code here
          close STDIN;
          close STDOUT;
          close STDERR;
          # do something long lasting
          CORE::exit(0);
        }
      }
    

    Grandkid becomes a "child of init" (parent process ID is 1).

    Note that the last two solutions do allow you to know the exit status of the process, but in our case we don't want to.

    One more solution is to use a different SIGCHLD handler:

      use POSIX 'WNOHANG';
      $SIG{CHLD} = sub { while( waitpid(-1,WNOHANG)>0 ) {} };
    

    Which is usefull when you fork() more than once process. The handler could call wait() as well, but for a variety of reasons involving tge handling of stopped processes and the rare event in which two children exit at nearly the same moment, the best technique is to call waitpid() in a tight loop with a first argument of -1 and a second argument of WNOHANG. Together these arguments tell waitpid() to reap the next child that's available, and prevent the call from blocking if there happens to be no child ready from reaping. The handler will loop untill waitpid() returns a negative number or zero, indicating that no more reapable children remain.

    You will probably want to open your own log file in the spawned process and log some info so you know what have happened there. At least while debugging your code.

    Check also Apache::SubProcess for a better system and exec implementations for mod_perl (use CPAN!). META: some docs regarding this module?

    [TOC]


    Memory leakage

    Scripts under mod_perl can very easily leak memory! Global variables stay around indefinitely, lexical variables (declared with my() are destroyed when they go out of scope, provided there are no references to them from outside of that scope.

    Perl doesn't return the memory it acquired from the kernel. It does reuse it though!

    First example demonstrates reading in a whole file:

      open IN, $file or die $!;
      local $/ = undef; # will read the whole file in
      $content = <IN>;
      close IN;
    

    If your file is 5Mb, the child who served that script will grow exactly by that size. Now if you have 20 children and all of them will serve this CGI, all of them will consume additional 20*5M = 100M of RAM! If that's the case, try to use other approaches of processing the file, if possible of course. Try to process a line at a time and print it back to the file. (If you need to modify the file itself, use a temporary file. When finished, overwrite the source file, make sure to provide a locking mechanism!)

    Second example demonstrates copying variables between functions (passing variables by value). Let's use the example above, assuming we have no choice but to read the whole file before any data processing takes place. Now you have some imagine process() subroutine that processes the data and returns it back. What happens if you pass the $content by value? You have just copied another 5M and the child has grown by another 5M in size (watch your swap space!) now multiply it again by factor of 20 you have 200M of wasted RAM, which will be apparently reused but it's a waste! Whenever you think the variable can grow bigger than few Kb, pass it by reference!

    Once I wrote a script that passed a content of a little flat file DataBase to a function that processed it by value -- it worked and it was processed fast, but with a time the DataBase became bigger, so passing it by value was an overkill -- I had to make a decision, whether to buy more memory or to rewrite the code. It's obvious that adding more memory will be merely a temporary solution. So it's better to plan ahead and pass the variables by reference, if a variable you are going to pass might be bigger than you think at the time of your coding process. There are a few approaches you can use to pass and use variables passed by reference. For example:

      my $content = qq{foobarfoobar};
      process(\$content);
      sub process{
        my $r_var = shift; 
        $$r_var =~ s/foo/bar/gs;
          # nothing returned - the variable $content outside has been
          # already modified
      }
      
      @{$var_lr} -- dereferences an array
      %{$var_hr} -- dereferences a hash
    

    For more info see perldoc perlref.

    Another approach would be to directly use a @_ array. Using directly the @_ array serves the job of passing by reference!

      process($content);
      sub process{
        $_[0] =~ s/foo/bar/gs;
          # nothing returned - the variable $content outside has been
          # already modified
      }
    

    From perldoc perlsub:

          The array @_ is a local array, but its elements are aliases for
          the actual scalar parameters.  In particular, if an element
          $_[0] is updated, the corresponding argument is updated (or an
          error occurs if it is not possible to update)...
    

    Be careful when you write this kind of subroutines, since it can confuse a potential user. It's not obvious that call like process($content); modifies the passed variable -- programmers (which are the users of your library in this case) are used to subs that either modify variables passed by reference or return the processed variable (e.g. $content=process($content);).

    Third example demonstrates a work with DataBases. If you do some DB processing, many times you encounter the need to read lots of records into your program, and then print them to the browser after they are formatted. (I don't even mention the horrible case where programmers read in the whole DB and then use perl to process it!!! Use a relational DB and let the SQL do the job, so you get only the records you need!!!).

    We will use DBI for this (assume that we are already connected to the DB) (refer to perldoc DBI for a complete manual of the DBI module):

      $sth->execute;
      while(@row_ary  = $sth->fetchrow_array;) {
            <do DB accumulation into some variable>
      }
      <print the output using the the data returned from the DB>
    

    In the example above the httpd_process will grow up by the size of the variables that have been allocated for the records that matched the query. (Again remember to multiply it by the number of the children your server runs!).

    A better approach is to not accumulate the records, but rather print them as they are fetched from the DB. Moreover, we will use the bind_col() and $sth->fetchrow_arrayref() (aliased to $sth->fetch()) methods, to fetch the data in the fastest possible way. The example below prints a HTML TABLE with matched data, the only memory that is being used is a @cols array to hold temporary row values:

      my @select_fields = qw(a b c);
          # create a list of cols values
      my @cols = ();
      @cols[0..$#select_fields] = ();
      $sth = $dbh->prepare($do_sql);
      $sth->execute;
        # Bind perl variables to columns.
      $sth->bind_columns(undef,\(@cols));
      print "<TABLE>";
      while($sth->fetch) {
         print "<TR>",
               map("<TD>$_</TD>", @cols),
               "</TR>";
      }
      print "</TABLE>";
    

    Note: the above method doesn't allow you to know how many records have been matched. The workaround is to run an identical query before the code above where you use SELECT count(*) ... instead of 'SELECT * ... to get the number of matched records. It should be much faster, since you can remove any SORTBY and alike attributes.

    For those who think that $sth->rows will do the job, here is the quote from the DBI manpage:

      rows();
    

      $rv = $sth->rows;
    

      Returns the number of rows affected by the last database altering
      command, or -1 if not known or not available.  Generally you can
      only rely on a row count after a do or non-select execute (for some
      specific operations like update and delete) or after fetching all
      the rows of a select statement.
    

      For select statements it is generally not possible to know how many
      rows will be returned except by fetching them all.  Some drivers
      will return the number of rows the application has fetched so far
      but others may return -1 until all rows have been fetched. So use of
      the rows method with select statements is not recommended.
    

    As a bonus, I wanted to write a single sub that flexibly processes any query, accepting: conditions, call-back closure sub, select fields and restrictions.

      # Usage:
      # $o->dump(\%conditions,\&callback_closure,\@select_fields,@restrictions);
      #
      sub dump{
        my $self = shift;
        my %param = %{+shift}; # dereference hash
        my $rsub = shift;
        my @select_fields = @{+shift}; # dereference list
        my @restrict = shift || '';
      
          # create a list of cols values
        my @cols = ();
        @cols[0..$#select_fields] = ();
      
        my $do_sql = '';
        my @where = ();
      
          # make a @where list 
        map { push @where, "$_=\'$param{$_}\'" if $param{$_};} keys %param;
      
          # prepare the sql statement
        $do_sql = "SELECT ";
        $do_sql .= join(" ", @restrict) if @restrict;# append the restriction list
        $do_sql .= " " .join(",", @select_fields) ;      # append the select list 
        $do_sql .= " FROM $DBConfig{TABLE} ";         # from table
      
          # we will not add the WHERE clause if @where is empty
        $do_sql .= " WHERE " . join " AND ", @where if @where;
      
        print "SQL: $do_sql \n" if $debug;
      
        $dbh->{RaiseError} = 1;     # do this, or check every call for errors
        $sth = $dbh->prepare($do_sql);
        $sth->execute;
          # Bind perl variables to columns.
        $sth->bind_columns(undef,\(@cols));
        while($sth->fetch) {
          &$rsub(@cols);
        }
          # print the tail or "no records found" message
          # according to the previous calls
        &$rsub();
      
      } # end of sub dump
    

    Now a callback closure sub can do lots of things. We need a closure to know what stage are we in: header, body or tail. For example, we want a callback closure for formatting the rows to print:

      my $rsub = eval {
          # make a copy of @fields list, since it might go
          # out of scope when this closure will be called
        my @fields = @fields; 
        my @query_fields = qw(user dir tool act); # no date field!!!
        my $header = 0;
        my $tail   = 0;
        my $counter = 0;
        my %cols = (); # columns name=> value hash
      
        # Closure with the following behavior:
        # 1. Header's code will be executed on the first call only and
        #    if @_ was set
        # 2. Row's printing code will be executed on every call with @_ set
        # 3. Tail's code will be executed only if Header's code was
        #    printed and @_ isn't set
        # 4. "No record found" code will be executed if Header's code
        #    wasn't executed
      
        sub {
              # Header
            if (@_ and !$header){
              print "<TABLE>\n";
              print $q->Tr(map{ $q->td($_) } @fields );
              $header = 1; 
            }
            
              # Body
            if (@_) {
              print $q->Tr(map{$q->td($_)} @_ );
              $counter++;
              return; 
            }
            
              # Tail, will be printed only at the end
            if ($header and !($tail or @_)){
              print "</TABLE>\n $counter records found";
              $tail = 1;
              return;
            }
            
              # No record found
            unless ($header){
              print $q->p($q->center($q->b("No record was found!\n")));
            }
      
          }  #  end of sub {}
      };  #  end of my $rsub = eval {
    

    You might also want to check Limiting the size of the processes and Limiting the resources used by httpd children.

    [TOC]


    -DTWO_POT_OPTIMIZE and -DPACK_MALLOC Perl Options

    Newer Perl versions also have build time options to reduce runtime memory consumption. These options might shrink down the size of your httpd by about ~150k (quite big number if you remember to multiply it by the number of chidren you use.)

    -DTWO_POT_OPTIMIZE macro improves allocations of data with size close to a power of two; but this works for big allocations (starting with 16K by default). Such allocations are typical for big hashes and special-purpose scripts, especially image processing.

    Perl memory allocation is by bucket with sizes close to powers of two. Because of these malloc overhead may be big, especially for data of size exactly a power of two. If PACK_MALLOC is defined, perl uses a slightly different algorithm for small allocations (up to 64 bytes long), which makes it possible to have overhead down to 1 byte for allocations which are powers of two (and appear quite often).

    Expected memory savings (with 8-byte alignment in alignbytes) is about 20% for typical Perl usage. Expected slowdown due to additional malloc overhead is in fractions of a percent (hard to measure, because of the effect of saved memory on speed).

    You will find these and other memory improvement details in perl5004delta.pod.

    Important: both options are On by default in perl versions 5.005 and higher.

    [TOC]


    Checking script modification times

    Under Apache::Registry the requested CGI script is always being stat()'ed to check whether it was modified. It adds a very little overhead, but if you are into squeezing all the jouces from the server, you might want to save this call. If you do -- take a look at Apache::RegistryBB module.

    [TOC]


    Cached stat() calls

    When you do a stat() or its variations (-M - modification time, -A last access time, -C inode-change time, and other), the information is being cached, so if you need to make an additional check for the same file, save the overhead of this check and use a _ variable instead. For example when testing for existance and read permissions you might use:

      my $filename = "./test";
        # two stat() calls
      print "OK\n" if -e $filename and -r $filename; 
      my $mod_time = (-M $filename) * 24 * 60 * 60;
      print "$filename was modified $mod_time seconds ago\n";
    

    or the more efficient (two stat() syscalls saved)!:

      my $filename = "./test";
        # two stat() calls
      print "OK\n" if -e $filename and -r _;
      my $mod_time = (-M _) * 24 * 60 * 60;
      print "$filename was modified $mod_time seconds ago\n";
    

    Remember that with mod_perl you might get negative times when you use -M and alike file tests. -M tests the difference in time between file modification file and the start of the script that performs this check. Because ^T variable is not being reset on each script invocation, and equal to the time the process has been forked at, you might want to perform:

      $^T = time();
    

    at the beginning of your scripts to get the regular perl script behaviour of file tests

    [TOC]


    Be carefull with symbolic links

    As you know Apache::Registry caches the scripts based on their URI. If you have the same script that can be reached by different URIs, possible if you have used a symbolic links, like:

      % ln -s /home/httpd/perl/news/news.pl /home/httpd/perl/news.pl
    

    Now the script can be reached as /news/news.pl and /news.pl URIs. It doesn't really matter until you advertise the two URIs, and users reach the same script from both of them. The moment this happens, you will get the same script cached twice!

    To detect it use /perl-status handler to see all the compiled scripts and their packages. In our example when requesting: http://localhost/perl-status?rgysubs you would see:

      Apache::ROOT::perl::news::news_2epl
      Apache::ROOT::perl::news_2epl
    

    after the both URIs have been requested from the same child process that happened to serve your request. To make the debug easier run the server in a single mode.

    [TOC]


    Limiting the size of the processes

    Apache::SizeLimit allows you to kill off Apache httpd processes if they grow too large. see perldoc Apache::SizeLimit for more details.

    By using this module, you should be able to discontinue using the Apache configuration directive MaxRequestsPerChild, although for some folks, using both in combination does the job.

    [TOC]


    Limiting the resources used by httpd children

    Apache::Resource uses the BSD::Resource module, which uses the C function setrlimit() to set limits on system resources such as memory and cpu usage.

    To configure use:

      PerlModule Apache::Resource
        # set child memory limit in megabytes
        # (default is 64 Meg)
      PerlSetEnv PERL_RLIMIT_DATA 32:48
      
        # set child CPU limit in seconds
        # (default is 360 seconds)
      PerlSetEnv PERL_RLIMIT_CPU 120
      
      PerlChildInitHandler Apache::Resource
    

    If you configure Apache::Status, it will let you review the resources set this way.

    The following limit values are in megabytes: DATA, RSS, STACK, FSIZE, CORE, MEMLOCK; all others are treated as their natural unit. Prepend PERL_RLIMIT_ for each one you want to use. Refer to setrlimit man page on your OS for other possible resources.

    If the value of the variable is of the form S:H, S is treated as the soft limit, and H is the hard limit. If it is just a single number, it is used for both soft and hard limits.

    To debug add:

      <Perl>
        $Apache::Resource::Debug = 1;
        require Apache::Resource;
      </Perl>
      PerlChildInitHandler Apache::Resource
    

    and look in the error_log to see what it's doing.

    Refer to perldoc Apache::Resource and man 2 setrlimit for more info.

    [TOC]


    Limiting the request rate speed (robots blocking)

    A limitation of using pattern matching to identify robots is that it only catches the robots that you know about, and only those that identify themselves by name. A few devious robots masquerade as users by using user agent strings that identify themselves as conventional browsers. To catch such robots, you'll have to be more sophisticated.

    Apache::SpeedLimit comes for you to help, see:

    http://www.modperl.com/chapters/ch6.html#Blocking_Greedy_Clients

    [TOC]


    Benchmarks. Impressing your Boss and Colleagues.

    How much faster is mod_perl than mod_cgi (aka plain perl/CGI)? There are many ways to benchmark the two. I'll present a few examples and numbers below. Checkout the benchmark directory of mod_perl distribution for more examples.

    If you are going to write your own benchmarking utility -- use Benchmark module for heavy scripts and Time::HiRes module for very fast scripts (faster than 1 sec) where you need better time precision.

    There is no need to write a special benchmark though. If you want to impress your boss or colleagues, just take some heavy CGI script you have (e.g. a script that crunches some data and prints the results to STDOUT), open 2 xterms and call the same script in mod_perl mode in one xterm and in mod_cgi mode in the other. You can use lwp-get from LWP package to emulate the web agent (browser). (benchmark directory of mod_perl distribution includes such an example)

    See also 2 tools for benchmarking: ApacheBench and crashme test

    [TOC]


    Developers Talk

    Perrin Harkins writes on benchmarks or comparisons, official or unofficial:

    I have used some of the platforms you mentioned and researched others. What I can tell you for sure, is that no commercially available system offers the depth, power, and ease of use that mod_perl has. Either they don't let you access the web server internals, or they make you use less productive languages than Perl, sometimes forcing you into restrictive and confusing APIs and/or GUI development environments. None of them offer the level of support available from simply posting a message to this list, at any price.

    As for performance, beyond doing several important things (code-caching, pre-forking/threading, and persistent database connections) there isn't much these tools can do, and it's mostly in your hands as the developer to see that the things which really take the time (like database queries) are optimized.

    The downside of all this is that most manager types seem to be unable to believe that web development software available for free could be better than the stuff that cost $25,000 per CPU. This appears to be the major reason most of the web tools companies are still in business. They send a bunch of suits to give PowerPoint presentations and hand out glossy literature to your boss, and you end up with an expensive disaster and an approaching deadline.

    But I'm not bitter or anything...

    Jonathan Peterson adds:

    Most of the major solutions have something that they do better than the others, and each of them has faults. Microsoft's ASP has a very nice objects model, and has IMO the best data access object (better than DBI to use - but less portable) It has the worst scripting language. PHP has many of the advantages of Perl-based solutions, but is less complicated for developers. Netscape's Livewire has a good object model too, and provides good server-side Java integration - if you want to leverage Java skills, it's good. Also, it has a compiled scripting language - which is great if you aren't selling your clients the source code (and a pain otherwise).

    mod_perl's advantage is that it is the most powerful. It offers the greatest degree of control with one of the more powerful languages. It also offers the greatest granularity. You can use an embedding module (eg eperl) from one place, a session module (Session) from another, and your data access module from yet another.

    I think the Apache::ASP module looks very promising. It has very easy to use and adequately powerful state maintenance, a good embedding system, and a sensible object model (that emulates the Microsoft ASP one). It doesn't replicate MS's ADO for data access, but DBI is fine for that.

    I have always found that the developers available make the greatest impact on the decision. If you have a team with no Perl experience, and a small or medium task, using something like PHP, or Microsoft ASP, makes more sense than driving your staff into the vertical learning curve they'll need to use mod_perl.

    For very large jobs, it may be worth finding the best technical solution, and then recruiting the team with the necessary skills.

    [TOC]


    Benchmarking a Graphic hits counter with Persistent DB Connection

    Here are the numbers from Michael Parker's mod_perl presentation at Perl Conference (Aug, 98) (Sorry there used to be links here to the source, but they went dead one day, so I removed them). The script is a standard hits counter, but it logs the counts into the mysql relational DataBase:

        Benchmark: timing 100 iterations of cgi, perl...  [rate 1:28]
        
        cgi: 56 secs ( 0.33 usr 0.28 sys = 0.61 cpu) 
        perl: 2 secs ( 0.31 usr 0.27 sys = 0.58 cpu) 
        
        Benchmark: timing 1000 iterations of cgi,perl...  [rate 1:21]
         
        cgi: 567 secs ( 3.27 usr 2.83 sys = 6.10 cpu) 
        perl: 26 secs ( 3.11 usr 2.53 sys = 5.64 cpu)      
        
        Benchmark: timing 10000 iterations of cgi, perl   [rate 1:21]
         
        cgi: 6494 secs (34.87 usr 26.68 sys = 61.55 cpu) 
        perl: 299 secs (32.51 usr 23.98 sys = 56.49 cpu) 
    

    We don't know what server configurations was used for these tests, but I guess the numbers speak for themselves.

    The source code of the script was available at (http://www.realtime.net/~parkerm/perl/conf98/sld006.htm ) - it's a dead link - if you know its new location, please let me know....

    [TOC]


    Benchmarking scripts with execution times below 1 second :)

    As noted before, for very fast scripts you will have to use the Time::HiRes module, its usage is similar to the Benchmark's.

      use Time::HiRes qw(gettimeofday tv_interval);
      my $start_time = [ gettimeofday ];
      &sub_that_takes_a_teeny_bit_of_time()
      my $end_time = [ gettimeofday ];
      my $elapsed = tv_interval($start_time,$end_time);
      print "the sub took $elapsed secs."
    

    See also crashme test.

    [TOC]


    PerlHandler's Benchmarking

    At http://perl.apache.org/dist/contrib/ you will find Apache::Timeit package which does PerlHandler's Benchmarking.

    [TOC]


    Tuning the Apache's configuration variables for the best performance

    It's very important to make a correct configuration of the MinSpareServers, MaxSpareServers, StartServers, MaxClients, and MaxRequestsPerChild parameters. There are no defaults, the values of these variable are very important, as if too ``low'' you will under-use the system's capabilities, and if too ``high'' chances that the server will bring the machine to its knees.

    All the above parameters should be specified on the basis of the resources you have. While with a plain apache server, there is no big deal if you run too many servers (not too many of course) since the processes are of ~1Mb and aren't eating a lot of your RAM. Generally the numbers are even smaller if memory sharing is taking place. The situation is different with mod_perl. I have seen mod_perl processes of 20Mb and more. Now if you have MaxClients set to 50: 50x20Mb = 1Gb - do you have 1Gb of RAM? Probably not. So how do you tune these parameters? Generally by trying different combinations and benchmarking the server. Again mod_perl processes can be of much smaller size if sharing is in place.

    Before you start this task you should be armed with a proper weapon. You need a crashme utility, which will load your server with mod_perl scripts you possess. You need it to have an ability to emulate a multiuser environment and to emulate multiple clients behavior which will call the mod_perl scripts at your server simultaneously. While there are commercial solutions, you can get away with free ones which do the same job. You can use an ApacheBench ab utility that comes with apache distribution, a crashme script which uses LWP::Parallel::UserAgent or httperf (see Download page).

    Another important issue is to make sure to run testing client (load generator) on a system that is more powerful than the system being tested. After all we are trying to simulate the Internet users, where many users are trying to reach your service at once -- since a number of concurrent users can be quite large, your testing machine much be very powerful and capable to generate a heavy load. Of course you should not run the clients and the server on the same machine. If you do -- your testing results would be incorrect, since clients will eat a CPU and a memory that have to be dedicated to the server, and vice versa.

    See also 2 tools for benchmarking: ApacheBench and crashme test

    [TOC]


    Tuning with ab - ApacheBench

    ab is a tool for benchmarking your Apache HTTP server. It is designed to give you an impression on how much performance your current Apache installation can give. In particular, it shows you how many requests per secs your Apache server is capable of serving. The ab tool comes bundled with apache source distribution (and it's free :).

    Let's try it. We will simulate 10 users concurrently requesting a very light script at www.nowhere.com:81/test/test.pl. Each ``user'' makes 10 requests.

      % ./ab -n 100 -c 10 www.nowhere.com:81/test/test.pl
    

    The results are:

      Concurrency Level:      10
      Time taken for tests:   0.715 seconds
      Complete requests:      100
      Failed requests:        0
      Non-2xx responses:      100
      Total transferred:      60700 bytes
      HTML transferred:       31900 bytes
      Requests per second:    139.86
      Transfer rate:          84.90 kb/s received
      
      Connection Times (ms)
                    min   avg   max
      Connect:        0     0     3
      Processing:    13    67    71
      Total:         13    67    74
    

    The only numbers we really care about are:

      Complete requests:      100
      Failed requests:        0
      Requests per second:    139.86
    

    Let's raise the load of requests to 100 x 10 (10 users, each makes 100 requests)

      % ./ab -n 1000 -c 10 www.nowhere.com:81/perl/access/access.cgi
      Concurrency Level:      10
      Complete requests:      1000
      Failed requests:        0
      Requests per second:    139.76
    

    As expected nothing changes -- we have the same 10 concurrent users. Now let's raise the number of concurrent users to 50:

      % ./ab -n 1000 -c 50 www.nowhere.com:81/perl/access/access.cgi
      Complete requests:      1000
      Failed requests:        0
      Requests per second:    133.01
    

    We see that the server is capable of serving 50 concurrent users at an amazing 133 req/sec! Let's find the upper boundary. Using -n 10000 -c 1000 failed to get results (Broken Pipe?). Using -n 10000 -c 500 derived 94.82 req/sec. The server's performance went down with the high load.

    The above tests were performed with the following configuration:

      MinSpareServers 8
      MaxSpareServers 6
      StartServers 10
      MaxClients 50
      MaxRequestsPerChild 1500
    

    Now let's kill a child after a single request, we will use the following configuration:

      MinSpareServers 8
      MaxSpareServers 6
      StartServers 10
      MaxClients 100
      MaxRequestsPerChild 1
    

    Simulate 50 users each generating a total of 20 requests:

      % ./ab -n 1000 -c 50 www.nowhere.com:81/perl/access/access.cgi
    

    The benchmark timed out with the above configuration.... I watched the output of ps as I ran it, the parent process just wasn't capable of respawning the killed children at that rate...When I raised the MaxRequestsPerChild to 10 I've got 8.34 req/sec - very bad (18 times slower!) (You can't benchmark the importance of the MinSpareServers, MaxSpareServers and StartServers with this kind of test).

    Now let's try to return MaxRequestsPerChild to 1500, but to lower the MaxClients to 10 and run the same test:

      MinSpareServers 8
      MaxSpareServers 6
      StartServers 10
      MaxClients 10
      MaxRequestsPerChild 1500
    

    I've got 27.12 req/sec, which is better but still 4-5 times slower (133 with MaxClients of 50)

    Summary: I have tested a few combinations of server configuration variables (MinSpareServers MaxSpareServers StartServers MaxClients MaxRequestsPerChild). And the results we have received are as follows:

    MinSpareServers, MaxSpareServers and StartServers are only important for user response times (sometimes user will have to wait a bit).

    The important parameters are MaxClients and MaxRequestsPerChild. MaxClients should be not to big so it will not abuse your machine's memory resources and not too small, when users will be forced to wait for the children to become free to come serve them. MaxRequestsPerChild should be as big as possible, to take the full benefit of mod_perl, but watch your server at the beginning to make sure your scripts are not leaking memory, thereby causing your server (and your service) to die very fast.

    Also it is important to understand that we didn't test the response times in the tests above, but the ability of the server to respond under a heavy load of requests. If the script that was used to test was heavier, the numbers would be different but the conclusions are very similar.

    The benchmarks were run with:

      HW: RS6000, 1Gb RAM
      SW: AIX 4.1.5 . mod_perl 1.16, apache 1.3.3
      Machine running only mysql, httpd docs and mod_perl servers.
      Machine was _completely_ unloaded during the benchmarking.
    

    After each server restart when I did changes to the server's configurations, I made sure the scripts were preloaded by fetching a script at least once by every child.

    It is important to notice that none of requests timed out, even if was kept in server's queue for more than 1 minute! (That is the way ab works, which is OK for the testing purposes but will be unacceptable in the real world - users will not wait for more than 5-10 secs for a request to complete, and the client (browser) will timeout in a few minutes.)

    Now let's take a look at some real code whose execution time is more than a few millisecs. We will do real testing and collect the data in tables for easier viewing.

    I will use the following abbreviations:

      NR    = Total Number of Request
      NC    = Concurrency
      MC    = MaxClients
      MRPC  = MaxRequestsPerChild
      RPS   = Requests per second
    

    Running a mod_perl script with lots of mysql queries (the script under test is mysqld bounded) (http://www.nowhere.com:81/perl/access/access.cgi?do_sub=query_form), with configuration:

      MinSpareServers        8
      MaxSpareServers       16
      StartServers          10
      MaxClients            50
      MaxRequestsPerChild 5000
    

    gives us:

         NR   NC    RPS     comment
      ------------------------------------------------
         10   10    3.33    # not a reliable statistics
        100   10    3.94    
       1000   10    4.62    
       1000   50    4.09    
    

    Conclusions: Here I wanted to show that when the application is slow -- not due to perl loading, code compilation and execution, but bounded to some external operation like mysqld querying which made the bottleneck -- it almost does not matter what load we place on the server. The RPS (Requests per second) is almost the same (given that all the requests have been served, you have an ability to queue the clients, but be aware that something that goes to queue means a waiting client and a client (browser) that might time out!)

    Now we will benchmark the same script without using the mysql (perl only bounded code) (http://www.nowhere.com:81/perl/access/access.cgi), it's the same script that just returns a HTML form, without making any SQL queries.

      MinSpareServers        8
      MaxSpareServers       16
      StartServers          10
      MaxClients            50
      MaxRequestsPerChild 5000
    

         NR   NC      RPS   comment
      ------------------------------------------------
         10   10    26.95   # not a reliable statistics
        100   10    30.88   
       1000   10    29.31
       1000   50    28.01
       1000  100    29.74
      10000  200    24.92
     100000  400    24.95
    

    Conclusions: This time the script we executed was pure perl (not bounded to I/O or mysql), so we see that the server serves the requests much faster. You can see the RequestPerSecond (RPS) is almost the same for any load, but goes lower when the number of concurrent clients goes beyond the MaxClients. With 25 RPS, the client supplying a load of 400 concurrent clients will be served in 16 secs. But to get more realistic and assume the max concurrency of 100, with 30 RPS, the client will be served in 3.5 secs, which is pretty good for a highly loaded server.

    Now we will use the server for its full capacity, by keeping all MaxClients alive all the time and having a big MaxRequestsPerChild, so no server will be killed during the benchmarking.

      MinSpareServers       50
      MaxSpareServers       50
      StartServers          50
      MaxClients            50
      MaxRequestsPerChild 5000
      
         NR   NC      RPS   comment
      ------------------------------------------------
        100   10    32.05
       1000   10    33.14
       1000   50    33.17
       1000  100    31.72
      10000  200    31.60
    

    Conclusion: In this scenario there is no overhead involving the parent server loading new children, all the servers are available, and the only bottleneck is contention for the CPU.

    Now we will try to change the MaxClients and to watch the results: Let's reduce MC to 10.

      MinSpareServers        8
      MaxSpareServers       10
      StartServers          10
      MaxClients            10
      MaxRequestsPerChild 5000
      
         NR   NC      RPS   comment
      ------------------------------------------------
         10   10    23.87   # not a reliable statistics
        100   10    32.64 
       1000   10    32.82
       1000   50    30.43
       1000  100    25.68
       1000  500    26.95
       2000  500    32.53
    

    Conclusions: A very little difference! Almost no change! 10 servers were able to serve almost with the same throughput as 50 servers. Why? My guess it's because of CPU throttling. It seems that 10 servers were serving requests 5 times faster than when in the test above we worked with 50 servers. In the case above each child received its CPU time slice 5 times less frequently. So having a big value for MaxClients, doesn't mean that the performance will be better. You have just seen the numbers!

    Now we will start to drastically reduce the MaxRequestsPerChild:

      MinSpareServers        8
      MaxSpareServers       16
      StartServers          10
      MaxClients            50
      
         NR   NC    MRPC     RPS    comment
      ------------------------------------------------
        100   10      10    5.77 
        100   10       5    3.32
       1000   50      20    8.92
       1000   50      10    5.47
       1000   50       5    2.83
       1000  100      10    6.51
    

    Conclusions: When we drastically reduce the MaxRequestsPerChild, the performance starts to become closer to the plain mod_cgi. Just for comparison with mod_cgi, here are the numbers of this run with mod_cgi:

      MinSpareServers        8
      MaxSpareServers       16
      StartServers          10
      MaxClients            50
      
         NR   NC    RPS     comment
      ------------------------------------------------
        100   10    1.12
       1000   50    1.14
       1000  100    1.13
    

    Conclusion: mod_cgi is much slower :) in test NReq/NClients 100/10 the RPS in mod_cgi was of 1.12 and in mod_perl of 32, which is 30 times faster!!! In the first test each child waited about 100 secs to be served. In the second and third 1000 secs!

    [TOC]


    Tuning with httperf

    httperf is an utility written by David Mosberger. Just like ApacheBench--it measures the performance of the webserver.

    A sample command line is shown below:

      httperf --server hostname --port 80 --uri /test.html \
       --rate 150 --num-conn 27000 --num-call 1 --timeout 5
    

    This command causes httperf to use the web server on the host with IP name hostname, running at port 80. The web page being retrieved is /test.html and, in this simple test, the same page is retrieved repeatedly. The rate at which requests are issued is 150 per second. The test involves initiating a total of 27,000 TCP connections and on each connection one HTTP call is performed (a call consists of sending a request and receiving a reply).

    The timeout option defines the number of seconds that the client is willing to wait to hear back from the server. If this timeout expires, the tool considers the corresponding call to have failed. Note that with a total of 27,000 connections and a rate of 150 per second, the total test duration will be approximately 180 seconds (27,000/150), independent of what load the server can actually sustain. And here is a result that one might get:

         Total: connections 27000 requests 26701 replies 26701 test-duration 179.996 s
        
         Connection rate: 150.0 conn/s (6.7 ms/conn, <=47 concurrent connections)
         Connection time [ms]: min 1.1 avg 5.0 max 315.0 median 2.5 stddev 13.0
         Connection time [ms]: connect 0.3
         
         Request rate: 148.3 req/s (6.7 ms/req)
         Request size [B]: 72.0
         
         Reply rate [replies/s]: min 139.8 avg 148.3 max 150.3 stddev 2.7 (36 samples)
         Reply time [ms]: response 4.6 transfer 0.0
         Reply size [B]: header 222.0 content 1024.0 footer 0.0 (total 1246.0)
         Reply status: 1xx=0 2xx=26701 3xx=0 4xx=0 5xx=0
         
         CPU time [s]: user 55.31 system 124.41 (user 30.7% system 69.1% total 99.8%)
         Net I/O: 190.9 KB/s (1.6*10^6 bps)
         
         Errors: total 299 client-timo 299 socket-timo 0 connrefused 0 connreset 0
         Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
    

    [TOC]


    Tuning with crashme script

    This is another crashme suite originally written by Michael Schilli and located at http://www.linux-magazin.de/ausgabe.1998.08/Pounder/pounder.html . I did a few modifications (mostly adding my() operands). I also allowed it to accept more than one url to test, since sometimes you want to test an overall and not just one script.

    The tool provides the same results as ab above but it also allows you to set the timeout value, so requests will fail if not served within the time out period. You also get Latency (secs/Request) and Throughput (Requests/sec) numbers. It can give you a better picture and make a complete simulation of your favorite Netscape browser :).

    I have noticed while running these 2 benchmarking suites - ab gave me results 2.5-3.0 times better. Both suites run on the same machine with the same load with the same parameters. But the implementations are different.

    Sample output:

      URL(s):          http://www.nowhere.com:81/perl/access/access.cgi
      Total Requests:  100
      Parallel Agents: 10
      Succeeded:       100 (100.00%)
      Errors:          NONE
      Total Time:      9.39 secs
      Throughput:      10.65 Requests/sec
      Latency:         0.85 secs/Request
    

    And the code:

      #!/usr/apps/bin/perl -w
      
      use LWP::Parallel::UserAgent;
      use Time::HiRes qw(gettimeofday tv_interval);
      use strict;
      
      ###
      # Configuration
      ###
      
      my $nof_parallel_connections = 10; 
      my $nof_requests_total = 100; 
      my $timeout = 10;
      my @urls = (
                'http://www.nowhere.com:81/perl/faq_manager/faq_manager.pl',
                'http://www.nowhere.com:81/perl/access/access.cgi',
               );
      
      
      ##################################################
      # Derived Class for latency timing
      ##################################################
      
      package MyParallelAgent;
      @MyParallelAgent::ISA = qw(LWP::Parallel::UserAgent);
      use strict;
      
      ###
      # Is called when connection is opened
      ###
      sub on_connect {
        my ($self, $request, $response, $entry) = @_;
        $self->{__start_times}->{$entry} = [Time::HiRes::gettimeofday];
      }
      
      ###
      # Are called when connection is closed
      ###
      sub on_return {
        my ($self, $request, $response, $entry) = @_;
        my $start = $self->{__start_times}->{$entry};
        $self->{__latency_total} += Time::HiRes::tv_interval($start);
      }
      
      sub on_failure {
        on_return(@_);  # Same procedure
      }
      
      ###
      # Access function for new instance var
      ###
      sub get_latency_total {
        return shift->{__latency_total};
      }
      
      ##################################################
      package main;
      ##################################################
      ###
      # Init parallel user agent
      ###
      my $ua = MyParallelAgent->new();
      $ua->agent("pounder/1.0");
      $ua->max_req($nof_parallel_connections);
      $ua->redirect(0);    # No redirects
      
      ###
      # Register all requests
      ###
      foreach (1..$nof_requests_total) {
        foreach my $url (@urls) {
          my $request = HTTP::Request->new('GET', $url);
          $ua->register($request);
        }
      }
      
      ###
      # Launch processes and check time
      ###
      my $start_time = [gettimeofday];
      my $results = $ua->wait($timeout);
      my $total_time = tv_interval($start_time);
      
      ###
      # Requests all done, check results
      ###
      
      my $succeeded     = 0;
      my %errors = ();
      
      foreach my $entry (values %$results) {
        my $response = $entry->response();
        if($response->is_success()) {
          $succeeded++; # Another satisfied customer
        } else {
          # Error, save the message
          $response->message("TIMEOUT") unless $response->code();
          $errors{$response->message}++;
        }
      }
      
      ###
      # Format errors if any from %errors 
      ###
      my $errors = join(',', map "$_ ($errors{$_})", keys %errors);
      $errors = "NONE" unless $errors;
      
      ###
      # Format results
      ###
      
      #@urls = map {($_,".")} @urls;
      my @P = (
            "URL(s)"          => join("\n\t\t ", @urls),
            "Total Requests"  => "$nof_requests_total",
            "Parallel Agents" => $nof_parallel_connections,
            "Succeeded"       => sprintf("$succeeded (%.2f%%)\n",
                                       $succeeded * 100 / $nof_requests_total),
            "Errors"          => $errors,
            "Total Time"      => sprintf("%.2f secs\n", $total_time),
            "Throughput"      => sprintf("%.2f Requests/sec\n", 
                                       $nof_requests_total / $total_time),
            "Latency"         => sprintf("%.2f secs/Request", 
                                       ($ua->get_latency_total() || 0) / 
                                       $nof_requests_total),
           );
      
      
      my ($left, $right);
      ###
      # Print out statistics
      ###
      format STDOUT =
      @<<<<<<<<<<<<<<< @*
      "$left:",        $right
      .
      
      while(($left, $right) = splice(@P, 0, 2)) {
        write;
      }
    

    [TOC]


    Choosing MaxClients

    The MaxClients directive sets the limit on the number of simultaneous requests that can be supported; no more than this number of child server processes will be created. To configure more than 256 clients, you must edit the HARD_SERVER_LIMIT entry in httpd.h and recompile. In our case we want this variable to be as small as possible, this way we can virtually bound the resources used by the server children. Since we can restrict each child's process size (see Limiting the size of the processes) -- the calculation of MaxClients is pretty straightforward :

                   Total RAM Dedicated to the Webserver
      MaxClients = ------------------------------------
                         MAX child's process size
    

    So if I have 400Mb left for the webserver to run with, I can set the MaxClients to be of 40 if I know that each child is bounded to the 10Mb of memory (e.g. with Apache::SizeLimit).

    Certainly you will wonder what happens to your server if there are more than MaxClients concurrent users at some moment. This situation is accompanied by the following warning message into the error.log file:

      [Sun Jan 24 12:05:32 1999] [error] server reached MaxClients setting,
      consider raising the MaxClients setting
    

    There is no problem -- any connection attempts over the MaxClients limit will normally be queued, up to a number based on the ListenBacklog directive. Once a child process is freed at the end of a different request, the connection will then be served.

    But it is an error because clients are being put in the queue rather than getting served at once, despite the fact that they do not get an error response. The error can be allowed to persist to balance available system resources and response time, but sooner or later you will need to get more RAM so you can start more children. The best approach is to try not to have this condition reached at all, and if reach it often you should start to worry about it.

    It's important to understand how much real memory a child occupies. Your children can share the memory between them (when OS supports that and you take action to allow the sharing happen - See Preload Perl modules at server startup). If this is the case, chances are that your MaxClients can be even higher. But it seems that it's not so simple to calculate the absolute number. (If you come up with solution please let us know!). If the shared memory was of the same size through the child's life, we could derive a much better formula:

                   Total_RAM + Shared_RAM_per_Child * MaxClients
      MaxClients = ---------------------------------------------
                            Max_Process_Size - 1
    

    which is:

                        Total_RAM - Max_Process_Size
      MaxClients = ---------------------------------------
                   Max_Process_Size - Shared_RAM_per_Child
    

    Let's roll some calculations:

      Total_RAM            = 500Mb
      Max_Process_Size     =  10Mb
      Shared_RAM_per_Child =   4Mb
    

                  500 - 10
     MaxClients = --------- = 81
                   10 - 4
    

    With no sharing in place

                     500
     MaxClients = --------- = 50
                     10
    

    With sharing in place you can have 60% more servers without purchasing more RAM, if you improve and keep the sharing level, let's say:

      Total_RAM            = 500Mb
      Max_Process_Size     =  10Mb
      Shared_RAM_per_Child =   8Mb
    

                  500 - 10
     MaxClients = --------- = 245
                   10 - 8
    

    390% more servers!!! You've got the point :)

    [TOC]


    Choosing MaxRequestsPerChild

    The MaxRequestsPerChild directive sets the limit on the number of requests that an individual child server process will handle. After MaxRequestsPerChild requests, the child process will die. If MaxRequestsPerChild is 0, then the process will live forever.

    Setting MaxRequestsPerChild to a non-zero limit has two beneficial effects: it solves memory leakages and helps reduce the number of processes when the server load reduces.

    The first reason is the most crucial for mod_perl, since sloppy programming will cause a child process to consume more memory after each request. If left unbounded, then after a certain number of requests the children will use up all the available memory and leave the server to die from memory starvation. Note, that sometimes standard system libraries leak memory too, especially on OSes with bad memory management (e.g. Solaris 2.5 on x86 arch). If this is your case you can set MaxRequestsPerChild to a small number, which will allow the system to reclaim the memory, greedy child process consumed, when it exits after MaxRequestsPerChild requests. But beware -- if you set this number too low, you will loose a fracture of the speed bonus you receive with mod_perl. Consider using Apache::PerlRun if this is the case. Also setting MaxSpareServers to a number close to MaxClients, will improve the response time (but your parent process will be busy respawning new children all the time!)

    Another approach is to use Apache::SizeLimit (See Limiting the size of the processes). By using this module, you should be able to discontinue using the MaxRequestsPerChild, although for some folks, using both in combination does the job.

    See also Preload Perl modules at server startup and Sharing Memory.

    [TOC]


    Choosing MinSpareServers, MaxSpareServers and StartServers

    With mod_perl enabled, it might take as much as 30 seconds from the time you start the server until it is ready to serve incoming requests. This delay depends on the OS, the number of preloaded modules and the process load of the machine. So it's best to set StartServers and MinSpareServers to high numbers, so that if you get a high load just after the server has been restarted, the fresh servers will be ready to serve requests immediately. With mod_perl, it's usually a good idea to raise all 3 variables higher than normal. In order to maximize the benefits of mod_perl, you don't want to kill servers when they are idle, rather you want them to stay up and available to immediately handle new requests. I think an ideal configuration is to set MinSpareServers and MaxSpareServers to similar values, maybe even the same. Having the MaxSpareServers close to MaxClients will completely use all of your resources (if MaxClients has been chosen to take the full advantage of the resources), but it'll make sure that at any given moment your system will be capable of responding to requests with the maximum speed (given that number of concurrent requests is not higher than MaxClients.)

    Let's try some numbers. For a heavily loaded web site and a dedicated machine I would think of (note 400Mb is just for example):

      Available to webserver RAM:   400Mb
      Child's memory size bounded:  10Mb
      MaxClients:                   400/10 = 40 (larger with mem sharing)
      StartServers:                 20
      MinSpareServers:              20
      MaxSpareServers:              35
    

    However if I want to use the server for many other tasks, but make it capable of handling a high load, I'd think of:

      Available to webserver RAM:   400Mb
      Child's memory size bounded:  10Mb
      MaxClients:                   400/10 = 40
      StartServers:                 5
      MinSpareServers:              5
      MaxSpareServers:              10
    

    (These numbers are taken off the top of my head, and it shouldn't be used as a rule, but rather as examples to show you some possible scenarios. Use this information wisely!)

    [TOC]


    Summary of Benchmarking to tune all 5 parameters

    OK, we've run various benchmarks -- let's summarize the conclusions:

    • MaxRequestsPerChild

      If your scripts are clean and don't leak memory, set this variable to a number as large as possible (10000?). If you use Apache::SizeLimit, you can set this parameter to 0 (equal to infinity). You will want this parameter to be smaller if your code becomes unshared over the process' life.

    • StartServers

      If you keep a small number of servers active most of the time, keep this number low. Especially if MaxSpareServers is low as it'll kill the just loaded servers before they were utilized at all (if there is no load). If your service is heavily loaded, make this number close to MaxClients (and keep MaxSpareServers equal to MaxClients as well.)

    • MinSpareServers

      If your server performs other work besides web serving, make this low so the memory of unused children will be freed when there is no big load. If your server's load varies (you get loads in bursts) and you want fast response for all clients at any time, you will want to make it high, so that new children will be respawned in advance and be waiting to handle bursts of requests.

    • MaxSpareServers

      The logic is the same as of MinSpareServers - low if you need the machine for other tasks, high if it's a dedicated web host and you want a minimal response delay.

    • MaxClients

      Not too low, so you don't get into a situation where clients are waiting for the server to start serving them (they might wait, but not for too long). Do not set it too high, since if you get a high load and all requests will be immediately granted and served, your CPU will have a hard time keeping up, and if the child's size * number of running children is larger than the total available RAM, your server will start swapping (which will slow down everything, which in turn will make things even more slower, until eventually your machine will die). It's important that you take pains to ensure that swapping does not normally happen. Swap space is an emergency pool, not a resource to be used on a consistent basis. If you are low on memory and you badly need it - buy it, memory is amazingly cheap these days.

      But based on the test I conducted above, even if you have plenty of memory like I have (1Gb), increasing MaxClients sometimes will give you no speedup. The more clients are running, the more CPU time will be required, the less CPU time slices each process will receive. The response latency (the time to respond to a request) will grow, so you won't see the expected improvement. The best approach is to find the minimum requirement for your kind of service and the maximum capability of your machine. Then start at the minimum and test like I did, successively raising this parameter until you find the point on the curve of the graph of the latency or/and throughput where the improvement becomes smaller. Stop there and use it. Of course when you use these parameters in production server, you will have the ability to tune them more precisely, since then you will see the real numbers. Also don't forget that if you add more scripts, or just modify the running ones -- most probably that the parameters need to be recalculated, since the processes will grow in size as you compile in more code.

    [TOC]


    Persistent DB Connections

    Another popular use of mod_perl is to take advantage of its ability to maintain persistent open database connections. The basic approach is as follows:

      # Apache::Registry script
      -------------------------
      use strict;
      use vars qw($dbh);
      
      $dbh ||= SomeDbPackage->connect(...);
    

    Since $dbh is a global variable for the child, once the child has opened the connection it will use it over and over again, unless you perform disconnect().

    Be careful to use different names for handlers if you open connection to different databases!

    Apache::DBI allows you to make a persistent database connection. With this module enabled, every connect() request to the plain DBI module will be forwarded to the Apache::DBI module. This looks to see whether a database handle from a previous connect() request has already been opened, and if this handle is still valid using the ping method. If these two conditions are fulfilled it just returns the database handle. If there is no appropriate database handle or if the ping method fails, a new connection is established and the handle is stored for later re-use. There is no need to delete the disconnect() statements from your code. They will not do a thing, as the Apache::DBI module overloads the disconnect() method with a NOP. On child's exit there is no explicit disconnect, the child dies and so does the database connection. You may leave the use DBI; statement inside the scripts as well.

    The usage is simple -- add to httpd.conf:

      PerlModule Apache::DBI
    

    It is important, to load this module before any other DBI, DBD::* and ApacheDBI* modules!

      db.pl
      ------------
      use DBI;
      use strict;
      
      my $dbh = DBI->connect( 'DBI:mysql:database', 'user', 'password',
                              { autocommit => 0 }
                            ) || die $DBI::errstr;
      
      ...rest of the program
    

    [TOC]


    Preopening Connections at the Child Process' Fork Time

    If you use DBI for DB connections, and you use Apache::DBI to make them persistent, it also allows you to preopen connections to DB for each child with connect_on_init() method, thus saving up a connection overhead on the very first request of every child.

      use Apache::DBI ();
      Apache::DBI->connect_on_init("DBI:mysql:test",
                                   "login",
                                   "passwd",
                                   {
                                    RaiseError => 1,
                                    PrintError => 0,
                                    AutoCommit => 1,
                                   }
                                  );
    

    This can be used as a simple way to have apache children establish connections on server startup. This call should be in a startup file require()d by PerlRequire or inside <Perl> section. It will establish a connection when a child is started in that child process. See the Apache::DBI manpage to see the requirements for this method.

    [TOC]


    Caching prepare() statements

    You can also benefit from persistent connections by replacing prepare() with prepare_cached(). That way you will always be sure that you have a good statement handle and you will get some caching benefit. The downside is that you are going to pay for DBI to parse your SQL and do a cache lookup every time you call prepare_cached().

    Be warned that some databases doesn't support caches of prepared plans. (e.g PostgreSQL and Sybase). Though with Sybase you could open multiple connections to achieve the same result (at the risk of getting deadlocks depending on what you are trying to do!)

    [TOC]


    Handling Timeouts

    Another problem is with timeouts: some databases disconnect the client after a certain time of inactivity. This problem is known as morning bug. The ping() method ensures that this will not happen. Some DBD drivers don't have this method, check the Apache::DBI manpage to see how to write a ping() method.

    Another approach is to change the client's connection timeout. For mysql users, starting from mysql-3.22.x you can set a wait_timeout option at mysqld server startup to change the default value. Setting it to 36 hours probably would fix the timeout problem.

    [TOC]


    Jeff's guide to mod_perl database performance

    [TOC]


    Analysis of the Problem

    A common web application architecture is one or more application servers which handle requests from client browsers by consulting one or more database servers and performing a transform on the data. When an application must consult the database on every request, the interaction with the database server becomes the central performance issue. Spending a bit of time optimizing your database access can result in significant application performance improvements. In this analysis, a system using Apache, mod_perl, DBI, and Oracle will be considered. The application server uses Apache and mod_perl to service client requests, and DBI to communicate with a remote Oracle database.

    In the course of servicing a typical client request, the application server must retrieve some data from the database and execute a stored procedure. There are several steps that need to be done to complete the request:

     1: Connect to the database server
     2: Prepare a SQL SELECT statement
     3: Execute the SELECT statement
     4: Retrieve the results of the SELECT statement
     5: Release the SELECT statement handle
     6: Prepare a PL/SQL stored procedure call
     7: Execute the stored procedure
     8: Release the stored procedure statement handle
     9: Commit or rollback
     10: Disconnect from the database server
    

    In this document, an application will be described which achieves maximum performance by eliminating some of the steps above and optimizing others.

    [TOC]


    Optimizing Database Connections

    A naive implementation would perform steps 1 through 10 from above on every request. A portion of the source code might look like this:

      # ...
      my $dbh = DBI->connect('dbi:Oracle:host', 'user', 'pass')
            || die $DBI::errstr;
      
      my $baz = $r->param('baz');
      
      eval {
            my $sth = $dbh->prepare(qq{
                    SELECT foo 
                      FROM bar 
                     WHERE baz = $baz
            });
            $sth->execute;
      
            while (my @row = $sth->fetchrow_array) {
                    # do HTML stuff
            }
            
            $sth->finish;
      
            my $sph = $dbh->prepare(qq{
                    BEGIN
                            my_procedure(
                                    arg_in => $baz
                            );
                    END;
            });
            $sph->execute;
            $sph->finish;
            
            $dbh->commit;
      };
      if ($@) {
            $dbh->rollback;
      }
      
      $dbh->disconnect;
      # ...
    

    In practice, such an implementation would have hideous performance problems. The majority of the execution time of this program would likely be spent connecting to the database. An examination shows that step 1 is comprised of many smaller steps:

     1: Connect to the database server
     1a: Build client-side data structures for an Oracle connection
     1b: Look up the server's alias in a file
     1c: Look up the server's hostname
     1d: Build a socket to the server
     1e: Build server-side data structures for this connection
    

    The naive implementation waits for all of these steps to happen, and then throws away the database connection when it is done! This is obviously wasteful, and easily rectified. The best solution is to hoist the database connection step out of the per-request lifecycle so that more than one request can use the same database connection. This can be done by connecting to the database server once, and then not disconnecting until the Apache child process exits. The Apache::DBI module does this transparently and automatically with little effort on the part of the programmer.

    Apache::DBI intercepts calls to DBI's connect and disconnect methods and replaces them with its own. Apache::DBI caches database connections when they are first opened, and it ignores disconnect commands. When an application tries to connect to the same database, Apache::DBI returns a cached connection, thus saving the significant time penalty of repeatedly connecting to the database. You will find a full treatment of Apache::DBI at Persistent DB Connections

    When Apache::DBI is in use, none of the code in the example needs to change. The code is upgraded from naive to respectable with the use of a simple module! The first and biggest database performance problem is quickly dispensed with.

    [TOC]


    Utilizing the Database Server's Cache

    Most database servers, including Oracle, utilize a cache to improve the performance of recently seen queries. The cache is keyed on the SQL statement. If a statement is identical to a previously seen statement, the execution plan for the previous statement is reused. This can be a considerable improvement over building a new statement execution plan.

    Our respectable implementation from the last section is not making use of this caching ability. It is preparing the statement:

      SELECT foo FROM bar WHERE baz = $baz
    

    The problem is that $baz is being read from an HTML form, and is therefore likely to change on every request. When the database server sees this statement, it is going to look like:

      SELECT foo FROM bar WHERE baz = 1
    

    and on the next request, the SQL will be:

      SELECT foo FROM bar WHERE baz = 42
    

    Since the statements are different, the database server will not be able to reuse its execution plan, and will proceed to make another one. This defeats the purpose of the SQL statement cache.

    The application server needs to make sure that SQL statements which are the same look the same. The way to achieve this is to use placeholders and bound parameters. The placeholder is a blank in the SQL statement, which tells the database server that the value will be filled in later. The bound parameter is the value which is inserted into the blank before the statement is executed.

    With placeholders, the SQL statement looks like:

      SELECT foo FROM bar WHERE baz = :baz
    

    Regardless of whether baz is 1 or 42, the SQL always looks the same, and the database server can reuse its cached execution plan for this statement. This technique has eliminated the execution plan generation penalty from the per-request runtime. The potential performance improvement from this optimization could range from modest to very significant.

    Here is the updated code fragment which employs this optimization:

      # ...
      my $dbh = DBI->connect('dbi:Oracle:host', 'user', 'pass')
            || die $DBI::errstr;
      
      my $baz = $r->param('baz');
      
      eval {
            my $sth = $dbh->prepare(qq{
                    SELECT foo 
                      FROM bar 
                     WHERE baz = :baz
            });
            $sth->bind_param(':baz', $baz);
            $sth->execute;
      
            while (my @row = $sth->fetchrow_array) {
                    # do HTML stuff
            }
            
            $sth->finish;
      
            my $sph = $dbh->prepare(qq{
                    BEGIN
                            my_procedure(
                                    arg_in => :baz
                            );
                    END;
            });
            $sph->bind_param(':baz', $baz);
            $sph->execute;
            $sph->finish;
            
            $dbh->commit;
      };
      if ($@) {
            $dbh->rollback;
      }
      # ...
    

    [TOC]


    Eliminating SQL Statement Parsing

    The example program has certainly come a long way and the performance is now probably much better than that of the first revision. However, there is still more speed that can be wrung out of this server architecture. The last bottleneck is in SQL statement parsing. Every time DBI's prepare() method is called, DBI parses the SQL command looking for placeholder strings, and does some housekeeping work. Worse, a context has to be built on the client and server sides of the connection which the database will use to refer to the statement. These things take time, and by eliminating these steps the time can be saved.

    To get rid of the statement handle construction and statement parsing penalties, we could use DBI's prepare_cached() method. This method compares the SQL statement to others that have already been executed. If there is a match, the cached statement handle is returned. But the application server is still spending time calling an object method (very expensive in Perl), and doing a hash lookup. Both of these steps are unnecessary, since the SQL is very likely to be static and known at compile time. The smart programmer can take advantage of these two attributes to gain better database performance. In this example, the database statements will be prepared immediately after the connection to the database is made, and they will be cached in package scalars to eliminate the method call.

    What is needed is a routine that will connect to the database and prepare the statements. Since the statements are dependent upon the connection, the integrity of the connection needs to be checked before using the statements, and a reconnection should be attempted if needed. Since the routine presented here does everything that Apache::DBI does, it does not use Apache::DBI and therefore has the added benefit of eliminating a cache lookup on the connection.

    Here is an example of such a package:

      package My::DB;
      
      use strict;
      use DBI;
      
      sub connect {
            if (defined $My::DB::conn) {
                    eval {
                            $My::DB::conn->ping;
                    };
                    if (!$@) {
                            return $My::DB::conn;
                    }
            }
      
            $My::DB::conn = DBI->connect(
                    'dbi:Oracle:server', 'user', 'pass', {
                            PrintError => 1,
                            RaiseError => 1,
                            AutoCommit => 0
                    }
            ) || die $DBI::errstr; #Assume application handles this
      
            $My::DB::select = $My::DB::conn->prepare(q{
                    SELECT foo
                      FROM bar
                     WHERE baz = :baz
            });
            
            $My::DB::procedure = $My::DB::conn->prepare(q{
                    BEGIN
                            my_procedure(
                                    arg_in => :baz
                            );
                    END;
            });
      
            return $My::DB::conn;
      }
      
      1;
    

    Now the example program needs to be modified to use this package.

      # ...
      my $dbh = My::DB->connect;
      
      my $baz = $r->param('baz');
      
      eval {
            my $sth = $My::DB::select;
            $sth->bind_param(':baz', $baz);
            $sth->execute;
      
            while (my @row = $sth->fetchrow_array) {
                    # do HTML stuff
            }
      
            my $sph = $My::DB::procedure;
            $sph->bind_param(':baz', $baz);
            $sph->execute;
             
            $dbh->commit;
      };
      if ($@) {
            $dbh->rollback;
      }
      # ...
    

    Notice that several improvements have been made. Since the statement handles have a longer life than the request, there is no need for each request to prepare the statement, and no need to call the statement handle's finish method. Since Apache::DBI and the prepare_cached() method are not used, no cache lookups are needed.

    [TOC]


    Conclusion

    The number of steps needed to service the request in the example system has been reduced significantly. In addition, the hidden cost of building and tearing down statement handles and of creating query execution plans is removed. Compare the new sequence with the original:

     1: Check connection to database
     2: Bind parameter to SQL SELECT statement
     3: Execute SELECT statement
     4: Fetch rows
     5: Bind parameters to PL/SQL stored procedure
     6: Execute PL/SQL stored procedure
     7: Commit or rollback
    

    It is probably possible to optimize this example even further, but I have not tried. It is very likely that the time could be better spent improving your database indexing scheme or web server buffering and load balancing. If there are any suggestions for further optimization of the application-database interaction, please mail them to me at jwb@cp.net.

    Jeffrey Baker, 4 October 1999

    [TOC]


    Using $|=1 under mod_perl and better print() techniques.

    As you know local $|=1; disables the buffering of the currently selected file handle (default is STDOUT). If you enable it, ap_rflush() is called after each print(), unbuffering Apache's IO.

    If you are using a _bad_ style in generating output, which consist of multiple print() calls, or you just have too many of them, you will experience a degradation in performance. The severity depends on the number of the calls you make.

    Many old CGIs were written in the style of:

      print "<BODY BGCOLOR=\"black\" TEXT=\"white\">";
      print "<H1>";
      print "Hello";
      print "</H1>";
      print "<A HREF=\"foo.html\"> foo </A>";
      print "</BODY>";
    

    which reveals the following drawbacks: multiple print() calls - performance degradation with $|=1, backslashism which makes the code less readable and more difficult to format the HTML to be easily readable as CGI's output. The code below solves them all:

      print qq{
        <BODY BGCOLOR="black" TEXT="white">
          <H1>
            Hello
          </H1>
          <A HREF="foo.html"> foo </A>
        </BODY>
      };
    

    I guess you see the difference. Be careful though, when printing a <HTML> tag. The correct way is:

      print qq{<HTML>
        <HEAD></HEAD>
        <BODY>
      }
    

    If you try the following:

      print qq{
        <HTML>
        <HEAD></HEAD>
        <BODY>
      }
    

    Some older browsers might not accept the output as HTML, but rather print it as a plain text, since they expect the first characters after the headers and empty line to be <HTML> and not spaces and/or additional newline and then <HTML>. Even if it works with your browser, it might not work for others.

    Now let's go back to the $|=1 topic. I still disable buffering, for 2 reasons: I use few print() calls by printing out multiline HTML and not a line per print() and I want my users to see the output immediately. So if I am about to produce the results of the DB query, which might take some time to complete, I want users to get some titles ahead. This improves the usability of my site. Recall yourself: What do you like better: getting the output a bit slower, but steadily from the moment you've pressed the Submit button or having to watch the ``falling stars'' for awhile and then to receive the whole output at once, even a few millisecs faster (if the client (browser) did not time out till then).

    An even better solution is to keep the buffering enabled, and use a Perl API rflush() call to flush the buffers when wanted. This way you can aggregate in the buffer the top of the page you are going to send to user, and flush it a moment before you are going to do some lenghty operation, like DB query. So you kill the two birds in one shoot: You show some of the data to the user immediately, so user will feel that something is actually happening, and you almost have no performance hit caused by disabled buffering.

      use CGI ();
      my $r = shift;
      my $q = new CGI;
      print $q->header('text/html');
      print $q->start_html;
      print $q->p("Searching...Please wait");
      $r->rflush;
        # imitate a lenghty operation
      for (1..5) {
        sleep 1;
      }
      print $q->p("Done!");
    

    Conclusion: Do not blindly follow suggestions, but think what is best for you in every given case.

    [TOC]


    More Reducing Memory Usage Tips

    One of the important issues in improving the performance is reduction of memory usage - the less memory each server uses, the more server processes you can start, and thus the more performance you have (from the user's point of view - the response speed )

    See Global vs Fully Qualified Variables

    See Memory "leakages"

    [TOC]


    Code Profiling

    Profiling process helps you to determine which subroutines or just snippets of code take the longest execution time and which subroutines are being called most often. Probably you will want to optimize those, and to improve the code toward efficiency.

    Let's write some code to mess with:

    META: build a hash and sort it by value, key... then rewrite the comparisment subroutine to use Shwartzian transform.. and more

    Think about some more web oriented examples...!

      map {push @list, int rand(100)} (1..1000);
    

      sub mysort {
        map ...
      }
    

    META: remove all the diagnostics section below it's irrelevant here. (just reuse the explanations)

    In the diagnostics pragma section, I showed that leaving it in production code is a bad idea, as it significantly slows down the execution time. We verified that by using Benchmark module. Now let see how to use profiler to find what subroutine diagnostics spends most of the time in, and once spotted it could be a good idea to rewrite this specific code to make it more optimized. We wouldn't optimize the code here as it's out of the scope of this document and since this is a core Perl module, chances are that it's already optimized.

    If you wander why, we can use Devel::DProf to help us. Let's use this code:

      diagnostics.pl
      --------------
      use diagnostics;
      test_code();
      sub test_code{
        for my $i (1..10) {
          my $j = $i**2;
        }
        $a = "Hi"; 
        $b = "Bye";
        if ($a == $b) {
          $c = $a;
        }
      }
    

    Run it with profiler enabled, and than create the profiling stastics withhelp of dprofpp:

      % perl -d:DProf diagnostics.pl
      % dprofpp
    

      Total Elapsed Time = 0.993458 Seconds
        User+System Time = 0.933458 Seconds
      Exclusive Times
      %Time ExclSec CumulS #Calls sec/call Csec/c  Name
       81.5   0.761  0.932      1   0.7610 0.9319  main::BEGIN
       12.8   0.120  0.101   3161   0.0000 0.0000  diagnostics::unescape
       6.43   0.060  0.060      2   0.0300 0.0300  diagnostics::BEGIN
       2.14   0.020  0.020      3   0.0067 0.0067  diagnostics::transmo
       1.07   0.010  0.010      2   0.0050 0.0050  Config::FETCH
       0.00   0.000 -0.000      2   0.0000      -  Exporter::import
       0.00   0.000 -0.000      2   0.0000      -  Exporter::export
       0.00   0.000 -0.000      1   0.0000      -  Config::BEGIN
       0.00   0.000 -0.000      1   0.0000      -  diagnostics::import
       0.00   0.000  0.020      3   0.0000 0.0066  diagnostics::warn_trap
       0.00   0.000  0.020      3   0.0000 0.0066  diagnostics::splainthis
       0.00   0.000 -0.000      1   0.0000      -  Config::TIEHASH
       0.00   0.000 -0.000      3   0.0000      -  diagnostics::shorten
       0.00   0.000 -0.000      3   0.0000      -  diagnostics::autodescribe
       0.00   0.000  0.010      1   0.0000 0.0099  main::test_code
    

    It's not easy to see who is responsible for this enourmous overhead, even if main::BEGIN seems to run, most of the time. To get a whole picture we must see the OPs tree, which shows us who calls who, so we run:

      % dprofpp -T
    

    and the output is:

     main::BEGIN
       diagnostics::BEGIN
          Exporter::import
             Exporter::export
       diagnostics::BEGIN
          Config::BEGIN
          Config::TIEHASH
          Exporter::import
             Exporter::export
       Config::FETCH
       Config::FETCH
       diagnostics::unescape
       .....................
       B<3159 times [diagnostics::unescape] snipped> .
       .....................
       diagnostics::unescape
       diagnostics::import
     diagnostics::warn_trap
       diagnostics::splainthis
          diagnostics::transmo
          diagnostics::shorten
          diagnostics::autodescribe
     main::test_code
       diagnostics::warn_trap
          diagnostics::splainthis
             diagnostics::transmo
             diagnostics::shorten
             diagnostics::autodescribe
       diagnostics::warn_trap
          diagnostics::splainthis
             diagnostics::transmo
             diagnostics::shorten
            diagnostics::autodescribe
    

    So we see that 2 executions of diagnostics::BEGIN and 3161 of diagnostics::unescape are responsible for most of the running overhead.

    META: but we see that it might be run only once in mod_perl, so the numbers are better right? check it!

    If we comment out the diagnostics module, we get:

      Total Elapsed Time = 0.079974 Seconds
        User+System Time = 0.059974 Seconds
      Exclusive Times
      %Time ExclSec CumulS #Calls sec/call Csec/c  Name
       0.00   0.000 -0.000      1   0.0000      -  main::test_code
    

    It is possible to profile code running under mod_perl with the Devel::DProf module, available on CPAN. However, you must have apache version 1.3b3 or higher and the PerlChildExitHandler enabled (during the httpd build process). When the server is started, Devel::DProf installs an END block to write the tmon.out file. This block will be called at the server shutdown. Here is how to start and stop a server with the profiler enabled:

      % setenv PERL5OPT -d:DProf
      % httpd -X -d `pwd` &
      ... make some requests to the server here ...
      % kill `cat logs/httpd.pid`
      % unsetenv PERL5OPT
      % dprofpp
    

    The Devel::DProf package is a Perl code profiler. It will collect information on the execution time of a Perl script and of the subs in that script (remember that print() and map() are just like any other subroutines you write, but they are come bundled with Perl!)

    Another approach is to use Apache::DProf, which hooks Devel::DProf into mod_perl. The Apache::DProf module will run a Devel::DProf profiler inside each child server and write the tmon.out file in the directory $ServerRoot/logs/dprof/$$ when the child is shutdown (where $$ is a number of the child process). All it takes is to add to httpd.conf:

      PerlModule Apache::DProf
    

    Remember that any PerlHandler that was pulled in before Apache::DProf in the httpd.conf or <startup.pl>, would not have its code debugging info inserted. To run dprofpp, chdir to $ServerRoot/logs/dprof/$$ and run:

      % dprofpp
    

    [TOC]


    Object Methods Calls Versus Function Calls

    Which approach is more efficient: OOP methods or function calls? For example, CGI.pm allows you to work in both modes.

      use CGI;
      my $q = new CGI;
      $q->param('x',5);
      my $x = $q->param('x');
    

    versus

      use CGI qw(:standard);
      param('x',5);
      my $x = param('x');
    

    As usual, let's benchmark and compare:

      meth_vs_func.pl
      ---------------
      use Benchmark;
      
      use CGI qw(:standard);
      $CGI::NO_DEBUG = 1;
      my $q = new CGI;
      my $x;
      timethese
        (20000, 
         {
          'Method'   => sub {$q->param('x',5); $x = $q->param('x'); },
          'Function' => sub {param('x',5); $x = param('x');},
         });
    

    The benchmark is written is such a way, that all the initializations are done at the beginning, so we can do a pure benchmarking. Let's do it:

      % ./meth_vs_func.pl
      
      Function: 29 wallclock secs (25.19 usr +  0.13 sys = 25.32 CPU)
        Method: 28 wallclock secs (22.94 usr +  0.10 sys = 23.04 CPU)
    

    What we are looking at are 'total CPU times' and not 'wallclock seconds', since it's possible that the load on the system was different for the two test while benchmarking, so these numbers are wrong ones to base our conclusions on.

    As we see methods are for about 6% slower than functions. This number is true for all methods in CGI.pm and other OOP modules as well. Why? Because the difference between functions and methods is in time taking to resolve the pointer from the object, to find the Module it belongs too and the actual method.

    If you maintain the data object in a package's global variable like CGI.pm does, you also save a little more time since you don't have to pass it to the function. One parameter less to pass, less stack operations, less time to get to the guts of the function.

    But this little overhead is insignificant for most of us, relative to the benefits it gives when we have a big project to take care of. And with big projects it's much easier to use the object oriented approach.

    In addition there is a real memory hit when you import all of the function into your process' memory. This can significantly enlarge memory requirements, particularly when there are many child processes.

    Aside of namespace pollution, when importing symbols from any module any script, its size grows by the size of the allocated space for those symbols. The more you import (e.g. qw(:standard) vs qw(:all)) the more memory will be used. Let's say the overhead is of size X. Now take the number of scripts you deploy the function method interface, let's call it Y. Finally let's say that you have Z number of processes.

    You will need X*Y*Z size of additional memory, taking X=10k, Y=10, Z=30, we get 10k*10*30 = 3Mb!!! Now you understand the difference.

    Let's benchmark the CGI.pm using GTop.pm. First with no exporting at all.

      use GTop ();
      use CGI ();
      print GTop->new->proc_mem($$)->size;
    

      1,949,696
    

    Now exporting a few dozens symbols:

      use GTop ();
      use CGI qw(:standard);
      print GTop->new->proc_mem($$)->size;
    

      1,966,080
    

    And finally exporting all the symbols (about 130)

      use GTop ();
      use CGI qw(:all);
      print GTop->new->proc_mem($$)->size;
    

      1,970,176
    

    Results:

      import symbols  size(bytes)  delta(bytes) relative to ()
      --------------------------------------
      ()              1949696             0
      qw(:standard)   1966080         16384
      qw(:all)        1970176         20480
    

    So in my example above X=20k => 20K*10*30 = 6Mb. You will need 6Mb more when importing all the CGI.pm's symbols versus not importing at all.

    But generally you use more scripts, more processes and probably import more symbols from the additional modules that use deploy.

    But, as reported, function method is faster in general case, because of the time overhead that takes to resolve the pointer from the object.

    If you are heading to performance improving direction, you will have to face the fact, that having to type My::Module::my_method might save you a good chunk of memory if the above call must not be called with a reference to an object, but even then it can be passed by value.

    I strongly endorse Apache::Request (libapreq) - Generic Apache Request Library. Its guts are all written in C, giving it a significant memory and performance benefit. It has all the functionality CGI.pm has, but HTML generation functions.

    [TOC]


    Sending plain HTML as a compressed output

    See Apache::GzipChain - compress HTML (or anything) in the OutputChain

    [TOC]


    Increasing the shared memory with mergemem

    mergemem is an experimental utility for linux, which looks *very* interesting for us mod_perl users:

            http://mondoshawan.ml.org/mergemem/
    

    It looks like it could be run periodically on your server to find and merge duplicate pages. There are caveats: it would halt your httpds during the merge (it appears to be very fast, but still ...).

    This software comes with a utility called memcmp to tell you how much you might save.

    If you have tried this utility, please let us know what do you think about it! Thanks

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/perl.html0100644000000000000000000016301607027225633012517 0ustar rootroot mod_perl guide: Perl Reference

    Mod Perl Icon Mod Perl Icon Perl Reference


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    A must read!

    This new document was born because some users are reluctant to learn Perl, prior to jumping into a mod_perl. I will try to cover some of the most frequent pure perl questions being asked at the list.

    Update: I'm moving most of the pure Perl related topics from everywhere in the Guide to this chapter. From now on other chapters will refer to sections in this chapter if required.

    Before you decide to skip this chapter make sure you know all the information provided here. The rest of the Guide assumes that you read this chapter and understood it.

    [TOC]


    Warnings Explained

    Meta: Rewrite this section

    Sometimes it's very hard to understand what a warning is complaining about. You see the source code, but you cannot understand why some specific snippet produces that warning. The mystery often results from the fact that the code can be called from different places.

    Here is an example:

      local $^W=1;
      good();
      bad();
      
      sub good{
        print_value("Perl");
      }
      
      sub bad{
        print_value();
      }
      
      sub print_value{
        my $var = shift;
        print "My value is $var\n";
      }
    

    In the code above, there is a subroutine that prints the passed value, sub good that passes the value correctly and sub bad where we forgot to pass it. When we run the script, we get the warning:

      Use of uninitialized value at ./warning.pl line 15.
    

    We can see the undefined variable $var at the line that attempts to print it:

      print "My value is $var\n";
    

    But how do we know why it is undefined? The solution is quite simple. What we need is a full stack trace, triggered by the warning.

    The Carp module comes to our aid with its cluck() function. Let's modify the script by adding a couple of lines. The rest of the script is unchanged.

      use Carp ();
      local $SIG{__WARN__} = \&Carp::cluck;
      
      local $^W=1;
      good();
      bad();
      
      sub good{
        print_value("Perl");
      }
      
      sub bad{
        print_value();
      }
      
      sub print_value{
        my $var = shift;
        print "My value is $var\n";
      }
    

    Now when we execute it, we see:

      Use of uninitialized value at /home/httpd/perl/book/warning.pl line 18.
      Apache::ROOT::perl::book::warning_2epl::print_value() 
        called at /home/httpd/perl/book/warning.pl line 13
      Apache::ROOT::perl::book::warning_2epl::bad() 
        called at /home/httpd/perl/book/warning.pl line 6
      Apache::ROOT::perl::book::warning_2epl::handler('Apache=SCALAR(0x84b1154)') 
        called at /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139
      eval {...} called at 
        /usr/lib/perl5/site_perl/5.005/i386-linux/Apache/Registry.pm line 139
      Apache::Registry::handler('Apache=SCALAR(0x84b1154)') 
        called at PerlHandler subroutine `Apache::Registry::handler' line 0
      eval {...} called at PerlHandler subroutine `Apache::Registry::handler' line 0
    

    Take a moment to understand the trace. The only part that we are interested in is the one that starts when our script is being called, so we can skip the Apache::Registry trace part. So we are left with:

      Use of uninitialized value at /home/httpd/perl/book/warning.pl line 18.
      Apache::ROOT::perl::book::warning_2epl::print_value() 
        called at /home/httpd/perl/book/warning.pl line 13
      Apache::ROOT::perl::book::warning_2epl::bad() 
        called at /home/httpd/perl/book/warning.pl line 6
    

    which tells us that the code that triggered the warning was:

      Apache::Registry code => bad() => print_value()
    

    We go into a bad() and indeed see that we forgot to pass the variable. Of course when you write a subroutine like print_value it could be a good idea to check the passed arguments before starting execution. But it was ``good'' enough to show you how to ease the debugging process.

    Sure, you say. I could find that problem by simple inspection of the code. You're right, but I promise you that your task would be quite complicated and time consuming for code of some thousands of lines.

    Notice the local() keyword in the second line that we added to our script, before setting $SIG{__WARN__}. Since %SIG is a global variable, forgetting to use local() will enforce this setting for all the scripts running under the same process. If this is the behaviour you want, for example in the development server, you should set it in a startup file, where you can easily switch this feature on and off.

    As you have noticed, warnings report the line number of the script which caused the warning. Unfortunately, certain uses of the eval operator and ``here documents'' are known to throw off Perl's line numbering, so the line numbers are often incorrect. (See Finding the Line Number the Error/Warning has been Triggered at)

    While having warning mode turned On is a must in a development server, you should turn it globally Off in a production server, since if every CGI script generates only one warning per request, and your server serves millions of requests per day, your log file will eat up all of your disk space and your system will die. My production servers have the following directive in the httpd.conf:

        PerlWarn Off
    

    While we are talking about control flags, another and more important flag is -T which turns On Taint mode. Since this is a very broad topic I'll not discuss it here, but if you aren't forcing all your scripts to run under Taint mode you are looking for trouble from malicious users. To turn it On, add to httpd.conf:

      PerlTaintCheck On
    

    [TOC]


    Variables globally, lexically scoped and fully qualified

     META: complete
    

    Also see the clarification of my() vs. use vars - Ken Williams writes:

      Yes, there is quite a bit of difference!  With use vars(), you are
      making an entry in the symbol table, and you are telling the
      compiler that you are going to be referencing that entry without an
      explicit package name.
      
      With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE.  The compiler
      figures out _at_ _compile_time_ which my() variables (i.e. lexical
      variables) are the same as each other, and once you hit execute time
      you can not go looking those variables up in the symbol table.
    

    And my() vs. local() - Randal Schwartz writes:

      local() creates a temporal-limited package-based scalar, array,
      hash, or glob -- when the scope of definition is exited at runtime,
      the previous value (if any) is restored.  References to such a
      variable are *also* global... only the value changes.  (Aside: that
      is what causes variable suicide. :)
      
      my() creates a lexically-limited non-package-based scalar, array, or
      hash -- when the scope of definition is exited at compile-time, the
      variable ceases to be accessible.  Any references to such a variable
      at runtime turn into unique anonymous variables on each scope exit.
    

    [TOC]


    Additional reading references

    For more information see: Using global variables and sharing them between modules/packages and an article by Mark-Jason Dominus about how Perl handles variables and namespaces, and the difference between use vars() and my() - http://www.plover.com/~mjd/perl/FAQs/Namespaces.html .

    [TOC]


    my() Scoped Variable in Nested Subroutines

    Before we proceed let's make a healthy assumption that we want to develop the code under strict pragma and avoid using global variables, thus using my() scoped variables whenever it's possible.

    [TOC]


    The Poison

    Let's look at this code:

      nested.pl
      -----------
      #!/usr/bin/perl
      
      use strict;
      
      sub print_power_of_2 {
        my $x = shift;
      
        sub power_of_2 {
          return $x ** 2; 
        }
      
        my $result = power_of_2();
        print "$x^2 = $result\n";
      }
      
      print_power_of_2(5);
      print_power_of_2(6);
    

    Don't let the weird subroutine names to fool you, the print_power_of_2() subroutine should print the power of two of the passed number. Let's run the code and see whether it works:

      % ./nested.pl
      
      5^2 = 25
      6^2 = 25
    

    Ouch, something is wrong. May be there is a bug in Perl and it doesn't work correctly with number 6? Let's try again using the 5 and 7:

      print_power_of_2(5);
      print_power_of_2(7);
    

    And run it:

      % ./nested.pl
      
      5^2 = 25
      7^2 = 25
    

    Wow, does it works only for 5? How about using 3 and 5:

      print_power_of_2(3);
      print_power_of_2(5);
    

    and the result is:

      % ./nested.pl
      
      3^2 = 9
      5^2 = 9
    

    Now we start to understand--only the first call to the print_power_of_2() function works correctly. Which makes us think that our code has some kind of memory for results of first time execution and a ignorance of the arguments from consequent executions.

    [TOC]


    The Diagnosis

    Let's follow the guidelines and use a -w flag. Now execute the code:

      % ./nested.pl
      
      Variable "$x" will not stay shared at ./nested.pl line 9.
      5^2 = 25
      6^2 = 25
    

    We have never saw such a warning message before and we don't quite understand what it means. A diagnostics pragma will certainly help us. Let's prepend this pragma before the strict pragma in our code:

      #!/usr/bin/perl -w
      
      use diagnostics;
      use strict;
    

    And execute it:

      % ./nested.pl
      
      Variable "$x" will not stay shared at ./nested.pl line 10 (#1)
        
        (W) An inner (nested) named subroutine is referencing a lexical
        variable defined in an outer subroutine.
        
        When the inner subroutine is called, it will probably see the value of
        the outer subroutine's variable as it was before and during the
        *first* call to the outer subroutine; in this case, after the first
        call to the outer subroutine is complete, the inner and outer
        subroutines will no longer share a common value for the variable.  In
        other words, the variable will no longer be shared.
        
        Furthermore, if the outer subroutine is anonymous and references a
        lexical variable outside itself, then the outer and inner subroutines
        will never share the given variable.
        
        This problem can usually be solved by making the inner subroutine
        anonymous, using the sub {} syntax.  When inner anonymous subs that
        reference variables in outer subroutines are called or referenced,
        they are automatically rebound to the current values of such
        variables.
        
      5^2 = 25
      6^2 = 25
    

    Well, now everything is clear. We have the inner subrouitine power_of_2() and the outer subroutine print_power_of_2() in our code.

    When the inner power_of_2() subroutine is called for the first time, it sees the value of the outer print_power_of_2() subroutine's $x variable. On consequent calls the $x variable wouldn't be updated, no matter what was the value of it in the outer subroutine. That's why the $x variable is no longer be shared.

    [TOC]


    The Remedy

    diagnostics pragma suggests using an anonymous subroutine (known also as closure). Let's rewrite the code to use this technique instead:

      anonymous.pl
      --------------
      #!/usr/bin/perl
      
      use strict;
      
      sub print_power_of_2 {
        my $x = shift;
      
        my $func_ref = sub {
          return $x ** 2;
        };
      
        my $result = &$func_ref();
        print "$x^2 = $result\n";
      }
      
      print_power_of_2(5);
      print_power_of_2(6);
    

    Now $func_ref contains a reference to an anonymous function, which we later use when we need to get the power of two. Since the anonymous function will be generated afresh every time print_power_of_2() will be called the correct answer will given. Let's verify:

      % ./anonymous.pl
      
      5^2 = 25
      6^2 = 36
    

    Indeed, it worked correctly as advertised.

    [TOC]


    When You Cannot Get Rid of Inner Subroutine

    First you might wonder, why in the world someone will need to define an inner subroutine. For example to improve the efficiency of perl scripts starting overhead you decide to write a daemon that will compile that the scripts and modules only once and store the cached pre-compiled code in memory. When some script ought to be executed you just tell the daemon the name of the script to run and it will do the rest.

    Seems like an easy task, and it is. The only problem is once the script is compiled, how do you execute it? Or let's put it the other way: after it was executed for the first time and it stays compiled in the daemon memory, how do you call it again? If you could enforce on developers to code the scripts so each will have a subroutine called run() that will actually execute the code in the script you have half of the problem solved.

    But how daemon knows to refer to some specific script if they all run in the main:: name space? An obvious thing is to ask the developers to declare a package in each and every script, and for the package name to be derived from the script name. Moreover, since there is chance that there will be more than once script with the same name but residing in different directories, the directory has to be a part of the package name in order to prevent namespace collisions. And don't forget that script can be moved from directory to directory and you will have to make sure that the package name will be corrected every time the script gets moved.

    But why enforce these strange rules on developers, when we can arrange for our daemon to do this work? For every script that daemon is about to execute for the first time, it should be wrapped inside the package whose name is constructed from the mungled path to the script and a subroutine called run(). For example if the daemon is about to execute the script /tmp/hello.pl:

      hello.pl
      --------
      #!/usr/bin/perl
      print "Hello\n";
    

    Prior to running it, the daemon will change the code to be:

      wrapped_hello.pl
      ----------------
      package cache::tmp::hello_2epl;
      
      sub run{
        #!/usr/bin/perl 
        print "Hello\n";
      }
    

    Where the package name is constructed from prefix cache::, each directories separation slash replaced with :: and non ASCII characters are encoded, so the . becomes _2e.

    Now when the daemon is requested to execute the script /tmp/hello.pl, all it has to do is to build the package name as before based on the location of the script and call its run() subroutine:

      use cache::tmp::hello_2epl;
      cache::tmp::hello_2epl::run();
    

    We have just written a partial prototype of the daemon we desired, the only not defined method is how to pass the path to the script to the daemon. This detail is left to the reader as an exercise.

    If you are familiar with Apache::Registry module, you know that it works almost in the same way. It uses a different package prefix and the generic function is called handler() and not run(). The scripts to run are passed through the HTTP protocol's headers.

    Now you understand that there are cases where your normal subroutines can become inner, since if your script was a simple:

      simple.pl
      ---------
      #!/usr/bin/perl 
      sub hello { print "Hello" }
      hello();
    

    Wrapped into a run() subroutine it becomes:

      simple.pl
      ---------
      package cache::simple_2epl;
      
      sub run{
        #!/usr/bin/perl 
        sub hello { print "Hello" }
        hello();
      }
    

    Therefore, hello() is an inner subroutine and if you have used my() scoped variables defined and altered outside and used inside hello(), it wouldn't work correctly starting from the second call, as was explained in the previous section.

    [TOC]


    Remedies working for Inner Subroutine

    First of all there is nothing to worry about since if you do happen to have ``the my() scoped variable in the inner subroutine'' problem, Perl will always alert you if you don't forget to turn the warnings On.

    Given that you have a script that has this problem. What are the ways to solve it? There are many of them and we will discuss some of them here.

    We will the following code to show different solutions.

      multirun.pl
      -----------
      #!/usr/bin/perl -w
      
      use strict;
      
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
      
      sub run {
      
        my $counter = 0;
      
        increment_counter();
        increment_counter();
      
        sub increment_counter{
          $counter++;
          print "Counter is equal to $counter !\n";
        }
      
      } # end of sub run
    

    This code executes the run() subroutine three times, which in turn initializes the $counter variable to 0, every time it executed and then calls twice the increment_counter() inner subroutine that prints $counter's value after incrementing it. One might expect to see the following output:

      run: [time 1]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 2]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 3]
      Counter is equal to 1 !
      Counter is equal to 2 !
    

    But as we have already learned from the previous sections, this is not what we are going to see. Indeed, when we run the script we see:

      % ./multirun.pl
    

      Variable "$counter" will not stay shared at ./nested.pl line 18.
      run: [time 1]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 2]
      Counter is equal to 3 !
      Counter is equal to 4 !
      run: [time 3]
      Counter is equal to 5 !
      Counter is equal to 6 !
    

    Obviously, the $counter variable is not reinitialized on each run() execution, therefore the $counter variable inside the increment_counter() subroutine preserves its previous value from the last execution and increments it to the next value.

    One of the workarounds is to use globally declared variables, with the vars pragma.

      multirun1.pl
      -----------
      #!/usr/bin/perl -w
      
      use strict;
      use vars qw($counter);
      
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
      
      sub run {
      
        $counter = 0;
      
        increment_counter();
        increment_counter();
      
        sub increment_counter{
          $counter++;
          print "Counter is equal to $counter !\n";
        }
      
      } # end of sub run
    

    If you run this and other offered below solutions, the correct expected output will be generated:

      % ./multirun1.pl
      
      run: [time 1]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 2]
      Counter is equal to 1 !
      Counter is equal to 2 !
      run: [time 3]
      Counter is equal to 1 !
      Counter is equal to 2 !
    

    By the way, the warning we saw before has gone and so the problem, since there is no my() (lexically defined) variable used in the nested subroutine.

    Another approach is to use fully qualified variables. This is a better one, since less memory will be used, but it adds a typing overhead:

      multirun2.pl
      -----------
      #!/usr/bin/perl -w
      
      use strict;
      
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
      
      sub run {
      
        $main::counter = 0;
      
        increment_counter();
        increment_counter();
      
        sub increment_counter{
          $main::counter++;
          print "Counter is equal to $main::counter !\n";
        }
      
      } # end of sub run
    

    You can also pass the variable to the subroutine by value and make the subroutine return it after it was updated. This adds time and memory overheads, so it's not a good idea if the variable can be very large.

    Don't rely on the fact that the variable is small during the development of the application, it can grow quite big in situations you didn't expect. For example, a very simple HTML form text entry field can return a few megabytes of data if one of users is bored and want to test how good is your code. It's not uncommon to see user Copy-and-Paste core dump files of 10Mb in size into a form's text fields and submit it for your script to process.

      multirun3.pl
      -----------
      #!/usr/bin/perl -w
      
      use strict;
      
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
      
      sub run {
      
        my $counter = 0;
      
        $counter = increment_counter($counter);
        $counter = increment_counter($counter);
      
        sub increment_counter{
          my $counter = shift || 0 ;
      
          $counter++;
          print "Counter is equal to $counter !\n";
      
          return $counter;
        }
      
      } # end of sub run
    

    Finally, you can use references to do the job. increment_counter() accepts a reference to a $counter variable and increments its value by first dereferencing it. The $counter variable outside gets affected by this change as well.

      multirun4.pl
      -----------
      #!/usr/bin/perl -w
      
      use strict;
      
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
      
      sub run {
      
        my $counter = 0;
      
        increment_counter(\$counter);
        increment_counter(\$counter);
      
        sub increment_counter{
          my $r_counter = shift || 0;
      
          $$r_counter++;
          print "Counter is equal to $$r_counter !\n";
        }
      
      } # end of sub run
    

    Here is yet another even more obsure reference usage. We modify the value of $counter inside the subroutine by using the fact that variables in @_ are actually aliases, so if you directly modify one of the members of the array the actual value of the passed variable gets changed.

      multirun5.pl
      -----------
      #!/usr/bin/perl -w
      
      use strict;
      
      for (1..3){
        print "run: [time $_]\n";
        run();
      }
      
      sub run {
      
        my $counter = 0;
      
        increment_counter($counter);
        increment_counter($counter);
      
        sub increment_counter{
          $_[0]++;
          print "Counter is equal to $_[0] !\n";
        }
      
      } # end of sub run
    

    Now you have at least five workarounds to choose from.

    For more information please refer to perlref and perlsub manpages.

    [TOC]


    use(), require(), do(), %INC and @INC Explained

    [TOC]


    The @INC array

    @INC is a special Perl variable which is an equivalent of the shell's PATH variable. While PATH includes a list of directories the executables are being looked up in, @INC contains a list of directories Perl modules and libraries can be loaded from.

    When you use(), require() or do() a filename or a module, Perl gets a list of directories from the @INC variable to search for the file it was requested to load. If the file that you want to load is not located in one of the listed directories, you have to tell Perl where to find the file by providing it a relative path to one of the directories in @INC or a full path to the file.

    [TOC]


    The %INC hash

    %INC is another special Perl variable that is used to cache the names of the files and the modules that were successfully loaded and compiled by use(), require() or do() functions. Before attempting to load a file or a module, Perl checks whether it's already in %INC hash. If it's there--the loading and therefore the loaded code compilation are not performed at all. Otherwise the file is loaded in memory and attempted to be compiled.

    If the file is successfully loaded and compiled, a new key-value pair is added to %INC, where the key is the name of the file or module as it passed to the one of the three functions we have just mentioned, and the value is a full path to it in the file system if it was found in any of the @INC directories, but ".".

    The following examples will make it easier to understand a described logic.

    First, let's see what are the contents of @INC on my system:

      % perl -e 'print join "\n", @INC'
      /usr/lib/perl5/5.00503/i386-linux
      /usr/lib/perl5/5.00503
      /usr/lib/perl5/site_perl/5.005/i386-linux
      /usr/lib/perl5/site_perl/5.005
      .
    

    Notice the . (current directory) as a last directory in the list.

    Now let's load a module strict.pm and see the contents of %INC:

      % perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'
      
      strict.pm => /usr/lib/perl5/5.00503/strict.pm
    

    Since strict.pm was found in /usr/lib/perl5/5.00503/ directory and /usr/lib/perl5/5.00503/ is a part of @INC--%INC includes a full path as a value for the key strict.pm.

    Now let's create the simplest module in /tmp/test.pm:

      test.pm
      -------
      1;
    

    It does nothing, but returns a true value when loaded. Now let's laod it in different ways:

      % cd /tmp
      % perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'
      
      test.pm => test.pm
    

    Since the file was found relative to . (current directory) the relative path is inserted as a value, but if we alter the @INC, by adding the /tmp to the end:

      % cd /tmp
      % perl -e 'BEGIN{push @INC, "/tmp"} use test; \
      print map {"$_ => $INC{$_}\n"} keys %INC'
      
      test.pm => test.pm
    

    we still get the relative path, since the module was found first relative to ".", because the /tmp was after . in the list. But if we execute the same code from a different directory and therefore the "." directory wouldn't match:

      % cd /
      % perl -e 'BEGIN{push @INC, "/tmp"} use test; \
      print map {"$_ => $INC{$_}\n"} keys %INC'
      
      test.pm => /tmp/test.pm
    

    we get the full path. We can also prepand the path with unshift(), so it will be used for matching before "." and therefore we get a full path as well.

      % cd /tmp
      % perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
      print map {"$_ => $INC{$_}\n"} keys %INC'
      
      test.pm => /tmp/test.pm
    

      BEGIN{unshift @INC, "/tmp"}
    

    can be replaced with more elegant:

      use lib "/tmp";
    

    Which executes exactly the BEGIN block from above.

    These approaches to modifying @INC can be labour intensive, since if you want to move the script around in the filesystem you have to modify the path. This can be painful, for example, when you move your scripts from development to a production server.

    There is a FindBin module, which solves this problem is the plain perl world, but unfortunately it doesn't work correctly under mod_perl.

    If you use this module, you don't need to write a hardcoded path. The following snippet does all the work for you (the file is /tmp/load.pl):

      load.pl
      -------
      #!/usr/bin/perl
      
      use FindBin ();
      use lib "$FindBin::Bin";
      use test;
      print "test.pm => $INC{'test.pm'}\n";
    

    In the above example $FindBin::Bin equals to /tmp. If we move the script somewhere else... e.g. /tmp/x in the code above $FindBin::Bin equals to /home/x.

      % /tmp/load.pl
      
      test.pm => /tmp/test.pm
    

    Just like with use lib but no hardcoded path required.

    As I've mentioned earlier, FindBin will not work in mod_perl environment, since it's a module and as any module it's loaded only once. So the first script using it will have all the settings correct, but the rest of the scripts will not if located in a different directory than the first one.

    [TOC]


    Modules, Libraries and Files

    Before we proceed let's define what do we mean by module and library or file.

    • Library or File

      A file which contains perl subroutines and other code.

      It generally doesn't include a package declaration.

      Its last statement returns true.

      Can be named in any desired way, but generally it has a .pl or .ph extensions.

      Examples:

        config.pl
        ----------
        $dir = "/home/httpd/cgi-bin";
        $cgi = "/cgi-bin";
        1;
      

        mysubs.pl
        ----------
        sub print_header{
          print "Content-type: text/plain\r\n\r\n";
        }
        1;
      

    • Module

      A file which contains perl subroutines and other code.

      It generally declares a package name at the beginning of it.

      Its last statement returns true.

      A naming convention requires it to have a .pm extension.

      Example:

        MyModule.pm
        -----------
        package My::Module;
        $My::Module::VERSION = 0.01;
        
        sub new{ return bless {}, shift;}
        END { print "Quitting\n"}
        1;
      

    [TOC]


    require()

    What require() does is reading a file with Perl code and compiles it. Before attempting to load the file it looks up its argument in %INC to see whether it was already loaded. If it was, require() just returns without doing a thing. Otherwise the file will be attempted to be loaded and compiled.

    require() has to find the file, is has to load. If the argument is a full path to the file, it just tries to read it. For example:

      require "/home/httpd/perl/mylibs.pl";
    

    If the path is relative, require() will attempt to search for the file in all the directories listed in @INC. For example:

      require "mylibs.pl";
    

    If there is more than one occurance of the file with the same name, in directories listed in @INC the first occurance will be used.

    The file must return TRUE as the last statement to indicate successful execution of any initialization code. Since you never know what changes the file will go through in the future, you cannot be sure that the last statement will always return TRUE. That's why the suggestion is to put ``1;'' at the end of file.

    While you should use the real filename for mosts of the files. If the file is a module, you may use the following convention instead:

      require My::Module;
    

    This is equal to:

      require "My/Module.pm";
    

    If require() fails to load the file, either because it couldn't find the file in question, the code failed to compile and didn't return TRUE at the end, the program would die(), unless the require() statement would be enclosed into an eval() block, like in this example:

      require.pl
      ----------
      #!/usr/bin/perl -w
      
      eval { require "/file/that/does/not/exists"};
      if ($@) {
        print "Failed to load, because : $@"
      }
      print "\nHello\n";
    

    When we execute the program:

      % ./require.pl
      
      Failed to load, because : Can't locate /file/that/does/not/exists in
      @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux
      /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux
      /usr/lib/perl5/site_perl/5.005 .) at require.pl line 3.
      
      Hello
    

    We see that the program didn't die(), because Hello was printed. This trick is useful when you want to check whether a user has some module installed, but if she hasn't--it's not so critical, may be the program runs without this module with a reduced set of functionality.

    If we remove the eval() part and try again:

      require.pl
      ----------
      #!/usr/bin/perl -w
      
      require "/file/that/does/not/exists";
      print "\nHello\n";
    

      % ./require1.pl
      
      Can't locate /file/that/does/not/exists in @INC (@INC contains:
      /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503
      /usr/lib/perl5/site_perl/5.005/i386-linux
      /usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3.
    

    The program just die()s in the last example, which is what you want in most of the cases.

    For more information referer to perlfunc manpage.

    [TOC]


    use()

    use() just like require() loads and compiles the files with Perl code, but it works with modules only. Thus the only way to pass a module to load is by its name and not a filename. If the module located in MyCode.pm, the correct way to use() it is:

      use MyCode
    

    and not:

      use "MyCode.pm"
    

    What use() does is translating of the passed argument into a file name replacing :: with / and appending .pm at the end. So My::Module becomes My/Module.pm.

    use() is exactly equivalent to:

     BEGIN { require Module; import Module LIST; }
    

    Internally it calls to require() to do the loading and compilation chores, when the former finishes its job, the import() is being called, unless () is a second argument. The following pairs are equivalent:

      use MyModule;
      BEGIN {require MyModule; import MyModule; }
      
      use MyModule qw(foo bar);
      BEGIN {require MyModule; import MyModule ("foo","bar"); }
      
      use MyModule ();
      BEGIN {require MyModule; }
    

    When non of the parameters passed to import() it imports the default symbols if such were defined inside the module. The import() is not a builtin function--it's just an ordinary static method call into the ``MyModule'' package to tell the module to import the list of features back into the current package. See the Exporter manpage for more information.

    There's a corresponding ``no'' command that unimports symbols imported by use, i.e., it calls unimport Module LIST instead of import().

    [TOC]


    do()

    While do() behaves almost indentically to require(), it reloads the file unconditionally. It doesn't check %INC to see whether the file was already loaded.

    If do() cannot read the file, it returns undef and sets $! to report the error. If do() can read the file but cannot compile it, it returns undef and sets an error message in $@. If the file is successfully compiled, do() returns the value of the last expression evaluated.

    [TOC]


    Using global variables and sharing them between modules/packages

    [TOC]


    Making the variables global

    When you first wrote $x in your code you created a global variable. It is visible everywhere in the file you have use it. or if defined it inside a package - it is visible inside this package. But it will work only if you do not use strict pragma and you HAVE to use this pragma if you want to run your scripts under mod_perl. Read The strict pragma to find out why.

    [TOC]


    Making the variables global with strict pragma On

    First you use :

      use strict;
    

    Then you use:

     use vars qw($scalar %hash @array);
    

    Starting from this moment the variables are global in the package you defined them, if you want to share global variables between packages, here what you can do.

    [TOC]


    Using Exporter.pm to share global variables

    Assume that you want to share the CGI.pm's object (I will use $q) between your modules. For example you create it in the script.pl, but want it to be visible in My::HTML. First - you make $q global.

      script.pl:
      ----------------
      use vars qw($q);
      use CGI;
      use lib qw(.); 
      use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl
      $q = new CGI;
      
      My::HTML::printmyheader();
      ----------------
    

    Note that we have imported $q from My::HTML. And the My::HTML which does the export of $q:

      My/HTML.pm
      ----------------
      package My::HTML;
      use strict;
      
      BEGIN {
        use Exporter ();
      
        @My::HTML::ISA         = qw(Exporter);
        @My::HTML::EXPORT      = qw();
        @My::HTML::EXPORT_OK   = qw($q);
      
      }
      
      use vars qw($q);
      
      sub printmyheader{
        # Whatever you want to do with $q... e.g.
        print $q->header();
      }
      1;
      -------------------
    

    So the $q is being shared between the My::HTML package and the script.pl. It will work vice versa as well, if you create the object in the My::HTML but use it in the script.pl. You have a true sharing, since if you change $q in script.pl, it will be changed in My::HTML as well.

    What if you need to share $q between more than 2 packages? For example you want My::Doc to share $q as well.

    You leave the My::HTML untouched, modify the script.pl to include:

     use My::Doc qw($q);
    

    And write the My::Doc exactly like My::HTML - of course that the content is different :).

    One possible pitfall is when you want to use the My::Doc in both My::HTML and script.pl. Only if you add:

      use My::Doc qw($q);
    

    Into a My::HTML, the $q will be shared. Otherwise My::Doc will not share the $q anymore. To make things clear here is the code:

      script.pl:
      ----------------
      use vars qw($q);
      use CGI;
      use lib qw(.); 
      use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl
      use My::Doc  qw($q); # Ditto
      $q = new CGI;
      
      My::HTML::printmyheader();
      ----------------
    

      My/HTML.pm
      ----------------
      package My::HTML;
      use strict;
      
      BEGIN {
        use Exporter ();
      
        @My::HTML::ISA         = qw(Exporter);
        @My::HTML::EXPORT      = qw();
        @My::HTML::EXPORT_OK   = qw($q);
      
      }
      
      use vars     qw($q);
      use My::Doc  qw($q);
      
      sub printmyheader{
        # Whatever you want to do with $q... e.g.
        print $q->header();
      
        My::Doc::printtitle('Guide');
      }
      1;
      -------------------
    

      My/Doc.pm
      ----------------
      package My::Doc;
      use strict;
      
      BEGIN {
        use Exporter ();
      
        @My::Doc::ISA         = qw(Exporter);
        @My::Doc::EXPORT      = qw();
        @My::Doc::EXPORT_OK   = qw($q);
      
      }
      
      use vars qw($q);
      
      sub printtitle{
        my $title = shift || 'None';
        
        print $q->h1($title);
      }
      1;
      -------------------
    

    [TOC]


    Using aliasing perl feature to share global variables

    As the title says you can import a variable into a script/module without using an Exporter.pm. I have found it useful to keep all the configuration variables in one module My::Config. But then I have to export all the variables in order to use them in other modules, which is bad for two reasons: polluting other packages' name spaces with extra tags which rise up the memory requirements, adding an overhead of keeping track of what variables should be exported from the configuration module and what imported for some particular package. I solve this problem by keeping all the variables in one hash %c and exporting only it. Here is an example of My::Config:

      package My::Config;
      use strict;
      use vars qw(%c);
      %c = (
        # All the configs go here
        scalar_var => 5,
      
        array_var  => [
                       foo,
                       bar,
                      ],
      
        hash_var   => {
                       foo => 'Foo',
                       bar => 'BARRR',
                      },
      );
      1;
    

    Now in packages that want to use the configuration variables I have either to use the fully qualified names like $My::Config::test, which I dislike or import them as described in the previous section. But hey, since we have only one variable to handle, we can make things even simpler and save the loading of the Exporter.pm package. We will use aliasing perl feature for exporting and saving the keystrokes:

      package My::HTML;
      use strict;
      use lib qw(.);
        # Global Configuration now aliased to global %c
      use My::Config (); # My/Config.pm in the same dir as script.pl
      use vars qw(%c);
      *c = \%My::Config::c;
      
        # Now you can access the variables from the My::Config
      print $c{scalar_val};
      print $c{array_val}[0];
      print $c{hash_val}{foo};
    

    Of course $c is global everywhere you use it as described above, and if you change it somewhere it will affect any other packages you have aliased $My::Config::c to.

    Note that aliases work either with global or local() vars - you cannot write:

      my *c = \%My::Config::c;
    

    Which is an error. But you can:

      local *c = \%My::Config::c;
    

    [TOC]


    The Scope of the Special Perl Variables

    Special Perl variables like $| (buffering), $^T (time), $^W (warnings), $/ (input record separator), $\ (output record separator) and many more are all global variables. This means that you cannot localize them with my(). Only local() is permitted to do that. Since the child server doesn't usually exit, if in one of your scripts you modify a global varible it will be changed for the rest of the process' life and will affect all the scripts executed by the same process.

    We will demonstrate the case on the input record separator variable. If you undefine this variable, a diamond operator will suck in the whole file at once if you have enough memory. Remembering this you should never write code like the example below.

      $/ = undef; 
      open IN, "file" ....
        # slurp it all into a variable
      $all_the_file = <IN>;
    

    The proper way is to have a local() keyword before the special variable is being changed, like this:

      local $/ = undef; 
      open IN, "file" ....
        # slurp it all inside a variable
      $all_the_file = <IN>;
    

    But there is a catch. local() will propagate the changed value to any of the code below it. The modified value will be in effect until the script terminates, unless it is changed again somewhere else in the script.

    A cleaner approach is to enclose the whole of the code that is affected by the modified variable in a block, like this:

      {
        local $/ = undef; 
        open IN, "file" ....
          # slurp it all inside a variable
        $all_the_file = <IN>;
      }
    

    That way when Perl leaves the block it restores the original value of the $/ variable, and you don't need to worry about its value anywhere else in your program.

    [TOC]


    Compiled Regular Expressions

    When using a regular expression that contains an interpolated Perl variable, if it is known that the variable (or variables) will not vary during the execution of the program, a standard optimization technique consists of adding the /o modifier to the regexp pattern. This directs the compiler to build the internal table once, for the entire lifetime of the script, rather than every time the pattern is executed. Consider:

      my $pat = '^foo$'; # likely to be input from an HTML form field
      foreach( @list ) {
        print if /$pat/o;
      }
    

    This is usually a big win in loops over lists, or when using grep() or map() operators.

    In long-lived mod_perl scripts, however, this can pose a problem if the variable changes according to the invocation. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by the httpd child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is dependent on. Your script will appear broken.

    There are two solutions to this problem:

    The first -- is to use eval q//, to force the code to be evaluated each time. Just make sure that the eval block covers the entire loop of processing, and not just the pattern match itself.

    The above code fragment would be rewritten as:

      my $pat = '^foo$';
      eval q{
        foreach( @list ) {
          print if /$pat/o;
        }
      }
    

    Just saying:

      foreach( @list ) {
        eval q{ print if /$pat/o; };
      }
    

    is going to be a horribly expensive proposition.

    You can use this approach if you require more than one pattern match operator in a given section of code. If the section contains only one operator (be it an m// or s///), you can rely on the property of the null pattern, that reuses the last pattern seen. This leads to the second solution, which also eliminates the use of eval.

    The above code fragment becomes:

      my $pat = '^foo$';
      "something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
      foreach( @list ) {
        print if //;
      }
    

    The only gotcha is that the dummy match that boots the regular expression engine must absolutely, positively succeed, otherwise the pattern will not be cached, and the // will match everything. If you can't count on fixed text to ensure the match succeeds, you have two possibilities.

    If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can use the dummy match:

      "$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present
    

    If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the unsearchable \377 character as follows:

      "\377" =~ /$pat|^[\377]$/; # guaranteed if meta-characters present
    

    Another approach:

    It depends on the complexity of the regexp you apply this technique to. One common usage where compiled regexp is usually more efficient is to ``match any one of a group of patterns'' over and over again.

    Maybe with some helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book ``Mastering Regex''.

      #####################################################
      # Build_MatchMany_Function
      # -- Input:  list of patterns
      # -- Output: A code ref which matches its $_[0]
      #            against ANY of the patterns given in the
      #            "Input", efficiently.
      #
      sub Build_MatchMany_Function {
        my @R = @_;
        my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
        my $matchsub = eval "sub { $expr }";
        die "Failed in building regex @R: $@" if $@;
        $matchsub;
      }
    

    Example usage:

      @some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
      $Known_Browser=Build_MatchMany_Function(@some_browsers);
    

      while (<ACCESS_LOG>) {
        # ...
        $browser = get_browser_field($_);
        if ( ! &$Known_Browser($browser) ) {
          print STDERR "Unknown Browser: $browser\n";
        }
        # ...
      }
    

    [TOC]


    perldoc's Rarely Known But Very Useful Options

    To find what functions perl has, you would execute:

      perldoc perlfunc
    

    To learn the syntax and to find an example of specific known function, you would execute (e.g. for open()):

      perldoc -f open
    

    There is a bug in this option, for it wouldn't call pod2man and display the section in POD. But it's still readable and very useful.

    To search the Perl FAQ (perlfaq) sections you would do (e.g for an open keyword):

      perldoc -q open
    

    will return you all the matching Q&A sections, still in POD.

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/porting.html0100644000000000000000000033763107027225633013245 0ustar rootroot mod_perl guide: CGI to mod_perl Porting. mod_perl Coding guidelines.

    Mod Perl Icon Mod Perl Icon CGI to mod_perl Porting. mod_perl Coding guidelines.


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    Document Coverage

    This chapter is relevant to both writing a new CGI or perl handler from scratch and migrating an application from plain CGI to mod_perl.

    It also covers the case where the CGI script being ported does the job, but is too dirty to be altered easily to run as a mod_perl program. (a.k.a. Apache::PerlRun mode)

    If you are in the porting stage, use it as a reference for possible problems you might encounter when running an existing CGI script in the new mode.

    If your project schedule is tight, I would suggest converting to mod_perl in the following steps: Initially, run all the scripts in the Apache::PerlRun mode. Then as time allows, move them into Apache::Registry mode. Later if you need an Apache Perl API functionality you can always add it.

    If you are about to write a new CGI script from scratch, it would be a good idea to learn about possible pitfalls and to avoid them in first place.

    If you don't need mod_cgi compatibility, it's a good idea to start writing using the mod_perl API in first place. This will make your application a little bit more efficient and it will be easier to use the mod_perl set of features inaccessible by core Perl functionality.

    [TOC]


    Before you start to code

    It can be a good idea to tighten up some of your Perl programming practices, since mod_perl doesn't tolerate sloppy programming.

    This chapter relies on a certain level of Perl knowledge. Please read through the Perl Reference chapter and make sure you know the material covered there. This will allow me to concentrate on pure mod_perl issues and make them more prominent to the experinced Perl programmer, which would otherwise be lost in the sea of Perl background notes.

    Additional resources:

    [TOC]


    Exposing Apache::Registry secrets

    Let's start with some simple code and see what can go wrong with it, detect bugs and debug them, discuss possible pitfalls and how to avoid them.

    I will use a simple CGI script, that initializes a $counter to 0, and prints its value to the screen while incrementing it.

      counter.pl:
      ----------
      #!/usr/bin/perl -w
      use strict;
      
      print "Content-type: text/plain\r\n\r\n";
      
      my $counter = 0;
      
      for (1..5) {
        increment_counter();
      }
      
      sub increment_counter{
        $counter++;
        print "Counter is equal to $counter !\r\n";
      }
    

    You would expect to see the output:

      Counter is equal to 1 !
      Counter is equal to 2 !
      Counter is equal to 3 !
      Counter is equal to 4 !
      Counter is equal to 5 !
    

    And that's what you see when you execute this script the first time. But let's reload it a few times... See, suddenly after a few reloads the counter doesn't start its count from 1 any more. We continue to reload and see that it keeps on growing, but not steadily starting almost randomly at 10, 10, 10, 15, 20... Weird...

      Counter is equal to 6 !
      Counter is equal to 7 !
      Counter is equal to 8 !
      Counter is equal to 9 !
      Counter is equal to 10 !
    

    We saw two anomalies in this very simple script: Unexpected increment of our counter over 5 and inconsistent growth over reloads. Let's investigate this script.

    [TOC]


    The First Mystery

    First let's peek into the error_log file. Since we have enabled the warnings what we see is:

      Variable "$counter" will not stay shared 
      at /home/httpd/perl/conference/counter.pl line 13.
    

    The Variable "$counter" will not stay shared warning is generated when the script contains a named nested subroutine (a not anonymous subroutine defined inside another subroutine) that refers to a lexically scoped variable defined outside this nested subroutine. This effect is explained in my() Scoped Variable in Nested Subroutines.

    Do you see a nested named subroutine in my script? I don't! What's going on? Maybe it's a bug? But wait, maybe the perl interpreter sees the script in a different way, maybe the code goes through some changes before it actually gets executed? The easiest way to check what's actually happening is to run the script with a debugger.

    But since we must debug it when it's being executed by the webserver, a normal debugger wouldn't help, because the debugger has to be invoked from within the webserver. Luckily Doug MacEachern wrote the Apache::DB module and we will use it to debug my script. While Apache::DB does allow you to debug the code interactively, we will do it non-interactively.

    Modify the httpd.conf file in the following way:

      PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1 frame=2"
      PerlModule Apache::DB
      <Location /perl>
        PerlFixupHandler Apache::DB
        SetHandler perl-script
        PerlHandler Apache::Registry
        Options ExecCGI
        PerlSendHeader On
      </Location>
    

    Restart the server and issue a request to counter.pl as before. On the surface nothing has changed--we still see the correct output as before, but two things happened in the background:

    First, the file /tmp/db.out was written, with a complete trace of the code that was executed.

    Second, the error_log now contains the real code that was actually executed. This is produced as a side effect of reporting the Variable "$counter" will not stay shared at... warning that we saw earlier.

    Here is the code that was actually executed:

      package Apache::ROOT::perl::conference::counter_2epl;
      use Apache qw(exit);
      sub handler {
        BEGIN {
          $^W = 1;
        };
        $^W = 1;
        
        use strict;
        
        print "Content-type: text/plain\r\n\r\n";
        
        my $counter = 0;
        
        for (1..5) {
          increment_counter();
        }
        
        sub increment_counter{
          $counter++;
          print "Counter is equal to $counter !\r\n";
        }
      }
    

    The original code wasn't idented. I've idented it for you to stress that the code was wrapped inside the handler() subroutine.

    What do we learn from this?

    First, that every cgi script is cached under a package whose name is formed from the Apache::ROOT:: prefix and the relative part of the script's URL (perl::conference::counter_2epl) by replacing all occurrences of / with ::. That's how mod_perl knows what script should be fetched from the cache--each script is just a package with a single subroutine named handler.

    Second, you see now why the diagnostics pragma talked about an inner (nested) subroutine--increment_counter is actually a nested subroutine.

    With mod_perl, each subroutine in every Apache::Registry script is nested inside the handler subroutine. This is not true if the subroutine is defined in the module that is loaded from the script. We will see this later.

    The section Remedies working for Inner Subroutine discusses some of the possible workarounds for this problem.

    It's important to understand that the inner subroutine effect happens only with code that Apache::Registry wraps with a declaration of the handler subroutine. If you put your code into a library or module, which the main script require()'s or use()'s, this effect doesn't occur.

    For example if we put the subroutine increment_counter() into mylib.pl, save it in the same directory as the main script and require() it, there will be no problem at all. (Don't forget the 1; at the end of the library or the require() might fail.)

      mylib.pl:
      ---------
      sub increment_counter{
        $counter++;
        print "Counter is equal to $counter !\r\n";
      }
      1;
    

      counter.pl:
      ----------
      #!/usr/bin/perl -w
      
      use strict;
      require "./mylib.pl";
      
      print "Content-type: text/plain\r\n\r\n";
      
      my $counter = 0;
      
      for (1..5) {
        increment_counter();
      }
    

    Personally, unless the script is very short, I tend to write all the code in external libraries, and to have only a few lines in the main script. Generally the main script simply calls the main function of my library. Usually I call it init(). I don't worry about closure effects anymore (unless I create them myself :).

    You shouldn't be intimidated by this issue at all, since Perl is your friend. Just keep the warnings mode On and Perl will gladly tell you whenever you have this effect, by saying:

      Variable "$counter" will not stay shared at ...[snipped]
    

    Just don't forget to check your error_log file, before going into production!

    By the way, the above example was pretty boring. In my first days of using mod_perl, I wrote a simple user registration program. I'll give a very simple representation of this program.

      use CGI;
      $q = new CGI;
      my $name = $q->param('name')
      print_respond();
      
      sub print_respond{
        print "Content-type: text/plain\r\n\r\n";
        print "Thank you, $name!";
      }
    

    My boss and I checked the program at the development server and it worked OK. So we decided to put it in production. Everything was OK, but my boss decided to keep on checking by submitting variations of his profile. Imagine the surprise when after submitting his name (let's say ``The Boss'' :), he saw the response ``Thank you, Stas Bekman!''.

    What happened is that I tried the production system as well. I was new to mod_perl stuff, and was so excited with the speed improvement that I didn't notice the closure problem. It hit me. At first I thought that maybe Apache had started to confuse connections, returning responses from other people's requests. I was wrong of course.

    Why didn't we notice this when we were trying the software on our development server? Keep reading and you will understand why.

    [TOC]


    The Second Mystery

    Let's return to our original example and proceed with the second mystery we noticed. Why did we see inconsistent results over numerous reloads?

    That's very simple. Every time a server gets a request to process, it hands it over one of the children, generally in a round robin fashion. So if you have 10 httpd children alive, the first 10 reloads might seem to be correct because the closure starts to appear from the second re-invocation. Subsequent reloads then return unexpected results.

    Moreover, requests can appear at random and children don't always run the same scripts. At any given moment one of the children could have served the same script more times than any other, and another may never have run it. That's why we saw the strange behavior.

    Now you see why we didn't notice the problem with the user registration system in the example. First, we didn't look at the error_log. (As a matter of fact we did, but there were so many warnings in there that we couldn't tell what were the important ones and what were not). Second, we had too many server children running to notice the problem.

    A workaround is to run the server as a single process. You achieve this by invoking the server with the -X parameter (httpd -X). Since there are no other servers (children) running, you will see the problem on the second reload.

    But before that, let the error_log help you detect most of the possible errors--most of the warnings can become errors, so you should make sure to check every warning that is detected by perl, and probably you should write the code in such a way that no warnings appear in the error_log. If your error_log file is filled up with hundreds of lines on every script invocation, you will have difficulty noticing and locating real problems.

    Of course none of the warnings will be reported if the warning mechanism is not turned On. Refer to the ``Warnings Explained'' section to learn about warnings in general and to the ``Warnings'' section to learn how to turn them on and off under mod_perl.

    [TOC]


    Sometimes it Works, Sometimes it Doesn't

    When you start running your scripts under mod_perl, you might find yourself in a situation where a script seems to work, but sometimes it screws up. And the more it runs without a restart, the more it screws up. Often the problem is easily detectable and solvable. You have to test your script under a server running in single process mode (httpd -X).

    Generally the problem you have is of using global variables. Because global variables don't change from one script invocation to another unless you change them, you can find your scripts do strange things.

    The first example is amazing--Web Services. Imagine that you enter some site where you have an account, perhaps a free email account. Now you want to see other users' mail.

    You type in a username you want to peek at and a dummy password and try to enter the account. On some services this will work!!!

    You say, why in the world does this happen? The answer is simple: Global Variables. You have entered the account of someone who happened to be served by the same server child as you. Because of sloppy programming, a global variable was not reset at the beginning of the program and voila, you can easily peek into others' email! Here is an example of sloppy code:

      use vars ($authenticated);
      my $q = new CGI;
      my $username = $q->param('username');
      my $passwd   = $q->param('passwd');
      authenticate($username,$passwd);
        # failed, break out
      die "Wrong passwd" unless $authenticated == 1;
        # user is OK, fetch user's data
      show_user($username);
      
      sub authenticate{
        my ($username,$passwd) = @_;
            # some checking
        $authenticated = 1 if (SOMETHING);
      }
    

    Do you see the catch? With the code above, I can type in any valid username and any dummy passwd and enter that user's account, if someone has successfully entered his account before me using the same child process! Since $authenticated is global--if it becomes 1 once it'll be 1 for the remainder of the child's life!!! The solution is trivial--reset $authenticated to 0 at the beginning of the program. (Or many other different solutions). Of course this example is trivial--but believe me it happens!

    Just another little one liner that can spoil your day, assuming you forgot to reset the $allowed variable. It works perfectly OK in plain mod_cgi:

      $allowed = 1 if $username eq 'admin';
    

    But using mod_perl, and if your system administrator has previously used the system, anybody who is lucky enough to be served later by the same child which served your administrator will be allowed in too.

    Another good example is usage of the /o regular expression modifier, which compiles a regular expression once, on its first execution, and never compiles it again. This problem can be difficult to detect, as after restarting the server each request you make will be served by a different child process, and thus the regex pattern for that child will be compiled afresh. Only when you make a request that happens to be served by a child which has already cached the regex will you see the problem. Generally you miss that. When you press reload, you see that it works (with a new, fresh child). Eventually it doesn't, because you get a child that has already cached the regex and won't recompile because of the /o modifier.

    An example of such a case would be:

      my $pat = $q->param("keyword");
      foreach( @list ) {
        print if /$pat/o;
      }
    

    To make sure you don't miss these bugs always test your CGI in single process mode.

    To solve this particular /o modifier problem refer to Compiled Regular Expressions.

    [TOC]


    Script's name space

    Scripts under Apache::Registry do not run in package main, they run in a unique name space based on the requested URI. For example, if your URI is /perl/test.pl the package will be called Apache::ROOT::perl::test_2epl.

    [TOC]


    @INC and mod_perl

    The basic Perl @INC behaviour is explained in section use(), require(), do(), %INC and @INC Explained.

    When running under mod_perl, once the server is up @INC is frozen and cannot be updated. The only opportunity to temporarily modify @INC is while the script or the module are loaded and compiled for the first time. After that its value is reset to the original one. The only way to change @INC permanently is to modify it at Apache startup.

    Two ways to alter @INC at server startup:

    • In the configuration file. For example add:

        PerlSetEnv PERL5LIB /home/httpd/perl
      

      or

        PerlSetEnv PERL5LIB /home/httpd/perl:/home/httpd/mymodules
      

    • In the startup file directly alter the @INC. For example

        startup.pl
        ----------
        use lib qw(/home/httpd/perl /home/httpd/mymodules);
      

      and load the startup file from the configuration file by:

        PerlRequire /path/to/startup.pl
      

    [TOC]


    Reloading Modules and Required Files

    You might want to read the ``use(), require(), do(), %INC and @INC Explained'' before you proceed.

    When you develop plain CGI scripts, you can just change the code, and rerun the CGI from your browser. Since the script isn't cached in memory, the next time you call it the server starts up a new perl process, which recompiles it from scratch. The effects of any modifications you've applied are immediately present.

    The situation is different with Apache::Registry, since the whole idea is to get maximum performance from the server. By default, the server won't spend time checking whether any included library modules have been changed. It assumes that they weren't, thus saving a few milliseconds to stat() the source file (multiplied by however many modules/libraries you use() and/or require() in your script.)

    The only check that is done is to see whether your main script has been changed. So if you have only scripts which do not use() or require() other perl modules or packages, there is nothing to worry about. If, however, you are developing a script that includes other modules, the files you use() or require() aren't checked for modification and you need to do something about that.

    So how do we get our modperl-enabled server to recognize changes in library modules? Well, there are a couple of techniques:

    [TOC]


    Restarting the server

    The simplest approach is to restart the server each time you apply some change to your code. See Server Restarting techniques.

    After restarting the server about 100 times, you will tire of it and you will look for other solutions.

    [TOC]


    Using Apache::StatINC for the Development Process

    Help comes from the Apache::StatINC module. When Perl pulls a file via require(), it stores the full pathname as a value in the global hash %INC with the file name as the key. Apache::StatINC looks through %INC and it immediately reloads any files it finds in there if it sees that they have been updated on disk.

    To enable this module just add two lines to httpd.conf.

      PerlModule Apache::StatINC
      PerlInitHandler Apache::StatINC
    

    To be sure it really works, turn on debug mode on your development box by adding PerlSetVar StatINCDebug On to your config file. You end up with something like this:

      PerlModule Apache::StatINC
      <Location /perl>
        SetHandler perl-script
        PerlHandler Apache::Registry
        Options ExecCGI
        PerlSendHeader On
        PerlInitHandler Apache::StatINC
        PerlSetVar StatINCDebug On
      </Location>
    

    Be aware that only the modules located in @INC are reloaded on change, and you can change @INC only before the server has been started (in the startup file).

    Nothing you do in your scripts and modules which are pulled in with require() after server startup will have any effect on @INC.

    When you write:

      use lib qw(foo/bar);
    

    @INC is changed only for the time the code is being parsed and compiled. When that's done, @INC is reset to its original value.

    To make sure that you have set @INC correctly, configure /perl-status location, fetch http://www.nowhere.com/perl-status?inc and look at the bottom of the page, where the contents of @INC will be shown.

    Notice the following trap:

    While ``.'' is in @INC, perl knows to require() files with pathnames given relative to the current (script) directory. After the script has been parsed, the server doesn't remember the path!

    So you can end up with a broken entry in %INC like this:

      $INC{bar.pl} eq "bar.pl"
    

    If you want Apache::StatINC to reload your script--modify @INC at server startup, or use a full path in the require() call.

    [TOC]


    Configuration Files: Writing, Dynamically Updating and Reloading

    Checking all the modules in %INC on every request can add a large overhead to server response times, and you certainly would not want the Apache::StatINC module to be enabled in your production site's configuration. But sometimes you want a configuration file reloaded when it is updated, without restarting the server.

    This is an especially important feature if for example you have a person that is allowed to modify some of the tool configuration, but for security reasons it's undesirable for him to telnet to the server to restart it.

    [TOC]


    Writing Configuration Files

    Since we are talking about configuration files, I would like to show you some good and bad approaches to configuration file writing.

    If you have a configuration file of just a few variables, it doesn't really matter how you do it. But generally this is not the case. Configuration files tend to grow as a project grows. It's very relevant to projects that generate HTML files, since they tend to demand many easily configurable parameters, like headers, footers, colors and so on.

    So let's start with the approach that is most often taken by CGI scripts writers. All configuration variables are defined in a separate file.

    For example:

      $cgi_dir = "/home/httpd/perl";
      $cgi_url = "/perl";
      $docs_dir = "/home/httpd/docs";
      $docs_url = "/";
      $img_dir = "/home/httpd/docs/images";
      $img_url = "/images";
      ... many more config params here ...
      $color_hint   = "#777777";
      $color_warn   = "#990066";
      $color_normal = "#000000";
    

    The use strict; pragma demands all the variables be declared. When we want to use these variables in a mod_perl script we must declare them with use vars in the script.

    So we start the script with:

      use strict;
      use vars qw($cgi_dir $cgi_url $docs_dir $docs_url 
                  ... many more config params here ....
                  $color_hint  $color_warn $color_normal
                 );
    

    It is a nightmare to maintain such a script, especially if not all the features have been coded yet. You have to keep adding and removing variable names. But that's not a big deal.

    Since we want our code clean, we start the configuration file with use strict; as well, so we have to list the variables with use vars pragma here as well. A second list of variables to maintain.

    If you have many scripts, you may get collisions between configuration files. One of the best solutions is to declare packages, with unique names of course. For example for our configuration file we might declare the following package name:

      package My::Config;
    

    The moment you add a package declaration and think that you are done, you realize that the nightmare has just begun. When you have declared the package, you cannot just require() the file and use the variables, since they now belong to a different package. So you have either to modify all your scripts to use a fully qualified notation like $My::Config::cgi_url instead of just $cgi_url or to import the needed variables into any script that is going to use them.

    Since you don't want to do the extra typing to make the variables fully qualified, you'd go for importing approach. But your configuration package has to export them first. That means that you have to list all the variables again and now you have to keep at least three variable lists updated when you make some changes in the naming of the configuration variables. And that's when you have only one script that uses the configuration file, in the general case you have many of them. So now our example configuration file looks like this:

      package My::Config;
      use strict;
      
      BEGIN {
        use Exporter ();
      
        @My::HTML::ISA       = qw(Exporter);
        @My::HTML::EXPORT    = qw();
        @My::HTML::EXPORT_OK = qw($cgi_dir $cgi_url $docs_dir $docs_url
                                  ... many more config params here ....
                                  $color_hint $color_warn $color_normal);
      }
      
      use vars qw($cgi_dir $cgi_url $docs_dir $docs_url 
                  ... many more config params here ....
                  $color_hint  $color_warn $color_normal
                 );
      
      $cgi_dir = "/home/httpd/perl";
      $cgi_url = "/perl";
      $docs_dir = "/home/httpd/docs";
      $docs_url = "/";
      $img_dir = "/home/httpd/docs/images";
      $img_url = "/images";
      ... many more config params here ...
      $color_hint   = "#777777";
      $color_warn   = "#990066";
      $color_normal = "#000000";
    

    And in the code:

      use strict;
      use My::Config qw($cgi_dir $cgi_url $docs_dir $docs_url 
                        ... many more config params here ....
                        $color_hint  $color_warn $color_normal
                       );
      use vars       qw($cgi_dir $cgi_url $docs_dir $docs_url 
                        ... many more config params here ....
                        $color_hint  $color_warn $color_normal
                       );
    

    This approach is especially bad in the context of mod_perl, since exported variables add a memory overhead. The more variables exported the more memory you use. If we multiply this overhead by the number of servers we are going to run, we get a pretty big number which could be used to run a few more servers instead.

    As a matter of fact things aren't so bad. You can group your variables, and call the groups by special names called tags, which can later be used as arguments to the import() or use() calls. You are probably familiar with:

      use CGI qw(:standard :html);
    

    We can implement it quite easily, with help of export_ok_tags() from Exporter. For example:

      BEGIN {
        use Exporter ();
        use vars qw( @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);
        @ISA         = qw(Exporter);
        @EXPORT      = qw();
        @EXPORT_OK   = qw();
        
        %EXPORT_TAGS = (
          vars => [qw($fname $lname)],
          subs => [qw(reread_conf untaint_path)],
        );
        Exporter::export_ok_tags('vars');
        Exporter::export_ok_tags('subs');
      }
    

    You export subroutines exactly like variables, since what's actually being exported is a symbol. The definition of these subroutines is not shown here.

    Notice that we didn't use export_tags(), as it exports the variables automatically without user asking for them in first place, which is considered a bad style. If a module automatically exports variables with export_tags() you can avoid this by not exporting at all:

      use My::Config ();
    

    In your code you can now write:

      use My::Config qw(:subs :vars);
    

    Groups of group tags:

    The :all tag from CGI.pm is a group tag of all other groups. It will require a little more effort from you, but you can always save time by looking at the solution in the code of CGI.pm. It's just a matter of a little code to expand all the groups recursively.

    After going through the pain of maintaining a list of variables in a big project with a huge configuration file (more than 100 variables) and many files actually using them, I came up with a much simpler solution: keeping all the variables in a single hash, which is built from references to other anonymous scalars, arrays and hashes.

    Now my configuration file looks like this:

      package My::Config;
      use strict;
      
      BEGIN {
        use Exporter ();
      
        @My::Config::ISA       = qw(Exporter);
        @My::Config::EXPORT    = qw();
        @My::Config::EXPORT_OK = qw(%c);
      }
      
      use vars qw(%c);
      
      %c = 
        (
         dir => {
                 cgi  => "/home/httpd/perl",
                 docs => "/home/httpd/docs",
                 img  => "/home/httpd/docs/images",
                },
         url => {
                 cgi  => "/perl",
                 docs => "/",
                 img  => "/images",
                },
         color => {
                   hint   => "#777777",
                   warn   => "#990066",
                   normal => "#000000",
                  },
        );
    

    Good perl style suggests keeping a comma at the end of lists. That's because additional items tend to be added to the end of the list. If you keep that last comma in place, you don't have to remember to add one when you add a new item.

    So now the script looks like this:

      use strict;
      use My::Config qw(%c);
      use vars       qw(%c)
      print "Content-type: text/plain\r\n\r\n";
      print "My url docs root: $c{url}{docs}\n";
    

    Do you see the difference? The whole mess has gone, there is only one variable to worry about.

    So far so good, but let's make it even better. I would like to get rid of the Exporter stuff completely. I remove all the exporting code so my config file now looks like:

      package My::Config;
      use strict;
      use vars qw(%c);
      
      %c = 
        (
         dir => {
                 cgi  => "/home/httpd/perl",
                 docs => "/home/httpd/docs",
                 img  => "/home/httpd/docs/images",
                },
         url => {
                 cgi  => "/perl",
                 docs => "/",
                 img  => "/images",
                },
         color => {
                   hint   => "#777777",
                   warn   => "#990066",
                   normal => "#000000",
                  },
        );
    

    And the code:

      use strict;
      use My::Config ();
      print "Content-type: text/plain\r\n\r\n";
      print "My url docs root: $My::Config::c{url}{docs}\n";
    

    Since we still want to save lots of typing, and since now we need to use a fully qualified notation like $My::Config::c{url}{docs}, let's use a magical perl aliasing feature. I'll modify the code to be:

      use strict;
      use My::Config ();
      use vars qw(%c);
      *c = \%My::Config::c;
      print "Content-type: text/plain\r\n\r\n";
      print "My url docs root: $c{url}{docs}\n";
    

    I have aliased the *c glob with \%My::Config::c, a reference to a hash. From now on, %My::Config::c and %c are the same hash and you can read from or modify either of them.

    Just one last little point. Sometimes you see a lot of redundancy in the configuration variables, for example:

      $cgi_dir  = "/home/httpd/perl";
      $docs_dir = "/home/httpd/docs";
      $img_dir  = "/home/httpd/docs/images";
    

    Now if you want to move the base path "/home/httpd" into a new place, it demands lots of typing. Of course the solution is:

      $base     = "/home/httpd";
      $cgi_dir  = "$base/perl";
      $docs_dir = "$base/docs";
      $img_dir  = "$docs_dir/images";
    

    You cannot do the same trick with a hash, since you cannot refer to its values before the definition is finished. So this wouldn't work:

      %c =
        (
         base => "/home/httpd",
         dir => {
                 cgi  => "$c{base}/perl",
                 docs => "$c{base}/docs",
                 img  => "$c{base}{docs}/images",
                },
        );
    

    But nothing stops us from adding additional variables, which are lexically scoped with my(). The following code is correct.

      my $base = "/home/httpd";
      %c =
        (
         dir => {
                 cgi  => "$base/perl",
                 docs => "$base/docs",
                 img  => "$base/docs/images",
                },
        );
    

    You have just learned how to make configuration files easily maintainable, and how to save memory by avoiding the export of variables into a script's namespace.

    [TOC]


    Reloading Configuration Files

    First, lets look at a simple case, when we just have to look after a simple configuration file like the one below. Imagine a script that tells you who is patch pumpkin of the current perl release.

    Sidenote: <Pumpkin> A humorous term for the token (notional or real) that gives its possessor (the ``pumpking'' or the ``pumpkineer'') exclusive access to something, e.g. applying patches to a master copy of some source (for which the token is called the ``patch pumpkin'').

      use CGI ();
      use strict;
      
      my $fname = "Larry";
      my $lname = "Wall";
      my $q = new CGI;
      
      print $q->header(-type=>'text/html');
      print $q->p("$fname $lname holds the patch pumpkin
                   for this perl release.");
    

    The script has a hardcoded value for the name. It's very simple: initialize the CGI object, print the proper HTTP header and tell the world who is the current patch pumpkin.

    When the patch pumpkin changes we don't want to modify the script. Therefore, we put the $fname and $lname variables into a configuration file.

      $fname = "Gurusamy";
      $lname = "Sarathy";
      1;
    

    Please note that there is no package declaration in the above file, so the code will be evaluated in the caller's package or in the main:: package if none was declared. This means that the variables $fname and $lname will override (or initialize if they weren't yet) the variables with the same names in the caller's namespace. This works for global variables only--you cannot update variables defined lexically (with my()) by this technique.

    You have started the server and everything is working properly. After a while you decide to modify the configuration. How do you let your running server know that the configuration was modified without restarting it? Remember we are in production and server restarting can be quite expensive for us. One of the simplest solutions is to poll the file's modification time by calling stat() before the script starts to do real work. If we see that the file was updated, we force a reconfiguration of the variables located in this file. We will call the function that reloads the configuration reread_conf() and it accepts a single argument, which is a relative path to the configuration file.

    Apache::Registry calls a chdir() to the script's directory before it starts the script's execution. So if your CGI script is invoked under the Apache::Registry handler you can put the configuration file in the same directory as the script. Alternatively you can put the file in a directory below that and use a path relative to the script directory. You have to make sure that the file will be found, somehow. Be aware that do() searches the libraries in the directories in @INC.

      use vars qw(%MODIFIED);
      sub reread_conf{
        my $file = shift;
        return unless $file;
        return unless -e $file and -r _;
        unless ($MODIFIED{$file} and $MODIFIED{$file} == -M _){
          my $return;
          unless ($return = do $file) {
            warn "couldn't parse $file: $@" if $@;
            warn "couldn't do $file: $!"    unless defined $return;
            warn "couldn't run $file"       unless $return;
          }
          $MODIFIED{$file} =  -M _; # Update the MODIFICATION times
        }
      } # end of reread_conf
    

    When require(), use() or do() operators successfully return, the file that was passed as an argument is inserted into %INC (the key is the name of the file and the value the path to it). Specifically, when Perl sees require() or use() in the code, it first tests %INC to see whether it's already there and thus loaded. If the test returns true, Perl saves the overhead of code re-reading and re-compiling.

    You generally don't notice with plain perl scripts, but in mod_perl it's used all the time; after the first request served by a process all the files loaded by require() stay in memory. If the file is preloaded at server startup, even the first request doesn't have the loading overhead.

    We use do() to reload the code in this file and not require() because while do() behaves almost indentically to require(), it reloads the file unconditionally. If do() cannot read the file, it returns undef and sets $! to report the error. If do() can read the file but cannot compile it, it returns undef and sets an error message in $@. If the file is successfully compiled, do() returns the value of the last expression evaluated.

    The configuration file can be broken if someone has incorrectly modified it. We don't want the whole service that uses that file to be broken, just because of that. We trap the possible failure to do() the file and ignore the changes, by the resetting the modification time. If do() fails to load the file it might be a good idea to send an email to the system administrator about the problem.

    Notice however, that since do() updates %INC like require() does, if you are using Apache::StatINC it will attempt to reload this file before the reread_conf() call. So if the file wouldn't compile, the request will be aborted. Apache::StatINC shouldn't be used in production (because it slows things down by stat()'ing all the files listed in %INC) so this shouldn't be a problem.

    Note that we assume that the entire purpose of this function is to reload the configuration if it was changed. This is fail-safe, as if something goes wrong we just return without modifying the server configuration. The script should not be used to initialize the variables on its first invocation. To do that, you would need to replace each occurence of return() and warn() with die(). If you do that, take a look at the section ``Redirecting Errors to the Client instead of error_log''.

    I used the above approach when I had a huge configuration file that was loaded only at server startup, and another little configuration file that included only a few variables that could be updated by hand or through the web interface. Those variables were initialized in the main configuration file. If the webmaster breaks the syntax of this dynamic file while updating it by hand, it won't affect the main (write-protected) configuration file and so stop the proper execution of the programs. Soon we will see a simple web interface which allows us to modify the configuration file without actually breaking it.

    A sample script using the presented subroutine would be:

      use vars qw(%MODIFIED $fname $lname);
      use CGI ();
      use strict;
      
      my $q = new CGI;
      print $q->header(-type=>'text/plain');
      my $config_file = "./config.pl";
      reread_conf($config_file);
      print $q->p("$fname $lname holds the patch pumpkin
                   for this perl release.");
      
      sub reread_conf{
        my $file = shift;
        return unless $file;
        return unless -e $file and -r _;
        unless ($MODIFIED{$file} and $MODIFIED{$file} == -M _){
          my $return;
          unless ($return = do $file) {
            warn "couldn't parse $file: $@" if $@;
            warn "couldn't do $file: $!"    unless defined $return;
            warn "couldn't run $file"       unless $return;
          }
          $MODIFIED{$file} =  -M _; # Update the MODIFICATION times
        }
      } # end of reread_conf
    

    Remember that you should be using (stat $file)[9] instead of -M $file if you are modifying the $^M variable. In some of my scripts, I reset $^M to the time of the script invocation with "$^M = time()". That way I can perform -M and the similar (-A, -C) file status tests relative to the script invocation time, and not the time the process was started.

    If your configuration file is more sophisticated and it declares a package and exports variables, the above code will work just as well. Even if you think that you will have to import() variables again, when do() recompiles the script the originally imported variables get updated with the values from the reloaded code.

    [TOC]


    Dynamically updating configuration files

    The CGI script below allows a system administrator to dynamically update a configuration file through the web interface. Combining this with the code we have just seen to reload the modified files, you get a system which is dynamically reconfigurable without server restart. Configuration can be performed from any machine having just a web interface (a simple browser connected to the Internet).

    Let's say you have a configuration file like this:

      package MainConfig;
      
      use strict;
      use vars qw(%c);
      
      %c = (
            name     => "Larry Wall",
            release  => "5.000",
            comments => "Adding more ways to do the same thing :)",
      
            other    => "More config values",
      
            hash     => { foo  => "bar",
                        fooo => "barr",
                      },
      
            array    => [qw( a b c)],
      
           );
    

    You want to make the variables name, release and comments dynamically configurable. You want to have a web interface with an input form that allows you to modify these variables. Once modified you want to update the configuration file and propagate the changes to all the currently running processes. Quite a simple task.

    Let's look at the main stages of the implementation. Create a form with preset current values of the variables. Let the administrator modify it and submit the changes. Validate the submitted information (numeric fields should carry numbers, literals--words, etc). Update the configuration file. Update the modified value in the memory of the current process. Present the form as before but with updated fields if any.

    The only part that seems to be complicated to implement is a configuration file update, for a couple of reasons. If updating the file breaks it, the whole service won't work. If the file is very big and includes comments and complex data structures, parsing the file can be quite a challenge.

    So let's simplify the task. If all we want is to updated a few variables, why don't we create a tiny configuration file with just those variables? It can be modified through the web interface and overwritten each time there is something to be changed. This way we don't have to parse the file before updating it. If the main configuration file is changed we don't care, we don't depend on it any more.

    The dynamically updated variables are duplicated, they will be in the main file and in the dynamic file. We do this to simplify maintainance. When a new release is installed the dynamic configuration file won't exist at all. It will be created only after the first update. As we just saw, the only change in the main code is to add a snippet to load this file if it exists and was changed.

    This additional code must be executed after the main configuration file has been loaded. That way the updated variables will override the default values in the main file.

    META: extend on the comments:

      # remember to run this code under taint mode
      
      use strict;
      use vars qw($q %c $dynamic_config_file %vars_to_change %validation_rules);
      
      use CGI ();
      
      use lib qw(.);
      use MainConfig ();
      *c = \%MainConfig::c;
      
      $dynamic_config_file = "./config.pl";
      
      # load the dynamic configuration file if exists, and override the
      # default values from the main configuration file
      do $dynamic_config_file if -e $dynamic_config_file and -r _;
      
      # fields that can be changed and their titles
      %vars_to_change =
        (
         'name'     => "Patch Pumpkin's Name",
         'release'  => "Current Perl Release",
         'comments' => "Release Comments",
        );
      
      %validation_rules =
        (
         'name'     => sub { $_[0] =~ /^[\w\s\.]+$/;   },
         'release'  => sub { $_[0] =~ /^\d+\.[\d_]+$/; },
         'comments' => sub { 1;                        },
        );
      
      $q = new CGI;
      print $q->header(-type=>'text/html'),
        $q->start_html();
      
      my %updates = ();
      
      # We always rewrite the dynamic config file, so we want all the
      # vars to be passed, but to save time we will only do checking
      # of vars that were changed.  The rest will be retrieved from
      # the 'prev_foo' values.
      foreach (keys %vars_to_change) {
        # copy var so we can modify it
        my $new_val = $q->param($_) || '';
      
        # strip a possible ^M char (DOS/WIN)
        $new_val =~ s/\cM//g;
      
        # push to hash if was changed
        $updates{$_} = $new_val
          if defined $q->param("prev_".$_) and $new_val ne $q->param("prev_".$_);
      }
      
      # Note that we cannot trust the previous values of the variables
      # since they were presented to the user as hidden form variables,
      # and the user can mangle those. We don't care: it cannot do any
      # damage, as we verify each variable by rules which we define.
    

      # Process if there is something to process. Will be not called if
      # it's invoked a first time to display the form or when the form
      # was submitted but the values weren't modified (we know that by
      # comparing with the previous values of the variables, which are
      # the hidden fields in the form)
      
      # process and update the values if valid
      process_change_config(%updates) if %updates;
      
      # print the update form
      conf_modification_form();
      
      # update the config file but first validate that the values are correct ones
      #########################
      sub process_change_config{
        my %updates = @_;
      
          # we will list here all the malformatted vars
        my %malformatted = ();
      
        print $q->b("Trying to validate these values<BR>");
        foreach (keys %updates) {
          print "<DT><B>$_</B> => <PRE>$updates{$_}</PRE>";
      
          # now we have to handle each var to be changed very very carefully
          # since this file goes immediately into production!
          $malformatted{$_} = delete $updates{$_}
            unless $validation_rules{$_}->($updates{$_});
      
        } # end of foreach my $var (keys %updates)
      
        # print warnings if there are any invalid changes
        print $q->hr,
          $q->p($q->b(qq{Warning! These variables were changed
                       but found malformed, thus the original
                       values will be preserved.})
             ),
          join(",<BR>",
             map { $q->b($vars_to_change{$_}) . " : $malformatted{$_}\n"
                 } keys %malformatted)
            if %malformatted;
      
        # Now complete the vars that weren't changed from the
        # $q->param('prev_var') values
        map { $updates{$_} = $q->param('prev_'.$_) unless exists $updates{$_}
            } keys %vars_to_change;
      
        # Now we have all the data that should be written into the dynamic
        # config file
      
          # escape single quotes "'" while creating a file
        my $content = join "\n",
          map { $updates{$_} =~ s/(['\\])/\\$1/g;
              '$c{' . $_ . "}  =  '" . $updates{$_} . "';\n"
            } keys %updates;
      
          # now add '1;' to make require() happy
        $content .= "\n1;";
      
          # keep the dummy result in $r so it won't complain
        eval {my $res = $content};
        if ($@) {
          print qq{Warning! Something went wrong with config file
                 generation!<P> The error was : <BR><PRE>$@</PRE>};
          return;
        }
      
        print $q->hr;
      
          # overwrite the dynamic config file
        use Symbol ();
        my $fh = Symbol::gensym();
        open $fh, ">$dynamic_config_file.bak"
          or die "Can't open $dynamic_config_file.bak for writing :$! \n";
        flock $fh,2; # exclusive lock
        seek $fh,0,2;       # rewind to the start
        truncate $fh, 0; # the file might shrink!
           print $fh $content;
        close $fh;
      
          # OK, now we make a real file
        rename "$dynamic_config_file.bak",$dynamic_config_file;
      
          # rerun it to update variables in the current process! Note that
          # it won't update the variables in other processes. A special
          # code that watches the timestamps on the config file will do this
          # work for each process. Since the next invocation will update the
          # configuration anyway, why do we need to load it here? The reason
          # is simple: we are going to fill the form's input fields with
          # the updated data.
        do $dynamic_config_file;
      
      } # end sub process_change_config
      
      ##########################
      sub conf_modification_form{
      
        print $q->center($q->h3("Update Form"));
      
        print $q->hr,
          $q->p(qq{This form allows you to dynamically update the current
                 configuration. You don\'t need to restart the server in
                 order for changes to take an effect}
               );
      
          # set the previous settings in the form's hidden fields, so we
          # know whether we have to do some changes or not
        map {$q->param("prev_$_",$c{$_}) } keys %vars_to_change;
      
          # raws for the table, go into the form
        my @configs = ();
      
          # prepare one textfield entries
        push @configs,
          map {
            $q->td(
                 $q->b("$vars_to_change{$_}:"),
                ),
            $q->td(
                 $q->textfield(-name      => $_,
                               -default   => $c{$_},
                               -override  => 1,
                               -size      => 20,
                               -maxlength => 50,
                              )
                ),
              } qw(name release);
      
          # prepare multiline textarea entries
        push @configs,
          map {
            $q->td(
                 $q->b("$vars_to_change{$_}:"),
                ),
            $q->td(
                 $q->textarea(-name    => $_,
                              -default => $c{$_},
                              -override  => 1,
                              -rows    => 10,
                              -columns => 50,
                              -wrap    => "HARD",
                              )
                ),
              } qw(comments);
      
        print $q->startform('POST',$q->url),"\n",
            $q->center($q->table(map {$q->Tr($_),"\n",} @configs),
                       $q->submit('','Update!'),"\n",
                      ),
            map ({$q->hidden("prev_".$_, $q->param("prev_".$_))."\n" }
                 keys %vars_to_change), # hidden previous values
            $q->br,"\n",
            $q->endform,"\n",
            $q->hr,"\n",
            $q->end_html;
      
      } # end sub conf_modification_form
    

    Once updated the script generates a file like:

      $c{release}  =  '5.6';
      
      $c{name}  =  'Gurusamy Sarathy';
      
      $c{comments}  =  'Perl rules the world!';
      
      1;
    

    [TOC]


    Reloading handlers

    If you want to reload a perlhandler on each invocation, the following trick will do it:

      PerlHandler "sub { do 'MyTest.pm'; MyTest::handler(shift) }"
    

    do() reloads MyTest.pm on every request.

    [TOC]


    Name collisions with Modules and libs

    This sections requires an indepth understanding of use(), require(), do(), %INC and @INC .

    To make things clear before we go into details: each child process has its own %INC hash which is used to store information about its compiled modules. The keys of the hash are the names of the modules and files passed as arguments to require() and use(). The values are the full or relative paths to these modules and files.

    Suppose we have my-lib.pl and MyModule.pm both located at /home/httpd/perl/my/.

    • /home/httpd/perl/my/ is in @INC at server startup.

        require "my-lib.pl";
        use MyModule.pm;
        print $INC{"my-lib.pl"},"\n";
        print $INC{"MyModule.pm"},"\n";
      

      prints:

        /home/httpd/perl/my/my-lib.pl
        /home/httpd/perl/my/MyModule.pm
      

      Adding use lib:

        use lib qw(.);
        require "my-lib.pl";
        use MyModule.pm;
        print $INC{"my-lib.pl"},"\n";
        print $INC{"MyModule.pm"},"\n";
      

      prints:

        my-lib.pl
        MyModule.pm
      

    • /home/httpd/perl/my/ isn't in @INC at server startup.

        require "my-lib.pl";
        use MyModule.pm;
        print $INC{"my-lib.pl"},"\n";
        print $INC{"MyModule.pm"},"\n";
      

      wouldn't work, since perl cannot find the modules.

      Adding use lib:

        use lib qw(.);
        require "my-lib.pl";
        use MyModule.pm;
        print $INC{"my-lib.pl"},"\n";
        print $INC{"MyModule.pm"},"\n";
      

      prints:

        my-lib.pl
        MyModule.pm
      

    Let's look at three scripts with faults related to name space. For the following discussion we will consider just one individual child process.

    Scenario 1

    First, You can't have two identical module names running under the same server! Only the first one found in a use() or require() statement will be compiled into the package, the request for the other module will be skipped, since the server will think that it's already compiled. This is a direct result of using <%INC>, which has keys equal to the names of the modules. Two identical names will refer to the same key in the hash. (Refer to the section Watching the server to find out how you can know what is loaded and where.)

    So if you have two different Foo modules in two different directories and two scripts script1.pl and script2.pl, placed like this:

      ./perl/tool1/Foo.pm
      ./perl/tool1/tool1.pl
      ./perl/tool2/Foo.pm
      ./perl/tool2/tool2.pl
    

    Where a sample code could be:

      ./perl/tool1/tool1.pl
      --------------------
      use Foo;
      print "Content-type: text/plain\r\n\r\n";
      print "I'm Script number One\n";
      foo();
      --------------------
    

      ./perl/tool1/Foo.pm
      --------------------
      sub foo{
        print "<B>I'm Tool Number One!</B>\n";
      }
      1;
      --------------------
    

      ./perl/tool2/tool2.pl
      --------------------
      use Foo;
      print "Content-type: text/plain\r\n\r\n";
      print "I'm Script number Two\n";
      foo();
      --------------------
    

      ./perl/tool2/Foo.pm
      --------------------
      sub foo{
        print "<B>I'm Tool Number Two!</B>\n";
      }
      1;
      --------------------
    

    Both scripts call use Foo;. Only the first one called will know about Foo. When you call the second script it will not know about Foo at all--it's like you've forgotten to write use Foo;. Run the server in single server mode to detect this kind of bug immediately.

    You will see the following in the error_log file:

      Undefined subroutine
      &Apache::ROOT::perl::tool2::tool2_2epl::foo called at
      /home/httpd/perl/tool2/tool2.pl line 4.
    

    Scenario 2

    If the files do not declare a package, the above is true for files you require() as well:

    Suppose the content of the scripts and config.pl files is exactly like in the example above, and you have a directory structure like this:

      ./perl/tool1/config.pl
      ./perl/tool1/tool1.pl
      ./perl/tool2/config.pl
      ./perl/tool2/tool2.pl
    

    and both scripts contain

      use lib qw(.);
      require "config.pl";
    

    The second scenario is not different from the first, there is no difference between use() and require() if you don't have to import some symbols into a calling script. Only the first script served will actually do the require(), for the same reason as the example above. %INC already includes the key "config.pl"!

    Scenario 3

    It is interesting that the following scenario will fail too!

      ./perl/tool/config.pl
      ./perl/tool/tool1.pl
      ./perl/tool/tool2.pl
    

    where tool1.pl and tool2.pl both require() the same config.pl.

    There are three solutions for this:

    Solution 1

    The first two faulty scenarios can be solved by placing your library modules in a subdirectory structure so that they have different path prefixes. The file system layout will be something like:

      ./perl/tool1/Tool1/Foo.pm
      ./perl/tool1/tool1.pl
      ./perl/tool2/Tool2/Foo.pm
      ./perl/tool2/tool2.pl
    

    And modify the scripts:

      use Tool1::Foo;
      use Tool2::Foo;
    

    For require() (scenario number 2) use the following:

      ./perl/tool1/tool1-lib/config.pl
      ./perl/tool1/tool1.pl
      ./perl/tool2/tool2-lib/config.pl
      ./perl/tool2/tool2.pl
    

    And each script contains respectively:

      use lib qw(.);
      require "tool1-lib/config.pl";
    

      use lib qw(.);
      require "tool2-lib/config.pl";
    

    This solution isn't good, since while it might work for you now, if you add another script that wants to use the same module or config.pl file, it would fail as we saw in the third scenario.

    Let's see some better solutions.

    Solution 2

    Another option is to use a full path to the script, so it will be used as a key in %INC;

      require "/full/path/to/the/config.pl";
    

    This solution solves the problem of the first two scenarios. I was surprised that it worked for the third scenario as well!

    With this solution you lose some portability. If you move the tool around in the file system you will have to change the base directory or write some additional script that will automatically update the hardcoded path after it was moved. Of course you will have to remember to invoke it.

    Solution 3

    Make sure you read all of this solution.

    Declare a package in the required files! It should be unique to the rest of the package names you use. %INC will then use the unique package name for the key. It's a good idea to use at least two-level package names for your private modules, e.g. MyProject::Carp and not Carp, since the latter will collide with an existing standard package. Even if as of the time of your coding it doesn't yet exist, a package might enter the next perl distribution as a standard module and your code will be broken. Foresee problems like this and save yourself future trouble.

    What are the implications of package declaration?

    Without package declarations, it is very convenient to use() or require() files because all the variables and subroutines are part of the main:: package. Any of them can be used as if they are part of the main script. With package declarations things are more awkward. You have to use the Package::function() method to call a subroutine from Package and to access a global variable $foo inside the same package you have to write $Package::foo.

    Lexically defined variables, those declared with my() inside Package will be inaccessible from outside the package.

    You can leave your scripts unchanged if you import the names of the global variables and subroutines into the namespace of package main:: like this:

      use Module qw(:mysubs sub_b $var1 :myvars);
    

    You can export both subroutines and global variables. Note however that this method has the disadvantage of consuming more memory for the current process.

    See perldoc Exporter for information about exporting other variables and symbols.

    This completely covers the third scenario. When you use different module names in package declarations, as explained above, you cover the first two as well.

    See also the perlmodlib and perlmod manpages.

    From the above discussion it should be clear that you cannot run development and production versions of the tools using the same apache server! You have to run a separate server for each. They can be the same machine, but the servers will use different ports.

    [TOC]


    More package name related issues

    If you have the following:

      PerlHandler Apache::Work::Foo
      PerlHandler Apache::Work::Foo::Bar
    

    If you make a request that pulls in Apache/Work/Foo/Bar.pm first, then the Apache::Work::Foo package gets defined, so mod_perl does not try to pull in Apache/Work/Foo.pm

    [TOC]


    __END__ and __DATA__ tokens

    Apache::Registry scripts cannot contain __END__ or __DATA__ tokens.

    Why? Because Apache::Registry scripts are being wrapped into a subroutine called handler, like the script at URI /perl/test.pl:

      print "Content-type: text/plain\r\n\r\n";
      print "Hi";
    

    When the script is being executed under Apache::Registry handler, it actually becomes:

      package Apache::ROOT::perl::test_2epl;
      use Apache qw(exit);
      sub handler {
        print "Content-type: text/plain\r\n\r\n";
        print "Hi";
      }
    

    So if you happen to put an __END__ tag, like:

      print "Content-type: text/plain\r\n\r\n";
      print "Hi";
      __END__
      Some text that wouldn't be normally executed
    

    it will be turned into:

      package Apache::ROOT::perl::test_2epl;
      use Apache qw(exit);
      sub handler {
        print "Content-type: text/plain\r\n\r\n";
        print "Hi";
        __END__
        Some text that wouldn't be normally executed
      }
    

    and you try to execute this script, you will receive the following warning:

      Missing right bracket at .... line 4, at end of line
    

    Perl cuts everything after the __END__ tag. The same applies to the __DATA__ tag.

    Also, rememeber that whatever applies to Apache::Registry scripts, in most cases applies to Apache::PerlRun scripts.

    [TOC]


    Output from system calls

    The output of system(), exec(), and open(PIPE,"|program") calls will not be sent to the browser unless your Perl was configured with sfio.

    You can use backticks as a possible workaround:

      print `command here`;
    

    But you're throwing performance out the window either way. Best not to fork at all if you can avoid it. See the ``Forking or Executing subprocesses from mod_perl'' section to learn about implications of forking.

    [TOC]


    Using format() and write()

    The interface to filehandles which are linked to variables with Perl's tie() function is not yet complete. The format() and write() functions are missing. If you configure Perl with sfio, write() and format() should work just fine.

    [TOC]


    Terminating requests and processes, the exit() and child_terminate() functions

    Perl's exit() built-in function cannot be used in mod_perl scripts. Calling it causes the mod_perl process to exit (which defeats the object of using mod_perl). The Apache::exit() function should be used instead.

    You might start your scripts by overriding the exit() subroutine (if you use Apache::exit() directly, you will have a problem testing the script from the shell, unless you put use Apache (); into your code.) I use the following code:

      BEGIN {
          # Auto-detect if we are running under mod_perl or CGI.
        $USE_MOD_PERL = ( (exists $ENV{'GATEWAY_INTERFACE'}
                       and $ENV{'GATEWAY_INTERFACE'} =~ /CGI-Perl/)
                          or exists $ENV{'MOD_PERL'} ) ? 1 : 0;
      }
      use subs qw(exit);
      
      # Select the correct exit function
      ########
      sub exit{
          # Apache::exit(-2) will cause the server to exit gracefully,
          # completing the logging functions and protocol requirements etc.
          #  (-2 == Apache::Constants::DONE)
        $USE_MOD_PERL ? Apache::exit(0) : CORE::exit(0);
      }
    

    Now the correct exit() will be always chosen, whether you run the script under mod_perl, ordinary CGI or from the shell.

    Note that if you run the script under Apache::Registry, The Apache function exit() overrides the Perl core built-in function. While you see exit() listed in @EXPORT_OK of the Apache package, Apache::Registry does something you don't see and imports this function for you. This means that if your script is running under Apache::Registry handler you don't have to worry about exit(). The same applies to Apache::PerlRun.

    If you use CORE::exit() in scripts running under modperl, the child will exit, but neither a proper exit nor logging will happen on the way. CORE::exit() cuts off the server's legs.

    If you need to shut down the child cleanly after the request was completed, use the $r->child_terminate method. You can call it anywhere in the code, and not just at the ``end''. This sets the value of the MaxRequestsPerChild configuration variable to 1 and clears the keepalive flag. After the request is serviced, the current connection is broken, because of the keepalive flag, and the parent tells the child to cleanly quit, because MaxRequestsPerChild is smaller than the number of requests served.

    You can accomplish this in two ways--in the Apache::Registry script:

      Apache->request->child_terminate;
    

    or in httpd.conf:

      PerlFixupHandler "sub { shift->child_terminate }"
    

    You would want to use the latter example only if you wanted the child to terminate every time the registered handler is called. Probably this is not what you want.

    Here is an example of assigning of the postprocessing handler:

      my $r = shift;
      $r->post_connection(\&exit_child);
      sub exit_child{
         # some logic here if needed
        $r->child_terminate;
      }
    

    The above is the code that is used by the Apache::SizeLimit module which terminates processes that grow bigger than a value you choose.

    Apache::GTopLimit (based on libgtop) is a similar module. It does the same thing, plus you can configure it to terminate processes when their shared memory shrinks below some specified size.

    As mentioned before, it is unnecessary to postpone the execution of child_terminate(). You can call it anywhere in the code, it won't terminate the child's execution until the request has been served. Don't confuse it with exit().

    [TOC]


    die() and mod_perl

    When you write:

      open FILE, "foo" or die "Cannot open foo file for reading: $!";
    

    in a perl script and execute it--the script would die() if it will be unable to open the file, by aborting the script execution, printing the death reason and quitting the Perl interpreter.

    You hardly will find a properly written Perl script that doesn't have at least one die() statement in it, if it has to cope with system calls and alike.

    CGI script running under mod_cgi exits on its completion. The Perl interperter exits as well. So it doesn't really matter whether the interpreter quits because the script died by natural death (when the last statement was executed) or aborted by die() statement.

    In mod_perl we don't want the interpreter to quit. We know already that when the script completes its chores the interpeter won't quit. There is no reason why it should quit when the script is stopped because of die(). As a result calling die() wouldn't quit the process.

    And this is how it works--when the die() gets triggered, it's mod_perl's $SIG{__DIE__} handler that logs the error message and calls exit() instead of real die(). Thus the script stops, but the process doesn't quit.

    This is an example of a trapping code, not the real code:

      $SIG{__DIE__} = sub { print STDERR @_; exit(); }
    

    [TOC]


    Testing the Code from the Shell

    Your CGI scripts will not yet run from the command line unless you use CGI::Switch or CGI.pm and at least Perl 5.004. They must also make no direct calls to Apache Perl API methods.

    [TOC]


    I/O is different

    If you are using Perl 5.004 or better, most CGI scripts can run under mod_perl untouched. If you're using 5.003, Perl's built-in read() and print() functions do not work as they do under CGI. If you're using CGI.pm, use $query->print instead of plain ol' print().

    [TOC]


    STDIN, STDOUT and STDERR streams

    In mod_perl both STDIN and STDOUT are tied to the socket the request came from. STDERR is tied to the error_log file.

    To print to STDOUT you can either use a regular print() (which is automagically tied to the the socket) or the $r->print method.

    [TOC]


    Global Variables Persistance

    Since the child process generally doesn't exit before it has serviced several requests, global variables persist inside the same process from request to request. This means that you must never rely on the value of the global variable if it wasn't initialized at the beginning of the request processing. See ``Variables globally, lexically scoped and fully qualified'' for more info.

    You should avoid using global variables unless it's impossible without them, because it will make the code development harder and you will have to make very sure that all the variables are initialized before being used. Use my() scoped variables everywhere you can.

    You should be especially careful with Perl Special Variables which cannot be lexically scoped. You have to use local() instead.

    [TOC]


    Generating correct HTTP Headers

    When writing your own handlers with the Perl API the proper way to send the HTTP Header is to set the header first and then to send it, like this:

      $r->content_type('text/html');
      $r->send_http_header;
      return OK if $r->header_only;
    

    If the client issues a HTTP HEAD request rather than the usual GET, to be compliant with the HTTP protocol we should not send the document body, but the the HTTP header only. When Apache receives a HEAD request, it sets header_only() to true. If we see that this has happened, we return from the handler immediately with an OK status code.

    Generally, you don't need the explicit content type setting, since Apache does it for you, by looking up the MIME type of the request by matching the extension of the URI in the MIME tables (from the mime.types file). So if the request URI is /welcome.html, the text/html content-type will be picked. However for CGI scripts or URIs that cannot be mapped by a known extension, you should set the appropriate type by using content_type() method.

    The situation is a little bit different with Apache::Registry and similar handlers. If you take a basic CGI script like this:

      print "Content-type: text/plain\r\n\r\n";
      print "Hello world";
    

    it wouldn't work, because the HTTP header will not be sent. By default, mod_perl does not send any headers itself. You may wish to change this by adding

      PerlSendHeader On
    

    in the <Location> part of your configuration. Now, the response line and common headers will be sent as they are by mod_cgi. Just as with mod_cgi, PerlSendHeader will not send the MIME type and a terminating double newline. Your script must send that itself, e.g.:

      print "Content-type: text/html\r\n\r\n";
    

    According to HTTP specs, you should send ``\cM\cJ'', ``\015\012'' or ``\0x0D\0x0A'' string. The ``\r\n'' is the way to do that on UNIX and MS-DOS/Windows machines. However, on a Mac ``\r\n'' eq ``\012\015'', exactly the other way around.

    Note, that in most UNIX CGI scripts, developers use a simpler ``\n\n'' and not ``\r\n\r\n''. There are occasions where sending ``\n'' without ``\r'' can cause problems, make it a habit to send ``\r\n'' every time.

    The PerlSendHeader On directive tells mod_perl to intercept anything that looks like a header line (such as Content-Type: text/plain) and automatically turn it into a correctly formatted HTTP/1.0 header, the same way it happens with CGI scripts running under mod_cgi. This allows you to keep your CGI scripts unmodified.

    You can use $ENV{PERL_SEND_HEADER} to find out whether PerlSendHeader is On or Off. You use it in your module like this:

     if($ENV{PERL_SEND_HEADER}) {
         print "Content-type: text/html\r\n\r\n";
     }
     else {
         my $r = Apache->request;
         $r->content_type('text/html');
         $r->send_http_header;
     }
    

    If you use CGI.pm's header() function to generate HTTP headers, you do not need to activate this directive because CGI.pm detects mod_perl and calls send_http_header() for you. However, it does not hurt to use this directive anyway.

    There is no free lunch--you get the mod_cgi behavior at the expense of the small but finite overhead of parsing the text that is sent. Note that mod_perl makes the assumption that individual headers are not split across print statements.

    The Apache::print() routine has to gather up the headers that your script outputs, in order to pass them to $r->send_http_header. This happens in src/modules/perl/Apache.xs (print) and Apache/Apache.pm (send_cgi_header). There is a shortcut in there, namely the assumption that each print statement contains one or more complete headers. If for example you generate a Set-Cookie header by multiple print() statements, like this:

       print "Content-type: text/html\n";
       print "Set-Cookie: iscookietext\; ";
       print "expires=Wednesday, 09-Nov-1999 00:00:00 GMT\; ";
       print "path=\/\; ";
       print "domain=\.mmyserver.com\; ";
       print "\r\n\r\n";
       print "hello";
    

    your generated Set-Cookie header is split over a number of print statements and gets lost. The above example wouldn't work! Try this instead:

       print "Content-type: text/html\n";
       my $cookie = "Set-Cookie: iscookietext\; ";
       $cookie .= "expires=Wednesday, 09-Nov-1999 00:00:00 GMT\; ";
       $cookie .= "path=\/\; ";
       $cookie .= "domain=\.mmyserver.com\; ";
       print $cookie;
       print "\r\n\r\n";
       print "hello";
    

    Sometimes when you call a script you see an ugly "Content-Type: text/html" displayed at the top of the page, and of course the HTML the rest of the HTML code won't be rendered correctly by the browser. As you have seen above, this generally happens when your code has already sent the header so you see it rendered into a browser's page. This might happen when you call the CGI.pm $q->header method or mod_perl's $r->send_http_header.

    If you have a complicated application where the header might be generated from many different places, depending on the calling logic, you might want to write a special subroutine that sends a header, and keeps track of whether the header has been already sent. Of course you can use a global variable to flag that the header has already been sent, but there is another and more elegant solution, where the closure effect is a desired feature.

    Just copy the code below, including the block's curly braces. Everywhere in your code where you want to print a header use the print_header() subroutine. $need_header is the same kind of beast as a static variable in C, so it remembers its value from call to call. The first time you call print_header(), the value of $need_header will become 0, and on any subsequent calls the header will be not sent.

      {
        my $need_header = 1;
        sub print_header {
          my $type = shift || "text/html";
          print("Content-type: $type\r\n\r\n"),$need_header = 0 if $need_header;
        }
      }
    

    In your code you call the above subroutine as:

      print_header();
    

    or for example if you want to override the default (text/html) MIME type:

      print_header("text/plain");
    

    Let's make our smart method more elaborate with PerlSendHeader directive settings, so it always does the right thing. It's especially important if you write an application that you are going to distribute, hopefully as Open Source.

      {
        my $need_header = 1;
        sub print_header {
          my $type = shift || "text/html";
          return unless $need_header;
          $need_header = 0;
          if($ENV{PERL_SEND_HEADER}) {
            print "Content-type: $type\r\n\r\n";
          }
          else {
            my $r = Apache->request;
            $r->content_type($type);
            $r->send_http_header;
          }
        }
      }
    

    You can continue to improve this subroutine even further to handle additional headers, such as cookies.

    See also Correct Headers--A quick guide for mod_perl users

    [TOC]


    NPH (Non Parsed Headers) scripts

    To run a Non Parsed Header CGI script under mod_perl, simply add to your code:

      local $| = 1;
    

    And if you normally set PerlSendHeader On, add this to your server's configuration file:

      <Files */nph-*>
        PerlSendHeader Off
      </Files>
    

    [TOC]


    BEGIN blocks

    Perl executes BEGIN blocks as soon as possible, at the time of compiling the code. The same is true under mod_perl. However, since mod_perl normally only compiles scripts and modules once, either in the parent server or once per-child, BEGIN blocks in that code will only be run once. As the perlmod manpage explains, once a BEGIN block has run, it is immediately undefined. In the mod_perl environment, this means that BEGIN blocks will not be run during the response to an incoming request unless that request happens to be the one that causes the compilation of the code.

    BEGIN blocks in modules and files pulled in via require() or use() will be executed:

    • Only once, if pulled in by the parent process.

    • Once per-child process if not pulled in by the parent process.

    • An additional time, once per child process if the module is pulled in off a disk again via Apache::StatINC.

    • An additional time, in the parent process on each restart if PerlFreshRestart is On.

    • Unpredictable if you fiddle with %INC yourself.

    BEGIN blocks in Apache::Registry scripts will be executed, as above plus:

    • Only once, if pulled in by the parent process via

      Apache::RegistryLoader.

    • Once per-child process if not pulled in by the parent process.

    • An additional time, once per child process, each time the script file changes on disk.

    • An additional time, in the parent process on each restart if pulled in by the parent process via Apache::RegistryLoader and PerlFreshRestart is On.

    Make sure you read Evil things might happen when using PerlFreshRestart.

    [TOC]


    END blocks

    As the perlmod manpage explains, an END subroutine is executed as late as possible, that is, when the interpreter exits. In the mod_perl environment, the interpreter does not exit until the server shuts down. However, mod_perl does make a special case for Apache::Registry scripts.

    Normally, END blocks are executed by Perl during its perl_run() function. This is called once each time the Perl program is executed, i.e. under mod_cgi, once per invocation of the CGI script. However, mod_perl only calls perl_run() once, during server startup. Any END blocks encountered during main server startup, i.e. those pulled in by the PerlRequire or by any PerlModule, are suspended.

    Apache versions 1.3b3 and later run the END blocks at child_exit().

    Except during the cleanup phase, any END blocks encountered during compilation of Apache::Registry scripts, including subsequent invocations when the script is cached in memory, are called after the script has completed.

    All other END blocks encountered during other Perl*Handler call-backs, e.g. PerlChildInitHandler, will be suspended while the process is running and called during child_exit() when the process is shutting down. Module authors might wish to use $r->register_cleanup() as an alternative to END blocks if this behavior is not desirable. $r->register_cleanup() is called at the CleanUp processing phase of each request and thus can be used to emulate plain perl's END{} block behavior.

    The last paragraph is very important for handling the case of 'User Pressed the Stop Button'.

    [TOC]


    Command line Switches (-w, -T, etc)

    Normally when you run perl from the command line, you have the shell invoke it with #!/bin/perl (sometimes referred to as a shebang line). In scripts running under mod_cgi, you may use perl execution switch arguments as described in the perlrun manpage, such as -w, -T or -d. Since scripts running under mod_perl don't need the shebang line, all switches except -w are ignored by mod_perl. This feature was added for a backward compatibility with CGI scripts.

    Most command line switches have a special variable equivalent. Consult the perlvar manpage for more details.

    [TOC]


    Warnings

    There are three ways to enable warnings:

    Globally to all Processes

    Setting:

      PerlWarn On
    

    in httpd.conf will turn warnings On in any script.

    You can then fine tune your code, turning warnings Off and On by setting the $^W variable in your scripts.

    Locally to a script

      #!/usr/bin/perl -w
    

    will turn warnings On for the scope of the script. You can turn them Off and On in the script by setting the $^W variable as noted above.

    Locally to a block

    This code turns warnings mode On for the scope of the block.

      {
        local $^W = 1;
        # some code
      }
    

    This turns it Off:

      {
        local $^W = 0;
        # some code
      }
    

    Note, that if you forget the local operator this code will affect the child processing the current request, and all the subsequent requests processed by that child. Thus

      $^W = 0;
    

    will turn the warnings Off, no matter what.

    If you want to turn warnings On for the scope of the whole file, as in the previous item, you can do this by adding:

      local $^W = 1;
    

    at the beginning of the file. Since a file is effectively a block, file scope behaves like a block's curly braces { } and local $^W at the start of the file will be effective for the whole file.

    [TOC]


    Taint Mode

    Perl's -T switch enables Taint mode. (META: Link to security chapter).

    Since the -T switch doesn't have an equivalent perl variable, mod_perl provides the PerlTaintCheck directive to turn on taint checks. In httpd.conf, enable this mode with:

      PerlTaintCheck On
    

    Now any code compiled inside httpd will be taint checked.

    If you use the -T switch, Perl will warn you that you should use the PerlTaintCheck configuration directive and will otherwise ignore it.

    [TOC]


    Other switches

    Finally, if you still need to to set additional perl startup flags such as -d and -D, you can use an environment variable PERL5OPT. See Apache::PerlRun.

    [TOC]


    The strict pragma

    It's _absolutely_ mandatory (at least for development) to start all your scripts with:

      use strict;
    

    If needed, you can always turn off the 'strict' pragma or a part of it inside the block, e.g:

      {
        no strict 'refs';
        ... some code
      }
    

    It's more important to have strict pragma enabled under mod_perl than anywhere else. While it's not required by the language, its use cannot be too strongly recommended. It will save you a great deal of time. And, of course, clean scripts will still run under mod_cgi (plain CGI)!

    [TOC]


    Passing ENV variables to CGI

    To pass an environment variable from a configuration file, add to it:

      PerlSetEnv key val
      PerlPassEnv key
    

    e.g.:

      PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1"
    

    will set $ENV{PERLDB_OPTS}, and it will be accessible in every child.

    %ENV is only set up for CGI emulation. If you are using the API, you should use $r->subprocess_env, $r->notes or $r->pnotes for passing data around between handlers. %ENV is slow because it must update the underlying C environment table. This also exposes the data on systems which allow users to see the environment with ps.

    In any case, %ENV and the tables used by those methods are all cleared after the request is served so that $ENV{SESSION_ID} will not be swapped or reused by different http requests.

    See also PerlSetupEnv which can enable/disable environment variables settings.

    [TOC]


    -M and other time() file tests under mod_perl

    Under mod_perl, files that have been created after the server's (child?) startup are reported with negative age with -M (-C -A) test. This is obvious if you remember that you will get the negative result if the server was started before the file was created. It's normal behavior with perl.

    If you want to have -M report the time relative to the current request, you should reset the $^T variable just as with any other perl script. Add $^T=time; at the beginning of the script.

    [TOC]


    Apache and syslog

    When native syslog support is enabled, the stderr stream will be redirected to /dev/null!

    It has nothing to do with mod_perl (plain Apache does the same). Doug wrote the Apache::LogSTDERR module to work around this.

    [TOC]


    Filehandlers and locks leakages

    When you write a script running under mod_cgi, you can get away with sloppy programming, like opening a file and letting the interpreter close it for you when the script had finished its run:

      open IN, "in.txt" or die "Cannot open in.txt for reading : $!\n";
    

    For mod_perl, before the end of the script you must close() any files you opened!

      close IN;
    

    If you forget to close(), you might get file descriptor leakage and (if you flock()ed on this file descriptor) unlock problems.

    Even if you do close the files, but for some reason the interpreter was stopped before the close() call, the leakage is still there. See for example Handling the 'User pressed Stop button' case. After a long run without restarting Apache your machine might run out of file descriptors, and worse, files might be left locked and unusable.

    What can you do? Use IO::File (and the other IO::* modules). This allows you to assign the file handler to variable which can be my() (lexically) scoped. When this variable goes out of scope the file or other file system entity will be properly closed (and unlocked if it was locked). Lexically scoped variables will always go out of scope at the end of the script's invocation even if it was aborted in the middle. If the variable was defined inside some internal block, it will go out of scope at the end of the block. For example:

      {
        my $fh = new IO::File("filename") or die $!;
        # read from $fh
      } # ...$fh is closed automatically at end of block, without leaks.
    

    As I have just mentioned, you don't have to create a special block for this purpose. A script in a file is effectively written in a block with the same scope as the file, so you can simply write:

      my $fh = new IO::File("filename") or die $!;
        # read from $fh
        # ...$fh is closed automatically at end of script, without leaks.
    

    Using a { BLOCK }) makes sure is that the file is closed the moment that the end of the block is reached.

    An even faster and lighter technique is to use Symbol.pm:

      my $fh = Symbol::gensym();
      open $fh, "filename" or die $!;
    

    Use these approaches to ensure you have no leakages, but don't be too lazy to write close() statements. Make it a habit.

    [TOC]


    Code has been changed, but it seems the script is running the old code

    Files pulled in via use or require statements are not automatically reloaded when changed on disk. See Reloading Modules and Required Files for more info.

    [TOC]


    The Script Is Too Dirty, But It Does The Job And I Cannot Afford To Rewrite It.

    You still can win from using mod_perl.

    One approach is to replace the Apache::Registry handler with Apache::PerlRun and define a new location. The script can reside in the same directory on the disk.

      # srm.conf
      Alias /cgi-perl/ /home/httpd/cgi/
      
      # httpd.conf
      <Location /cgi-perl>
        #AllowOverride None
        SetHandler perl-script
        PerlHandler Apache::PerlRun
        Options ExecCGI
        allow from all
        PerlSendHeader On
      </Location>
    

    See Apache::PerlRun--a closer look

    Another ``bad'', but workable method is to set MaxRequestsPerChild to 1, which will force each child to exit after serving only one request. You will get the preloaded modules, etc., but the script will be compiled for each request, then thrown away. This isn't good for ``high-traffic'' sites, as the parent server will need to fork a new child each time one is killed. You can fiddle with MaxStartServers and MinSpareServers, so that the parent pre-spawns more servers than actually required and the killed one will immediately be replaced with a fresh one. Probably that's not what you want.

    [TOC]


    Apache::PerlRun--a closer look

    Apache::PerlRun gives you the benefit of preloaded Perl and its modules. This module's handler emulates the CGI environment, allowing programmers to write scripts that run under CGI or mod_perl without any change. Unlike Apache::Registry, the Apache::PerlRun handler does not cache the script inside a subroutine. Scripts will be ``compiled'' on each request. After the script has run, its name space is flushed of all variables and subroutines. Still, you don't have the overhead of loading the Perl interpreter and the compilation time of the standard modules. If your script is very light, but uses lots of standard modules, you will see no difference between Apache::PerlRun and Apache::Registry!.

    Be aware though, that if you use packages that use internal variables that have circular references, they will be not flushed!!! Apache::PerlRun only flushes your script's name space, which does not include any other required packages' name spaces. If there's a reference to a my() scoped variable that's keeping it from being destroyed after leaving the eval scope (of Apache::PerlRun), that cleanup might not be taken care of until the server is shutdown and perl_destruct() is run, which always happens after running command line scripts. Consider this example:

      package Foo;
      sub new { bless {} }
      sub DESTROY {
        warn "Foo->DESTROY\n";
      }
      
      eval <<'EOF';
      package my_script;
      my $self = Foo->new;
      #$self->{circle} = $self;
      EOF
      
      print $@ if $@;
      print "Done with script\n";
    

    First you'll see:

      Foo->DESTROY
      Done with script
    

    Then, uncomment the line where $self makes a circular reference, and you'll see:

      Done with script
      Foo->DESTROY
    

    In this case, under mod_perl you wouldn't see Foo->DESTROY until the server shutdown, or until your module properly took care of things.

    [TOC]


    Sharing variables between processes

    META: to be completed

    • Global variables initialized at the server startup, through the Perl startup file, can be shared between processes, until modified by some of the processes. e.g. when you write:

        $My::debug = 1;
      

      all processes will read the same value. If one of the processes changes that value to 0, it will still be equal to 1 for any other process, but not for the one which actually made the change. When a process modifies a shared variable, it becomes the process' private copy.

    • IPC::Shareable can be used to share variables between children.

    • libmm

    • other methods?

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/scenario.html0100644000000000000000000020336707027225633013364 0ustar rootroot mod_perl guide: Real World Scenarios Implementation

    Mod Perl Icon Mod Perl Icon Real World Scenarios Implementation


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    Standalone mod_perl Enabled Apache Server

    [TOC]


    Installation in 10 lines

    The Installation is very very simple (example of installation on Linux OS):

      % cd /usr/src
      % lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz
      % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
      % tar zvxf apache_x.xx.tar.gz
      % tar zvxf mod_perl-x.xx.tar.gz
      % cd mod_perl-x.xx
      % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
        DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1
      % make && make test && make install
      % cd ../apache_x.x.x
      % make install
    

    That's all!

    Notes: Replace x.x.x with the real version numbers of mod_perl and apache. gnu tar uncompresses as well (with z flag).

    [TOC]


    Installation in 10 paragraphs

    First download the sources of both packages, e.g. you can use lwp-download utility to do it. lwp-download is a part of the LWP (or libwww) package, you will need to have it installed in order for mod_perl's make test to pass. Once you install this package unless it's already installed, lwp-download will be available for you as well.

      % lwp-download http://www.apache.org/dist/apache_x.x.x.tar.gz
      % lwp-download http://perl.apache.org/dist/mod_perl-x.xx.tar.gz
    

    Extract both sources. Usually I open all the sources in /usr/src/, your mileage may vary. So move the sources and chdir to the directory, you want to put the sources in. Gnu tar utility knows to uncompress too with z flag, if you have a non-gnu tar utility, it will be incapable to decompress, so you would do it in two steps: first uncompressing the packages with gzip -d apache_x.xx.tar.gz and gzip -d mod_perl-x.xx.tar.gz, second un-tarring them with tar xvf apache_x.xx.tar and tar xvf mod_perl-x.xx.tar.

      % cd /usr/src
      % tar zvxf apache_x.xx.tar.gz
      % tar zvxf mod_perl-x.xx.tar.gz
    

    chdir to the mod_perl source directory:

      % cd mod_perl-x.xx
    

    Now build the make file, for a basic work and first time installation the parameters in the example below are the only ones you would need. APACHE_SRC tells where the apache src directory is. If you have followed my suggestion and have extracted the both sources under the same directory (/usr/src), do:

      % perl Makefile.PL APACHE_SRC=../apache_x.x.x/src \
        DO_HTTPD=1 USE_APACI=1 PERL_MARK_WHERE=1 EVERYTHING=1
    

    There are many additional parameters. You can find some of them in the configuration dedicated and other sections. While running perl Makefile.PL ... the process will check for prerequisites and tell you if something is missing, If you are missing some of the perl packages or other software -- you will have to install these before you proceed.

    Now we make the project (by building the mod_perl extension and calling make in apache source directory to build a httpd), test it (by running various tests) and install the mod_perl modules.

      % make && make test && make install
    

    Note that if make fails, neither make test nor make install will be not executed. If make test fails, make install will be not executed.

    Now change to apache source directory and run make install to install apache's headers, default configuration files, to build apache directory tree and to put the httpd there.

      % cd ../apache_x.x.x
      % make install
    

    When you execute the above command, apache installation process will tell you how to start a freshly built webserver (the path of the apachectl, more about it later) and where the configuration files are. Remember (or even better write down) both, since you will need this information very soon. On my machine the two important paths are:

      /usr/local/apache/bin/apachectl
      /usr/local/apache/conf/httpd.conf
    

    Now the build and the installation processes are completed. Just configure httpd.conf and start the webserver.

    [TOC]


    Configuration Process

    A basic configuration is a simple one. First configure the apache as you always do (set Port, User, Group, correct ErrorLog and other file paths and etc), start the server and make sure it works. One of the ways to start and stop the server is to use apachectl utility:

      % /usr/local/apache/bin/apachectl start
      % /usr/local/apache/bin/apachectl stop
    

    Shut the server down, open the httpd.conf in your favorite editor and scroll to the end of the file, where we will add the mod_perl configuration directives (of course you can place them anywhere in the file).

    Add the following configuration directives:

      Alias /perl/ /home/httpd/perl/
    

    Assuming that you put all your scripts, that should be executed by mod_perl enabled server, under /home/httpd/perl/ directory.

      PerlModule Apache::Registry
      <Location /perl>
        SetHandler perl-script
        PerlHandler Apache::Registry
        Options ExecCGI
        PerlSendHeader On
        allow from all
      </Location>
    

    Now put a test script into /home/httpd/perl/ directory:

      test.pl
      -------
      #!/usr/bin/perl -w
      use strict;
      print "Content-type: text/html\r\n\r\n";
      print "It worked!!!\n";
      -------
    

    Make it executable and readable by server, if your server is running as user nobody (hint: look for User directive in httpd.conf file), do the following:

      % chown nobody /home/httpd/perl/test.pl
      % chmod u+rx   /home/httpd/perl/test.pl
    

    Test that the script is running from the command line, by executing it:

      % /home/httpd/perl/test.pl
    

    You should see:

      Content-type: text/html
      
      It worked!!!
    

    Now it is a time to test our mod_perl server, assuming that your config file includes Port 80, go to your favorite Netscape browser and fetch the following URL (after you have started the server):

      http://localhost/perl/test.pl
    

    Make sure that you have a loop-back device configured, if not -- use the real server name for this test, for example:

      http://www.nowhere.com/perl/test.pl
    

    You should see:

      It worked!!!
    

    If something went wrong, go through the installation process again, and make sure you didn't make a mistake. If that doesn't help, read the INSTALL pod document (perlpod INSTALL) in the mod_perl distribution directory.

    Now copy some of your perl/CGI scripts into a /home/httpd/perl/ directory and see them working much much faster, from the newly configured base URL (/perl/). Some of your scripts will not work out of box and will demand some minor tweaking or major rewrite to make them work properly with mod_perl enabled server. Chances are that if you are not practicing a sloppy programming techniques -- the scripts will work without any modifications at all.

    The above setup is very basic, it will help you to have a mod_perl enabled server running and to get a good feeling from watching your previously slow CGIs now flying.

    As with perl you can start benefit from mod_perl from the very first moment you try it. When you become more familiar with mod_perl you will want to start writing apache handlers and deploy more of the mod_perl power.

    [TOC]


    One Plain and One mod_perl enabled Apache Servers

    Since we are going to run two apache servers we will need two different sets of configuration, log and other files. We need a special directory layout. While some of the directories can be shared between the two servers (assuming that both are built from the same source distribution), others should be separated. From now on I will refer to these two servers as httpd_docs (vanilla Apache) and httpd_perl (Apache/mod_perl).

    For this illustration, we will use /usr/local as our root directory. The Apache installation directories will be stored under this root (/usr/local/bin, /usr/local/etc and etc...)

    First let's prepare the sources. We will assume that all the sources go into /usr/src dir. It is better when you use two separate copies of apache sources. Since you probably will want to tune each apache version at separate and to do some modifications and recompilations as the time goes. Having two independent source trees will prove helpful, unless you use DSO, which is covered later in this section.

    Make two subdirectories:

      % mkdir /usr/src/httpd_docs
      % mkdir /usr/src/httpd_perl
    

    Put the Apache sources into a /usr/src/httpd_docs directory:

      % cd /usr/src/httpd_docs
      % gzip -dc /tmp/apache_x.x.x.tar.gz | tar xvf -
    

    If you have a gnu tar:

      % tar xvzf /tmp/apache_x.x.x.tar.gz
    

    Replace /tmp directory with a path to a downloaded file and x.x.x with the version of the server you have.

      % cd /usr/src/httpd_docs
      
      % ls -l
      drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 apache_x.x.x/
    

    Now we will prepare the httpd_perl server sources:

      % cd /usr/src/httpd_perl
      % gzip -dc /tmp/apache_x.x.x.tar.gz | tar xvf -
      % gzip -dc /tmp/modperl-x.xx.tar.gz | tar xvf -
      
      % ls -l
      drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 apache_x.x.x/
      drwxr-xr-x  8 stas  stas 2048 Apr 29 17:38 modperl-x.xx/
    

    Time to decide on the desired directory structure layout (where the apache files go):

      ROOT = /usr/local
    

    The two servers can share the following directories (so we will not duplicate data):

      /usr/local/bin/
      /usr/local/lib
      /usr/local/include/
      /usr/local/man/
      /usr/local/share/
    

    Important: we assume that both servers are built from the same Apache source version.

    Servers store their specific files either in httpd_docs or httpd_perl sub-directories:

      /usr/local/etc/httpd_docs/
                     httpd_perl/
      
      /usr/local/sbin/httpd_docs/
                      httpd_perl/
      
      /usr/local/var/httpd_docs/logs/
                                proxy/
                                run/
                     httpd_perl/logs/
                                proxy/
                                run/
    

    After completion of the compilation and the installation of the both servers, you will need to configure them. To make things clear before we proceed into details, you should configure the /usr/local/etc/httpd_docs/httpd.conf as a plain apache and Port directive to be 80 for example. And /usr/local/etc/httpd_perl/httpd.conf to configure for mod_perl server and of course whose Port should be different from the one httpd_docs server listens to (e.g. 8080). The port numbers issue will be discussed later.

    The next step is to configure and compile the sources: Below are the procedures to compile both servers taking into account the directory layout I have just suggested to use.

    [TOC]


    Configuration and Compilation of the Sources.

    Let's proceed with installation. I will use x.x.x instead of real version numbers so this document will never become obsolete :).

    [TOC]


    Building the httpd_docs Server

    Sources Configuration:

      % cd /usr/src/httpd_docs/apache_x.x.x
      % make clean
      % env CC=gcc \
      ./configure --prefix=/usr/local \
        --sbindir=/usr/local/sbin/httpd_docs \
        --sysconfdir=/usr/local/etc/httpd_docs \
        --localstatedir=/usr/local/var/httpd_docs \
        --runtimedir=/usr/local/var/httpd_docs/run \
        --logfiledir=/usr/local/var/httpd_docs/logs \
        --proxycachedir=/usr/local/var/httpd_docs/proxy
    

    If you need some other modules, like mod_rewrite and mod_include (SSI), add them here as well:

        --enable-module=include --enable-module=rewrite
    

    Note: gcc -- compiles httpd by 100K+ smaller then cc on AIX OS. Remove the line env CC=gcc if you want to use the default compiler. If you want to use it and you are a (ba)?sh user you will not need the env function, t?csh users will have to keep it in.

    Note: add --layout to see the resulting directories' layout without actually running the configuration process.

    Sources Compilation:

      % make
      % make install
    

    Rename httpd to http_docs

      % mv /usr/local/sbin/httpd_docs/httpd \
      /usr/local/sbin/httpd_docs/httpd_docs
    

    Now update an apachectl utility to point to the renamed httpd via your favorite text editor or by using perl:

      % perl -p -i -e 's|httpd_docs/httpd|httpd_docs/httpd_docs|' \
      /usr/local/sbin/httpd_docs/apachectl
    

    [TOC]


    Building the httpd_perl (mod_perl enabled) Server

    Before you start to configure the mod_perl sources, you should be aware that there are a few Perl modules that have to be installed before building mod_perl. You will be alerted if any required modules are missing when you run the perl Makefile.PL command line below. If you discover that some are missing, pick them from your nearest CPAN repository (if you do not know what is it, make a visit to http://www.perl.com/CPAN ) or run the CPAN interactive shell via the command line perl -MCPAN -e shell.

    Make sure the sources are clean:

      % cd /usr/src/httpd_perl/apache_x.x.x
      % make clean
      % cd /usr/src/httpd_perl/mod_perl-x.xx
      % make clean
    

    It is important to make clean since some of the versions are not binary compatible (e.g apache 1.3.3 vs 1.3.4) so any ``third-party'' C modules need to be re-compiled against the latest header files.

    Here I did not find a way to compile with gcc (my perl was compiled with cc so we have to compile with the same compiler!!!

      % cd /usr/src/httpd_perl/mod_perl-x.xx
    

      % /usr/local/bin/perl Makefile.PL \
      APACHE_PREFIX=/usr/local/ \
      APACHE_SRC=../apache_x.x.x/src \
      DO_HTTPD=1 \
      USE_APACI=1 \
      PERL_MARK_WHERE=1 \
      PERL_STACKED_HANDLERS=1 \
      ALL_HOOKS=1 \
      APACI_ARGS=--sbindir=/usr/local/sbin/httpd_perl, \
             --sysconfdir=/usr/local/etc/httpd_perl, \
             --localstatedir=/usr/local/var/httpd_perl, \
             --runtimedir=/usr/local/var/httpd_perl/run, \
             --logfiledir=/usr/local/var/httpd_perl/logs, \
             --proxycachedir=/usr/local/var/httpd_perl/proxy
    

    Notice that all APACI_ARGS (above) must be passed as one long line if you work with t?csh!!! However it works correctly the way it shown above with (ba)?sh (by breaking the long lines with '\'). If you work with t?csh it does not work, since t?csh passes APACI_ARGS arguments to ./configure by keeping the new lines untouched, but stripping the original '\', thus breaking the configuration process.

    As with httpd_docs you might need other modules like mod_rewrite, so add them here:

             --enable-module=rewrite
    

    Note: PERL_STACKED_HANDLERS=1 is needed for Apache::DBI

    Now, build, test and install the httpd_perl.

      % make && make test && make install
    

    Note: apache puts a stripped version of httpd at /usr/local/sbin/httpd_perl/httpd. The original version which includes debugging symbols (if you need to run a debugger on this executable) is located at /usr/src/httpd_perl/apache_x.x.x/src/httpd.

    Note: You may have noticed that we did not run make install in the apache's source directory. When USE_APACI is enabled, APACHE_PREFIX will specify the --prefix option for apache's configure utility, specifying the installation path for apache. When this option is used, mod_perl's make install will also make install on the apache side, installing the httpd binary, support tools, along with the configuration, log and document trees.

    If make test fails, look into t/logs and see what is in there. Also see make test fails.

    While doing perl Makefile.PL ... mod_perl might complain by warning you about missing libgdbm. Users reported that it is actually crucial, and you must have it in order to successfully complete the mod_perl building process.

    Now rename the httpd to httpd_perl:

      % mv /usr/local/sbin/httpd_perl/httpd \
      /usr/local/sbin/httpd_perl/httpd_perl
    

    Update the apachectl utility to point to renamed httpd name:

      % perl -p -i -e 's|httpd_perl/httpd|httpd_perl/httpd_perl|' \
      /usr/local/sbin/httpd_perl/apachectl
    

    [TOC]


    Configuration of the servers

    Now when we have completed the building process, the last stage before running the servers, is to configure them.

    [TOC]


    Basic httpd_docs Server's Configuration

    Configuring of httpd_docs server is a very easy task. Open /usr/local/etc/httpd_docs/httpd.conf into your favorite editor (starting from version 1.3.4 of Apache - there is only one file to edit). And configure it as you always do. Make sure you configure the log files and other paths according to the directory layout we decided to use.

    Start the server with:

      /usr/local/sbin/httpd_docs/apachectl start
    

    [TOC]


    Basic httpd_perl Server's Configuration

    Here we will make a basic configuration of the httpd_perl server. We edit the /usr/local/etc/httpd_perl/httpd.conf file. As with httpd_docs server configuration, make sure that ErrorLog and other file's location directives are set to point to the right places, according to the chosen directory layout.

    The first thing to do is to set a Port directive - it should be different from 80 since we cannot bind 2 servers to use the same port number on the same machine. Here we will use 8080. Some developers use port 81, but you can bind to it, only if you have root permissions. If you are running on multiuser machine, there is a chance someone already uses that port, or will start using it in the future - which as you understand might cause a collision. If you are the only user on your machine, basically you can pick any not used port number. Port number choosing is a controversial topic, since many organizations use firewalls, which may block some of the ports, or enable only a known ones. From my experience the most used port numbers are: 80, 81, 8000 and 8080. Personally, I prefer the port 8080. Of course with 2 server scenario you can hide the nonstandard port number from firewalls and users, by either using the mod_proxy's ProxyPass or proxy server like squid.

    For more details see Publishing port numbers different from 80 , Running 1 webserver and squid in httpd accelerator mode, Running 2 webservers and squid in httpd accelerator mode and Using mod_proxy.

    Now we proceed to mod_perl specific directives. A good idea will be to add them all at the end of the httpd.conf, since you are going to fiddle a lot with them at the beginning.

    First, you need to specify the location where all mod_perl scripts will be located.

    Add the following configuration directive:

        # mod_perl scripts will be called from
      Alias /perl/ /usr/local/myproject/perl/
    

    From now on, all requests starting with /perl will be executed under mod_perl and will be mapped to the files in /usr/local/myproject/perl/.

    Now we should configure the /perl location.

      PerlModule Apache::Registry
    

      <Location /perl>
        #AllowOverride None
        SetHandler perl-script
        PerlHandler Apache::Registry
        Options ExecCGI
        allow from all
        PerlSendHeader On
      </Location>
    

    This configuration causes all scripts that are called with a /perl path prefix to be executed under the Apache::Registry module and as a CGI (so the ExecCGI, if you omit this option the script will be printed to the user's browser as a plain text or will possibly trigger a 'Save-As' window). Apache::Registry module lets you run almost unaltered CGI/perl scripts under mod_perl. PerlModule directive is an equivalent of perl's require(). We load the Apache::Registry module before we use it in the PerlHandler in the Location configuration.

    PerlSendHeader On tells the server to send an HTTP header to the browser on every script invocation. You will want to turn this off for nph (non-parsed-headers) scripts.

    This is only a very basic configuration. Server Configuration section covers the rest of the details.

    Now start the server with:

      /usr/local/sbin/httpd_perl/apachectl start
    

    [TOC]


    Running 2 webservers and squid in httpd accelerator mode

    While I have detailed the mod_perl server installation, you are on your own with installing the squid server (See Getting Helped for more details). I run linux, so I downloaded the rpm package, installed it, configured the /etc/squid/squid.conf, fired off the server and was all set. Basically once you have the squid installed, you just need to modify the default squid.conf the way I will explain below, then you are ready to run it.

    First, let's understand what do we have in hands and what do we want from squid. We have an httpd_docs and httpd_perl servers listening on ports 81 and 8080 accordingly (we have to move the httpd_docs server to port 81, since port 80 will be taken over by squid). Both reside on the same machine as squid. We want squid to listen on port 80, forward a single static object request to the port httpd_docs server listens to, and dynamic request to httpd_perl's port. Both servers return the data to the proxy server (unless it is already cached in the squid), so user never sees the other ports and never knows that there might be more then one server running. Proxy server makes all the magic behind it transparent to user. Do not confuse it with mod_rewrite, where a server redirects the request somewhere according to the rules and forgets about it. The described functionality is being known as httpd accelerator mode in proxy dialect.

    You should understand that squid can be used as a straight forward proxy server, generally used at companies and ISPs to cut down the incoming traffic by caching the most popular requests. However we want to run it in the httpd accelerator mode. Two directives: httpd_accel_host and httpd_accel_port enable this mode. We will see more details in a few seconds. If you are currently using the squid in the regular proxy mode, you can extend its functionality by running both modes concurrently. To accomplish this, you extend the existent squid configuration with httpd accelerator mode's related directives or you just create one from scratch.

    As stated before, squid listens now to the port 80, we have to move the httpd_docs server to listen for example to the port 81 (your mileage may vary :). So you have to modify the httpd.conf in the httpd_docs configuration directory and restart the httpd_docs server (But not before we get the squid running if you are working on the production server). And as you remember httpd_perl listens to port 8080.

    Let's go through the changes we should make to the default configuration file. Since this file (/etc/squid/squid.conf) is huge (about 60k+) and we would not use 95% of it, my suggestion is to write a new one including only the modified directives.

    We want to enable the redirect feature, to be able to serve requests, by more then one server (in our case we have httpd_docs and httpd_perl) servers. So we specify httpd_accel_host as virtual. This assumes that your server has multiple interfaces - Squid will bind to all of them.

      httpd_accel_host virtual
    

    Then we define the default port - by default, if not redirected, httpd_docs will serve the pages. We assume that most requests will be of the static nature. We have our httpd_docs listening on port 81.

      httpd_accel_port 81
    

    And as described before, squid listens to port 80.

      http_port 80
    

    We do not use icp (icp used for cache sharing between neighbor machines), which is more relevant in the proxy mode.

      icp_port 0
    

    hierarchy_stoplist defines a list of words which, if found in a URL, causes the object to be handled directly by this cache. In other words, use this to not query neighbor caches for certain objects. Note that I have configured the /cgi-bin and /perl aliases for my dynamic documents, if you named them in a different way, make sure to use the correct aliases here.

      hierarchy_stoplist /cgi-bin /perl
    

    Now we tell squid not to cache dynamic pages.

      acl QUERY urlpath_regex /cgi-bin /perl
      no_cache deny QUERY
    

    Please note that the last two directives are controversial ones. If you want your scripts to be more complying with the HTTP standards, the headers of your scripts should carry the Caching Directives according to the HTTP specs. You will find a complete tutorial about this topic in Tutorial on HTTP Headers for mod_perl users by Andreas J. Koenig (at http://perl.apache.org ). If you set the headers correctly there is no need to tell squid accelerator to NOT try to cache something. The headers I am talking about are Last-Modified and Expires. What are they good for? Squid would not bother your mod_perl server a second time if a request is (a) cachable and (b) still in the cache. Many mod_perl applications will produce identical results on identical requests at least if not much time goes by between the requests. So your squid might have a hit ratio of 50%, which means that mod_perl servers will have as twice as less work to do than before. This is only possible by setting the headers correctly.

    Even if you insert user-ID and date in your page, caching can save resources when you set the expiration time to 1 second. A user might double click where a single click would do, thus sending two requests in parallel, squid could serve the second request.

    But if you are lazy, or just have too many things to deal with, you can leave the above directives the way I described. But keep in mind that one day you will want to reread this snippet and the Andreas' tutorial and squeeze even more power from your servers without investing money for additional memory and better hardware.

    While testing you might want to enable the debugging options and watch the log files in /var/log/squid/. But turn it off in your production server. I list it commented out. (28 == access control routes).

      # debug_options ALL, 1, 28, 9
    

    We need to provide a way for squid to dispatch the requests to the correct servers, static object requests should be redirected to httpd_docs (unless they are already cached), while dynamic should go to the httpd_perl server. The configuration below tells squid to fire off 10 redirect daemons at the specified path of the redirect daemon and disables rewriting of any Host: headers in redirected requests (as suggested by squid's documentation). The redirection daemon script is enlisted below.

      redirect_program /usr/lib/squid/redirect.pl
      redirect_children 10
      redirect_rewrites_host_header off
    

    Maximum allowed request size in kilobytes. This one is pretty obvious. If you are using POST to upload files, then set this to the largest file's size plus a few extra kbytes.

      request_size 1000 KB
    

    Then we have access permissions, which I will not explain. But you might want to read the documentation so to avoid any security flaws.

      acl all src 0.0.0.0/0.0.0.0
      acl manager proto cache_object
      acl localhost src 127.0.0.1/255.255.255.255
      acl myserver src 127.0.0.1/255.255.255.255
      acl SSL_ports port 443 563
      acl Safe_ports port 80 81 8080 81 443 563
      acl CONNECT method CONNECT
      
      http_access allow manager localhost
      http_access allow manager myserver
      http_access deny manager
      http_access deny !Safe_ports
      http_access deny CONNECT !SSL_ports
      # http_access allow all
    

    Since squid should be run as non-root user, you need these if you are invoking the squid as root.

      cache_effective_user squid
      cache_effective_group squid
    

    Now configure a memory size to be used for caching. A squid documentation warns that the actual size of squid can grow three times larger than the value you are going to set.

      cache_mem 20 MB
    

    Keep pools of allocated (but unused) memory available for future use. Read more about it in the squid documents.

      memory_pools on
    

    Now tight the runtime permissions of the cache manager CGI script (cachemgr.cgi,that comes bundled with squid) on your production server.

      cachemgr_passwd disable shutdown
      #cachemgr_passwd none all
    

    Now the redirection daemon script (you should put it at the location you have specified by redirect_program parameter in the config file above, and make it executable by webserver of course):

      #!/usr/local/bin/perl
      
      $|=1;
      
      while (<>) {
          # redirect to mod_perl server (httpd_perl)
        print($_), next if s|(:81)?/perl/|:8080/perl/|o;
    

          # send it unchanged to plain apache server (http_docs)
        print;
      }
    

    In my scenario the proxy and the apache servers are running on the same machine, that's why I just substitute the port. In the presented squid configuration, requests that passed through squid are converted to point to the localhost (which is 127.0.0.1). The above redirector can be more complex of course, but you know the perl, right?

    A few notes regarding redirector script:

    You must disable buffering. $|=1; does the job. If you do not disable buffering, the STDOUT will be flushed only when the buffer becomes full and its default size is about 4096 characters. So if you have an average URL of 70 chars, only after 59 (4096/70) requests the buffer will be flushed, and the requests will finally achieve the server in target. Your users will just wait till it will be filled up.

    If you think that it is a very ineffective way to redirect, I'll try to prove you the opposite. The redirector runs as a daemon, it fires up N redirect daemons, so there is no problem with perl interpreter loading, exactly like mod_perl -- perl is loaded all the time and the code was already compiled, so redirect is very fast (not slower if redirector was written in C or alike). Squid keeps an open pipe to each redirect daemon, thus there is even no overhead of the expensive system calls.

    Now it is time to restart the server, at linux I do it with:

      /etc/rc.d/init.d/squid restart
    

    Now the setup is complete ...

    Almost... When you try the new setup, you will be surprised and upset to discover a port 81 showing up in the URLs of the static objects (like htmls). Hey, we did not want the user to see the port 81 and use it instead of 80, since then it will bypass the squid server and the hard work we went through was just a waste of time?

    The solution is to run both squid and httpd_docs at the same port. This can be accomplished by binding each one to a specific interface. Modify the httpd.conf in the httpd_docs configuration directory:

      Port 80
      BindAddress 127.0.0.1
      Listen 127.0.0.1:80
    

    Modify the squid.conf:

      http_port 80
      tcp_incoming_address 123.123.123.3
      tcp_outgoing_address 127.0.0.1
      httpd_accel_host 127.0.0.1
      httpd_accel_port 80
    

    Where 123.123.123.3 should be replaced with IP of your main server. Now restart squid and httpd_docs in either order you want, and voila the port number has gone.

    You must also have in the /etc/hosts an entry (most chances that it's already there):

      127.0.0.1  localhost.localdomain   localhost
    

    Now if your scripts were generating HTML including fully qualified self references, using the 8080 or other port -- you should fix them to generate links to point to port 80 (which means not using the port at all). If you do not, users will bypass squid, like if it was not there at all, by making direct requests to the mod_perl server's port.

    The only question left is what to do with users who bookmarked your services and they still have the port 8080 inside the URL. Do not worry about it. The most important thing is for your scripts to return a full URLs, so if the user comes from the link with 8080 port inside, let it be. Just make sure that all the consecutive calls to your server will be rewritten correctly. During a period of time users will change their bookmarks. What can be done is to send them an email if you have one, or to leave a note on your pages asking users to update their bookmarks. You could avoid this problem if you did not publish this non-80 port in first place. See Publishing port numbers different from 80.

    <META> Need to write up a section about server logging with squid. One thing I sure would like to know is how requests are logged with this setup. I have, as most everyone I imagine, log rotation, analysis, archiving scripts and they all assume a single log. Does one have different logs that have to be merged (up to 3 for each server + squid) ? Even when squid responds to a request out of its cache I'd still want the thing to be logged. </META>

    See Using mod_proxy for information about X-Forwarded-For.

    To save you some keystrokes, here is the whole modified squid.conf:

      http_port 80
      tcp_incoming_address 123.123.123.3
      tcp_outgoing_address 127.0.0.1
      httpd_accel_host 127.0.0.1
      httpd_accel_port 80
      
      icp_port 0
      
      hierarchy_stoplist /cgi-bin /perl
      acl QUERY urlpath_regex /cgi-bin /perl
      no_cache deny QUERY
      
      # debug_options ALL,1 28,9
      
      redirect_program /usr/lib/squid/redirect.pl
      redirect_children 10
      redirect_rewrites_host_header off
      
      request_size 1000 KB
      
      acl all src 0.0.0.0/0.0.0.0
      acl manager proto cache_object
      acl localhost src 127.0.0.1/255.255.255.255
      acl myserver src 127.0.0.1/255.255.255.255
      acl SSL_ports port 443 563
      acl Safe_ports port 80 81 8080 81 443 563
      acl CONNECT method CONNECT
      
      http_access allow manager localhost
      http_access allow manager myserver
      http_access deny manager
      http_access deny !Safe_ports
      http_access deny CONNECT !SSL_ports
      # http_access allow all
      
      cache_effective_user squid
      cache_effective_group squid
      
      cache_mem 20 MB
      
      memory_pools on
      
      cachemgr_passwd disable shutdown
    

    Note that all directives should start at the beginning of the line.

    [TOC]


    Running 1 webserver and squid in httpd accelerator mode

    When I was first told about squid, I thought: ``Hey, Now I can drop the httpd_docs server and to have only squid and httpd_perl servers``. Since all my static objects will be cached by squid, I do not need the light httpd_docs server. But it was a wrong assumption. Why? Because you still have the overhead of loading the objects into squid at first time, and if your site has many of them -- not all of them will be cached (unless you have devoted a huge chunk of memory to squid) and my heavy mod_perl servers will still have an overhead of serving the static objects. How one would measure the overhead? The difference between the two servers is memory consumption, everything else (e.g. I/O) should be equal. So you have to estimate the time needed for first time fetching of each static object at a peak period and thus the number of additional servers you need for serving the static objects. This will allow you to calculate additional memory requirements. I can imagine, this amount could be significant in some installations.

    So I have decided to have even more administration overhead and to stick with squid, httpd_docs and httpd_perl scenario, where I can optimize and fine tune everything. Of course this can be not your case. If you are feeling that the scenario from the previous section is too complicated for you, make it simpler. Have only one server with mod_perl built in and let the squid to do most of the job that plain light apache used to do. As I have explained in the previous paragraph, you should pick this lighter setup only if you can make squid cache most of your static objects. If it cannot, your mod_perl server will do the work we do not want it to.

    If you are still with me, install apache with mod_perl and squid. Then use a similar configuration from the previous section, but now httpd_docs is not there anymore. Also we do not need the redirector anymore and we specify httpd_accel_host as a name of the server and not virtual. There is no need to bind two servers on the same port, because we do not redirect and there is neither Bind nor Listen directives in the httpd.conf anymore.

    The modified configuration (see the explanations in the previous section):

      httpd_accel_host put.your.hostname.here
      httpd_accel_port 8080
      http_port 80
      icp_port 0
      
      hierarchy_stoplist /cgi-bin /perl
      acl QUERY urlpath_regex /cgi-bin /perl
      no_cache deny QUERY
      
      # debug_options ALL, 1, 28, 9
      
      # redirect_program /usr/lib/squid/redirect.pl
      # redirect_children 10
      # redirect_rewrites_host_header off
      
      request_size 1000 KB
      
      acl all src 0.0.0.0/0.0.0.0
      acl manager proto cache_object
      acl localhost src 127.0.0.1/255.255.255.255
      acl myserver src 127.0.0.1/255.255.255.255
      acl SSL_ports port 443 563
      acl Safe_ports port 80 81 8080 81 443 563
      acl CONNECT method CONNECT
      
      http_access allow manager localhost
      http_access allow manager myserver
      http_access deny manager
      http_access deny !Safe_ports
      http_access deny CONNECT !SSL_ports
      # http_access allow all
      
      cache_effective_user squid
      cache_effective_group squid
      
      cache_mem 20 MB
      
      memory_pools on
      
      cachemgr_passwd disable shutdown
    

    [TOC]


    One Light and One Heavy Server where ALL htmls are Perl-Generated

    Instead of keeping all your perl scripts in /perl and your static content everywhere else, you could keep your static content in special directories and keep your perl scripts everywhere else. You can still use the light/heavy apache separation approach described before, with a few minor modifications.

    [TOC]


    Installation and Configuration

    First you need to compile your light apache with mod_proxy and mod_rewrite

      % ./configure --prefix=[snip...] --enable-module=rewrite --enable-module=proxy
    

    In the light apache's httpd.conf file, turn rewriting on:

      RewriteEngine on
    

    and list the static directories something like this:

      RewriteRule ^/img - [L]
      RewriteRule ^/style - [L]
    

    The [L] means that the rewrite engine should stop if it has a match. This is necessary because the very last rewrite rule proxies everything to the heavy server:

      RewriteRule ^/(.*) http://www.myservername.com:8080/$1 [P]
    

    This line (which must be the last RewriteRule) is the difference between a server for which static content is the default and one for which dynamic (perlish) content is the default. (The above RewriteRule assumes the heavy server runs on the same machine as the light server. You can just insert a different URL in here if the heavy apache is elsewhere, but keeping the two servers on the one machine and treating them as one has some advantages described later.)

    You should also add the reverse rewrite rule:

      ProxyPassReverse / http://www.myservername.com/
    

    so that the user doesn't see :8080 port number in her browser's location window.

    Of course, www.myservername.com should be replaced with *your* domain name. You *could* use localhost in the RewriteRule above if the heavy and light servers are on the same machine, but your heavy server might accidentally say localhost in a client redirect (see below) which would not be good. Also, if your heavy server understands virtual hosts, you probably don't want to use localhost name.

    [TOC]


    Tricks, traps and gotchas

    • 'Closing your shutters' temporarily

      Very occasionally, your mod_perl server will suffer glitches. Perhaps you changed a module and restarted your mod_perl httpd when a perl -cw would have given you some very interesting information! Since *all* your html is dynamically generated, suddenly *nobody* can view *any* pages on your site. Disaster!! Worse - your users are getting horribly cryptic Unable to contact upstream server error messages on a grey background, not those nice customised error messages you generate with perl.

      If you insert a line into the light apache's httpd.conf file:

        RewriteRule ^/(.*) /sorry.html [L]
      

      *after* the list of static directories but *before* the rule that proxies everything else to the heavy apache, your users now get a (relatively) nice `Sorry for the inconvenience' message instead of the cryptic message described above. What's more, because this sorry.html RewriteRule is listed *after* the image directory, you can refer to your images in it. Now all you have to do is figure out how to fix the module you broke.

      Of course you need the sorry.html be ready and when you alter the configuration, you will have to restart the light server for the changes to take effect and when you fix all the errors in mod_perl server, remove the change and restart it again.

      This situation is easy to prevent. See Safe Code Updates on a Live Production Server for more info.

    • Logging

      There are a number of different ways to maintain logs of your hits. The easiest way is to let both apaches log to their own access_log file. Unfortunately, this means that many requests will be logged twice (which makes it tricky to merge the two logfiles, should you want to). Also, as far as the heavy apache is concerned, all requests will appear to come from the IP address of the machine on which the light apache is running. If you are logging IP addresses, as part of your access_log the logs written by the heavy apache will be fairly meaningless.

      One solution is to tell the heavy apache not to bother logging requests that seem to come from the light apache's machine. You might do this by installing a custom PerlLogHandler or just piping to access_log via grep -v (match all but this pattern) for the bad IP address. In this scenario, the access_log written by the light apache is *the* access_log, plus any direct accesses to the heavy server in case they bypassed the proxy server.

      Note that you don't want to pipe the access_log from the heavy apache to /dev/null. If you do this, you won't be able to see any requests that bypass the lightweight apache and come straight in on port 8080 (or alike). As you will see shortly, this *can* happen and every time you see one, you should ask yourself Why? and take steps to eliminate it.

      It's quote easy to make the logger to log the original client's IP and not the one that comes from proxy server. Look for mod_proxy_add_forward at Building and Using mod_proxy for a hint.

    • Eliminating :8080's

      There are a number of ways in which the user can somehow be directed to URLs which have a :8080 in them. (Replace 8080 with the port your mod_perl enabled Apache listens at.) If you are running the heavy apache on a different machine to that of the light apache this will be less of a problem (provided the heavy apache has the same ServerName as the light apache), but this section may still apply to you...

      If the user requests a URL that maps to a directory without a trailing slash (/), apache will issue a client redirect (301?) to the correct URL. Unfortunately, Apache that will issue this redirect will most likely be the heavy apache, since most distinct requests are answered by it. It will issue the redirect to *its* port on *its* ServerName and because the redirect is a so-called *client* redirect, the URL (with the :8080 on the end) will be in the body, not the header, of the data returned to the user's browser. This means that the ProxyPassReverse in the light apache's configuration file to catch such things will be unable to catch this. :-(

      Since this will tend only to be a problem when the heavy and light apaches are running on the same machine (but on different ports), if the light and heavy apaches have the same DocumentRoot we can have the *light* apache figure out that a request is for a directory without a trailing slash and have it do the redirect itself, *before* the heavy apache finds out about it:

          RewriteCond /www/shop%{SCRIPT_FILENAME} -d
          RewriteRule ^(.+[^/])$ $1/ [R]   
      

      Note that these two lines should be *after* the RewriteRules for the static directories but (of course) *before* the final all-encompassing RewriteRule that proxies everything else to the heavy apache. If you put these two lines in the light httpd.conf *before* the static directories are mentioned, the light httpd may find itself in an infinite loop if somebody were to request for example /img in our setup.

      Another way in which :8080's can creep into URLs is if you have perl code which issues a redirect to http://$ENV{HTTP_HOST}/.... If you are migrating from one heavy server to one heavy and one light, you may find a few of these. If you replace HTTP_HOST with SERVER_NAME, all should be well. Note that you may need to do this whether or not the light and heavy servers are on the same machine.

      The :8080 effect can be insidious. Once a user gets a URL with :8080, in, odd things will happen. If the heavy and light apaches have the same DocumentRoot (normal if they are on the same machine) and/or the heavy apache is able to deliver the same static content as the light apache, the worst that will happen is that the user's browser will display :8080 in the location box for every subsequent URL on your site until they follow an absolute link within your site (ie. http://www.myservername.com/file/stuff as opposed to just /file/stuff) If the request is in a password-protected area, then the user may have to log in twice.

      If the heavy and light apaches do not share the same DocumentRoot directory (normal if they are on different servers or have different DocumentRoot's) then images won't appear. This is worse than the previous scenario. At least if the heavy apache can serve images, your site will still look normal. If the heavy apache is *unable* to serve images, all your pages will be imageless. This is a fairly compelling reason to run your light and heavy servers on the same machine and to have them share a DocumentRoot.

      Regardless of how hard you try to eliminate :8080s, they *will* crop up from time to time. You should occasionally examine the access_log of the heavy apache - any requests that appear in there (assuming you aren't bothering to log requests that come via the light apache) should be investigated.

      Interestingly, if the final catch-all RedirectRule is to localhost:8080, it is possible that localhost will leak into stray client redirects. Moral: use your server's name in redirects, unless you know different.

    • Security

      Because all http requests will appear to your perl scripts to be coming from the light httpd, you must be careful not to authenticate based on the IP address from which a request came. This can be easy to overlook if you are moving from a single- to a dual-server scenario.

      The URLs that return the /server-status and /perl-status of your apache servers are often protected based on IP address. The /server-status URL for the heavy server is probably safe if the light apache also defines an identical /server-status URL, but the /perl-status URL should be protected.

      If you must authenticate based on IP address, you should either make sure that the light apache's IP address is not in any way privileged or you should block access to port 8080 from anywhere except the light apache's IP address.

      If your heavy and light httpds can both serve static content (where :8080s only affect URLs - not content), then blocking port 8080 is not recommended. After all, if a user accidentally (or intentionaly) gets onto port 8080 in this scenario, the worst that will happen is that URLs will look odd.

      Note that if you are using the X-Forwarded-For HTTP header, then this subsection is of limited relevance to you.

    [TOC]


    Building and Using mod_proxy

    To build it into apache just add --enable-module=proxy during the apache configure stage.

    Now we will talk about apache's mod_proxy and understand how it works.

    The server on port 80 answers http requests directly and proxies the mod_perl enabled server in the following way:

      ProxyPass        /modperl/ http://localhost:81/modperl/
      ProxyPassReverse /modperl/ http://localhost:81/modperl/
    

    PPR is the saving grace here, that makes apache a win over Squid. It rewrites the redirect on its way back to the original URI.

    You can control the buffering feature with ProxyReceiveBufferSize directive:

      ProxyReceiveBufferSize 16384
    

    The above setting will set a buffer size to be of 16Kb. If it is not set explicitly or set to 0, then the default buffer size is used. It may not be smaller than 512 and it should be a number that it's a multiplicative of 512.

    Both the default and the maximum possible value are depend on OS. For example on linux OS with kernel 2.2.5 the maximum and default values are either 32k or 64k (hint: grep the kernel sources for SK_RMEM_MAX variable). If you set the value bigger than limit, the default one will be used.

    Under FreeBSD it's possible to configure kernel to have bigger socket buffers:

     % sysctl -w kern.ipc.maxsockbuf=2621440
    

    When you tell the kernel to use bigger sockets you can set bigger values for ProxyReceiveBufferSize. i.e. 1048576 (1Mb) and bigger.

    So basically to get an immediate release of the mod_perl server from stale awaiting, ProxyReceiveBufferSize should be set to a value greater than the biggest generated respond produced by any mod_perl script but not bigger than the limit. But even if not all the requests' output will be small enough or the buffer big enough to absorb it all, you've got an improve since the processes that generated smaller responds will be immideately released.

    As the name states, its buffering feature applies only to downstream data (coming from the origin server to the proxy) and not upstream (i.e. buffering the data being uploaded from the client browser to the proxy, thus freeing the httpd_perl origin server from being tied up during a large POST such as a file upload).

    Apache does caching as well. It's relevant to mod_perl only if you produce proper headers, so your scripts' output can be cached. See apache documentation for more details on configuration of this capability.

    Ask Bjoern Hansen has written a mod_proxy_add_forward module for apache, that sets the X-Forwarded-For field when doing a ProxyPass, similar to what squid can do. (Its location is specified in the help section). Basically, that module adds an extra HTTP header to proxying requests. You can access that header in the mod_perl-enabled server, and set the IP of the remote server. You won't need to compile anything into the back-end server, if you are using Apache::{Registry,PerlRun} just put something like the following into start-up.pl:

      sub My::ProxyRemoteAddr ($) {
            my $r = shift;
       
            # we'll only look at the X-Forwarded-For header if the requests
            # comes from our proxy at localhost
            return OK unless ($r->connection->remote_ip eq "127.0.0.1");
      
            # Select last value in the chain -- original client's ip
            if (my ($ip) = $r->headers_in->{'X-Forwarded-For'} =~ /([^,\s]+)$/) {
              $r->connection->remote_ip($ip);
            }
            
            return OK;
      }
    

    And in httpd.conf:

      PerlPostReadRequestHandler My::ProxyRemoteAddr
    

    Different sites have different needs. If you're using the header to set the IP address, apache believes it is dealing with (in the logging and stuff), you really don't want anyone but your own system to set the header. That's why the above ``recommended code'' checks where the request is really coming from, before changing the remote_ip.

    Generally you shouldn't trust the X-Forwarded-For header. You only want to rely on X-Forwarded-For headers from proxies you control yourself. If you know how to spoof a cookie you've probably got the general idea on making HTTP headers and can spoof the X-Forwarded-For header as well. The only address *you* can count on as being a reliable value is the one from r->connection->remote_ip.

    From that point on, the remote IP address is correct. You should be able to access REMOTE_ADDR as usual.

    It was reported that Ben Laurie's Apache-SSL does not seem to put the IPs in the X-Forwarded-For header (it does not set up such a header at all). However, the REMOTE_ADDER it sets up and contains the IP of the original client machine.

    You could do the same thing with other environment variables (though I think several of them are preserved, you will want to run some tests to see which ones).

    [TOC]


    HTTP Authentication with 2 servers + proxy

    Assuming that you have a setup of one ``front-end'' server, which proxies the ``back-end'' (mod_perl) server, if you need to perform the authentication in the ``back-end'' server, it should handle all authentication itself. If apache proxies correctly, it seems like it would pass through all authentication information, making the ``front-end'' apache somewhat ``dumb'', as it does nothing, but passes through all the information.

    The only possible caveat in the config file is that your Auth stuff needs to be in <Directory ...> ... </Directory> tags because if you use a <Location /...> ... </Location> the proxypass server takes the auth info for its own authentication and would not pass it on.

    The same with mod_ssl, if plugged into a front-end server, all the SSL requests be encoded/decoded properly by it.

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/security.html0100644000000000000000000003413207027225633013420 0ustar rootroot mod_perl guide: Protecting Your Site

    Mod Perl Icon Mod Perl Icon Protecting Your Site


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    The Importance of Your site's Security

    Let's face it, your site or service can easily become a target for Internet ``terrorists''. It can be because of something you said, the success of your site, or for no obvious reason whatever. If your site security is compromised, all your data can be deleted or important information can be stolen. You may risk legal action or the sack if this happens.

    Your site can be paralyzed through a _simple_ denial of service (DoS) attack.

    Whatever you do, as long as you are connected to the network your site will be vulnerable. Cut the connections, turn off your machine and put it into a safe. Now it is protected--but useless.

    So what can you do?

    Let's first get acquainted with some security related terminology:

    Authentication

    When you want to make sure that a user is who he claims to be, you generally ask her for a username and a password. Once you have both, you can check them against your database of username/password pairs. If they match, the user has passed the Authentication stage. From now on if you keep the session open all you need to do is to remember the username.

    Authorization

    You might want to allow user foo to have access to some resource, but restrict her from accessing another resource, which in turn is accessible only for user bar. The process of checking access rights is called Authorization. For Authorization all you need is an authenticated username or some other attribute which you can authorize against. For example, you can authorize upon IP number, allowing only your local users to use some service. But be warned that IP numbers or session_ids can be spoofed (forged), and that is why you should not do Authorization without Authentication.

    Actually you've been familiar with both these concepts for a while.

    When you telnet to your account on some machine you go through a login process (Authentication).

    When you try to read some file from your file systems, the kernel checks the permissions on this file (Authorization). You may hear about Access control which is another name for the same thing.

    [TOC]


    Illustrated Security Scenarios

    I am going to present some real world security requirements and their implementations.

    [TOC]


    Non authenticated access for internal IPs, Authenticated for external IPs

    An Extranet is very similar to an Intranet, but at least partly accessible from outside your organization. If you run an Extranet you might want to let your internal users have unrestricted access to your web server. If these same users call from outside your organization you might want to make sure that they are in fact your employees.

    These requirements are achieved very simply by putting the IP patterns of the organization in a Perl Access Handler in an .htaccess file. This sets the REMOTE_USER environment variable to the organization's generic username. Scripts can test the REMOTE_USER environment variable to determine whether to allow unrestricted access or else to require authentication.

    Once user passes the authentication stage, either bypassing it because of his IP address or after entering a correct login/password pair, the REMOTE_USER variable is set. Then we can talk about authorization.

    Let's see the implementation of the authentication stage. First we modify <httpd.conf>:

      PerlModule My::Auth
      
      <Location /private>
        PerlAccessHandler My::Auth::access_handler
        PerlSetVar Intranet "100.100.100.1 => userA, 100.100.100.2 => userB"
        PerlAuthenHandler My::Auth::authen_handler
        AuthName realm
        AuthType Basic
        Require valid-user
      </Location>
    

    Now the code of My/Auth.pm:

        sub access_handler {
    

            my $r = shift;
    

            unless ($r->some_auth_required) {
                    $r->log_reason("No authentication has been configured");
                    return FORBIDDEN;
            }
            # get list of IP addresses
            my %ips = split /\s*(?:=>|,)\s*/, $r->dir_config("Intranet");
    

            if (my $user = $ips{$r->connection->remote_ip}) {
    

                    # update connection record
                    $r->connection->user($user);
    

                    # do not ask for a password
                    $r->set_handlers(PerlAuthenHandler => [\&OK]);
            }
            return OK;
        }
        
        sub authen_handler {
    

            my $r = shift;
    

            # get user's authentication credentials
            my ($res, $sent_pw) = $r->get_basic_auth_pw;
            return $res if $res != OK;
            my $user = $r->connection->user;
    

            # authenticate through DBI
            my $reason = authen_dbi ($r, $user, $sent_pw, $niveau);
    

            if ($reason) {
                    $r->note_basic_auth_failure;
                    $r->log_reason ($reason, $r->uri);
                    return AUTH_REQUIRED;
            }
            return OK;
        }
        
        sub authen_dbi{
          my ($r, $user, $sent_pw, $niveau) = @_;
    

          # validate username/passwd
    

          return 0 if (*PASSED*)
            
          return "Failed for X reason";
    

        }
    

    You can implement your own authen_dbi() routine, or you can replace authen_handler() with an existing authentication handler such as Apache::AuthenDBI.

    If one of the IP addresses is matched, access_handler() sets REMOTE_USER to be either userA or userB.

    If neither IP address is matched, PerlAuthenHandler will not be set to OK, and the Authentication stage will ask the user for a login and password.

    [TOC]


    Authentication code snippets

    [TOC]


    Forcing re-authentication

    To force authenticated user to reauthenticate just send the following header to the browser:

      WWW-Authenticate: Basic realm="My Realm"
      HTTP/1.0 401 Unauthorized
    

    This will pop-up (in Netscape at least) a window saying Authorization Failed. Retry? with OK and a Cancel buttons. When that window pops up you know that the password has been discarded. If the user hits the Cancel button the username will also be discarded. If she hits the OK button, the authentication window will be brought up again with the previous username already in place.

    In the Perl API you would use note_basic_auth_failure() method to force reauthentication.

    This may not work! The browser's behaviour is in no way guaranteed.

    [TOC]


    OK, AUTH_REQUIRED and FORBIDDEN in Authentication handlers

    When your authentication handler returns OK, it means that user has correctly authenticated and now $r->connection->user will have the username set for subsequent requests. For Apache::Registry and friends, where the environment variable settings weren't erased, an equivalent $ENV{REMOTE_USER} variable will be available.

    The password is available only through the Perl API with the help of the get_basic_auth_pw() method.

    If there is a failure, unless it's the first time the AUTH_REQUIRED flag will tell the browser to pop up an authentication window, to try again. For example:

       my($status, $sent_pw) = $r->get_basic_auth_pw;
       unless($r->connection->user and $sent_pw) {
           $r->note_basic_auth_failure;
           $r->log_reason("Both a username and password must be provided");
           return AUTH_REQUIRED;
       }
    

    Let's say that you have a mod_perl authentication handler, where the user's credentials are checked against a database. It returns either OK or AUTH_REQUIRED. One of the possible authentication failure case might happen when the username/password are correct, but the user's account has been suspended temporarily.

    If this is the case you would like to make the user aware of this, by displaying a page, instead of having the browser pop up the authentication dialog again. You will also refuse authentication, of course.

    The solution is to return FORBIDDEN, but before that you should set a custom error page for this specific handler, with help of $r->custom_response. It looks something like this:

      use Apache::Constants qw(:common);
      $r->custom_response(SERVER_ERROR, "/errors/suspended_account.html");
       
      return FORBIDDEN if $suspended;
    

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/19/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/snippets.html0100644000000000000000000010573307027225633013424 0ustar rootroot mod_perl guide: Code Snippets

    Mod Perl Icon Mod Perl Icon Code Snippets


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    Redirecting Errors to the Client instead of error_log

    To trap (almost) all Perl run-time errors and send the output to the client instead of to Apache's error_log add this line to your script:

      use CGI::Carp qw(fatalsToBrowser);
    

    Refer to the CGI::Carp man page for more detailed information.

    You can also write your own custom __DIE__ and __WARN__ signal handlers. Suppose that I don't want users to see an error message, but I want it to be emailed to me if it's severe enough. The handler is to trap various errors and perform according to some defined logic.

    I wrote this handler for the modperl environment, but it works correctly when called from the shell. A stripped-down version of the code is shown here:

      # assign the DIE sighandler to call mydie(error_message) whenever a
      # die() sub is being called. Can be added anywhere in the code.
      local $SIG{'__DIE__'} = \&mydie;
      
      # Do not forget the C<local()>, unless you want this signal handler to
      # be invoked every time any scripts dies (including events where this
      # treatment may be undesirable).
       
      # and the handler itself
      sub mydie{
        my $why = shift;
      
        my $UNDER_MOD_PERL = ( (exists $ENV{'GATEWAY_INTERFACE'} 
                               and $ENV{'GATEWAY_INTERFACE'} =~ /CGI-Perl/)
                             or exists $ENV{'MOD_PERL'} ) ? 1 : 0;
      
        chomp $why;
        my $orig_why = $why;                # an ASCII copy for email report
      
        # handle the shell execution case (so we will not get all the HTML)
        print("Error: $why\n"), exit unless $UNDER_MOD_PERL;
      
        my $should_email = 0;
        my $message = '';
      
        $why =~ s/[<&>]/"&#".ord($&).";"/ge;    # entity escape
      
        # Now we need to trap various kinds of errors, that come from CGI.pm
        # And we don't want these errors to be emailed to us, since
        # these aren't programmatical errors
        if ($orig_why =~ /Client attempted to POST (\d+) bytes/o) {
      
          $message = qq{
                      You cannot POST messages bigger than 
                      @{[1024*$c{max_image_size}]} bytes.<BR>
                      You have tried to post $1 bytes<BR>
                      If you are trying to upload an image, make sure its size is not 
                      bigger than @{[1024*$c{max_image_size}]} bytes.<P>
                      Thank you!
                     };
      
        } elsif ($orig_why =~ /Malformed multipart POST/o) {
      
          $message = qq{
                      Have you tried to upload an image in the wrong way?<P>
                      To sucessfully upload an image you must use a browser that supports
                      image upload and use the 'Browse' button to select that image.
                      DO NOT type the path to the image into the upload field.<P>
                      Thank you!
                     };
      
        } elsif ($orig_why =~ /closed socket during multipart read/o) {
      
          $message = qq{
                      Have you pressed a 'STOP' button?<BR>
                      Please try again!<P>
                      Thank you!
                     };
      
        } else {
      
          $message = qq{
                        <B>There is no action to be performed on your side, since
                      the error report has been already sent to webmaster. <BR><P>
                      <B>Thank you for your patience!</B>
                     };
      
          $should_email = 1;
        }
      
      
        print qq{Content-type: text/html
      
      <HTML><BODY BGCOLOR="white">
      <B>Oops, Something went wrong.</B><P>
      $message
      </BODY></HTML>};      
      
          # send email report if appropriate
        if ($should_email){
      
            # import sendmail subs
          use Mail ();
            # prepare the email error report:
          my $subject ="Error Report";
          my $body = qq|
        An error has happened:
      
        $orig_why
      
          |;
      
            # send error reports to admin and author
          send_mail($c{email}{'admin'},$c{email}{'admin'},$subject,$body);
          send_mail($c{email}{'admin'},$c{email}{'author'},$subject,$body);
          print STDERR "[".scalar localtime()."] [SIGDIE] Sending Error Email\n";
        }
      
           # print to error_log so we will know we've sent
        print STDERR "[".scalar localtime()."] [SIGDIE] $orig_why \n";
      
        exit 1;
      }                             # end of sub mydie
      
    

    You may have noticed that I trap the CGI.pm's die() calls here, I don't see any reason why my users should see ugly error messages, but that's the way CGI.pm written. The workaround is to trap them yourself.

    Please note that as of version 2.49, CGI.pm provides a cgi_error() method to print the errors and won't die() unless you want it to.

    [TOC]


    Caching the POSTed Data

    What happens if you need to access the POSTed data more than once? May be if you want to reuse it on subsequent requests. At the low-level data can only be read from a socket once. So you have to store it once and make it available for reuse. There is an experimental option for Makefile.PL called PERL_STASH_POST_DATA. If you turn it on, you can get at it again with $r->subprocess_env("POST_DATA"). This is not on by default because of the overhead it adds. And, because not all POST data is read in one clump, what do we do with large multipart file uploads? It's not a problem that's easy to solve in a general way. You might try the following approach:

      <Limit POST>
         PerlFixupHandler    My::fixup_handler
      </Limit>
    

      use Apache::Constants;
      sub My::fixup_handler {
        my $r = shift;
        return DECLINED unless $r->method eq "POST";
        $r->args(scalar $r->content);
        $r->method("GET");
        $r->method_number(M_GET);
        $r->headers_in->unset('Content-length');
        OK;
      }
    

    Now when CGI.pm, Apache::Request or whoever parses the client data, it can do so more than once since $r->args doesn't go away (unless you make it go away).

    [TOC]


    Cache control for regular and error modes

    To disable caching you should use the headers:

      Pragma: no-cache
      Cache-control: no-cache
    

    For normally generated responds use:

      $r->header_out("Pragma","no-cache");
      $r->header_out("Cache-control","no-cache");
      $r->no_cache(1);
    

    If for some reason you need to use them in Error control code use:

      $r->err_header_out("Pragma","no-cache");
      $r->err_header_out("Cache-control","no-cache");
    

    [TOC]


    Redirect a POST request, forwarding the content

    With mod_perl you can easily redirect a POST request to some other location. All it takes is reading in the contents, setting the method to be of a GET type and args with the content to be forwarded and finally doing the redirect:

      my $r = shift;
      my $content = $r->content;
      $r->method("GET");
      $r->method_number(M_GET);
      $r->headers_in->unset("Content-length");
      $r->args($content);
      $r->internal_redirect_handler("/new/url");
    

    Of course that last line can be any kind of redirect, not necessarily an internal redirect.

    [TOC]


    Reading POST Data, then Redirecting or doing something else

    If you read POST data, then redirect, you need to do this before the redirect or apache will hang:

      $r->method_number(M_GET);
      $r->method('GET');
      $r->headers_in->unset('Content-length');
      $r->header_out('Location' => $ENV{SCRIPT_NAME});
      $r->status(REDIRECT);
      $r->send_http_header;
    

    After the first time you read POST data, you need the code above to prevent somebody else from trying to read post data that's already been read.

    [TOC]


    Redirecting While Maintaining Environment Variables

    Let's say you have a module that sets some environment variables.

    If you redirect, that's most likely telling the web browser to fetch the new page. This makes it a totally new request and none of environment variables stays preserved.

    However, if you're using internal_redirect(), then subprocess_env() should do the trick, but the %ENV keys will be prefixed with REDIRECT_.

    [TOC]


    Terminating a child process on Request Completion

    If you want to terminate the child process serving the current request, upon completion of processing, call anywhere in the code:

      $r->child_terminate;
    

    Apache won't actually terminate the child until everything is done and the connection is closed.

    [TOC]


    More on relative paths

    Many people use relative paths for require, use, etc., or open files in the current directory or relative to the current directory. But this will fail if you don't chdir() into the correct directory first (e.g when you call the script by its full path). This code would work:

      /home/httpd/perl/test.pl:
      -------------------------
      #!/usr/bin/perl
      open IN, "./foo.txt";
      -------------------------
    

    if we call the script by:

      % chdir /home/httpd/perl
      % ./test.pl
    

    since foo.txt is located at the same directory the script is being called from. if we call the script by:

      % /home/httpd/perl/test.pl
    

    when we aren't chdir to the /home/httpd/perl, the script will fail to find foo.txt. If you don't want to use hardcoded directories in your scripts, FindBin.pm package will come to rescue.

      use FindBin qw($Bin);
      use lib $Bin;
      open IN, "./foo.txt";
    

    or

      use FindBin qw($Bin);
      open IN, "$Bin/foo.txt";
    

    Now $Bin includes the path of the directory the script resides in, so you can move the script from one directory to the other and call it from anywhere else. The paths will be always correct.

    It's different from using "./foo", for you first have to chdir to the directory in which the script is located. (Think about crontabs!!!)

    Important: FindBin will not work in mod_perl environment as it's loaded and executed only for the first script executed inside the process, all the other will use the cached value, which would be probably incorrect.

    [TOC]


    Watching the error_log file without telneting to the server

    I wrote this script a long time ago, when I had to debug my CGI scripts but didn't have the access to the error_log file. I asked the admin to install this script and have used it happily since then.

    If your scripts are running on these 'Get-free-site' servers, and you cannot debug your script because you can't telnet to the server or can't see the error_log, you can ask your sysadmin to install this script.

    Note, that it was written for a plain Apache, and isn't prepared to handle complex multiline error and warning messages generated by mod_perl. It also uses a system() call to do the main work with tail() utility, probably a more efficient perl implementation is due (take a look at File::Tail module). You are welcome to fix it and contribute it back to mod_perl community. Thank you!

    Ok, here is the code:

      # !/usr/bin/perl -Tw
      
      use strict;
      
      my $default   = 10;
      my $error_log = "/usr/local/apache/logs/error_log";
      use CGI;
      
      # untaint $ENV{PATH}
      $ENV{'PATH'} = '/bin:/usr/bin';
      delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
        
      my $q = new CGI;
      
      my $counts = (defined $q->param('count') and $q->param('count'))
        ? $q->param('count') : $default;
      
      print $q->header,
            $q->start_html(-bgcolor => "white",
                         -title   => "Error logs"),
            $q->start_form,
            $q->center(
                     $q->b('How many lines to fetch? '),
                     $q->textfield('count',10,3,3),
                     $q->submit('', 'Fetch'),
                     $q->reset,
                    ),
            $q->end_form,
            $q->hr;
      
      # untaint $counts
      $counts = ($counts =~ /(\d+)/) ? $1 : 0;
      
      print($q->b("$error_log doesn't exist!!!")),exit unless -e $error_log;
      
      open LOG, "tail -$counts $error_log|" or die "Can't open tail on $error_log :$!\n";
      my @logs = <LOG>;
      close LOG;
        # format and colorize each line nicely
      foreach (@logs) {
          s{
           \[(.*?)\]\s* # date
           \[(.*?)\]\s* # type of error 
           \[(.*?)\]\s* # client part
           (.*)         # the message
          }
          {
            "[$1] <BR> [".
            colorize($2,$2).
            "] <BR> [$3] <PRE>".
            colorize($2,$4).
            "</PRE>"
          }ex;
        print "<BR>$_<BR>"; 
      }
      
      
      
      #############
      sub colorize{
        my ($type,$context) = @_;
      
        my %colors = 
          (
           error  => 'red',
           crit   => 'black',
           notice => 'green',
           warn   => 'brown',
          );
      
        return exists $colors{$type}
            ? qq{<B><FONT COLOR="$colors{$type}">$context</FONT></B>}
            : $context;
      }
    

    [TOC]


    Accessing variables from the caller's package

    Sometimes you want to access variables from the caller's package. One way is to do:

      my $caller = caller;
      print qq[$caller --- ${"${caller}::var"}];
    

    [TOC]


    Handling cookies

    Unless you use some well known module like CGI.pm you can handle the cookies yourself.

    Cookies come in the $ENV{HTTP_COOKIE} variable. You can print the raw cookie string as $ENV{HTTP_COOKIE}.

    Here is a fairly well-known bit of code to take cookie values and put them into a hash:

      sub getCookies {
          # cookies are seperated by a semicolon and a space, this will
          # split them and return a hash of cookies
        local(@rawCookies) = split (/; /,$ENV{'HTTP_COOKIE'});
        local(%cookies);
      
        foreach(@rawCookies){
          ($key, $val) = split (/=/,$_);
          $cookies{$key} = $val;
        }
      
        return %cookies;
      }
    

    [TOC]


    Sending multiple cookies with Perl API

    Taken that you have prepared your cookies in @cookies, the following would do:

      for(@cookies){
       $r->headers_out->add( 'Set-Cookie' => $_ );
     }
    

    [TOC]


    Passing and preserving custom data structures between handlers

    Let's say that you wrote a few handlers to process a request, and they all need to share some custom Perl data structure. The pnotes() method comes to your rescue. Given that one of the handlers stored some data in a hash %my_data, before it finishes its activity:

       # First handler:
       my %my_data = qw(foo => 1, bar => 2);
       $r->pnotes('my_data' => \%my_data);
    

    All the subsequent handlers will be able to retrieve the stored data with:

       # Later handler:
       my $info = $r->pnotes('my_data');
       print $info->{foo};
    

    The stored information will be destroyed at the end of the request.

    [TOC]


    Passing environment variables between handlers

    A simple example of passing environment variables between handlers:

    Having a configuration:

      PerlAccessHandler My::Access
      PerlLogHandler My::Log
    

    and startup.pl:

      sub My::Access::handler {
        my $r = shift;
        $r->subprocess_env(TICKET => $$);
        $r->notes(TICKET => $$);
      }
      
      sub My::Log::handler {
        my $r = shift;
        my $env = $r->subprocess_env('TICKET');
        my $note = $r->notes('TICKET');
        warn "env=$env, note=$note\n";
      }
    

    Adding %{TICKET}e and %{TICKET}n to the LogFormat for access_log works fine too.

    [TOC]


    CGI::params in the mod_perl-ish way

    Extracting request params in the mod_perl-ish way:

      my $r = shift;  # or $r = Apache->request
      my %params = $r->method eq 'POST' ? $r->content : $r->args;
    

    Also take a look at Apache::Request which has the same parameters extraction and setting API.

    [TOC]


    Subclassing Apache::Request example

      package My::TestAPR;
        
      use strict;
      use vars qw/@ISA/;
      @ISA = qw/Apache::Request/;
      
      sub new {
            my ($proto, $apr) = @_;
            my $class = ref($proto) || $proto;
            bless { _r => $apr }, $class;
      }
      
      sub param {
            my ($self, $key) = @_;
            my $apr = $self->{_r};
            $apr->param($key) . '42';
      }
      
      sub sum {
            my ($self, $key) = @_;
            my $apr = $self->{_r};
            my @values = $apr->param($key);
            my $sum = 0;
            for (@values) {
                    $sum += $_;
            }
            $sum;
      }
      1;
      __END__
    

    [TOC]


    Sending email from mod_perl

    Well, there is nothing special about sending email from mod_perl, it's just that we do that a lot. And there are a few important issues about it. The most widely used approach is firing a sendmail process and piping the headers and the body to it. The problem is that sendmail is a very heavy process and it makes mod_perl processes less efficient.

    One of the improvements is to say to sendmail not to deliver the email at the ``real time'' but to do that in the background or just queue the job until the next queue run, if you don't want your process to wait until delivery is complete, which sometimes significantly diminishes the delay for mod_perl process waiting for the sendmail proces to complete. This can be specified for all deliveries in sendmail.cf or on each invocation on the sendmail command line: -odb (background) -odq (queue-only) or -odd (queue and also defer the DNS/NIS lookups).

    Some people prefer using a lighter mail delivery programs like qmail.

    The most efficient approach is to talk directly to the SMTP server. Luckily Net::SMTP modules makes this task a very easy one. The only problem is when <Net::SMTP> fails to deliver the mail, because the destination peer server is temporarely down. But from the other side Net::SMTP allows you to send email much much faster, since you don't have to invoke a dedicated process for that. Here is an example of the subroutine that sends email.

      use Net::SMTP ();
      use Carp qw(carp verbose);
      
      #
      # Sends email by using the SMTP Server
      #
      # The SMTP server as defined in Net::Config 
      # or you can hardcode it here, look for $smtp_server below 
      #
      sub send_mail{
        my ($from, $to, $subject, $body) = @_;
      
        my $mail_message = <<__END_OF_MAIL__;
      To: $to
      From: $from
      Subject: $subject
      
      $body
      
      __END_OF_MAIL__
      
          # Set this parameter if you don't have a valid Net/Config.pm
          # entry for SMTP host and uncomment it in the Net::SMTP->new
          # call
        # my $smtp_server = 'localhost';
      
          # init the server
        my $smtp = Net::SMTP->new(
                                # $smtp_server,
                                Timeout => 60, 
                                Debug   => 0,
                               );
      
        $smtp->mail($from) or carp ("Failed to specify a sender [$from]\n");
        $smtp->to($to) or carp ("Failed to specify a recipient [$to]\n");
        $smtp->data([$mail_message]) or carp ("Failed to send a message\n");
      
        $smtp->quit or carp ("Failed to quit\n");
      
      } #  end of sub send_mail
    

    [TOC]


    Code Unloading

    We urge to preload as much code as possible all the time as it reduces the memory footprint. But sometimes we want to unload the code that was loaded before. For example, you could load many modules to do some configuration or initialization work at the server startup, but none of the children will need these modules later. You can unload the code.

    For example if you use XML::Parser in a <Perl section only, you could remove it with:

      delete $INC{'XML/Parser.pm'};
      Apache::PerlRun->flush_namespace('XML::Parser');
    

    [TOC]


    A Simple Handler To Print The Environment Variables

    The code:

      package MyEnv;
      use Apache;
      use Apache::Constants;
      sub handler{ 
        my $r = shift; 
        print $r->send_http_header("text/plain"); 
        print map {"$_ => $ENV{$_}\n"} keys %ENV;
        return OK;
      }
      1;
    

    The configuration:

      PerlModule MyEnv
      <Location /env>
        SetHandler perl-script
        PerlHandler MyEnv
      </Location>
    

    The invocation:

      http://localhost/env
    

    [TOC]


    mysql backup and restore scripts

    Well, this is something off-topic but since many of us use mysql or other RDBMS in their work with mod_perl driven sites, it's good to know how to backup and restore the databases in case of database corruption.

    First we should tell the mysql to log all the clauses that modify the databases (we don't care about SELECT queries for database backups). Modify the safe_mysql script by adding the --log-update options to the mysql server starting parameters and restart the server. From now on all the non-select queries will be logged into /var/lib/mysql/www.bar.com file. Your hostname will show up instead of www.bar.com.

    Now create a dump directory under /var/lib/mysql/. That's where the backups will be stored (you can name the directory as you wish of course).

    Prepare the backup script and store it in file, e.g: /usr/local/sbin/mysql/mysql.backup.pl

      #!/usr/bin/perl -w
      
      # this script should be run from the crontab every night or in shorter
      # intervals. This scripts does a few things.
      # 1. dump all the tables into a separate dump files (these dump files 
      # are ready for DB restore)
      # 2. backups the last update log file and create a new log file
      
      
      use strict;
      my $data_dir = "/var/lib/mysql";
      my $update_log = "$data_dir/www.bar.com";
      my $dump_dir  = "$data_dir/dump";
      my $gzip_exec = "/bin/gzip";
      my @db_names = qw(bugs mysql bonsai);
      my $mysql_admin_exec = "/usr/bin/mysqladmin ";
      
          # convert unix time to date + time
      my ($sec,$min,$hour,$mday,$mon,$year) = localtime(time);
      my $time  = sprintf("%0.2d:%0.2d:%0.2d",$hour,$min,$sec);
      my $date  = sprintf("%0.2d.%0.2d.%0.4d",++$mon,$mday,$year+1900);
      my $timestamp = "$date.$time";
      
      # dump all the DBs we want to backup
      foreach my $db_name (@db_names) {
        my $dump_file = "$dump_dir/$timestamp.$db_name.dump";
        my $dump_command = "/usr/bin/mysqldump -c -e -l -q --flush-logs $db_name > $dump_file";
        system $dump_command;
      }
      
      # move update log to backup for later restore if needed
      rename $update_log, "$dump_dir/$timestamp.log" if -e $update_log;
      
      # restart the update log to log to a new file!
      `/usr/bin/mysqladmin refresh`;
      
      # compress all the created files
      system "$gzip_exec $dump_dir/$timestamp.*";
    

    You might need to change the executable paths according to your system. And list the names of the databases you want to backup, using the db_names array.

    Now make the script executable and arrange the crontab entry to run the backup script nightly. Notice that in time there backups will use lots of disk space and you should remove the old ones. A sample crontab entry, to run the script at 4am every day:

      0 4 * * * /usr/local/sbin/mysql/mysql.backup.pl > /dev/null 2>&1
    

    So what we have achieved is this. At any moment we have the dump of the databases from the last execution of the backup script and the log file of all the clauses that has updated the databases since then. So if the database gets corrupted we have all the information to restore it, without loosing a single bit of information. We restore it with the following script, which I put in: /usr/local/sbin/mysql/mysql.restore.pl

      #!/usr/bin/perl -w
      
      # this scripts restores the DBs
      
      # Usage: mysql.restore.pl update.log.gz dump.db1.gz [... dump.dbn.gz]
      # all files dump* are compressed as we expect them to be created by 
      # mysql.backup utility
      
      # example: 
      # % mysql.restore.pl myhostname.log.gz 12.10.1998.16:37:12.*.dump.gz
      
      # .dump.gz extension.
      
      use strict;
      
      use FindBin qw($Bin);
      
      my $data_dir   = "/var/lib/mysql";
      my $dump_dir   = "$data_dir/dump";
      my $gzip_exec  = "/bin/gzip";
      my $mysql_exec = "/usr/bin/mysql -f ";
      my $mysql_backup_exec = "$Bin/mysql.backup.pl";
      my $mysql_admin_exec  = "/usr/bin/mysqladmin ";
      
      my $update_log_file = '';
      my @dump_files = ();
      
      # split input files into an update log and the dump files
      foreach (@ARGV) {
        push(@dump_files, $_),next unless /\.log\.gz/;
        $update_log_file = $_;
      }
      
      die "Usage: mysql.restore.pl update.log.gz dump.db1.gz [... dump.dbn.gz]\n" 
        unless defined @dump_files and @dump_files > 0;
      
      # load the dump files
      foreach (@dump_files) {
      
          # check the file exists
        warn("Can't locate $_"),next unless -e $_;
      
          # extract the db name from the dump file
        my $db_name = $1 if /\d\d\.\d\d.\d\d.\d\d:\d\d:\d\d\.(\w+)\.dump\.gz/;
      
        warn("Can't extract DB name from the file name,
              probably an error in the file format"),
                next unless defined $db_name and $db_name;
      
          # we want to drop the table since restore will rebuild it!
          # force to drop the db without confirmation
        my $drop_command = "$mysql_admin_exec -f drop $db_name";
        system $drop_command;
      
        $drop_command = "$mysql_admin_exec create $db_name";
        system $drop_command;
      
          # build the command and execute it
        my $restore_command = "$gzip_exec -cd $_ | $mysql_exec $db_name";
        system $restore_command;
      }
      
      # now load the update_log file (update the db with the changes since
      # the last dump
      warn("Can't locate $update_log_file"),next unless  -e $update_log_file;
      
      my $restore_command = 
        "$gzip_exec -cd $update_log_file |$mysql_exec";
      system $restore_command;
      
      # rerun the mysql.backup.pl since we have reloaded the dump files
      # and update log , and we must rebuild backups!
      system $mysql_backup_exec;
    

    These are kinda dirty scripts, but they work... if you come up with a more clean scripts, please contribute... thanks

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/start.html0100644000000000000000000002543307027225633012712 0ustar rootroot mod_perl guide: Guide's Overview

    Mod Perl Icon Mod Perl Icon Guide's Overview


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    What's inside?

    Before you start with mod_perl installation, you should have an overall picture of this wonderful technology. There is more then one way to use a mod_perl-enabled webserver. You have to decide what mod_perl scheme you want to use. Picking the Right Strategy chapter presents various approaches and discusses their pros and cons.

    Once you know what fits your requirements the best, you should proceed to Real World Scenarios Implementation. This chapter provides very detailed scenarios of the schemes discussed in the Picking the Right Strategy chapter.

    The Server Installation chapter follows on to the Real World Scenarios Implementaion chapter by providing more in-depth installation details.

    The Server Configuration chapter adds to the basic configurations presented in the Real World Scenarios Implementaion chapter with extended configurations and various configuration examples.

    The Frequent mod_perl problems chapter just collects links to other chapters. It is an attempt to stress some of the most frequently encountered mod_perl problems. So this is the first place you should check if you have got a problem.

    Probably the most important chapter is CGI to mod_perl Porting. mod_perl Coding guidelines. It explains the differences between scripts running under mod_cgi and mod_perl, and what should be done in order to make existing scripts run under mod_perl. Along with the porting notes it provides guidelines for proper mod_perl programming.

    Performance. Benchmarks is the biggest and a very important chapter. It explains the details of tuning mod_perl and the scripts running under it, so you can squeeze every ounce of the power from your server. A large part of the chapter is benchmarks, the numbers that IT managers love to read. But these are different benchmarks: they are not comparing mod_perl with similar technologies, rather with different configurations of mod_perl servers, to guide you through the tuning process. I have to admit, performance tuning is a very hard task, and demands a lot of understanding and experience. But once you acquire this knowledge you can make magic with your server.

    The Things obvious to others, but not to you chapter is exactly what it claims to be. Some people have been in this business too long, and many things have become too obvious to them. This is not true for a newbie, so this chapter talks about such things.

    While developing your mod_perl applications, you will begin to understand that an error_log file is your best friend. It tells you all the intimate details of what is happening to your scripts. But the problem is that it speaks a secret language. To learn the alphabet and the grammar of this language, refer to the chapter Warnings and Errors: Where and Why.

    Protecting Your Site - All about security.

    If you are into driving relational databases with your cgi scripts, the mod_perl and Relational Databases chapter will tell you all about the database-related goodies mod_perl has prepared for you.

    If you are using good old dbm files for your databases, the mod_perl and dbm files chapter explains how to utilize them better under mod_perl.

    More and more Internet Service Providers (ISPs) are evaluating the possibility of providing mod_perl services to their users. Is this possible? Is it secure? Will it work? What resources does it take? The mod_perl for ISPs. mod_perl and Virtual Hosts chapter answers all these questions. If you want to run a mod_perl- enabled server, but do not have root access, read this chapter as well, either to learn how to do it yourself, or maybe to persuade your ISP to provide this service.

    If you have to administer your Apache mod_perl server the Controlling and Monitoring the Server chapter is for you. Among the topics are: server restarting and monitoring techniques, preventing the server from eating up all your disk space in a matter of minutes, and more.

    (META: fix this) The mod_perl Status. Peeking into the Server's Perl Innards chapter shows you the ways you can peek at what is going on in a mod_perl-enabled server while it is running. Like looking at the value of some global variable, what database connections are open, looking up what modules were loaded and their paths, what is the value of @INC, and much more.

    Every programmer needs to know how to debug her program. It is an _easy_ task with plain Perl. Just invoke the program with the -d flag and debug it. Is it possible to do the same under mod_perl? After all you cannot debug every CGI script by executing it from the command line: some scripts will not run from the command line. The Debugging mod_perl chapter proves debugging under mod_perl is possible and real.

    Sometimes browsers that interact with our servers have bugs, which cause big headaches for CGI developers. Preventing these bugs from happening is discussed in the Workarounds for some known bugs in browsers chapter.

    Many modules were written to extend the mod_perl's core functionality. Some important modules are covered in the Apache::* modules chapter.

    Some folks decide to go with mod_perl, but they are missing a basic understanding of Perl, which is absolutely not tolerated by mod_perl. If you are such a person, there is nothing to be ashamed of; we all went through this. Get a good Perl book and start reading. The Perl Reference chapter gives some basic perl lessons, delivering the knowledge without which you cannot start to program mod_perl scripts.

    The Code Snippets chapter is just a collection of code snippets I have found useful while writing the scripts.

    The Choosing an Operating System and Hardware chapter gives you an idea on how to choose the SW and HW for the webserver.

    The mod_perl Advocacy tries to help to make it easier to advocate mod_perl around the world.

    The Getting Help and Further Learning chapter refers you to other related information resources, like learning Perl programming and SQL, understanding security, building databases, and more.

    Appendix A: Downloading software and documentation includes pointers to the software that was explained and/or mentioned in this guide.

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 11/27/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/strategy.html0100644000000000000000000007752207027225634013426 0ustar rootroot mod_perl guide: Choosing the Right Strategy

    Mod Perl Icon Mod Perl Icon Choosing the Right Strategy


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    Do it like I do it!?

    There is no such thing as the RIGHT strategy in the web server business, although there are many wrong ones. Never believe a person who says: "Do it this way, this is the best!". As the old saying goes: "Trust but verify". There are too many technologies out there to choose from, and it would take an enormous investment of time and money to try to validate each one before deciding which is the best choice for your situation. With this in mind, I will present some ways of using standalone mod_perl, and some combinations of mod_perl and other technologies. I'll describe how these things work together, and offer my opinions on the pros and cons of each, the relative degree of difficulty in installing and maintaining them, and some hints on approaches that should be used and things to avoid.

    To be clear, I will not address all technologies and tools, but limit this discussion to those complementing mod_perl.

    Please let me stress it again: DO NOT blindly copy someone's setup and hope for a good result. Choose what is best for your situation -- it might take some effort to find out what that is.

    In this chapter we will discuss

    [TOC]


    mod_perl Deployment Overview

    There are several different ways to build, configure and deploy your mod_perl enabled server. Some of them are:

    1. Having one binary and one configuration file (one big binary for mod_perl).

    2. Having two binaries and two configuration files (one big binary for mod_perl and one small binary for static objects like images.)

    3. Having one DSO-style binary and two configuration files, with mod_perl available as a loadable object.

    4. Any of the above plus a reverse proxy server in http accelerator mode.

    If you are a newbie, I would recommend that you start with the first option and work on getting your feet wet with apache and mod_perl. Later, you can decide whether to move to the second one which allows better tuning at the expense of more complicated administration, or to the third option -- the more state-of-the-art-yet-suspiciously-new DSO system, or to the fourth option which gives you even more power.

    1. The first option will kill your production site if you serve a lot of static data from large (4 to 15MB) webserver processes. On the other hand, while testing you will have no other server interaction to mask or add to your errors.

    2. This option allows you to tune the two servers individually, for maximum performance.

      However, you need to choose between running the two servers on multiple ports, multiple IPs, etc., and you have the burden of administering more than one server. You have to deal with proxying or fancy site design to keep the two servers in synchronization.

    3. With DSO, modules can be added and removed without recompiling the server, and their code is even shared among multiple servers.

      You can compile just once and yet have more than one binary, by using different configuration files to load different sets of modules. The different Apache servers loaded in this way can run simultaneously to give a setup such as described in the second option above.

      On the down side, you are playing at the bleeding edge.

      You are dealing with a new solution that has weak documentation and is still subject to change. It is still somewhat platform specific. Your mileage may vary.

      The DSO module (mod_so) adds size and complexity to your binaries.

      See Build mod_perl as DSO inside Apache source tree via APACI

    4. The fourth option (proxy in http accelerator mode), once correctly configured and tuned, improves the performance of any of the above three options by caching and buffering page results.

    [TOC]


    Alternative architectures for running one and two servers

    The next part of this chapter discusses the pros and the cons of each of these presented configurations. Real World Scenarios Implementaion describes the implementation techniques of these schemes.

    We will look at the following installations:

    [TOC]


    Standalone mod_perl Enabled Apache Server

    The first approach is to implement a straightforward mod_perl server. Just take your plain apache server and add mod_perl, like you add any other apache module. You continue to run it at the port it was running before. You probably want to try this before you proceed to more sophisticated and complex techniques.

    The advantages:

    • Simplicity. You just follow the installation instructions, configure it, restart the server and you are done.

    • No network changes. You do not have to worry about using additional ports as we will see later.

    • Speed. You get a very fast server, you see an enormous speedup from the first moment you start to use it.

    The disadvantages:

    • The process size of a mod_perl-enabled Apache server is huge (maybe 4Mb at startup and growing to 10Mb and more, depending on how you use it) compared to the typical plain Apache. Of course if memory sharing is in place, RAM requirements will be smaller.

      You probably have a few tens of child processes. The additional memory requirements add up in direct relation to the number of child processes. Your memory demands are growing by an order of magnitude, but this is the price you pay for the additional performance boost of mod_perl. With memory prices so cheap nowadays, the additional cost is low -- especially when you consider the dramatic performance boost mod_perl gives to your services with every 100Mb of RAM you add.

      While you will be happy to have these monster processes serving your scripts with monster speed, you should be very worried about having them serve static objects such as images and html files. Each static request served by a mod_perl-enabled server means another large process running, competing for system resources such as memory and CPU cycles. The real overhead depends on static objects request rate. Remember that if your mod_perl code produces HTML code which includes images, each one will turn into another static object request. Having another plain webserver to serve the static objects solves this unpleasant obstacle. Having a proxy server as a front end, caching the static objects and freeing the mod_perl processes from this burden is another solution. We will discuss both below.

    • Another drawback of this approach is that when serving output to a client with a slow connection, the huge mod_perl-enabled server process (with all of its system resources) will be tied up until the response is completely written to the client. While it might take a few milliseconds for your script to complete the request, there is a chance it will be still busy for some number of seconds or even minutes if the request is from a slow connection client. As in the previous drawback, a proxy solution can solve this problem. More on proxies later.

      Proxying dynamic content is not going to help much if all the clients are on a fast local net (for example, if you are administering an Intranet.) On the contrary, it can decrease performance. Still, remember that some of your Intranet users might work from home through slow modem links.

    If you are new to mod_perl, this is probably the best way to get yourself started.

    And of course, if your site is serving only mod_perl scripts (close to zero static objects, like images), this might be the perfect choice for you!

    For implementation notes see the ``One Plain and One mod_perl enabled Apache Servers'' section in implementations chapter.

    [TOC]


    One Plain Apache and One mod_perl-enabled Apache Servers

    As I have mentioned before, when running scripts under mod_perl, you will notice that the httpd processes consume a huge amount of virtual memory, from 5Mb to 15Mb and even more. That is the price you pay for the enormous speed improvements under mod_perl. (Again -- shared memory keeps the real memory that is being used much smaller :)

    Using these large processes to serve static objects like images and html documents is overkill. A better approach is to run two servers: a very light, plain apache server to serve static objects and a heavier mod_perl-enabled apache server to serve requests for dynamic (generated) objects (aka CGI).

    From here on, I will refer to these two servers as httpd_docs (vanilla apache) and httpd_perl (mod_perl enabled apache).

    The advantages:

    • The heavy mod_perl processes serve only dynamic requests, which allows the deployment of fewer of these large servers.

    • MaxClients, MaxRequestsPerChild and related parameters can now be optimally tuned for both httpd_docs and httpd_perl servers, something we could not do before. This allows us to fine tune the memory usage and get a better server performance.

      Now we can run many lightweight httpd_docs servers and just a few heavy httpd_perl servers.

    An important note: When a user browses static pages and the base URL in the Location window points to the static server, for example http://www.nowhere.com/index.html -- all relative URLs (e.g. <A HREF="/main/download.html">) are being served by the light plain apache server. But this is not the case with dynamically generated pages. For example when the base URL in the Location window points to the dynamic server -- (e.g. http://www.nowhere.com:8080/perl/index.pl) all relative URLs in the dynamically generated HTML will be served by the heavy mod_perl processes. You must use fully qualified URLs and not relative ones! http://www.nowhere.com/icons/arrow.gif is a full URL, while /icons/arrow.gif is a relative one. Using <BASE HREF="http://www.nowhere.com/"> in the generated HTML is another way to handle this problem. Also the httpd_perl server could rewrite the requests back to httpd_docs (much slower) and you still need the attention of the heavy servers. This is not an issue if you hide the internal port implementations, so the client sees only one server running on port 80. (See Publishing port numbers different from 80)

    The disadvantages:

    • An administration overhead.

      • The need for two different sets of configuration, log and other files. We need a special directory layout to manage these. While some directories can be shared between the two servers (like the include directory, containing the apache include files -- assuming that both are built from the same source distribution), most of them should be separated and the configuration files updated to reflect the changes.

      • The need for two sets of controlling scripts (startup/shutdown) and watchdogs.

      • If you are processing log files, now you probably will have to merge the two separate log files into one before processing them.

    • Just as in the one server approach, we still have the problem of a mod_perl process spending its precious time serving slow clients, when the processing portion of the request was completed a long time ago. Deploying a proxy solves this, and will be covered in the next section.

      As with the single server approach, this is not a major disadvantage if you are on a fast network (i.e. Intranet). It is likely that you do not want a buffering server in this case.

    Before you go on with this solution you really want to look at the Adding a Proxy Server in http Accelerator Mode section.

    For implementation notes see the ``One Plain and One mod_perl enabled Apache Servers'' section in implementations chapter.

    [TOC]


    One light non-Apache and One mod_perl enabled Apache Servers

    If the only requirement from the light server is for it to serve static objects, then you can get away with non-apache servers having an even smaller memory footprint. thttpd has been reported to be about 5 times faster then apache (especially under a heavy load), since it is very simple and uses almost no memory (260k) and does not spawn child processes.

    Meta: Hey, No personal experience here, only rumours. Please let me know if I have missed some pros/cons here. Thanks!

    The Advantages:

    • All the advantages of the 2 servers scenario.

    • More memory saving. Apache is about 4 times bigger then thttpd, if you spawn 30 children you use about 30M of memory, while thttpd uses only 260k - 100 times less! You could use the 30M you've saved to run a few more mod_perl servers.

      The memory savings are significantly smaller if your OS supports memory sharing with Dynamically Shared Objects (DSO) and you have configured apache to use it. If you do allow memory sharing, 30 light apache servers ought to use only about 3 to 4Mb, because most of it will be shared. There is no memory sharing if apache modules are statically compiled into httpd.

    • Reported to be about 5 times faster then plain apache serving static objects.

    The Disadvantages:

    • Lacks some of apache's features, like access control, error redirection, customizable log file formats, and so on.

    [TOC]


    Adding a Proxy Server in http Accelerator Mode

    At the beginning there were 2 servers: one plain apache server, which was very light, and configured to serve static objects, the other mod_perl enabled (very heavy) and configured to serve mod_perl scripts. We named them httpd_docs and httpd_perl respectively.

    The two servers coexist at the same IP address by listening to different ports: httpd_docs listens to port 80 (e.g. http://www.nowhere.com/images/test.gif) and httpd_perl listens to port 8080 (e.g. http://www.nowhere.com:8080/perl/test.pl). Note that I did not write http://www.nowhere.com:80 for the first example, since port 80 is the default port for the http service. Later on, I will be changing the configuration of the httpd_docs server to make it listen to port 81.

    Now I am going to convince you that you want to use a proxy server (in the http accelerator mode). The advantages are:

    • Allow serving of static objects from the proxy's cache (objects that previously were entirely served by the httpd_docs server).

    • You get less I/O activity reading static objects from the disk (proxy serves the most ``popular'' objects from RAM - of course you benefit more if you allow the proxy server to consume more RAM). Since you do not wait for the I/O to be completed you are able to serve static objects much faster.

    • The proxy server acts as a sort of output buffer for the dynamic content. The mod_perl server sends the entire response to the proxy and is then free to deal with other requests. The proxy server is responsible for sending the response to the browser. So if the transfer is over a slow link, the mod_perl server is not waiting around for the data to move.

      Using numbers is always more convincing :) Let's take a user connected to your site with 28.8 kbps (bps == bits/sec) modem. It means that the speed of the user's link is 28.8/8 = 3.6 kbytes/sec. I assume an average generated HTML page to be of 10kb (kb == kilobytes) and an average script that generates this output in 0.5 secs. How long will the server wait before the user gets the whole output response? A simple calculation reveals pretty scary numbers - it will have to wait for another 6 secs (20kb/3.6kb), when it could serve another 12 (6/0.5) dynamic requests in this time.

      This very simple example shows us that we need only one twelfth the number of children running, which means that we will need only one twelfth of the memory (not quite true because some parts of the code are shared).

      But you know that nowadays scripts often return pages which are blown up with javascript code and similar, which can make them of 100kb size and the download time will be of the order of... (This calculation is left to you as an exercise :)

      Many users like to open many browser windows and do many things at once (download files and browse graphically heavy sites). So the speed of 3.6kb/sec we were assuming before, may often be 5-10 times slower.

    • We are going to hide the details of the server's implementation. Users will never see ports in the URLs (more on that topic later). You can have a few boxes serving the requests, and only one serving as a front end, which spreads the jobs between the servers in a way that you can control. You can actually shut down a server, without the user even noticing, because the front end server will dispatch the jobs to other servers. (This is called a Load Ballancing and it's a pretty big issue, which will not be discussed in this document. For more information see 'High-Availability Linux Project')

    • For security reasons, using any httpd accelerator (or a proxy in httpd accelerator mode) is essential because you do not let your internal server get directly attacked by arbitrary packets from whomever. The httpd accelerator and internal server communicate in expected HTTP requests. This allows for only your public ``bastion'' accelerating www server to get hosed in a successful attack, while leaving your internal data safe.

    The disadvantages are:

    • Of course there are drawbacks. Luckily, these are not functionality drawbacks, but they are more administration hassle. You have another daemon to worry about, and while proxies are generally stable, you have to make sure to prepare proper startup and shutdown scripts, which are run at boot and reboot as appropriate. Also, you might want to set up the crontab to run a watchdog script.

    • Proxy servers can be configured to be light or heavy, the admin must decide what gives the highest performance for his application. A proxy server like squid is light in the concept of having only one process serving all requests. But it can appear pretty heavy when it loads objects into memory for faster service.

    Have I succeeded in convincing you that you want a proxy server?

    If you are on a local area network (LAN), then the big benefit of the proxy buffering the output and feeding a slow client is gone. You are probably better off sticking with a straight mod_perl server in this case.

    [TOC]


    Implementations of Proxy Servers

    As of this writing, two proxy implementations are known to be widely used with mod_perl - squid proxy server and mod_proxy which is a part of the apache server. Let's compare them.

    [TOC]


    The Squid Server

    The Advantages:

    • Caching of static objects. These are served much faster, assuming that your cache size is big enough to keep the most frequently requested objects in the cache.

    • Buffering of dynamic content, by taking the burden of returning the content generated by mod_perl servers to slow clients, thus freeing mod_perl servers from waiting for the slow clients to download the data. Freed servers immediately switch to serve other requests, thus your number of required servers goes down dramatically.

    • Non-linear URL space / server setup. You can use Squid to play some tricks with the URL space and/or domain based virtual server support.

    The Disadvantages:

    • Proxying dynamic content is not going to help much if all the clients are on a fast local net. Also, a message on the squid mailing list implied that squid only buffers in 16k chunks so it would not allow a mod_perl to complete immediately if the output is larger.

    • Speed. Squid is not very fast today when compared with the plain file based web servers available. Only if you are using a lot of dynamic features such as mod_perl or similar is there a reason to use Squid, and then only if the application and the server are designed with caching in mind.

    • Memory usage. Squid uses quite a bit of memory.

      META: more details?

    • HTTP protocol level. Squid is pretty much a HTTP/1.0 server, which seriously limits the deployment of HTTP/1.1 features.

    • HTTP headers, dates and freshness. The squid server might give out stale pages, confusing downstream/client caches.(You update some documents on the site, but squid will still serve the old ones.)

    • Stability. Compared to plain web servers, Squid is not the most stable.

    The pros and cons presented above lead to the idea that you might want to use squid for its dynamic content buffering features, but only if your server serves mostly dynamic requests. So in this situation, when performance is the goal, it is better to have a plain apache server serving static objects, and squid proxying the mod_perl enabled server only.

    For implementation details see the ``Running 1 webserver and squid in httpd accelerator mode'' and the ``Running 2 webservers and squid in httpd accelerator mode'' sections in the implementations chapter.

    [TOC]


    Apache's mod_proxy

    I do not think the difference in speed between apache's mod_proxy and squid is relevant for most sites, since the real value of what they do is buffering for slow client connections. However, squid runs as a single process and probably consumes fewer system resources.

    The trade-off is that mod_rewrite is easy to use if you want to spread parts of the site across different back end servers, while mod_proxy knows how to fix up redirects containing the back-end server's idea of the location. With squid you can run a redirector process to proxy to more than one back end, but there is a problem in fixing redirects in a way that keeps the client's view of both server names and port numbers in all cases.

    The difficult case is where:

    The Advantages:

    • No additional server is needed. We keep the one plain plus one mod_perl enabled apache servers. All you need is to enable mod_proxy in the httpd_docs server and add a few lines to httpd.conf file.

    • The ProxyPass and ProxyPassReverse directives allow you to hide the internal redirects, so if http://nowhere.com/modperl/ is actually http://localhost:81/modperl/, it will be absolutely transparent to the user. ProxyPass redirects the request to the mod_perl server, and when it gets the response, ProxyPassReverse rewrites the URL back to the original one, e.g:

        ProxyPass        /modperl/ http://localhost:81/modperl/
        ProxyPassReverse /modperl/ http://localhost:81/modperl/
      

    • It does mod_perl output buffering like squid does. See the Using mod_proxy notes for more details.

    • It even does caching. You have to produce correct Content-Length, Last-Modified and Expires http headers for it to work. If some of your dynamic content does not change frequently, you can dramatically increase performance by caching it with ProxyPass.

    • ProxyPass happens before the authentication phase, so you do not have to worry about authenticating twice.

    • Apache is able to accelerate secure HTTP requests completely, while also doing accelerated HTTP. With squid you have to use an external redirection program for that.

    • The latest (apache 1.3.6 and later) Apache proxy accelerated mode is reported to be very stable.

    The Disadvantages:

    • Users have reported that it might be a bit slow, but the latest version is fast enough.

      (META: How fast is enough? :) Any figures here?

    For implementation see the ``Using mod_proxy'' section in the implementation chapter.

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.
    guide/style.css0100644000000000000000000000041007027225634012526 0ustar rootrootA { text-decoration: none; } PRE { white-space: pre; font-family: profont, monaco, ocr-a, monospace, fixed; font-size: medium; line-height: normal; color: black; background-color: #ccccff; padding: .25em; margin: .25em; border: thin dashed black; } guide/troubleshooting.html0100644000000000000000000007075507027225634015014 0ustar rootroot mod_perl guide: Warnings and Errors Troubleshooting Index

    Mod Perl Icon Mod Perl Icon Warnings and Errors Troubleshooting Index


    [ Prev | Main Page | Next ]

    Table of Contents:


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.

    [TOC]


    General Advice

    Having the warnings turned on, immensly helps to detect possible problems. See The Importance of Warnings.

    Enabling use diagnostics; generally helps you to determine the source of the problem and how to solve it. See diagnostics pragma for more info.

    [TOC]


    Building and Installation

    See make Troubleshooting and make test Troubleshooting

    [TOC]


    Configuration and Startup

    This section talks about errors reported when you attempt to start the server.

    [TOC]


    libexec/libperl.so: open failed: No such file or directory

    If when you run the server you get the following error:

      libexec/libperl.so: open failed: No such file or directory
    

    The above error seems to indicate that perl was compiled with a shared library. mod_perl does detect this and links the apache executable to the perl shared library (libperl.so).

    First of all make sure you have perl installed on the machine, and that you have libperl.so in /<version>/<architecture>/CORE>. For example in /usr/local/lib/perl5/5.00503/sun4-solaris/CORE.

    Then make sure that that directory is included in the environment variable LD_LIBRARY_PRELOAD. Under normal circumstances, apache should have the path configured at compile time, but this way you can override the library path.

    [TOC]


    Invalid command 'PerlHandler'...

      Syntax error on line 393 of /etc/httpd/conf/httpd.conf: Invalid
      command 'PerlHandler', perhaps mis-spelled or defined by a module
      not included in the server configuration [FAILED]
    

    Happens when you have a mod_perl enabled Apache compiled with DSO (Generally it's an installed RPM or other binary package). You have to tell apache to load mod_perl by adding:

      AddModule mod_perl.c
    

    in your config file.

    [TOC]


    RegistryLoader: Cannot translate the URI /home/httpd/perl/test.pl

    (Meta: I've changed this warning message in the module - update it!!!)

      RegistryLoader: Cannot translate the URI /home/httpd/perl/test.pl
                  into a real path to the filename. Please refer to the
                  manpage for more information
                  or use the complete method's call like:
                  $r->handler(uri,filename);\n";
    

    This warning shows up when RegistryLoader fails to translate the URI into the corresponding filesystem path. Most of failures happen when one passes a file path instead of URI. (A reminder: /home/httpd/perl/test.pl is a file path, while /perl/test.pl is an URI). In most cases all you have to do is to pass something that RegistryLoader expects to get - the URI, but there are more complex cases. RegistryLoader's man page shows how to handle these cases as well (watch for the trans() sub).

    [TOC]


    "Apache.pm failed to load!"

    If your server startup fails with:

      Apache.pm failed to load!
    

    error, try adding to httpd.conf:

      PerlModule Apache
    

    directive.

    [TOC]


    Code Parsing and Compilation

    [TOC]


    Value of $x will not stay shared at - line 5

    my() Scoped Variable in Nested Subroutines.

    [TOC]


    Value of $x may be unavailable at - line 5.

    my() Scoped Variable in Nested Subroutines.

    [TOC]


    Can't locate loadable object for module XXX

    There is no object built for this module. e.g. when you see:

      Can't locate loadable object for module Apache::Util in @INC...
    

    make sure to give mod_perl's Makefile.PL PERL_UTIL_API=1, EVERYTHING=1 or DYNAMIC=1 parameters to enable and build all the components of Apache::Util.

    [TOC]


    Can't locate object method "get_handlers"...

      Can't locate object method "get_handlers" via package "Apache"
    

    You need to rebuild your mod_perl with stacked handlers, i.e. PERL_STACKED_HANDLERS=1 or more simply EVERYTHING=1.

    [TOC]


    Missing right bracket at line ...

    Most chances you really have a syntax error. However the other reason might be a script running under Apache::Registry and using <__DATA__> or <__END__> tokens. Learn why

    [TOC]


    Can't load '.../auto/DBI/DBI.so' for module DBI

    Check that all your modules are compiled with the same perl that is being compiled into mod_perl. perl 5.005 and 5.004 are not binary compatible by default.

    Other known causes of this problem:

    OS distributions that ship with a (broken) binary Perl installation.

    The `perl' program and `libperl.a' library are somehow built with different binary compatibility flags.

    The solution to these problems is to rebuild Perl and any extension modules from a fresh source tree. Tip for running Perl's Configure script: use the `-des' flags to accepts defaults and `-D' flag to override certain attributes:

      % ./Configure -des -Dcc=gcc ... && make test && make install
    

    Read Perl's INSTALL doc for more details.

    Solaris OS specific:

    Can't load DBI or similar Error for the IO module or whatever dynamic module mod_perl tries to pull in first. The solution is to re-configure, re-build and re-install Perl and dynamic modules with the following flags when Configure asks for ``additional LD flags'':

      -Xlinker --export-dynamic
    

    or

      -Xlinker -E
    

    This problem is only known to be caused by installing gnu ld under Solaris.

    [TOC]


    Runtime

    [TOC]


    Incorrect line number reporting in error/warn log messages

    See Use of uninitialized value at (eval 80) line 12.

    [TOC]


    rwrite returned -1

    That message happens when the client breaks the connection while your script is trying to write to the client. With Apache 1.3.x, you should only see the rwrite messages if LogLevel is set to debug.

    There was a bug that reported this debug message regardless the value of LogLevel directive. It has been fixed in mod_perl 1.19_01 (CVS version).

    Generally a LogLevel is either debug or info. debug logs everything, info is the next level, which doesn't include debug messages. You shouldn't use a ``debug'' mode on your production server. And as of this moment there is no way to stop users from aborting connections.

    [TOC]


    caught SIGPIPE in process

      [modperl] caught SIGPIPE in process 1234
      [modperl] process 1234 going to Apache::exit with status...
    

    That's the $SIG{PIPE} handler installed by mod_perl/Apache::SIG, called if a connection timesout or Client presses the 'Stop' button. It gives you an opportunity to do cleanups if the script was aborted in the middle of its execution. See Handling the 'User pressed Stop button' case for more info.

    If your mod_perl version < 1.17 you might get the message in the following section...

    [TOC]


    Client hit STOP or Netscape bit it!

      Client hit STOP or Netscape bit it!
      Process 2493 going to Apache::exit with status=-2
    

    You will see this message in mod_perl < 1.17. See caught SIGPIPE in process.

    [TOC]


    Global symbol "$foo" requires explicit package name

    The script below will print a warning like above, moreover it will print the whole script as a part of the warning message:

      #!/usr/bin/perl -w
      use strict;
      print "Content-type: text/html\n\n";
      print "Hello $undefined";
    

    The warning:

      Global symbol "$undefined" requires explicit package name at /usr/apps/foo/cgi/tmp.pl line 4.
              eval 'package Apache::ROOT::perl::tmp_2epl;use Apache qw(exit);sub handler {
      #line 1 /usr/apps/foo/cgi/tmp.pl
      BEGIN {$^W = 1;}#!/usr/bin/perl -w
      use strict;
      print "Content-type: text/html\\n\\n";
      print "Hello $undefined";
      
      
      }
      ;' called at /usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm line 168
              Apache::Registry::compile('package
            Apache::ROOT::perl::tmp_2epl;use Apache qw(exit);sub han...') 
            called at /usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm line 121
              Apache::Registry::handler('Apache=SCALAR(0x205026c0)') called at /usr/apps/foo/cgi/tmp.pl line 4
              eval {...} called at /usr/apps/foo/cgi/tmp.pl line 4
      [Sun Nov 15 15:15:30 1998] [error] Undefined subroutine &Apache::ROOT::perl::tmp_2epl::handler called at /
      usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm line 135.
      
      [Sun Nov 15 15:15:30 1998] [error] Goto undefined subroutine &Apache::Constants::SERVER_ERROR at /usr/apps
      /lib/perl5/site_perl/5.005/aix/Apache/Constants.pm line 23.
    

    The error is simple to fix. When you use the use strict; pragma (and you should...), all variables should be defined before being used.

    The bad thing is that sometimes the whole script (possibly, thousands of lines) is printed to error_log file as a code that the server has tried to eval()uate.

    As Doug answered to this question:

     Looks like you have a $SIG{__DIE__} handler installed (Carp::confess?).
     That's what's expected if so.
    

    It wasn't in my case, but may be yours.

    Bryan Miller said:

     You might wish to try something more terse such as 
     "local $SIG{__WARN__} = \&Carp::cluck;"  The confess method is _very_
     verbose and will tell you more than you might wish to know including
     full source.
    

    [TOC]


    Use of uninitialized value at (eval 80) line 12.

    Your code includes some undefined variable that you have used as if it was already defined and initialized. For example:

      $param = $q->param('test');
      print $param;
    

    vs.

      $param = $q->param('test') || '';
      print $param;
    

    In the second case, $param will always be defined, either $q->param('test') returns some value or undef.

    Also read about finding the Line Number the Error/Warning has been Triggered at.

    [TOC]


    Undefined subroutine &Apache::ROOT::perl::test_2epl::some_function called at

    See Names collisions with Modules and libs.

    [TOC]


    Callback called exit

    Callback called exit is just a generic message when some unrecoverable error occurs inside Perl during perl_call_sv() (which mod_perl uses to invoke all handler subroutines. Such problems seem far less with 5.005_03 than 5.004.

    Sometimes you discover that your server is not responding and its error_log has filled up the remaining space on the file system. When you get to see the contents of the error_log -- it includes millions of lines, like:

      Callback called exit at -e line 33, <HTML> chunk 1.
    

    Why the looping?

    Perl can get *very* confused inside an endless loop in your code, it doesn't mean your code called exit(), but Perl's malloc went haywire and called croak(), but no memory is left to properly report the error, so Perl is stuck in a loop writing that same message to stderr.

    Perl 5.005+ plus is recommended for its improved malloc.c and other features that improve mod_perl and come turned on by default.

    See also Out_of_memory!

    [TOC]


    Out of memory!

    If something goes really wrong with your code, Perl may die with an ``Out of memory!'' message and/or ``Callback called exit''. Common causes of this are never-ending loops, deep recursion, or calling an undefined subroutine. Here's one way to catch the problem: See Perl's INSTALL document for this item:

      =item -DPERL_EMERGENCY_SBRK
    

      If PERL_EMERGENCY_SBRK is defined, running out of memory need not be a
      fatal error: a memory pool can allocated by assigning to the special
      variable $^M.  See perlvar(1) for more details.
    

    If you compile with that option and add 'use Apache::Debug level => 4;' to your PerlScript, it will allocate the $^M emergency pool and the $SIG{__DIE__} handler will call Carp::confess, giving you a stack trace which should reveal where the problem is. See the Apache::Resource module for prevention of spinning httpds.

    Note that perl-5.005+ has PERL_EMERGENCY_SBRK turned on by default.

    The other trick is to have a startup script initialize Carp::confess, like so:

      use Carp ();
      eval { Carp::confess("init") };
    

    this way, when the real problem happens, Carp::confess doesn't eat memory in the emergency pool ($^M).

    [TOC]


    server reached MaxClients setting, consider raising the MaxClients setting

    See Choosing MaxClients.

    [TOC]


    syntax error at /dev/null line 1, near "line arguments:"

      syntax error at /dev/null line 1, near "line arguments:"
      Execution of /dev/null aborted due to compilation errors.
      parse: Undefined error: 0
    

    There is a chance that your /dev/null device is broken. Try:

      % sudo echo > /dev/null
    

    [TOC]


    Shutdown and Restart

    [TOC]


    Evil things might happen when using PerlFreshRestart

    Unfortunately, not all perl modules are robust enough to survive reload, for them, unusual situation. PerlFreshRestart does not much more than:

      while (my($k,$v) = each %INC) {
        delete $INC{$k};
        require $k;
      }
    

    Besides that, it flushes the Apache::Registry cache, and empties any dynamic stacked handlers (e.g. PerlChildInitHandler).

    Lots of SegFaults and other problems were reported by users who have turned PerlFreshRestart On. Most of them have gone away when it was turned off. It doesn't mean that you shouldn't use it, if it works for you. Just be aware of the dragons...

    [TOC]


    Constant subroutine XXX redefined

    That's a mandatory warning inside Perl. It happens only if you modify your script and Apache::Registry reloads it. Perl is warning you that the subroutine(s) were redefined. It is mostly harmless. If you don't like seeing those, just kill -USR2 (graceful restart) apache when you modify your scripts.

    You aren't supposed to see these warnings when you don't modify the code with perl 5.004_05 or 5.005+.and higher. If you still experince a problem with code within a CGI script, moving all the code into a module (or a library) and requiring it should solve the problem.

    [TOC]


    Can't undef active subroutine

      Can't undef active subroutine at
      /usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm line 102.
      Called from package Apache::Registry, filename
      /usr/apps/lib/perl5/site_perl/5.005/aix/Apache/Registry.pm, line 102
    

    This problem is caused when, a client drops the connection while httpd is in the middle of a write, httpd timeout happens, sending a SIGPIPE, and Perl in that child is stuck in the middle of its eval context. This is fixed by the Apache::SIG module which is called by default. This should not happen unless you have code that is messing with $SIG{PIPE}. It's also triggered only when you've changed your script on disk and mod_perl is trying to reload it.

    [TOC]


    [warn] child process 30388 did not exit, sending another SIGHUP

    From mod_perl.pod: With Apache versions 1.3.0 and higher, mod_perl will call the perl_destruct() Perl API function during the child exit phase. This will cause proper execution of END blocks found during server startup along with invoking the DESTROY method on global objects who are still alive. It is possible that this operation may take a long time to finish, causing problems during a restart. If your code does not contain and END blocks or DESTROY methods which need to be run during child server shutdown, this destruction can be avoided by setting the PERL_DESTRUCT_LEVEL environment variable to -1.

    [TOC]


    Windows OS specific notes

    [TOC]


    Apache::DBI

    Apache::DBI causes the server to exit when it starts up, with:

      [Mon Oct 25 15:06:11 1999] file .\main\http_main.c, line 5890,
      assertion "start_mutex" failed
    

    Build mod_perl with PERL_STARTUP_DONE_CHECK set (e.g. insert

      #define PERL_STARTUP_DONE_CHECK 1
    

    at the top of mod_perl.h or add it to the defines in MSVC++ Options dialog).

    Apache loads all Apache modules twice, to make sure the server will successfully restart when asked to. This flag disables all PerlRequire and PerlModule statements on the first load, so they can succeed on the second load. Without that flag, the second load fails.

    [TOC]


    The Writing Apache Modules with Perl and C book can be purchased online from O'Reilly and Amazon.com.
    Your corrections of either technical or grammatical errors are very welcome. You are encouraged to help me to improve this guide. If you have something to contribute please send it directly to me.
    [ Prev | Main Page | Next ]

    Written by Stas Bekman.
    Last Modified at 12/18/1999
    Mod Perl Icon Use of the Camel for Perl is
    a trademark of O'Reilly & Associates,
    and is used by permission.