
Overwhelmed by your Usenet feed? This shell script
manages the C News spool area and keeps it from swamping your
system
By Becca Thomas
Usenet news, depending on whom you talk to, can be a vital
information source or 99 percent information free. What is
certain is that its daily newsfeed can take up megabytes of space
on your system's hard disk. The key to successfully managing the
information glut is to regularly purge news files that have
become outdated.
James Hamilton contributes a script to make life easier for the
C News administrator. When properly configured, his program
allows the built-in expiration system to maintain sufficient
file-system resources automatically, even during burst influxes of
news.
Living on the Edge
Dear Dr. Thomas:
We all love Usenet news. The only problem is that there is so
much of it! Generally, news administrators maintain enough free
space in the news-spool file system to accommodate an average
influx of news. This scheme has two undesirable consequences:
(1) Occasionally, a large quantity of news will arrive in a short
period, exhausting spool space and causing articles to be lost,
and (2) the extra space needed to accommodate an unusually large
news influx is only rarely used. No one likes to lose articles,
but I also abhor expiring articles earlier than necessary only to
maintain a largely unused buffer space.
Static news-spool space management is wasteful and means that
some articles are going to be unavailable to your users due to
premature expiration. This situation prompted the creation of
autoexpire [see Part A
of the Listing], a program that dynamically maintains the spool
for C News and allows longer expiration periods without the risk
of exhausting file-system resources.
The autoexpire program manages the news spool by
monitoring the free space and adjusting the expiration period
that is defined by the ``all'' field in the expiration-control
file (generally named explist) as needed to keep the
news-spool free space within specified limits. When needed,
autoexpire runs the C News
doexpire program to free up file-system
resources (free blocks and inodes). Thus, you don't need
cron to run doexpire,
although continuing to do so causes no harm.
Part B shows a sample ``all''
field entry from the expiration- control file. The expiry period
(third field) contains up to three hyphen-separated subfields: a
retention period, expiration period, and purge period. The
retention period determines how long must pass after the article
arrives before it is a candidate for expiration (here 5 days);
the expiration period establishes how long after arrival before
the article is expired by default (10 days); the purge period
determines how long must pass after arrival before the article
will be expired unconditionally (30 days in this example). Two
hyphen-separated numbers represent the expiration and purge
periods. A single number would be the expiration period.
Although decimal fractions could be specified, our program only
works with whole numbers, which is by far the most common
usage.
I originally wrote autoexpire because I was
finding it cumbersome to maintain a balance between retaining as
many articles as possible and not running out of news spool space
and losing articles. I've been using autoexpire for
more than a year without problem, and I no longer need to monitor
my news-spool space and edit the C News expiration-control file
as news traffic patterns change.
James Hamilton / Toronto, Canada
Explanation
Lines 11-14 define several variables the user can employ to
``tune'' the operation of autoexpire. If
autoexpire can't increases the news-spool free space
beyond the lower limit specified by minFreeSpace
without setting the expiration subfield to a value less than that
defined by minDays, it will send a mail message
warning to the news administrator but will not reduce the
subfield to a smaller value. It's not harmful to have the
expiration period be less than the retention period, but it has
no effect.
The KandMtoUnits() function converts values
specified in kilounits or megaunits to units. For instance, an
argument of 44m (44 megabytes) is converted to
46,137,344 bytes by multiplying 44 by 1,048,576, the number of
units in a megaunit. The ErrorExit() function mails
its string argument to the news administrator and terminates the
script with an error exit status (here 100).
The ChgNewsExp() function changes the expiration
period subfield of the expiration-control file ``all'' entry to
the value specified by its invocation argument. Line 46 searches
the expiration-control file for the ``all'' entry and, if
located, places the fields in shell positional parameters. Lines
52-60 parse the expiry (third) field, accommodating all three
formats using a case statement. The expiration subfield value is
adjusted by the invocation argument value on line 62. The new
value is tested to see if it lies within the bounds defined by
the configuration section. If so, the expiry field of the
``all'' entry is reconstructed using the new expiration subfield.
Several sanity tests check proper operation; otherwise, the news
administrator is notified and the script terminates.
The ``main'' portion of the program follows. First, the
current directory is changed to the news library directory, which
contains the expiration-control file. Next, all free-space
values are converted to bytes by lines 87 and 88 for easy
comparison.
The spacefor script, which is customized for the
target system during C News installation, provides a portable
method to determine if there's sufficient spool space. In
particular, a spacefor filesize archive
command-line invocation returns how many objects of size
filesize can fit in the news-spool
directory. If the spacefor command on line 90
returns a value less than one, no object of size
minBytesFree can fit, which is equivalent to saying
that the free space is less than the value stored in
minBytesFree. Thus, doexpire is called
to free up more space. In the rare event that one invocation of
the expiration program isn't sufficient--say because of a sudden
news influx--another free-space test is performed (line 92), and
if there's still not enough space, the expiration-period subfield
is decremented by one and doexpire rerun. This step
would be an emergency measure, not a routine procedure.
If there's excessive free space and the expiration-control
file wasn't changed today (line 97), the expiration subfield of
the control file is incremented by one (line 98).
Configuration and Installation
Locate autoexpire in the directory used for news
administration scripts defined by the NEWSBIN
environment variable or a general local-command directory, like
/usr/local/bin.
Edit the script's configuration section to set
minFreeSpace, maxFreeSpace,
maxDays, and minDays. The value is the
threshold below which autoexpire will attempt to
reduce the expiration period and run doexpire. The
maxFreeSpace value is the threshold above which
autoexpire will attempt to increase the expiration
period. The free-space values can be expressed in bytes (specify
no suffix), kilobytes (use K or k suffix), or megabytes (M or m
suffix). The minDays and maxDays
variables are both expressed in days.
Make a backup copy of your explist file--for instance,
cp $NEWSCTL/explist $NEWSCTL/ explist.orig,
in case you choose to return to manual news-spool maintenance
in the future.
Change the cron configuration file so
autoexpire rather than doexpire is run
by the news account. It's a good idea to specify
the full path name for autoexpire because
cron often uses a limited search path.
Alternate Implementations
Administering a news system is full of personal choices driven
by the usage patterns of the news system and its configuration.
Consequently, the design choices used for this program are not
ideal for all. Let's discuss two design alternatives and see how
the script can be modified to implement them.
Monitoring Inodes. You may have noticed that
autoexpire only monitors free space, not inodes.
Although it's easy to change the script to examine both, it's
difficult to do so portably because there's no single command (or
command line) identical on all systems that reports the free-inode
count. (The C News spacefor command can only be
used to report the free-block count portably.) Of course,
df reports both free space and inodes, but the
required command-line options and/or output format varies between
implementations and may even be vendor specific for the same Unix
version.
For these reasons, I chose to present a script that monitors
free space portably (by using spacefor) and then to
show how to monitor inodes as a system-specific extension of the
program. To make this change, add the code shown [in Part D], which includes a new
configuration variable, function definition, and main program
replacement. You may need to customize the
GetFreeCounts() function definition to post-process
your df command version, storing the free byte count
in freeBytes and available inode count in
freeInodes, which will be used later by the modified
``main'' program.
Monitoring Individual Newsgroups. The
autoexpire script is designed to manage news space
using the ``all'' entry in the expiration-control file. However,
this approach may not be appropriate for all sites. Finer-grain
control can be obtained by modifying autoexpire as
documented [in Part E].
This version will attempt to decrease the expiration value for
all newsgroups and the ``all'' entry in the expiration-control
file when space is short. However, the expiration value isn't
decremented below the retention value because
doexpire honors the retention value, not the
expiration value, in this situation. This version assumes that
your echo command recognizes the \t
escape sequence to represent a tab. If not, you'll need to
replace the \t by an actual control-I character (for
instance, use control-V control-I in text-entry mode of
vi).
You should periodically do a ``sanity check'' of the values in
the expiration-control file because they could drift so they are
no longer have their original interrelationship. For instance,
say an important newsgroup has expiry field 10-20, but a ``junk''
group 5-7. After running autoexpire several times
to recover disk space after a large news influx, say, the
important newsgroup becomes 10-10 and the junk newsgroup 5-5.
After disk resources have have been sufficient for awhile, the
values may have been incremented--by adding, say, 20 to the
expiration values--to give 10-30 and 5-25, which means the junk
group will be retained almost as long as the important group by
default. One solution would be to use the purge subfield in all
expiry fields. You could also add code to prevent the expiration
(middle) number from increasing above the purge value (right-hand
number), analogous to the code used in Part E to prevent the expiration
period from decreasing below the retention period (left-hand
number of the triple).
Acknowledgments
I wish to thank the following readers for their help with testing
this month's contribution: Kees Hendrikse, Echelon Consultancy,
Enschede, The Netherlands; and Paul Balyoz, Computer Science
Department, Northern Arizona University, Flagstaff, Ariz.
|