OPUS and ASET, ten years on

Ten years ago today, I and a few colleagues from Ulster University presented some of our work on on-line Placement Management at the ASET conference in York. At that time our system was simply called the Placement Management System or PMS, and yes of course this led to more than a few comments.

At that stage we had been working on the project for some 5 years, so it’s a useful reminder just how much time I ended up spending on that project.

Now called OPUS that system still exists, was released as Open Source and was and is used by a number of Universities. Though Ulster is developing an alternative system it hasn’t yet subsumed all the functionality in OPUS and I’m back to maintaining the system in a low key way.

I recently fixed some bugs introduced by a well meaning volunteer over two years ago, which felt quite good – while they were low on impact they were irritating in some aspects of usage. In the process I found that our custom framework, written by myself and Gordon Crawford for version 4 of both OPUS and the PDSystem to work with the Smarty Template Engine, is broken with Smarty version 3.

I intend to fix that problem, and do what may be a last release of OPUS, which will bring some improvements in speed, and localisation and internationalisation. Of course the source is still available directly from the version control on the site, so nobody has to wait on me – but I’ve had some recent queries from HEIs in India, so there is still interest in the system and its Debian packaging.

For those wanting the walk down memory lane, and for my own archival purposes, those slides from ten years ago are here: aset-york-pms-2005-09-05.

Garbage collecting sessions in PHP

In PHP, sessions are by default stored in a file in a directory. Sessions can be specifically destroyed from within the code, for example when users logout explicitly, but frequently they do not. As a result session files tend to hang around, and cause the problem of how to clean them up. The standard way is to use PHP’s own garbage collection which is normally enabled out of the box. In this, we define constants that specify the maximum lifetime for the session, and essentially the probability of clean up.

To make things more interesting, Debian, out of the box doesn’t do garbage collection in this way. It has a cron job that regularly erases session files in the defined session directory. But, if like me, and many others, you put your session files in a different directory for each application to avoid clashes on namespaces for two applications running under the same browser from the same server, you have a problem. If you forget Debian’s behaviour the session files will just grow indefinitely. I had forgotten this issue and found over a year’s worth of session files in a directory recently.

Solving this problem is actually quite difficult to do optimally. I mean, I could create a cron job to mirror Debian’s own, but then I’d have to put the maximum lifetime in a cron job somewhere, out of the way, and difficult for the average sys admin I’m working with to find and deal with. (That is, away from the main configuration of the project). Or I could parse this value out of the main configuration. But this leads to another problem. For some users, a 30 minute maximum idle time is acceptable (although in my case where actually a suite of applications are being used as a single gestalt entity that can even be a problem), but for many of my administrator users you need huge idle times, since they are used to logging in first thing, and periodically working at the application through the day.

In the end I settled on changing our framework to make it easy to pass through garbage collection values. This makes an interface to the configuration really easy, but it doesn’t solve the problems of long session times that not all users need, and huge delays in garbage collection.

In my last article I talked about a Munin plugin for OPUS, but when you look at it you’ll see these kind of cliff fall drops, which are caused by the garbage collection finally kicking in and removing sessions where users have not explicitly logged out. Currently, every ten minutes, OPUS runs through its user database and finds users who are allegedly online but have no active session file and then marks them offline. Then it updates the file with the online user count that Munin reads.

I suspect eventually, I will write a more sophisticated script that actually kills sessions depending upon idle time and the user class, which would make for a more accurate picture here. Any brighter ideas gratefully accepted.

My first Munin plugin

Munin is a great, really useful project for monitoring all sorts of things on servers over short and long term periods, and can help identify and even warn of undue server loads. It is also appropriately and poetically named for one of Odinn’s crows (so I suppose I should have written this on a Wednesday).

We’ve been running Munin on one of our production servers at work for quite some time, and it gives us a lot of confidence that, to say the least, the server is running in its comfort zone around the clock. Among other bits and pieces, we run OPUS and the PDSystem on this box, two of our home grown projects that are available to the students. For some time now I’ve considered writing a plugin for OPUS to show logged in users, and I finally did this, albeit the counts are not nearly so reliable as I’d like for two reasons, but I’ll probably discuss that in another post. Anyway, I arranged for OPUS to drop a simple text file which simply contains counts of online users with the syntax

student: 10
admin: 2

and so on, for each of the categories of users. Then I needed a plugin to deal with this. I decided to write it simple shell script, since its portable and I’m not much of a perl fan.


# Munin plugin for OPUS showing online users
# Copyright Colin Turner
# GPL V2+

# Munin plugins, at their simplest, are run either with "config" or
# no parameters (I plan to add auto configuration later).
case $1 in
  # In config mode, we spout out details of the graphs we will have
  # I want one graph, with lots of stacked values. The first one is
  # an AREA, and the others are stacked above them. I also (-l 0)
  # make sure the graph shows everything down to zero.
	cat <<'EOM'
graph_title OPUS online users
graph_args -l 0
graph_vlabel online users
graph_info The number of online users on OPUS is shown.
student.label student
student.min 0
student.draw AREA
staff.label academic
staff.min 0
staff.draw STACK
company.label hr staff
company.min 0
company.draw STACK
supervisor.label supervisor
supervisor.min 0
supervisor.draw STACK
admin.label admin
admin.min 0
admin.draw STACK
root.label root
root.min 0
root.draw STACK
application.label application
application.min 0
application.draw STACK
	exit 0;;

# Now the plugin is being run for data. Bail if the file is unavailable
if [ ! -r /var/lib/opus/online_users ] ; then
     echo Cannot read /var/lib/opus/online_users >&2
     exit -1

# Otherwise, a quick sed converts the default format to what Munin needs
cat /var/lib/opus/online_users | sed -e "s/:/.value/"

The plugin has now been running for several days, and you can see its output here. There are problems with it, but that’s more to do with PHP, Debian and user choice, and I’ll comment on that another time. However, already it gives me a useful feel for a lot of user behaviour.

Writing Munin plugins is easy, and Munin does so much of the hard work of turning your creation into something useful.

Importing data from lots of systems with PHP and Regular Expressions

Preamble: I found this article sitting as a draft for several years (2009), so the software release it mentions was years ago! But I figured I’d finally proof read and release the article since the problem it addresses is still a real one.

On Friday I released version 4.1.0 of OPUS which has been in the pipeline for a while. Although the previous version had been working remarkably well, I hadn’t realised just how many bugfixes and new features had accumulated in the last months, and as several Universities had just last week installed the older version, I really needed to get the new version out.

One of the features I introduced in the 4.x family, and which seems to have settled and matured nicely in 4.1 was functionality to import student data, but really, it’s useful for importing diverse CSV formats into a standard base, where the CSV conversion can have more complex problems than just shuffling columns. Our software really uses a thin layer of web services to interface to our student records system, but for the first few versions we imported data from the CSV files that our system could easily export. This is still the solution other universities will have to use unless and until they want to write their own web services layer. Even back when I needed this for internal use, this functionality was a pain, since the University was liable to change the format without notice, and even had slightly different formats for different lists. This made data validation and interpretation a real pain.

So I hit upon the idea of using Regular Expressions (regexps) to validate the data, naturally enough, but then had an idea that would go further. I thought I would define a “standard” CSV format that OPUS would expect, a very simple one. Then I would, for each input system define three regexps.

  1. A match regular expression that would define what each line of data should look like, and also group the bits we wanted to capture in parentheses.
  2. A replacement regular expression that would be used to remap the captured data into a “standard” line.
  3. An exclude regular expression that would define lines to be explicitly exempt from parsing, particularly to deal with header rows, for example.

Since University’s frequently change the content and order of their CSV files and they vary from institution to institution this provides a mechanism to not just validate the input data, but to re-map data from a specified format from any given institution to this “common” format for use within OPUS.

Here is the basic bones of the algorithm.

// This is the pattern OPUS expects at the end of a mapping
$standard_pattern = "/^\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\"$/";
while($line = fgets ($fp, 2048))
  $line = trim($line);
  // Valid lines must match the normal pattern
  if(!preg_match($csvmap-&gt;pattern, $line))
    array_push($rejected_lines, $line);
    continue; // move on
  // and not be excluded
  if((strlen($csvmap-&gt;exclude) &amp;&amp; preg_match($csvmap-&gt;exclude, $line)))
    array_push($excluded_lines, $line);
    continue; // move on
  // Ok, do the replacement to change to standard format
  $line = preg_replace($csvmap-&gt;pattern, trim($csvmap-&gt;replacement), $line);
  // Finally extract data from the standard format to an array as if from SRS
  if(!preg_match($standard_pattern, $line, $matches))
    array_push($mismapped_lines, $line);
    continue; // move on

You can now analyse the arrays of bad lines to explore problems. Rejected lines indicate the fundamental regexp might be wrong. Excluded lines should be expected, but not if they contain valid data obviously. Mismapped lines indicate that the conversion (replacement) regexp is likely to be at fault. This functionality has so far, allowed me to easily adapt OPUS to import data from lots of different university systems without fuss. (Although setting up the initial regexps can be a real pain.)

To give an example, I use the following regexp’s for the University of Ulster’s current programme listing.

// Pattern to match
/^Y([0-9]*),(B[0-9]*),"([A-Za-z'\-]*), ([A-Za-z'\-]*)",(.*),(.*),"(.*)"$/
// Pattern to replace with
// Exclusion pattern

This excludes the top, key row of labels, and reorders some of the columns appropriately. This little trick may be of value to you if you have the same problem.

Fixing truncated printing with Firefox

A while ago, I discovered that my current main development project OPUS had an odd problem when printing out of a gecko based browser.

It would print the first page, whether in portrait or landscape, and if there was more content, it would be abruptly truncated and the second page would contain merely the footer off the page. I’ve been meaning to try and solve the problem with a print stylesheet for a while and finally did so today. Continue reading “Fixing truncated printing with Firefox”

Drupal Login Problems

So, in order to post that rant about PHP and SimpleXML I had to fix a problem that seems to have spontaneously arisen with Drupal (this content management system).

For some reason it wasn’t persisting login information, at least from firefox (sorry – iceweasel here on my Debian system). It’s interesting to note, reading about the bug, that it has been around for literally months and doesn’t seem to have been nailed.

So, anyway, I’ve installed some beta of Drupal, and yes, it now seems to be fixed… If I could only solve the problem that I can’t “uncollapse” parts of the content now.

UPDATE: OK, this seems to be a problem with firefox version 2, or probably really the CSS file for it. It works with Galeon, or when I tell firefox to fake being IE .

SimpleXML should be called BloodyAckwardXML

Another night of coding in PHP, and I’ve officially decided that SimpleXML utterly irritates me.

I’d already discovered, much to my irritation, that is virtually impossible to handle SimpleXML objects elegantly with the Smarty template engine – but now I discover I can’t even shove them in a PHP session without trouble – when you next visit the site you get stuff like this:

Warning: session_start() [function.session-start]: Node no longer exists

and then more trouble.

As part of a new Web Application Framework I’m working on I wanted to parse XML configuration files one time only, and then cache the results in the session. It looks like I now have to totally redesign my idea :-(. You can see the work in progress at its home page.