Category Archives: programming

Workload Allocation Modelling Update – Scalability

I have been doing some more work on my software to handle Academic Workload Modelling, developing a roadmap for two future versions, one being modifications needed to run real allocations for next year without scrapping existing data, and another being code to handle the moderation of exams and coursework (which isn’t really anything to do with workload modelling, there’s some more mission creep going on).

Improvements to Task Handling

Speaking of mission creep I noted in the last article I’d added some code to capture tasks that staff members would be reminded off and could self-certify as complete. I improved this a lot with more rich detail about when tasks were overdue and UI improvements.

I wanted to automate some batch code to send emails from the system periodically. I discovered that using a Django management command provided an elegant way to the batch mode code into the project that could be called with cron through the usual Django manage.py script that it creates to handle its own internal related tasks for the project from the command line.

It was easy to use this framework to add command switches and configuration of verbosity (you might note I haven’t disabled all output at the moment so I can monitor execution at this stage). I have set this up to email folks on a Monday morning with all the tasks, but also on Wednesday and Friday if there are urgent tasks still outstanding (less than a week to deadline).

I’ve been using this functionality live and it has worked very well. I used Django templates to help provide the email bodies, both in HTML and plain text.

Sample Task Reminder Email
Sample Task Reminder Email

Issues of Scale

My early prototype handled data for one academic year, albeit with fields in the schema to try and solve this at a later stage. It also suffered from a problem in that if other Schools wanted to use the system, how would I disaggregate the data both for security and convenience?

In the end I hit upon a solution for both issues, a WorkPackage model that allows a range of dates (usually one academic year) and a collection of Django User Groups to be specified. This allows all manually allocated activities, and module data to be specified with a package and therefore both invisible to other packages (users in other Schools, or in other Academic Years). I was also able to put the constants I’m using to model workload into the Django model, making it easier to tweak year on year.

I’m pretty much ready to use the system for a real allocation now without having to purge the test data I used this this year. I can simply create a new WorkPackage.

I need to write some functionality to allow one package’s allocations to be automatically rolled over to the next as a starting point, but I reckon that’s maybe two or three more hours.

Future Plans for the Application

The next part of planned functionality is an ability to handle coursework and examination and the moderation process. It will be quite a big chunk of new functionality and moving the system again to something quite a bit bigger than just a workload allocation system.

This of course means I need a better Application name, (WAM isn’t so awesome anyway). Suggestions on a post card.

Django Issues

I think I’m getting more to grips with Django all the time – although I often have the nagging feeling I’m writing several lines of code that would be simpler if I had a better feel for its syntax for dealing with QuerySets.

The big problem I hit, again, was issues in migrations. I created and executed migrations on my (SQLite) development system, but when I moved these over to production (MySQL) it barfed spectacularly.

Once again the lack of idempotent execution means you have to work out what part of the migration worked and then tag the migration as “faked” in order to move onto the next. This was sufficient this time, and I didn’t have to write custom migrations like last time, but it’s really not very reassuring.

Further Details

As before, the code is on GitHub, and the development website on foss.ulster.ac.uk, if you want more details.

Manually completing a botched django migration

I wrote a lot of code for my Workload Allocation system on Friday, and had been developing it on the machine with django’s built in lightweight web server, and a (default) sqlite database backend. In production I decided to use a MySQL backend in case sqlite was, well, too lite.

One of the things that is really neat about django, but which also profoundly scares me, is that it handles changes to the database schema automatically. I am used to doing all of this by hand. It has been a pleasant change, but I wondered what would happen if it went wrong.

Which it did on Friday. The migrations had worked perfectly well on the development server and after some testing I decided to roll the code into production, whereupon the migration failed. I’m still not sure why, but something in the django deep magic failed. To make things worse the process is, I have discovered, not idempotent, and trying to run the migration again caused it to fail in new places because some of the database schema changes had been successful; so it was now bailing out with “already exists” kind of errors.

Removing some tables and trying again didn’t quite do the trick. I thought about trying to fix the schema manually, since with the mysql command line tool I could see what fields needed to be added, but upon inspection the restraints added by django were complex and I was unsure how important they were.

So this is my clumsy workaround, that will no doubt come back to haunt me.

I used the following commands from the top of the django app directory to find the name of the migration that was failing, and than used –fake to force django to forget about having to apply it.

I then created a “manual” django migration that added the new fields.

It turns out that getting the dependency right at the top is very important, it needs to be previous migration.

The name of this script is important, follow the naming convention of your most recent failed migration, changing auto to custom and the timestamp appropriately. I discovered that django, would not run this migration. It detected a conflict with the previous migration that should have created the fields and wanted me to try and merge them. That would be pointless since the previous migration failed. I also discovered to my surprise there was no –force command line switch to override this logic, though Google perhaps suggests that previous versions of django allowed this.

So, I used the sqlmigration django command to output the correct SQL that it would produce if this migration did run. Once I got it showing in the shell, I forwarded this to a file.

Finally I used the mysql command line tool

to get access to the database, and then used the following command to import and run the SQL produced above.

And so far so good. I had been getting Server Errors on pages relating to the botched model before and at the moment they seem to be behaving correctly. Hopefully this may help you and not come back to haunt me.

Workload Allocation Monitoring (WAM) Prototype

I decided to start writing a workload allocation monitoring system for Higher Education. I found one written as part of a JISC project at Cambridge, but despite my experience with PHP I found it difficult to set-up, a bit crude (sorry) and hard to maintain. It was clearly very flexible, and I wanted something flexible, simple and clean.

So I decided I’d try writing something quickly using the Python django framework. This is my first web-app written in Python and so I dare say I would do some things differently with more experience, but I have now reached the point where I have a workable prototype that I can start to use myself. I’ve got to say, I found django to be pretty neat.

At its heart is a list of the loads against Academic Staff in a department or school. The idea is to try and increase transparency. There are problems with this approach: some known irregularities of loading can be for confidential reasons; small numbers of staff with key skills can cause issues as well, but it is intended to provide a basis.

Overall loads for staff.
Overall loads for staff.

 

 

 

 

While classically the word semester implies that there are two of them, most Universities operate a three semester system with the third covering the Summer. Unevenness in loading over the Summer is another cause of potential trouble, so the system tries to show loading as spread across semesters. A scaled column accounts for staff who do not have a 100% FTE contribution but their hours are up-scaled for comparison.

Naturally staff will want to see some granularity of these loads and they are broken into individual activities that are allocated to given members of staff.

Breakdown of activities for a staff member.
Breakdown of activities for a staff member.

An individual activity can be specified as occupying a number of hours, or alternatively a percentage of a staff member’s time. It can occupy one or more semesters (in which case it is spread evenly across them). Types can be allocated for activities to help track contributions of different types. It might be that an activity is related to a module or study, or not.

Activities are long term parts of work allocated hours or a percentage of time.
Activities are long term parts of work allocated hours or a percentage of time.

Speaking of modules basic information is stored for these, and another issue I think will help, tracking the submission of exams and coursework through various QA processes.

At a glance the most recent information about the exam and coursework status can be seen.
At a glance the most recent information about the exam and coursework status can be seen.

While activities are considered to be events with long engagements, another issue for staff are tasks that are allocated to them, usually of comparatively short duration. It can be hard to staff to remember all of these tasks, and hard for manager to follow up their completion, especially without annoying staff who have completed them already.

Tasks can be allocated against individual members of staff or groups or both.
Tasks can be allocated against individual members of staff or groups or both.

The web-app will allow tasks to be defined against one person, many people, categories of people and so on.

A list of tasks and their deadlines.
A list of tasks and their deadlines.

 

 

 

 

 

It is possible to easily see which tasks are still open and whether their deadline has come and gone.

The staff required to complete a task are shown, and those that have indicated completion. The system politely nags those still outstanding.
The staff required to complete a task are shown, and those that have indicated completion. The system politely nags those still outstanding.

A look at a given task will show who has completed it and who still needs to.

A given staff member can sign off their own task.
A given staff member can sign off their own task.

 

It is often the case that admin and clerical staff check off colleagues who have responded to a given call, so the system allows for staff with given permissions to indicate someone has having completed the task. Alternatively the member of staff can do this for themselves.

So while it is still a bit rough and ready I’ve reached the point where the system is stable enough for use. Of course the challenge comes when we consider the assumptions to come up with the hours and percentage loading in the first place. So I hope to pick the brains of some colleagues about this and start testing the system.

I’ve yet to make a formal release, but the code is Affero GPL (you can use the code free of restrictions (and charge) but cannot deprive others of the same freedom on derivative works) so feel free to have a look at it.

My roadmap for an initial release can be found on foss.ulster.ac.uk, where I will eventually host the code as well, but at the moment it can be found at GitHub. My previous post detailed how to get the app to work with a central authentication system your University likely has, or something similar.

Yeah… design and CSS is not my strongest skill, more work to be done on that.

Share and enjoy.

Django, CAS authentication and Apache

I am certainly no stranger to Web Development, but I decide to really look at the Python web framework django in some detail last week to write a small web application for Workload Modelling for Academic Staff.

Yes, this is a geeky, programming post.

In doing so I ran into some trouble trying to get CAS authentication to work with the app. I tried using a django-cas client I found, having found no direct CAS support in django. This took a reasonable number of code modifications, in several source files (really only a pain because I would have to maintain both development code and production code on different authentication). However the critical problem was that while I could get authentication into the “userland” parts of the app, I was getting redirect issues with the django generated administration interface.

So, I found a totally different approach. Django does have generic remote user support built-in which I hadn’t initially found. There are some details here. As you can see there are only two lines of code needed to enable this support.

I found this worked without any drama when I used Apache to force the CAS authentication. So the code required (in version 1.8 of django) is simply as follows, in the settings.py file.

The Apache Configuration looks something like this.

You will need to ensure you have Apache’s CAS and wsgi modules installed and enabled too.

I wasted a couple of hours going around the houses on this one, so hopefully it may save you. I will be hosting the project for my modeller on foss.ulster.ac.uk along with the code once I move it from GitHub.

OPUS and ASET, ten years on

Ten years ago today, I and a few colleagues from Ulster University presented some of our work on on-line Placement Management at the ASET conference in York. At that time our system was simply called the Placement Management System or PMS, and yes of course this led to more than a few comments.

At that stage we had been working on the project for some 5 years, so it’s a useful reminder just how much time I ended up spending on that project.

Now called OPUS that system still exists, was released as Open Source and was and is used by a number of Universities. Though Ulster is developing an alternative system it hasn’t yet subsumed all the functionality in OPUS and I’m back to maintaining the system in a low key way.

I recently fixed some bugs introduced by a well meaning volunteer over two years ago, which felt quite good – while they were low on impact they were irritating in some aspects of usage. In the process I found that our custom framework, written by myself and Gordon Crawford for version 4 of both OPUS and the PDSystem to work with the Smarty Template Engine, is broken with Smarty version 3.

I intend to fix that problem, and do what may be a last release of OPUS, which will bring some improvements in speed, and localisation and internationalisation. Of course the source is still available directly from the version control on the site, so nobody has to wait on me – but I’ve had some recent queries from HEIs in India, so there is still interest in the system and its Debian packaging.

For those wanting the walk down memory lane, and for my own archival purposes, those slides from ten years ago are here: aset-york-pms-2005-09-05.

Python script to randomise an m3u playlist

While I’m blogging scripts for playlist manipulation here is one I use in a nightly cron job to shuffle our playlists so that various devices playing from them have some daily variety. All disclaimers apply, it’s rough and ready but WorksForMe (TM).

I have an entry in my crontab like this

which takes a static playlist and produces a nightly shuffled version.

Python script to add a file to a playlist

I have a number of playlists on Gondolin, which is a headless machine. I wanted to be able to easily add a given mp3 file to the playlists which are in m3u format. That means that each entry has both the filename and an extended line with some basic metadata, in particular the track length in seconds, the track artist and name. I wanted a script that could extract this information from the mp3 file and make adding the entry easy. So I wrote this in Python. It’s rough and ready and it is probably not very Pythonic but it’s working for me. The script should create a playlist if it doesn’t currently exist, and check for a newline at the end of the file so that the appended lines are really on a new line. ItWorksForMe (TM).

This uses the eyeD3 Python library, which on Debian is provided in python-eyed3.

My basic usage is

the last parameter is the path relative to which the mp3 filename should be written to. This is useful for me because I rsync the whole tree between machines, as you will see there are options for writing an absolute pathname if you prefer. I should probably rewrite the script to do it relative to the playlist, but that’s another day.

Migration from Savane to Redmine

I am admin for a server at work foss.ulster.ac.uk to host our open source development work. It used to run on GNU Savane, but despite several efforts, that project is clearly dead in the ditch.

So having to change the underlying system, I decided to move to Redmine (you can see some previous discussion here). I’m recording aspects of the migration here mostly for my own sake.

This install was on Debian Squeeze. I first of all installed the relevant package

and followed the prompts for the configuration. The documentation for the Debian install is a little unhelpful about how to actually configure the web server, and while I have good experience with Apache, I have very little with Ruby on Rails.

I installed the Apache Passenger module.

and copied the example config

I then edited the newly created redmine file to look like this:

In my case I wanted Redmine on the web root, so you can see the changes I made.

I then disabled the default config and enabled this:

and restarted Apache

Now you can login, with the default username and password (admin and admin) and change them and start some configuration.

Garbage collecting sessions in PHP

In PHP, sessions are by default stored in a file in a directory. Sessions can be specifically destroyed from within the code, for example when users logout explicitly, but frequently they do not. As a result session files tend to hang around, and cause the problem of how to clean them up. The standard way is to use PHP’s own garbage collection which is normally enabled out of the box. In this, we define constants that specify the maximum lifetime for the session, and essentially the probability of clean up.

To make things more interesting, Debian, out of the box doesn’t do garbage collection in this way. It has a cron job that regularly erases session files in the defined session directory. But, if like me, and many others, you put your session files in a different directory for each application to avoid clashes on namespaces for two applications running under the same browser from the same server, you have a problem. If you forget Debian’s behaviour the session files will just grow indefinitely. I had forgotten this issue and found over a year’s worth of session files in a directory recently.

Solving this problem is actually quite difficult to do optimally. I mean, I could create a cron job to mirror Debian’s own, but then I’d have to put the maximum lifetime in a cron job somewhere, out of the way, and difficult for the average sys admin I’m working with to find and deal with. (That is, away from the main configuration of the project). Or I could parse this value out of the main configuration. But this leads to another problem. For some users, a 30 minute maximum idle time is acceptable (although in my case where actually a suite of applications are being used as a single gestalt entity that can even be a problem), but for many of my administrator users you need huge idle times, since they are used to logging in first thing, and periodically working at the application through the day.

In the end I settled on changing our framework to make it easy to pass through garbage collection values. This makes an interface to the configuration really easy, but it doesn’t solve the problems of long session times that not all users need, and huge delays in garbage collection.

In my last article I talked about a Munin plugin for OPUS, but when you look at it you’ll see these kind of cliff fall drops, which are caused by the garbage collection finally kicking in and removing sessions where users have not explicitly logged out. Currently, every ten minutes, OPUS runs through its user database and finds users who are allegedly online but have no active session file and then marks them offline. Then it updates the file with the online user count that Munin reads.

I suspect eventually, I will write a more sophisticated script that actually kills sessions depending upon idle time and the user class, which would make for a more accurate picture here. Any brighter ideas gratefully accepted.

Importing data from lots of systems with PHP and Regular Expressions

Preamble: I found this article sitting as a draft for several years (2009), so the software release it mentions was years ago! But I figured I’d finally proof read and release the article since the problem it addresses is still a real one.

On Friday I released version 4.1.0 of OPUS which has been in the pipeline for a while. Although the previous version had been working remarkably well, I hadn’t realised just how many bugfixes and new features had accumulated in the last months, and as several Universities had just last week installed the older version, I really needed to get the new version out.

One of the features I introduced in the 4.x family, and which seems to have settled and matured nicely in 4.1 was functionality to import student data, but really, it’s useful for importing diverse CSV formats into a standard base, where the CSV conversion can have more complex problems than just shuffling columns. Our software really uses a thin layer of web services to interface to our student records system, but for the first few versions we imported data from the CSV files that our system could easily export. This is still the solution other universities will have to use unless and until they want to write their own web services layer. Even back when I needed this for internal use, this functionality was a pain, since the University was liable to change the format without notice, and even had slightly different formats for different lists. This made data validation and interpretation a real pain.

So I hit upon the idea of using Regular Expressions (regexps) to validate the data, naturally enough, but then had an idea that would go further. I thought I would define a “standard” CSV format that OPUS would expect, a very simple one. Then I would, for each input system define three regexps.

  1. A match regular expression that would define what each line of data should look like, and also group the bits we wanted to capture in parentheses.
  2. A replacement regular expression that would be used to remap the captured data into a “standard” line.
  3. An exclude regular expression that would define lines to be explicitly exempt from parsing, particularly to deal with header rows, for example.

Since University’s frequently change the content and order of their CSV files and they vary from institution to institution this provides a mechanism to not just validate the input data, but to re-map data from a specified format from any given institution to this “common” format for use within OPUS.

Here is the basic bones of the algorithm.

You can now analyse the arrays of bad lines to explore problems. Rejected lines indicate the fundamental regexp might be wrong. Excluded lines should be expected, but not if they contain valid data obviously. Mismapped lines indicate that the conversion (replacement) regexp is likely to be at fault. This functionality has so far, allowed me to easily adapt OPUS to import data from lots of different university systems without fuss. (Although setting up the initial regexps can be a real pain.)

To give an example, I use the following regexp’s for the University of Ulster’s current programme listing.

This excludes the top, key row of labels, and reorders some of the columns appropriately. This little trick may be of value to you if you have the same problem.