Patent issues with card storage

I read from Slashdot that Microsoft is taking legal action against TomTom over a number of alleged patent violations. Three of these are apparently targeted at the Linux kernel in use by TomTom.

Much of the focus of the discussion revolves around the use of FAT and FAT32 storage systems on media cards, and I have to confess this is an issue I hadn’t previously considered very deeply. Microsoft has patents on these filing systems, and in fact it is important to note that Windows does not currently support any filing system that isn’t one of Microsoft’s own, patent entangled file systems.

There are very important, highly anti-competitive consequences to this.

  • In order for media cards to work with Windows – still a practical monopoly offering in the operating system market – these cards have to support a MS entangled filing system.
  • This likely means Microsoft is obtaining a revenue stream for all the licensing on many of these cards and their associated readers.
  • Other operating systems, if they wish to be equally convenient to those using media cards, cameras, media players have no choice but to try and include support for these filing systems, even if they are not technically superior.

I would tend to agree with the observations of many that litigation here seems to be all that remains in a sad absence of innovation, but I would hope that if various legislatures have seen the browser issues as anti-competitive that they will consider issues like these too, especially if Microsoft starts throwing its weight around. It’s not in any consumer interest for a single company to enjoy such dominance on such a wide range of products.


Bruce Perens has written a good insightful analysis of this issue that also neatly encapsulates his well thought out view points on software patents in general.

Importing data from lots of systems with PHP and Regular Expressions

Preamble: I found this article sitting as a draft for several years (2009), so the software release it mentions was years ago! But I figured I’d finally proof read and release the article since the problem it addresses is still a real one.

On Friday I released version 4.1.0 of OPUS which has been in the pipeline for a while. Although the previous version had been working remarkably well, I hadn’t realised just how many bugfixes and new features had accumulated in the last months, and as several Universities had just last week installed the older version, I really needed to get the new version out.

One of the features I introduced in the 4.x family, and which seems to have settled and matured nicely in 4.1 was functionality to import student data, but really, it’s useful for importing diverse CSV formats into a standard base, where the CSV conversion can have more complex problems than just shuffling columns. Our software really uses a thin layer of web services to interface to our student records system, but for the first few versions we imported data from the CSV files that our system could easily export. This is still the solution other universities will have to use unless and until they want to write their own web services layer. Even back when I needed this for internal use, this functionality was a pain, since the University was liable to change the format without notice, and even had slightly different formats for different lists. This made data validation and interpretation a real pain.

So I hit upon the idea of using Regular Expressions (regexps) to validate the data, naturally enough, but then had an idea that would go further. I thought I would define a “standard” CSV format that OPUS would expect, a very simple one. Then I would, for each input system define three regexps.

  1. A match regular expression that would define what each line of data should look like, and also group the bits we wanted to capture in parentheses.
  2. A replacement regular expression that would be used to remap the captured data into a “standard” line.
  3. An exclude regular expression that would define lines to be explicitly exempt from parsing, particularly to deal with header rows, for example.

Since University’s frequently change the content and order of their CSV files and they vary from institution to institution this provides a mechanism to not just validate the input data, but to re-map data from a specified format from any given institution to this “common” format for use within OPUS.

Here is the basic bones of the algorithm.

// This is the pattern OPUS expects at the end of a mapping
$standard_pattern = "/^\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\",\"(.*)\"$/";
while($line = fgets ($fp, 2048))
  $line = trim($line);
  // Valid lines must match the normal pattern
  if(!preg_match($csvmap->pattern, $line))
    array_push($rejected_lines, $line);
    continue; // move on
  // and not be excluded
  if((strlen($csvmap->exclude) && preg_match($csvmap->exclude, $line)))
    array_push($excluded_lines, $line);
    continue; // move on
  // Ok, do the replacement to change to standard format
  $line = preg_replace($csvmap->pattern, trim($csvmap->replacement), $line);
  // Finally extract data from the standard format to an array as if from SRS
  if(!preg_match($standard_pattern, $line, $matches))
    array_push($mismapped_lines, $line);
    continue; // move on

You can now analyse the arrays of bad lines to explore problems. Rejected lines indicate the fundamental regexp might be wrong. Excluded lines should be expected, but not if they contain valid data obviously. Mismapped lines indicate that the conversion (replacement) regexp is likely to be at fault. This functionality has so far, allowed me to easily adapt OPUS to import data from lots of different university systems without fuss. (Although setting up the initial regexps can be a real pain.)

To give an example, I use the following regexp’s for the University of Ulster’s current programme listing.

// Pattern to match
/^Y([0-9]*),(B[0-9]*),"([A-Za-z'\-]*), ([A-Za-z'\-]*)",(.*),(.*),"(.*)"$/
// Pattern to replace with
// Exclusion pattern

This excludes the top, key row of labels, and reorders some of the columns appropriately. This little trick may be of value to you if you have the same problem.

Unusual dates

This Friday, the time from the start of the Unix Epoch will be 1234567890, as noticed by Linux Pro Magazine, and no doubt others.

Mathematics, being in many ways really the study of patterns is always interested in these kind of things. We are now several years on from 19-11-1999 where all the digits in the date are odd. That was the last such date for over a thousand years, the next being 1-1-3111, but then we are enjoying an era of lots of dates where all the digits are even right now, which started on the 2-2-2000 after the last one being 28-8-888 I guess.

Cooperation with higher education in open source projects

For some time now, I have been advocating that universities should create more free and open source solutions to fill the gaps needed within the education sector itself. Every year, thousands of computer science and engineering students in the UK alone rack their brains looking for undergraduate final year projects. Most of these will be isolated standalone demonstrations of skill that will never be seen again or used for any other purpose.

Too rarely do we encourage students to contribute to an existing major software project instead. Most often that will most practically be a FOSS project. There are several reasons why we do not, and probably the most significant is that there is still a large number of staff members in education that haven’t the faintest clue about the FOSS world, and so many students are left unenlightened too. Another problem that is often cited is the difficulty in unentangling the contribution made by the student as an individual. However, with most source control systems it is trivial to extract this information and this could easily be made a requirement of the project.

So I was very interested when a friend referred me to a letter in the times expressing very similar opinions about major public sector projects. I agree completely, there is a major untapped resource out there in staff and students, which would gain real life experience in enterprise projects. As the letter’s author suggests, the culture of pride in software (especially in free software) would help improve quality and education the public in the value of free software solutions.