A solution for BIM and special characters?

The main long term issue in the use of BIM has been student’s creating blog posts that contain “special” characters. This is typically done when they create their post in Word and then copy and paste it into their blog. The interaction between BIM, SimplePie and database engines has not been a good one. It results in blog posts either not being stored in the Moodle database or being cut off at the special character.

A couple of days ago I got a report of this type of problem from a course using BIM. Previously, all of the recent problems associated with special characters have been specific to this university’s version of Postgres. I couldn’t re-create the problems with my test Moodle install with MySQL or Postgres. This latest report is different. It causes problems on my install using MySQL.

More interestingly, it also doesn’t cause a problem with Moodle 2’s external blog feature. This is interesting because Moodle 2’s feature is using SimplePie, just like BIM. It appears that Moodle and/or the install of Moodle 2 I have is doing something that addresses the special character problem. The aim is to investigate and identify what this is and see if it can be incorporate into BIM.

In the end, this evolved into a solution for BIM v1 that has been put into the git repository. Now onto BIM v2.

The problem

Some evidence of the problem. First, what the problem post looks like on the student’s blog. Note the little square.

The problem post

This causes the a problem in BIM, once the special character is reached, nothing else is stored. The same post above in BIM. (Click on the image to see a larger version).

The problem post in BIM

And evidence that it is working in Moodle 2. Note: Moodle 2 only stores a sub-set of the post, not the complete content. But you can see that it does include the square and some of what follows.

Problem post in Moodle 2

Is Postgres making the difference?

My Moodle 2 install is using Postgres, so there’s a chance that this is the source of the different outcome. Must test that.

No, it appears that it does fail in Moodle 1.9 + BIM + Postgres. However, it fails differently than for MySQL. As above, MySQL only cuts off everything after the first special character. It still inserts an entry for the post. Moodle 1.9 + BIM + Postgres doesn’t create an entry at all for the post.

So there appears to be a real difference in how Moodle 2 is doing this, what is it?

How does Moodle 2 do it?

So, the aim here is to go through the Moodle 2 code and try and determine what it is doing that is making this work.

Registering an external blog is starts in Moodle 2 with the ~/blog/external_blog_edit.php file. This presents the form to enter the details of the external blog. It also processes the form. It uses a Moodle class that wraps around SimplePie to get the data.

The first step is to look at the data it gets to check if this is where the special character handling occurs?

$rss = new moodle_simplepie($data->url);

print "<xmp>";
print_object( $rss );
print "</xmp>";
die;

Without the die this code simply updates the database and reports back success on a different page without giving a chance to see the dump. Looking at the dump, you can see that SimplePie is getting the complete posts from the feed.

Which is also what happens in Moodle 1.9 + BIM + Postgres. So the question is what is different about the Moodle 2.0 database queries that result in ignoring the special characters?

This is done in the function blog_sync_external_entries. Which as expected, loops through the entries in the feed and inserts them in the database. It does this by creating an object, setting up the fields and using the insert_record. This is essentially the same as BIM. So where’s the difference?

Is it in the insert_record function?

The abstract classs is in ~/lib/dml/moodle_database.php. There are then separate implementations for each database type, including pgsql.

It appears to be using a PHP function pg_query_params to populate the parameters into the SQL statement, possibly handling quoting. Two questions

  • Is this where it happens?
  • Is it used in Moodle 1.9?

The final solution?

This post has been on the go for a few days as real life intrudes on BIM v1.0 work. In the end folk needed a solution for v1 and it looks like I found one and it has been committed into git.

$content = iconv( "ISO-8859-1", "UTF-8//IGNORE", $raw_content );

Which, as I understand it, essentially ensures that all the characters in the content string (content of a blog post) are in the UTF-8 character set. i.e. the character set being used by the database. Doing this ensures that the database doesn’t complain or fail on insert.

The drawback of this solution is that when the content is displayed it shows up (on many?all? browsers with funny characters. The advantage is that it appears to work. Even on the problem posts used above.

Let me know if you have any problems. Am waiting for a local university to try this in production, if it works there, I’m hoping it will work anywhere.

BIM, missing students and Moodle groups

The following is a description of a common “problem” with BIM.

Symptoms

The problem manifests itself along the lines of this:

  • There are students enrolled in the Moodle course you are using BIM for.
  • You’ve set up BIM, allocated student groups to markers.
  • However, some or all of the students aren’t showing up on the “Manage Marking” or “Your students” screens.
  • This might only be for one of the teaching staff, it might be for all.
  • The lead/coordinating teacher can probably find them using “Find student”.

Cause

The “Manage marking” and “Your students” tabs in BIM rely on students being members of Moodle groups. If the students aren’t in the Moodle groups that are allocated to the staff, then the staff can’t easily see them.

Solutions

Some of the potential steps in a solution include:

  • Check that all students are in groups.
    Go to the Moodle “Groups” functionality for your course and make sure that all the students have been allocated to the groups appropriately.

    In an institutional setting this should be done automatically, but if this process is broken you might be having problems.

  • Check that groups have been allocated to staff.
    Go (as the coordinating/lead teacher) into BIM and use the “Allocate markers” tab to allocate the groups to staff as necessary.

More problems with BIM and special characters

The following is a record of some work to investigate some more apparent problem with BIM mirroring blog posts that contain “special” characters due to a bit of copying and pasting from Word into WordPress.

An aside on supporting a tool like BIM

All of my previous software support has been around software that I (or the team I work for) have developed and for which we are also responsible for hosting and supporting the people using it. i.e. we knew exactly what was going on where.

With BIM the development, support and hosting are all separated. I write BIM, but I don’t host of support it. Which means I don’t know what the folk who are doing the hosting/supporting have done (or haven’t done). Which adds all sorts of complexity to the process.

This sort of separation is increasingly common and is often aimed at saving resources/money. But I do wonder whether or not that if viewed from an overall perspective it is adding more cost (in the broadest possible sense i.e. including the hassle and inefficiency caused by the difficulty but which doesn’t typically show up in anyone’s budget bottom line) into the whole task of supporting systems.

The problem

There are at least two, almost invariably related, problems:

  • Special characters that aren’t being translated safely.
  • Some situations where Moodle/BIM on Postgres is not able to insert student posts.

The postgres problem

There appears to a problem with Postgres/Moodle/PHP/BIM falling over when attempting to insert some posts. Maybe because of special characters. The only way I have of testing this at the moment is black box, i.e. re-creating it on a Postgres-based Moodle install. First step is to identify where this is happening and see if something can be done to make it not so catastrophic a failure.

Fail all posts, if one post fails

The error I’m seeing is

bim_process_feed: inserting bim_marking **url here**

Ok, that seems to match this bit of code

if ( ! insert_record( "bim_marking", $safe ) ) {
    mtrace( get_string( 'bim_process_feed_error', 'bim', $entry->link ) );
    return false;
}

Ok, the first thing here is that the “return false” should go. This breaks out of the whole insert process. What should happen is simply move onto the next entry and try to insert that.

Ok, that’s updated and into github.

Why are posts/inserts failing

The above was causing a problem because Postgres would report a failure when blog posts with certain characters were being inserted. The same posts are not causing a problem with MySQL.

Is this a known problem with Postgres? Are there solutions from the Postgres community that might help out here?

Doesn’t seem to be too much via Google, at least with what I was searching for. Guess I turn to exploring more the special characters in the posts that are causing problems.

More special character problems

The process here is to repeat what I did last time, modify my version of BIM to be somewhat explicit about the characters in posts it is inserting, find out which aren’t being handled, and then add some code to handle them.

The list of potentially problematic characters are:

  • a bullet point
    Turns out that this is a middot (search for 183. So, I’ve added a translation for this. This works, however, it does highlight a problem with “bad handling” described in the next section.

Bad handling

At least on MySQL, some of the changes seem to be introducing some rather weird translations. For example, a middot and some white space is translated into similar looking characters but with each surrounded by a single quote. This needs to be fixed.

Ahh, some code I lifted is using ereg_replace, which according to this is not multi-byte safe. Replace it with mb_ereg_replace, and all is good.

This seems to be the major problem.

Will see how it goes when testing with other problem data.

BIM, blog posts and special characters

The following is a summary/explanation of a common problem with BIM and its mirroring of blog posts and a common solution. The problem is generally caused by folk creating their blog posts in Word and then copying and pasting them into the blog post. For various reasons this process brings along some “special” characters which, while they work fine in Word, screw up royally within more constrained textual representations, like those of Web browsers and XML/RSS parsing libraries.

Reported problem

A student has made a post to their blog, the teacher can see it on the student’s blog, but it’s simply not present within BIM. BIM isn’t picking it up.

Diagnosis of the problem

Steps to diagnosing the source of the problem were:

  • Login to the Moodle course site and confirm the problem.
    Yes, student has posted it to his blog, but BIM not picking it up.
  • Register the student blog with a local copy of BIM.
    Ahh, the blog post shows up on my local copy, but only the first dozen or so characters.
  • Look at the feed for the student blog.
    Find the tell-tale signs of special characters exactly where my local copy of BIM cuts off the post.

Okay, BIM currently attempts to handle special characters, obviously it is missing something.

Common solution

This appears likely to be an on-going problem, so am going to leave a bit of commented code in place that I use to implement this “solution”. The “solution” is basically get BIM to print out each individual character in a blog post along with its ASCII value. Use this ASCII value to modify the bim_clean_content function to remove the offending special character.

The code that implements this character by character display looks like this

# KLUDGE: simple test to find out which special characters are
#  causing problems
$contenta = str_split( $content);
print "<h1> $title </h1>";
foreach ( $contenta as $char ) {
       echo "$char .. " . ord( $char ) . "<br />";
}

For this particular problem the offending character is 189. So add the following to the function bim_clean_content. It appears that character 189 is some sort of dash.

$post = ereg_replace( chr(189), "-", $post );

Re-register a student with the same blog and 189 has been replaced. Remove the kludge and it all appears to be registered correctly.

CQU problem with BIM and RSS feeds

This is the first post in a new tradition. Any problems folk report with BIM and the subsequent diagnosis and solution I undertake will get reported here on the blog and hopefully mirrored in some way onto the BIM github page.

It is somewhat ironic that the first problem in this tradition comes from the institution I finished working at last Friday.

A problem with cron?

The problem becomes apparent during the BIM cron process, the error BIM is reporting is

Error getting url for feed here

Locating the problem

This appears to be a problem with the BIM mirror process (the process by which BIM checks the feed for each registered blog feed to see if there are new posts). There’s some problem retrieving the feed.

Exactly where is the problem?

$ cd moodle/mod/bim/lib
$ grep "Error getting" *
bim_rss.php:        mtrace( "Error getting $student_feed->feedurl" );

If I go look in bim_rss.php, this occurs in the function bim_process_feed. This function is called for each registered blog feed and is meant to retrieve the feed, check it for new posts, and if detected, handle them. It does this using the Simplepie library.

// get the RSS file 
$feed = new SimplePie();
$feed->set_feed_url( $student_feed->feedurl );
$feed->enable_cache( true );
$feed->set_cache_location( $dir );
$feed->init();

if ( $feed->error() ) {
    mtrace( "Error getting $student_feed->feedurl" );
    return false;
}

This function is not only called during cron. It is also called every time a student visits the BIM activity. In this situation, bim_process_feed is called only for the logging in student. This makes sure that bim always shows the student the most up to date information about their feed.

Diagnosing the problem

I’m guessing that the problem is one or more of the following, all associated with the failure of Simplepie to get the feed:

  • There’s a problem with a proxy setting and the institutional network configuration preventing access to the feed url.
  • There’s a problem with file permissions or similar on the institutional server preventing Simplepie writing to the cache directory.
  • The student’s blog doesn’t exist any more.

The problem

I have to admit, I was immediately thinking of some of the more complicated reasons for the problem. Then I remembered the last one above and decided I’d better type in the URL for the student’s blog and see if I could access it. The following is what I found.

Deleted blog

So it appears that the student has removed the blog, this would seem to indicate that the student needs to re-register the blog or perhaps has simply moved on.

Potential Solutions

Is up to the institutional folk, the options I can see are:

  • Ignore it.
  • Find out what course/BIM activity the blog was registered for
    Look for the feedurl in the bim_student_feeds table.
  • Check the situation with the academic in charge of the course and perhaps do one of the following
    • Turn mirroring off for the BIM activity, if the course is finished.
    • Ask the teaching staff to follow up with the student.
    • Remove or modify the feed registration.
      Check the table mdl_bim_student_feeds