Finishing tweaks to Moodle book search block

A previous post recorded some early exploration of what tweaks might be necessary to be made to the Moodle book search block. The original code for the block and the tweaks I’ve made are available via GitHub.

Current status

This is something I started before Xmas. Coming back to it in 2016, I’ve finished off item #1 of the todo list below. What I’ve done is described in more detail below.

In essence, I’ve replaced the old Book search method with the method used in the forum search mechanism.  This

  1. Removes the SQL injection problem;
  2. Improves support for standard search approaches (e.g. use of double quotes); and
  3. Slight changes the default treatment of title and content.
    i.e. Old search mechanism returned a match only if all parts of the search string appeared in either the title of the book chapter OR the content of the book chapter. (A book chapter equates more closely to a page in the book, than a collection of pages).   e.g. a search for copyright creative would only find pages where both words appear in either the title or the content.The new search mechanism returns a match if all parts of the search string are find in the title or the content. e.g. a search for copyright creative would find pages that had copyright in the title and creative in the content; copyright in the content and creative in the title; both copyright and content in the title; and, both copyright and content in the content.

Giving back to github

Bugger, didn’t have these changes managed via git.  Stick it back in my repository for this block and create a pull request for the original.

 Original to do list

The to do list I’m working from includes:

  1. Remove the sql injection problem;  DONE.
  2. Improve the search results format;  DELAYED
    e.g. as illustrated in this image.
  3. Provide a bit more scaffolding about how to use the search mechanism (e.g. use of ” + and – etc) DELAYED
  4. Provide an advanced search form/mechanism; DELAYED
    e.g. as shown in this image which is a modification of Forum search interface.

Remove the sql injection problem

As of yesterday, an initial modification had been made to the block to adopt the approach used by the Forum search block.  This needs to be further tweaked, tested and improved.  Steps include:

  1. Install a vanilla version of the Book search block for testing
  2. Move the “get identifiable” books into a function.
  3. Finalise and test the move to the “Forum search” approach.
  4. Explore what other changes might be possible

Vanilla book search for testing

Clone a version of the block straight from the original, stick it in a v_search_books directory and update the code to use this slightly different name (v = vanilla)

Can it install?  Yes.

Can I add it as a block?  Yes

Does it work? Not yet

  • Change the hard-coded URL to put to new location
  • The language strings aren’t working. Why? Need to rename the language file.

Working and it appears that they are producing the same output.  My tweaks yesterday were better than I thought.

Relocate “get identifiable” books

This is basically a cosmetic/personal preference change.

Finish the move to the “Forum search” approach

Parameters && get_in_or_equal – this is done and working.

The old search block when searching for “copyright creative” generates the following (partial) SQL

(
( bc.title ILIKE ‘%copyright%’ AND bc.title ILIKE ‘%creative%’ ) OR
( bc.content ILIKE ‘%copyright%’ AND bc.content ILIKE ‘%creative%’ )
)

The forum search method (that I’ve adapted for searching books) for the same string generates the following SQL  (I’ve manually replaced the parameters with the actual string)

OLD    ( TITLE = A && TITLE = B ) or ( CONTENT = A && CONTENT = B )

NEW ( TITLE = A or CONTENT = A ) AND ( TITLE = B or CONTENT = B ) — possibly more inclusive and better

(
 (bc.title ILIKE "%copyright%" ESCAPE E'\\') OR (bc.content ILIKE '%copyright%' ESCAPE E'\\')
 ) AND
 (
 (bc.title ILIKE '%creative%' ESCAPE E'\\') OR (bc.content '%creative' ESCAPE E'\\')
 )

Support for – and +: the existing block supports the following searches

  • “copyright creative” – search for whole strings. Old doesn’t support it. New does. NEW is better
    • old - (( bc.title ILIKE '%"copyright%' AND bc.title ILIKE '%creative"%' ) OR ( bc.content ILIKE '%"copyright%' AND bc.content ILIKE '%creative"%' ) )

      This is actually including the double quotes.

    • new – ((bc.title ILIKE ‘%copyright creative%’ ESCAPE E’\\’)
      OR (bc.content ILIKE ‘%copyright creative%’ ESCAPE E’\\’))This is what is expected from a normal search.
  • copyright +creative – to ensure that it’s a word – new is using “proper” Postgresql approach. Old is using a regular expression kludge. NEW is better
    • old

       (( bc.title ILIKE '%copyright%' AND 
             bc.title ~* '(^|[^a-zA-Z0-9])creative([^a-zA-Z0-9]|$)' ) 
      OR ( bc.content ILIKE '%copyright%' AND 
          bc.content ~* '(^|[^a-zA-Z0-9])creative([^a-zA-Z0-9]|$)' ) )
    • new
      
      ((bc.title ILIKE "%copyright%" ESCAPE E'\\') OR 
           (bc.content ILIKE "%copyright%" ESCAPE E'\\')) 
      AND ((bc.title ~* "[[:<:]]creative[[:>:]]") OR 
               (bc.content ~* "[[:<:]]creative[[:>:]]" ))
  • copyright -creative – has copyright but not creative  – largely the same.
    • old
      (( bc.title ILIKE '%copyright%' AND 
          bc.title !~* '(^|[^a-zA-Z0-9])creative([^a-zA-Z0-9]|$)' ) 
      OR ( bc.content ILIKE '%copyright%' AND 
          bc.content !~* '(^|[^a-zA-Z0-9])creative([^a-zA-Z0-9]|$)' ) )
    • new
      ((bc.title ILIKE "%copyright%" ESCAPE E'\\') OR 
          (bc.content ILIKE "%copyright%" ESCAPE E'\\')) 
      AND (NOT ((bc.title ILIKE "%creative%" ESCAPE E'\\') OR 
           (bc.content ILIKE "%creative%" ESCAPE E'\\')))  
      

 

Explore other changes

 

 

Improve results format

This particular task includes the following sub-tasks

  1. Rewrite the interface using a renderer
    I haven’t used the rendered approach and no bugger all about it.  Might be too much work for now.
  2. Improve the interface.

 

Improve the interface

The current search results look like this

Book search (existing)

The earlier mock up I produced looks like this

004_results
Some possible improvements include:

  • Nest book and chapter titles
    As shown above, the current search interface repeats the name of the book “Copyright and what you can use” for each chapter.  A different interface might be to next book, chapter, and sub-chapters.
  • Include the module/topic name in the hierarchy
    Book’s typically fit within a module/topic. Including that in the search response would likely help the user orient themselves to where the discovered books reside on the broader site.
  • Show some of the matching content.
    Provide a snippet of the content matching the search for each chapter. In much the same way that Google does.

Advanced search form

Tweaking Moodle book search

A couple of weeks ago I gave a presentation showing off some work from the Moodle open Book project. The middle of the presentation was a live demonstration of the Moodle Book and various features. At one point in the presentation members of the audience (including a number of academics who used the Book module in their Moodle sites) gave an audible gasp. This occurred when I showed off the search block for the Book module. A tool that allows the user to search the contents of all the books in a Moodle course. The gasp indicated just how much teachers and students desire this feature. A feature I’ve been calling out for quite some time.

The GitHub repository has been around for 2 years. So why isn’t this block more widely available in Moodle sites? There’s a security flaw in the code and is somewhat unfinished.

The Moodle SQL injection page outlines at least two broad steps that Moodle code should take to prevent a nefarious person from gaining inappropriate access. These are

  1. Using appropriate parameter cleaning mechanisms on data coming from outside of Moodle (e.g. search terms entered into a form).
  2. Using provided Moodle functions to retrieve data from the database (e.g. search the database for content in a Moodle book)

The Moodle book search block currently meets #1, but fails #2.

The following aims to explore and hopefully remedy this problem.

Current status is that some initial changes have been made to a local version of the block that borrows lessons from the forum search.  Need to spend a bit more time on this, but it’s on the way.

The form

The form for the block currently passes the following information.

Added by the block code

  • courseid; and
  • page.

The user is able to enter data into: bsquery

The processing

Apart from standard processing the main searching is done in a function named search which

  • Deals with some apparent differences between flavours of SQL between databases.
    Seeming to point to a problem in how it’s engaging with databases. DOES FORUM SEARCH DO THIS
  • Focuses attention on books the user is allowed to read.
  • Generates strings containing SQL statement
    See below for the format.

    • Supports + and –
  • Uses get_records_sql to retrieve any matches
    Not using placeholders, which is a problem.

A search for “copyright” generates a SQL statement similar to the following

SELECT DISTINCT bc.* FROM mdl_book_chapters bc, mdl_book b 
        WHERE b.course = 12 AND b.id IN (..long list of book ids) 
        AND bc.bookid = b.id AND bc.hidden = 0 AND 
        (( bc.title ILIKE '%copyright%' ) OR 
         ( bc.content ILIKE '%copyright%' ) ) 
        ORDER BY bc.bookid, bc.pagenum

Each word added to the search phrase adds options to the “ilikes”

( bc.title ILIKE ‘%copyright%’ AND bc.title ILIKE ‘%creative%’
AND bc.title ILIKE ‘%commons%’ )

Questions include

  • Can get_records_sql be replaced with get_records
  • Can the database dependency be removed
  • Can placeholders be used

Comparison with forum search

The forum search is an accepted part of Moodle.  Checking how it works might provide some inspiration to copy.

Has a slight be compartmentalised structure

   $forums = forum_get_readable_forums($USER-&gt;id, $courseid);

Has to deal with a lot more complexity than the book search.

Uses $DB->get_in_or_equal

Makes use of a method search_generate_SQL to do as the name suggests.  This is something that should be worked into the Book search block.

Initial testing and that is working.  Dig a bit and it makes sense.  Also seems to tidy up the code a fair bit.

 

Book github tool: producing some HTML5

Following on from the work late last week and the lovely feedback provided by @rolley it’s time to convert some plans into action.

The aim is to modify the under-development Moodle Book github tool so that when it concatenates the chapters from a Moodle Book resource into a single HTML file, the HTML file is structured HTML 5 semantic elements.

An initial version is apparently working. Though it still needs some checking and likely tweaking.

You can view the output on GitHub (at least you can at the moment) as raw HTML.

Side benefits

As part of my testing of the code, I ran the HTML produced by the tool through this free online HTML 5 outliner. The outliner parses the HTML and its semantic elements and constructs an outline of the content. In much the same way that the Moodle Book module currently does via other means.

Here’s what the Moodle Book interface looks like for my test “book”.
Moodle book ToC

It shows the Moodle Book design feature in terms of hierarchy. The Moodle Book only has two levels: chapter and sub-chapter. Won’t do three levels within a single book.

The image below is a screen capture of the output generated by the HTML 5 outliner on the single HTML file produced by the Moodle Book github tool.

Auto outline

There are more than two levels here.

The Untitled Section

First, there’s the “Untitled Section”.  I need to identify where that’s coming from something wrong with:

  1. the HTML 5 my code is producing; or,
  2. the outliner tool.

I tried a few variations to address #1, didn’t work. May need to find another auto-outliner of HTML5.

More levels

HTML 5 outlining doesn’t stop at the use of the section and article elements I’ve added in.  It also makes use of the heading elements and other parts of the HTML.

e.g. the first page in this book (titled “The perils of copyright”) includes a H3 title for a section of that page. The section is titled “How do you ask?”.  Which is the second part of the outline produces by the auto-outliner.

The next heading “It can be embarrassing if you don’t” is a sub-chapter in the Moodle book.  The trouble here is that both are showing up at the same level.  Not quite the behaviour the best fits.

However, there is some potential in this auto generation of the outline. Might be useful for other purposes

Need to

  1. Test out what’s working and what’s not here.
  2. Explore a bit more with what other tools produce HTML5 with semantic elements.
  3. Update the Moodle book github tool so it imports this content.