A couple of weeks ago I wrote this post outlining the design of an Excel spreadsheet EDC3100 students were asked to use for their first assignment. They’ll be using it to evaluate an ICT-based lesson plan. The assignment is due Tuesday and ~140 have submitted so far. It’s time to develop the code that’s going to help me analyse the student submissions.
The aim is to have a script that will extract each students responses from the spreadsheet they’ve submitted and place those responses into a database. From there the data can be analysed in a number of ways to help improve the efficiency and effectiveness of the marking process; and, explore some different practices (the earlier post has a few random ideas).
The script I’m working on here will need to
- Be given a directory path containing unpacked student assignment submissions.
- Parse the list of submitted files and identity all the spreadsheets
- Exclude those spreadsheets that have already been placed into the database.
Eventually this will need to be configurable.
- For all the new spreadsheets
- Extract the data from the spreadsheet
At this stage, I don’t need to stick the data in a database.
Code that when given a directory will extract the spreadsheet names Match the filename to a student #id. Parse an individual Excel sheet Rubric About What How Evaluation RAT
Mechanism to show the values associated with question number in the sheet.
Look at a literal data structure.
Implement a test sheet
- See which student files will give me problems.
Extract spreadsheet names
This is where the “interesting” naming scheme by the institutional system will make things interesting. The format appears to be
- SURNAME Firstname
Matches the name of the student with the provided case (e.g. “JONES David”)
Appears to be the id for this particular assignment submission.
Is a constant, there for all files.
Is the name of the file the student used on their computer. It appears likely that some students will have been “creative” with the naming schemes. Appears at least one student has a file name something.xlsx.docx
Match the filename to a student id
This is probably going to be the biggest problem area. I need to connect the file to an actual unique student id. The problem is that the filename doesn’t contain a unique id that is associated with the student (e.g. the Moodle user id for the student, or the institutional student number). All it has is the unique id for the submission.
Hence I need to rely on matching the name. This is going to cause problems if there are students with the same name, or students who have changed their name while the semester is under way. Thankfully it appears we don’t currently have that problem.
Test with 299 submitted files
Assignment due this morning – let’s test with the 299 submitted files.
Ahh, issues with people’s names: apostrophe
Apparently 18 errors out of 297 files. Where did the other 2 go?
“Bad” submissions include
- 10 with only 1 file submitted;
All 10 only submitted the checklist. Not the cover sheet or the lesson plan.
- 26 with only 2 files submitted (3 total required)
- 25 – Didn’t submit the lesson plan
- 1 – Didn’t submit the checklist
- 0 – Didn’t submit the coversheet
- 18 files that appear to have the bad xlsx version problem from below.
That implies that some of the people who submitted 3 files, didn’t submit an excel file?
Oh, quite proud in a nerdy, strange way about this
for name in `ls | cut -d_ -f2 | sort | uniq -c | sort -r | grep ' 3 ' | sed -e '1,$s/^.*[0-9] //'` do files=`ls *$name*` echo $files | grep -q ".xls" if [ $? -eq 1 ] then echo "found $name" fi done
I’m assuming there will be files that can’t be read. So what are the problems.
Seem they are all down to Microsoft’s “Composite Document File V2 Format”. These files will open in Excel, but challenge the Perl module I’m using.
Out of the 297 submitted so far, 18 have this problem. Going to leave those for another day.