Category Archives: Education

Topics related to the education sector (usually the tertiary or Higher Education sector).

Mr Grammarly writes a novel

Mr Grammarly, the Grammarly family parrot (a New Zealand Kea). Renowned for its literary abilities and loathing of the passive voice.

Grammarly is a web-based service to support writers. I use it a lot but worry that Grammarly will homogenise the literary voice until we all sound like the Grammarly family’s pet—a parrot named “Mr Grammarly”. 

Grammarly provides advice on correcting punctuation, word use, and reducing the use of the passive voice (a challenge for anyone taught academic writing before 2005). It can also score clarity, engagement, and tone of delivery.

I find it incredibly useful, and I recommend it to my graduate students and staff. When it works, it’s fabulous, and as a native English speaker, the probability of horrible failure is low. I am not obliged to take all of Grammarly’s suggestions, and I have enough of a sense of the language to know when I can break a rule or when Grammarly is wrong. Non-native English speakers may not have the same advantage and need to work harder to make those decisions. Is the suggestion for a change in word good or bad, is the rephrased sentence clearer?

I became so sick of reading poorly written student drafts with basic spelling and grammatical mistakes that I began telling my students if they had not checked the text against Grammarly, I was not interested in reading it. And then, I started to receive drafts with bizarre word choices and ill-phrased sentences. I ran the drafts through Grammarly, and they came through with no suggested corrections.

Lesson number one, use Grammarly but use it with care.

I still had had this nagging concern about the homogenisation of the voice, and I decided to test Grammarly against great literature. My guess (let’s call it a hypothesis) was that Grammarly would reduce poetry to blancmange. As a well trained dust-bowl empiricist, I decided to test it. 

I cut-and-pasted the first page of three novels into Grammarly.

F. Scott Fitzgerald‘s, The Great Gatsby received an overall score of 86. There were six hard to read sentences, one suggested rephrasing, and a handful of suggested corrections. The “hard to read sentences” were the most significant challenge because (in the absence of a suggested rephrasing) I needed to keep Fitzgerald’s voice but rewrite. It was easier than I anticipated. Most of the “hard to read sentences” are “hard to read” because they are long—a series of full-stop separable clauses that Fitzgerald separated with semicolons. Grammarly and I could get Fitzgerald up to an overall score of 99, and the literary world rejoiced.

Ernest Hemingway‘s, The Old Man and the Sea received a very creditable overall score of 92. I thought that his short, terse sentences would give him an edge over Fitzgerald, and I was right. His use of commas, however, needed work. By accepting every change and a minimal loss of poetry (those island boys needed to learn to speak better Grammarly English), I could bring Hemingway up to a perfect score.

Finally, I Grammarly checked Douglas Stuart‘s Shuggie BainBooker Prize-winning novel for 2020. Straight out of the gate, he had an overall score of 99. It was the phrase “leaving him with the thankless task of running his deli counter and her rotisserie stand all alone” that denied him a perfect score. I didn’t think I could do better—sorry, Mr Stuart. If only Fitzgerald and Hemingway had Grammarly!

Grammarly does have an in-built preference for a particular style of punctuation, the active voice, and short sentences. These three preferences make sense. Grammarly supports readability, and literature is not necessarily about readability. Ask James Joyce! Short sentences are cognitively more straightforward than are long sentences with embedded clauses. The active voice makes it more transparent who did what to whom. Consistent, rule-based punctuation also reduces the cognitive load.

Nonetheless, beyond the use of active sentences and a preference for short sentences, Grammarly is remarkably good at leaving the authorial voice untouched. That was lesson two. We were not all going to sound like the family parrot.

You will be pleased to know that this 666-word piece has a perfect score. I wrote it clearly, the delivery was “just right”, and you found it engaging. I hope the Man Booker Committee will appreciate my 2022 novel written in short, active, well-punctuated sentences.

Advice to junior academics, “Protect your CV!”

Board Teaching School University Research [CC0 Public Domain;]

Some years ago I was asked to give a staff seminar on “Developing an Academic Career.” The request came at the same time that a number of my colleagues were under significant pressure to undertake more teaching. I suggested in the seminar that the best way to advance an academic career was to allocate a minimum (but sufficient) effort to the teaching, and allocate the greatest effort towards research. The advice was stated in terms of protect one’s CV; i.e., allocate effort to those activities that are truly rewarded within the academic world.

Since giving that seminar, I have been approached by numerous colleagues thanking me for the frank and contrary advice. I know a number of them have since been promoted and attribute that success, in some small part, to pushing back against demands for more teaching, and focusing effort on developing research.

A standard, junior academic appointment is based on a mix of research, teaching and administration. The mix is usually something like 40% Research, 40% Teaching, and 20% Administration. In a rational world this would mean allocating an appropriate amount of time to each kind of task. And in this rational world, promotion and recognition would follow accordingly.

Before I go any further, I’d like you to complete a small exercise. Name half a dozen academics who are world renowned for their teaching. You know the kind of person I mean. She is an academically sound teacher at the top of her game, who can develop curricula, and hold small groups or crowded lecture theatres in the palm of her hand. She is as comfortable in a flipped classroom as she is in a tutorial or a problem-based learning session. This person is not simply a world-class educationalist, her peers, globally, recognise her as such.

Maybe you can name one or even two of these teachers. I can’t name any. Zero. And I suspect that is true for most academics. We all know great teachers. In every university department there is one or two of them who really connect with their students. But they are unknown outside a relatively small circle of staff and students. Here in lies the problem.

World class universities do not set out to hire world class teachers, because there is no such thing. They want to hire adequate teachers, who are world class researchers.  We know who the world class researchers are because there are well recognised (though admittedly flawed) metrics for evaluating this. If you want to develop a strong academic career, weight your effort towards the research and the accepted metrics of success.

I have watched worthy colleagues become suckers to an indifferent departmental system that needs someone (pretty much anyone) in a classroom. They are beseeched, cajoled and bullied to do more teaching than they should because, so the argument goes, it helps out the department. It shows what a great team player they are and will undoubtedly be recognised and rewarded at some future time in some unspecified way. DO NOT BELIEVE IT!

You should absolutely be a team player, and do your fair share of teaching. You should also appreciate that teaching can be intrinsically rewarding and is an important part of academic life. But universities are flawed organisations that do not have good mechanisms for rewarding and promoting on the basis of teaching performance. Doing more teaching is not rewarded, and your nobility in teaching more to allow others to pursue a full academic career is likely to be a source of later regret.

Join the Q: Chasing journal indicators of academic performance

Universities are predisposed to rank each other (and be ranked) by performance, including research performance. Rankings are not merely about quality, they are about perception. And perception translates nominal prestige into cash through student fees, block grants from government, as well as research income. As a consequence, there is a danger that universities may chase indicators of prestige rather than thinking about the underlying data that informs the indicator, and what the underlying data might mean for understanding and improving performance.

This image comes from an article published in The Conversation under a creative commons license. see

Ranking has become so crucial in the life of universities that it infuses the brickwork and is absorbed by us each time we brush along the walls. At one time, when evaluating research performance, an essential metric was the number of publications. That calculus has shifted and it is no longer enough to publish. Now we have to publish in Q1 journals; i.e., journals ranked by impact factor in the top 25%. Those ranked in the next 26th to 50th percentile are Q2, and so forth. Unfortunately, Q-ranking encourages indicator chasing. It has a level of arbitrariness that discourages thoughtful choices about where to publish, and leads to such unhelpful advice as, “publish in more Q1 journals”.

The Q-ranking game was brought home to me in a recent discussion among colleagues in medical education research about where they should publish. In this discussion BMC Medical Education was identified as a poorly ranked journal (Q3) that should not be considered.

I like and entirely approve of publishing research in the very best journals that one can, and encouraging staff to publish in high-quality journals is a good thing. “Best” and “high quality”, however, is not just about the impact factor and the Q-ranking of a journal. The best journal for an article is the journal that can create the greatest impact from the work, in the right area, be that in research, policy, or practice. A personally, highly cited article in a low impact factor journal may be better than a poorly cited paper in high impact factor journal.

Some years ago I was invited by a government research council to review the performance of a university’s Health Policy Unit. One of my fellow panel members was very focused on the poor ranking of most of the journals into which this unit was publishing. The director of the unit tried to defend the record. She argued that it was more important that the publications were policy-relevant than that they were published in a prestigious journal. The argument was cut down by the research council representative. From the representative’s point of view, the government had to allocate funds, and the journal ranking was an important mechanism for evaluating the return on investment.

I did a quick back of the envelope calculation. It was true, the unit had published in some pretty ordinary journals — not an article in The Lancet among them. However, if one treated the collection of papers published by the unit as if the unit was a stand-alone journal, the impact factor exceeded PLoS Medicine, a highly regarded Q1 journal. My argument softened the opposition to refunding the unit, but it did not completely deal with it because the research council didn’t care about the individual papers. They wanted prestige, and Q-ranking marked prestige.

So, which medical education journal should you publish in? The advice was blunt. The university uses a Thomson Reuters product, Journal Citation Reports (JCR) to determine the Q-ranking of journals. BMC Medical Education ranks quite poorly — Q3 — so don’t publish there. The ranking in this case, however, was based on journals bundled into a comparison pool that JCR calls “Education, Scientific Disciplines”. This comparison pool includes such probably excellent (and completely irrelevant) journals as Physical Review Special Topics-Physics Education Research and Studies in Science Education. However, if one adopts the “Social Sciences, General” pool of comparison journals, which JCR also reported, BMC Medical Education jumps from a Q3 to a Q1 journal. And this raises the obvious question, what is the true ranking of BMC Medical Education?

The advice about where to publish explicitly dismissed an alternative source for the Q-ranking of journals, Scimago Journal Ranking (SJR), because it was too generous — with the implication that “generous” meant “not as rigorous”. In fact, it appears that the difference between SJR and JCR is about the pool of journals used for the comparison. Both SJR and JCR treat the pool of journals against which the chosen journal should be compared as a relatively static one. But it is not. The pool against which the journal should be compared (assuming one should do this at all) is dependent on the kind of research being reported and the audience. Consider potential journals for publishing a biomedical imaging paper. The Q-ranking pool could be (1) general medical journals, (2) journals dealing with medical imaging, (3) radiology journals, or (4) radiography journals, (5) some more refined subset of journals. As with BMC Medical Education, the Q-rank of prospective journals could be quite different in each pool.

One might reasonably, and rhetorically ask, did the value of the science or the quality of the work change because the comparison pool changed? This leads to a small thought experiment. Imagine a world in which every journal below Q1 suddenly disappeared. The quality of the remaining journals has not changed, but three-quarters of them are suddenly Q2 and below. (As an aside, this is reminiscent of the observation that half of all doctors are below average).

If a researcher works in a single discipline, learning which are the preferred journals to publish in becomes second nature. If one work across disciplines, then the question is not as clear. The question is no longer, “which are the highest ranked journals?”, but “which are the highest ranked journals given this article, and these possible disciplinary choices?”. If the question is, “which journal will have the most significant impact of the type that I seek?”, then Q-ranking is only relevant if the outcome sought is to publish in a Q1 journal from a particular comparison pool. If one seeks some other kind of impact, like policy relevance or change in practice, then the Q-rank may be of no value.

Indicators of publishing quality should not drive strategy. Strategy should be inspired by a vision of excellence and institutional purpose. If you want an example of how chasing indicators can have a severe and negative impact, have a look at this (Q1!!!!) paper.


Prevalence of sexual assault at Australian Universities is … non-zero.

A few days ago the  Australian Human Rights Commission (AHRC) launched Change the course, a national report on sexual assault and sexual harassment at Australian universities lead by Commissioner Kate Jenkins. Sexual assault and sexual harassment are important social and criminal issues, and the AHRC report is misleading and unworthy of the gravity of the subject matter.

It is statistical case-study in “how not to.”

The report was released to much fanfare, receiving national media coverage including TV and newspapers, and a quick response from universities. “At a glance …” the report highlights among other things:

  • 30,000+ students responded to the survey — remember this number, because (too) much is made of it.
  • 21% of students were sexually harassed in a university setting.
  • 1.6% of students were sexually assaulted in a university setting.
  • 94% of sexually harassed and 87% of sexually assaulted students did not report the incidents.

From a reading of the survey’s methodology, any estimates of sexual harassment/assault should be taken with a shovel-full of salt and should generate no response other than that of the University Of Queensland’s Vice-Chancellor, Peter Høj‘s, that any number greater than zero is unacceptable. What we did not have before the publication of the report was a reasonable estimate of the magnitude of the problem and, notwithstanding the media hype, we still don’t.  The AHRC’s research methodology was weak, and it looks like they knew the methodology was weak when they embarked on the venture.

Where does the weakness lie?  The response rate!!!

A sample of 319,252 students was invited to participate in the survey.  It was estimated at the design stage that between 10 and 15% of students would respond (i.e., 85-90% would not respond) (p.225 of the report).  STOP NOW … READ NO FURTHER.  Why would anyone try to estimate prevalence using a strategy like this?  Go back to the drawing board.  Find a way of obtaining a smaller, representative sample, of people who will respond to the questionnaire.

Giant samples with poor response rates are useless.  They are a great way for market research companies to make money, but they do not advance knowledge in any meaningful way, and they are no basis for formulating policy. The classic example of a large sample with a poor response rate misleading researchers was the Literary Digest poll to predict the outcome of the 1936 US presidential election.  They sent out 10 Million surveys and received 2.3 Million responses.  By any measure, 2.3 Million responses to a survey is an impressive number.  Unfortunately for the Literary Digest, there were systematic differences between responders and non-responders.  The Literary Digest predicted that Alf Landon (Who?) would win the presidency with 69.7% of the electoral college votes.  He won 1.5% of the electoral college votes.  This is a lesson about the US electoral college system, but it is also a significant lesson about the non-response bias. The Literary Digest had a 77% non-response rate; the AHRC had a 90.3% non-response rate. Who knows how the 90.3% who did not respond compare with the 9.7% who did respond?  Maybe people who were assaulted were less likely to respond and the number is a gross underestimate of assaults.  Maybe they were more likely to respond and it is a gross overestimate of assaults.  The point is that we are neither wiser nor better informed for reading the AHRC report.

Sadly, whoever estimated the (terrible) response was even then, overly optimistic.  The response rate was significantly lower than the worst-case scenario of 10% [Response Rate = 9.7%, 95%CI: 9.6%–9.8%].

In sharp contrast to the bad response rate of the AHRC study, the Crime Victimisation Survey (CVS) 2015-2016, conducted by the Australia Bureau of Statistics (ABS) had a nationally representative sample and a 75% response rate — fully completed!  That’s a survey you could actually use for policy.  The CVS is a potentially less confronting instrument, which may account for the better response rate.  It seems more likely, however, that recruiting students by sending them emails is neither sophisticated enough nor adequate.

Poorly conducted crime research is not merely a waste of money, it trivialises the issue.  The media splash generates an illusion of urgency and seriousness, and the poor methodology means it can be quickly dismissed.

If there is a silver lining to this cloud, it is that AHRC has created an excellent learning opportunity for students involved in quantitative (social)  research.


It was pointed out to me by Mark Diamond that a better ABS resource is the 2012 Personal Safety Survey, which tried to answer the question about the national prevalence of sexual assault.  A Crime Victimisation Survey is likely to receive a better response rate than a survey looking explicitly at sexual assault.  I reproduce the section on sample size from the explanatory notes because it highlights the difference between a well conducted survey and the pile of detritus reported by AHRC.

There were 41,350 private dwellings approached for the survey, comprising 31,650 females and 9,700 males. The design catered for a higher than normal sample loss rate for instances where the household did not contain a resident of the assigned gender. Where the household did not contain an in scope resident of the assigned gender, no interview was required from that dwelling. For further information about how this procedure was implemented refer to Data Collection.

After removing households where residents were out of scope of the survey, where the household did not contain a resident of the assigned gender, and where dwellings proved to be vacant, under construction or derelict, a final sample of around 30,200 eligible dwellings were identified.

Given the voluntary nature of the survey a final response rate of 57% was achieved for the survey with 17,050 persons completing the survey questionnaire nationally. The response comprised 13,307 fully responding females and 3,743 fully responding males, achieving gendered response rates of 57% for females and 56% for males.