Insights into the usefulness of machine grammar/spell checkers for improving research writing

A preliminary investigation comparing 5 machine and 1 human Language checker for detected errors and readability

Introduction

Grammar and spell check tools are improving every year with advances in Natural Language Processing and machine learning, but how effective are they? We need to know what they are good at, and what they are not good at to really understand their usefulness – especially for research writers who are Non-native speakers of English.

In a comparison analysis of four information dense sentences from a research manuscript written by a Non-native English speaker, the revisions of a human copy writer (with more than 17 years of experience in academic researching, publishing and copy editing) were compared with the suggestions by machine grammar/spell checkers. Many important mistakes (“necessary” revisions) and writing issues that affect readability were not found by machine grammar/spell checkers (see Figure 1). This article will look at these results to better understand the strengths and limitations of machine grammar/spell checkers, but also human copy editing.

Figure 1. Comparing and evaluating the revisions made by human and machine language editors. 

English as an international language or Lingua Franca (ELF) means that every day hundreds of millions of non-native speakers (NNSs) are using English for written communication, for emails, text messages, social media posts, report writing and so much more. Machine editing tools, like spell and grammar checks from Google Docs and Microsoft Word, are very useful and can really help non-native writers improve the quality of their writing by acting as assistants to identify potential writing problems and also teachers to help the writer learn.

Grammar/spell checker tools

A grammar/spell checker is a software tool that checks a written text for spelling and grammatical mistakes, appropriate punctuation, misspellings, and issues related to sentence structure. There are several of these tools available, with the most common ones being from Microsoft Word, Google Docs, Ginger, Grammarly, and a new one called Trinka focusing on academic style. These four grammar/spell checkers will be compared in this article (Ginger, Grammarly and Trinka all offer premium versions for monthly subscription fees, but here we will compare the free versions). 

Grammar/spell checkers have a long history that started in the 1960s will spell checkers that used dictionaries as the comparison source. More recently, neural network-based tools also evaluate tone, style, and semantics (word use) to help improve the quality of the writing.

The language checking tools give a visual indication by highlighting or underlining spelling and grammar errors in different colors, like Microsoft Word uses red for spelling and blue for grammar. And then, when the cursor is placed over the highlighted parts, the tool offers suggestions to correct those errors; certain tools offer a suggestive corrected version by displaying correction as strikeout in an appropriate color, like Grammarly (Figure 1). When explanations are offered, they are usually phrased with the use of hedges, like “it seems that….” or “it appears that …” (again, see Figure 2)

Figure 2. An example of Grammarly identifying possible errors, giving a correction, and explanation.

These days, the most common types of NNS English writing are emails (business) and social media (personal). These tests are often short and follow similar structures. The grammar is typically simple and the tone more conversational, informal and personal because it is directed to colleagues, contacts or social media “friends”.

In most of these regular writing tasks, the message and conversational style and tone is simpler and we tend to pay attention to some of the basics like spelling and grammar. For these texts, current automated language checkers are most useful.

Types of errors identified by grammar/spell checkers

Figure 3 below gives some of these types of examples like like misspellings (mispelling → misspelling”), wrong word choice as identified from context (effect → affect), punctuation (comma use), and grammar mistakes involving verb forms (will showed → will show) and constructions (not only … but also).

Figure 3. Common types of simple errors caught by grammar and spell checkers (Devopedia, 2021).

By using various rule-based, natural language processing and machine learning approaches, grammar/spell checkers are programmed to identify five main types of error: sentence structure, spelling, syntax (grammar), punctuation and semantic (word use). Here is a brief description of these five types of errors with examples (see also Figure 4 below).

  • Sentence Structure: Traditionally, rule based approaches with a training database that assigns parts of speech (POS) to each word can identify if the POS are organized incorrectly.

Example: “she began to singing” → “began to sing” or “began singing”.

Grammar/Spellcheckers can often identify dependent clauses without the main clause (sentence fragment), a sentence with two or more clauses without a conjunction (run-on sentence), or sentences missing a subject and other structural errors.

  • Syntax Error: Grammar/spell checkers are programmed with grammar rules that can detect violations of rules such as subject-verb agreement, wrong/missing article or preposition, verb tense or verb form error, or a noun number error.

Example: “The elderly man often go to the park” → “often goes”.

  • Punctuation Error: This is for missing, unnecessary or wrongly placed punctuation, like commas, semi-colons, periods, exclamation marks, question marks, etc.

Example: “The results aim to assess the cause, effect and impact in …”  → “effect,

and impact …”.

  • Spelling Error: By comparing words with a reference dictionary, words not found in the dictionary can be compared with similarly spelt words.

Example: “He could not concieve the consequences” →   conceive.

  • Semantic Error: Based on a training database, Natural Language Processing approaches can statistically identify whether a word is wrongly used in a given context, like

Example: “I am going to the library to buy a book” → “bookstore to buy a book”.

Rule-based approaches typically can’t handle semantic errors. They require statistical or machine learning approaches, which can also flag other types of errors. Often a combination of approaches leads to a good solution.

Figure 4. Types of grammar and spelling errors (Soni & Thakur 2018, cited in Devopedia, 2021) .

What about complicated texts?

The above five classes of errors may be able to catch a lot of the writing issues with simpler texts like email and messaging. But what about more complicated texts written for a public audience?

For these texts, we need to be more careful, and a lot more thoughtful.

When texts get longer, like reports, memorandums and articles, the ideas increase in number and complexity, often requiring careful sequencing and support to clearly inform or persuade the reader. Making the writing readable and smooth is the big challenge – breaking down complex ideas into simpler parts and connecting them smoothly and logically.

This means we need to spend more time planning, writing, and revising.

The writing for complicated texts depends largely on its context, where context means the purpose of the text and its audience. Here, we need to also consider writing/reading issues more complicated than correctness (word use and grammar) and mechanics (punctuation, spelling, capitalization), such as

  • Coherence: how ideas are logically organized in the text at the paragraph or whole text levels
  • Cohesion: how ideas stick together within or between sentences
  • Reader-writer interaction: how words and grammar can be used to influence the reader 
  • Style and register: academic/business/diplomatic style  and formal/informal register.

These four issues help make the text more readable. And while they often operate within the sentence, they more often apply across sentences within a paragraph or across paragraphs. Unfortunately, grammar/spell checkers are most successful at the sentence level for identifying issues within a sentence.

So, how helpful are Grammar/spell checkers for these complicated texts?

Technical writing: grammar + spelling + more …

In my line of work as a research writer and also copy editor or language editor for scientific research manuscripts, I put a lot of time and care into making sure that sentences and paragraphs are not only clean of errors but also clear and efficient in expression. Professional writers—especially technical writers—do this in order to ease the processing efforts of the reader which will maximize the uptake of the author’s message. This is readability.

Optimizing readability involves linguistic strategies like ensuring that

  • ideas follow a certain logical sequence, or
  • supporting ideas be put in a dependent clause, or
  • contextual information placed at the beginning of a sentence, or
  • the main idea appears at the end of the sentence for maximum impact.

These are examples of grammar and text manipulation that help the reader know what is important in the text.

Of course, not every sentence is of equal weight in a report or article. There are always some paragraphs or sentences that are key to the purpose of the text. These parts of the text in particular require careful thinking and expression.

For a recent copyediting case for a computer science research paper written by a NNS researcher, I spent a lot of time on the Abstract and first paragraph of the Introduction. I noticed that key sentences in research papers need to be as clear and persuasive as possible, and that many NNS writers struggle with these types of sentences. I also started to wonder if grammar/spell checkers can be useful tools to help NNSs become more independent from human copyeditors?

The analysis: key sentences from a research paper

The two text sections are listed below, the first came from the Abstract, and the second from the first paragraph of the research paper. Both of these sections are extremely important in a research promotion function because they serve to convince researchers to read and use their research – which is, after all, the main point of publishing research.

2 Key sentences from the Abstract introducing the new system:2 k2y sentences from the Introduction pointing out need for this research:
  The proposed system deploying HC AI using explainable knowledge and causations shows that our approach has outperformed the sole statistical approach by further considering meta parametric HC ethical priority, compared to the baselines in the simulated game theory environments of Deep Reinforcement Learning. (2) The experimental results aim to assess the cause, effect and impact in the multi-agent of heterogeneity in the DRL environments for general, natural and significant causal learning representations efficiently and effectively.  (3) The needs of HCC and AI have significantly become important magnitudes to estimate the modern AI systems. (4) Especially, many AI systems have mainly focused on how to design and implement the system architecture for attaining the convergent goal systematically and expeditiously, but regardless of the effectiveness of human’s high-level adoption, such as heterogeneous agents, ethical transparency and safety, critical saliency and conditional situations of HC AI modeling mechanism.

The first thing you will notice with these sentences is that they are full of unfamiliar and highly technical terms and concepts. The grammar is also quite dense and some of the sentences are quite long which makes it even more difficult for the reader to understand and process the information.

However, in terms of their importance, the two sections are pivotal for communicating key messages about the research. From the Abstract section, the two sentences (1,2) from the abstract describe the AI system that will be introduced and promoted in the research paper. From the Introduction section, these two sentences (3,4) are the first two and immediately set the tone for the paper: why is there the need for the proposed AI system. In short, since these two sections play an important role in influencing the reader, they require especially careful thinking and expression.

Caveat

Please note that the following analysis of five machine language checkers is only based on four sentences. This is not a thorough and carefully controlled analysis targeting many different types of word and grammar mistake types. I was simply curious to see to what extent the machine language checkers could analyze these pivotal and strategic sentences and identify 1) any mistakes and 2) any readability issues. I was also wondering about how many misidentified mistakes, or false positives, they might find.

Rating the correction suggestions

When analyzing the revision suggestions from the grammar/spell checkers, I assigned them into one of four categories (see Table 1):

  • Necessary: If the grammar/spell checker identified a real error
  • Helpful: not an error, but can improve readability
  • Not necessary: the suggested revision would make no difference
  • Mistake: incorrectly assumed an error when there was none, or “false positive”.

The last category is the most troubling because for NNSs whose knowledge of English is not very solid or complete, this suggestion could lead to a mistake where the original was actually correct. For this reason, I focused on the “mistake score”, which was calculated from (# mistakes) / (total revision suggestions).

Table 1. Correction categories and examples

CategoryMeaningExample and suggestionLanguage issue and explanation
NecessaryOriginal was incorrect and needs to be corrected“The needs of HCC and AI have significantly become important magnitudes to estimate [Gr5] the modern AI systems.”    [Gr5] the modern AI systems → modern AI systems.Indefinite article “the”: This suggestion is necessary because “modern AI systems” is a general noun class, so we cannot use “the”; Grammarly knew that “modern AI” were not simply adjectives specifying the noun “systems”, but “modern AI systems” is a complete noun phrase to be treated as a whole.
HelpfulTo improve processing and readability, but not necessary“The experimental results aim to assess the A, [MsW1] B andC …”    [MsW1] B and C → B, and C  Oxford comma: Here, the added comma helps the reader group the A, B, C series together more quickly so it helps the reader process information
Not necessaryNo difference“The proposed system deploying HCI using explainable knowledge and [Tr1] causations shows that our approach has outperformed …”   [Tr1] causations → causationCount vs non-count nouns: Although “causation” is usually a count noun, here in this technical sense, causation can be put into the plural form.
MistakeFalse positive“The proposed system deploying HCI [Gi1] using explainable knowledge and causations shows that our approach has outperformed …” [Gi1] using → usesVerb form: In the sentence, the main verb is “shows” not using” because using a reduced form of “which uses….”.

Microsoft Word vs Google Docs

Let’s start with the first line of a writer’s defense against mistakes: MS Word and Google Docs. Writers writing extended texts use these types of word processing applications. Table 2 shows that Microsoft Word found 1 issue in total and Google Docs found 0. The Microsoft Word issue is often called the Oxford comma and is used before “and” in a list of 3 or more items (a, b, and c). However, this is not a hard-and-fast grammar rule (contrary to the feelings of vocal proponents of the Oxford comma), and it is surprising that another example of the Oxford comma was not found later in the same sentence.

Table 2. Comparing Microsoft Word and Google Docs on two sections of research texts

Sample textsMicrosoft Word SCORE: 1/1Google DocsEvaluation and explanation
(1) The proposed system deploying HC AI using explainable knowledge and causations shows that our approach has outperformed the sole statistical approach by further considering meta parametric HC ethical priority, compared to the baselines in the simulated game theory environments of Deep Reinforcement Learning. (2) The experimental results aim to assess the cause, [MsW1] effect and impact in the multi-agent of heterogeneity in the DRL environments for general, natural and significant causal learning representations efficiently and effectively.                    [MsW1] Punctuation: add a comma:   “effect, and”                “Document looks good”[1] HELPFUL: “Oxford comma” – but this is not necessarily a hard-and-fast rule
(3) The needs of HCC and AI have significantly become important magnitudes to estimate the modern AI systems. (4) Especially, many AI systems have mainly focused on how to design and implement the system architecture for attaining the convergent goal systematically and expeditiously, but regardless of the effectiveness of human’s high-level adoption, such as heterogeneous agents, ethical transparency and safety, critical saliency and conditional situations of HC AI modeling mechanism.        “Total suggestions: 0”        “Document looks good”   

I often use Google Docs when writing first drafts of almost everything, and my spelling and grammar typos are almost always caught, so I do not want to suggest that Google Docs or Word should not be used. However, perhaps many of the mistakes of NNSs may not be found so easily with these grammar/spell checkers. 

Ginger vs Grammarly vs Trinka

Now, let’s compare Grammarly, Ginger and Trinka – the commercial applications that are expected to outperform the language checking tools built into word processors.  

Ginger and Grammarly are the two most popular and commercially successful grammar/spell checkers. They both started at around the same time, Ginger in 2007 and Grammarly in 2009, and their logos are very similar with a stylized white “G” inside a green circle.

In 2020, Crimson Interactive released Trinka, a grammar/spell checker designed specifically for academic writing. What sets Trinka apart is that it takes into account Academic style guides as well as technical spelling and phrases (see Table 3).

Table 3. Features of Trinka compared to other grammar/spell checking tools (Trinka, 2021)

comparison of Trinka and Grammarly accuracy rates

“Errors” and error analysis

In a white paper on Trinka by Crimson (2020), they shared the results of a comparison study with Grammarly in an analysis of 258 sentences, which were also evaluated by an experienced professional academic copy editor. In total, there were 437 errors in the 258 sentences, with Trinka having a higher accuracy rate of 51% when compared to Grammarly’s 46%. More specifically, Trinka offered more correct (51 vs 46%) and fewer incorrect ( 19.7 vs 22%) suggestions and also had fewer missed suggestions, though these numbers are worryingly high at about 50% (Table 4). 

Table 4. Comparison of Trinka and Grammarly (from the creators of Trinka, Crimson [2020])

 However, it is not clear how the experienced copy editor from Trinka defined the 437 “errors”. It is my guess that “necessary”, “helpful”, and perhaps even “possible but not necessary” suggestions were associated with “error”. In my analysis, I will make a clear distinction between suggestions that are necessary” and which identify clear errors or mistakes, from those that are helpful to improve readability, as well as those that are possible suggestions but are not necessary.

Ranking: #1 Grammarly, #2 Trinka, #3 Ginger

In Table 5a, I have compared the suggestions from the free versions of the three commercial grammar/spell checkers; in Table 5b I have evaluated their suggestions. We should keep in mind that the sentences are quite dense, both in terms of technical phrases/concepts and grammar. They are difficult enough for humans to process, let alone automated AI tools. We can notice that the algorithms for these tools are clearly different because they identified different issues. Ginger only found four issues compared to six from Grammarly and Trinka, and all four were incorrect; Grammarly made only one mistake and Trinka two.

Table 5a. Comparison of Ginger, Grammarly, and Trinka grammar/spell checkers on 2 sections of research texts

Sample textsGinger MISTAKE SCORE: 4/4  Grammarly MISTAKE SCORE: 1/6Trinka MISTAKE SCORE: 2/6
  (1) The proposed system deploying HCI [Gi1] using explainable knowledge and [Tr1] causations [Gr1] shows that our approach has outperformed the sole statistical approach by further considering [Gr2][Tr1] meta parametric HC ethical priority, compared to the baselines in the simulated game theory environments of Deep [Tr3] Reinforcement Learning. (2) The experimental results [Tr4] aim to assess the cause, effect [Gr3] and impact [Gi2][Tr5]  in the multi-agent of heterogeneity in the DRL environments [Gi3] for general, natural [Gr4] and significant causal learning representations efficiently and effectively.  [Gi1] using → uses        
[Gi2] in → on   [Gi3] For → in  
[1] Grammar: shows → show
[2] Spelling: meta parametric → meta-parametric   
[3] Punctuation: add comma … →  , and   [4] Punctuation: add comma … → , and    
[1] Spelling: causations → causation Please check the spelling
[2] Grammar: meta parametric → meta-parametric Delete the space between ‘meta’ and ‘parametric’ or add a hyphen between them.
[3] Grammar: Reinforcement Learning → reinforcement learning Replace ‘Reinforcement Learning’ with ‘reinforcement learning’.
[4] Enhancement: aim to assess → “assess” Conciseness tip: Consider deleting ‘aim to’ and changing ‘assess’ as shown here.
[5] Enhancement: in → on Replace ‘in’ with ‘on’.
(3) The needs of HCC and AI have significantly become important magnitudes to estimate [Gr5] the modern AI systems. (4) Especially, many AI systems have mainly focused on how to design and implement the system architecture for attaining [Tr6] the convergent goal systematically and expeditiously, but regardless of the effectiveness of human’s high-level adoption, such as heterogeneous agents, ethical transparency [Gr6] and safety, critical [Gi4] saliency and conditional situations of HC AI modeling mechanism.                        [Gi4] Didn’t recognize  “saliency” [5] Grammar: the modern → the modern              
[6] Punctuation: add comma … → , and  
[6] Grammar: “the” → a Replace ‘the’ with ‘a’.

Table 5b. Comparing suggestion evaluations for Ginger, Grammarly, and Trinka  

Ginger MISTAKE SCORE: 4/4  Grammarly MISTAKE SCORE: 1/6  Trinka MISTAKE SCORE:  2/6  
[1] MISTAKE: “use” not the main verb, which is “shows”
[2] MISTAKE: matched “on” with impact on” but not applicable to the whole series with “cause” and “effect”
[3] MISTAKE: seemed to incorrectly assume there should be “in general” phrase  
[4] MISTAKE: saliency is a technical term
[1] MISTAKE: “causations” to be the subject, but it is “system”
[2] HELPFUL: hyphen can make it faster to process
[3], [4] HELPFUL: Oxford comma is possible and makes it faster to process
[5] NECESSARY: general noun class, so no “the”
[6] HELPFUL: Oxford comma (see above)
[1] NOT NECESSARY: in this technical sense, causation can be put in plural
[2] HELPFUL: hyphen can make it faster to process
[3] MISTAKE: capital letters are for the technical term
[4] NOT NECESSARY: eliminating “aim” makes it more concise and direct, though “aim” is more hedged and humble, perhaps a good idea if “impact” is being evaluated. 
[5] MISTAKE: matched “on” with impact on” but not applicable to the whole series with “cause” and “effect
[6] NECESSARY: “a goal” is more suitable because it is the first time “goal” is mentioned; “the goal” implies we know which one, but we don’t yet.  

Even though both Trinka and Grammarly had few mistaken suggestions, most of their corrections were not necessary. In fact, only their suggested corrections relating to the definite article (the) is really necessary: Grammarly [5] recommended removing “the” because the noun phrase “modern AI systems” is a general class, whereas Trinka [6] suggested removing “the” before “convergent goal” because it is the first time the noun is mentioned.

So, what about the human editor?

Human language editor

Unlike the automated grammar/spell checkers, the human copy editor is first-and-foremost a human reader and has a trained reader sensibility and a professional ability to identify language issues that impede or stall the processing of the written message. And differing from automated grammar/spell checkers, this reader sensibility goes beyond the word, phrase and sentence levels. Reading is a process of moving across a text to take in information, and the talented writer can use skills involving cohesion, coherence, reader-writer interaction and style and register tools to make her message meet reader expectations and to make it as easy to process as possible.

Of course, different writers and different copy editors will use language differently to express a message, and another writer or copy editor will likely use language in other ways to convey the message.

With this in mind, let’s look at the original and language edited versions from the human language editor (Table 6). We should keep in mind that complicated texts like research papers are written with the purpose to change someone’s mind about something, and this requires rhetorical and linguistic skills to not only communicate a clear message but also persuade.

Table 6. Original version vs human copy edited version that is more readable

Original versionRevised version
  (1) The proposed system deploying HC AI using explainable knowledge and causations shows that our approach has outperformed the sole statistical approach by further considering meta parametric HC ethical priority, compared to the baselines in the simulated game theory environments of Deep Reinforcement Learning. (2) The experimental results aim to assess the cause, effect and impact in the multi-agent of heterogeneity in the DRL environments for general, natural and significant causal learning representations efficiently and effectively.(1) The proposed system deploys HC AI using explainable knowledge and causations, and when it was compared to baselines in simulated game theory environments of Deep Reinforcement Learning, it outperformed the statistical approach alone by further considering meta parametric HC ethical priorities. (2) The experimental results aim to efficiently and effectively assess the cause, effect and impact of multi-agent heterogeneity in DRL environments for general, natural and significant causal learning representations.
(3) The needs of HCC and AI have significantly become important magnitudes to estimate the modern AI systems. (4) Especially, many AI systems have mainly focused on how to design and implement the system architecture for attaining the convergent goal systematically and expeditiously, but regardless of the effectiveness of human’s high-level adoption, such as heterogeneous agents, ethical transparency and safety, critical saliency and conditional situations of HC AI modeling mechanism.(3) The needs of HCC and AI have become important magnitudes to estimate the modern AI systems. (4) This is especially the case for many AI systems which have mainly focused on how to design and implement the system architecture for systematically and expeditiously attaining convergent goals, but have not accounted for factors affecting high-level adoption by humans, such as heterogeneous agents, ethical transparency and safety, critical saliency and conditional situations of the HC AI modeling mechanism.

Why is the revised version more readable?

Take sentence (1), for example. The original version used 2 reduced relative clauses before the main verb of the sentence (“shows”) actually appeared. This puts a large distance between the main subject (“proposed system”) and its verb (“shows”). The two relative clauses [a,b] are written out in full in the unpacked second example below to make this clearer:

“The proposed system [a] deploying HC AI   [b] using explainable knowledge and causations shows that …” “The proposed system [a] THAT deployS HC AI [b] WHICH usES explainable knowledge and causations shows that …”

It should not surprise us that this information dense sentence with 2 relative clauses after the sentence subject fooled Ginger into identifying the main verb as “use” (“The proposed system deploying HC AI USES”) because it is rare that within a relative clause after the sentence subject there is another relative clause. Rare, but not impossible. But this is not very readable, so the human-revised sentence unpacked a lot of the information to revise it into a more straightforward and common SV sentence:

“The proposed system deploys HC AI using explainable knowledge and causations, …”

Now, although the revised sentence is still quite long and complicated (partly because of strict word requirements for Abstracts), it is now also better able to guide the reader through the information (“and when it was compared to …”) and leave the reader the main message and important information at the end of the sentence (“it outperformed the …”). The end of the sentence is where the reader pauses before starting the next sentence, so this is why we should always make an effort to place the important information at the end of a sentence. 

Similar to sentence (1), sentence (4) is quite dense and has a number of mistakes and also awkward phrasing that slows down and confuses the reader. The first problem involves the word “especially” which in English cannot be used as a sentence connector, as it can in Mandarin, the original author’s native language. To fix this error, we can turn the word “especially” into an introductory clause,

“This is especially the case for many AI systems …”.

This clause will go on to describe these “many AI systems” with a “which” relative clause, but then adds a final clause with the conjunction “but” to clarify the limitation of these “many AI systems”:

[they] “have not accounted for factors affecting high-level adoption by humans”,

with examples of these factors given at the end of the sentence. This more direct phrasing  clarifies the main message of the sentence that “many AI systems have not considered various factors”, which will set up the value for the proposed approach that will consider these various factors. 

Table 7 gives a more detailed overview of the revisions from the human rater and  evaluation and explanation for the revisions. Out of a total of 15 revisions across the 4 sentences, 9 were seen as “Necessary” and 6 “Helpful” to ease readability and processability.

Table 7. Detailed description of the human language editor revisions

OriginalRevisionEvaluation 
(1) The proposed system deploying HC AI using explainable knowledge and causations shows that our approach has outperformed the sole statistical approach by further considering meta parametric HC ethical priority, compared to the baselines in the simulated game theory environments of Deep Reinforcement Learning. (2) The experimental results aim to assess the cause, effect and impact in the multi-agent of heterogeneity in the DRL environments for general, natural and significant causal learning representations efficiently and effectively.The proposed system [1] deploying-esHuman-Centered AI using explainable knowledge and causations, [2] and when it was compared to baselines in simulated game theory environments of Deep Reinforcement Learning, [3] it shows that our approach has outperformed the [4]  sole statistical approach alone by further considering meta parametric Human-Centered ethical [5] priority-ies, compared to the baselines in the simulated game theory environments of Deep Reinforcement Learning. The experimental results aim to [6]  efficiently and effectively assess the cause, effect and impact [7]  of  in  [8]  the multi-agent [8] of heterogeneity in [9] the DRL environments for general, natural and significant causal learning representations efficiently and effectively.[Hum1] HELPFUL: more direct sentence with S V closer together.
[Hum2] HELPFUL: supporting or context info for main clause should not appear at end of sentence; main clause or idea of sentence should occur at end of sentence for maximum impact
[Hum3] HELPFUL: more direct phrasing: “the system outperformed…”
[Hum4] NECESSARY: word choice sole → alone
[Hum5] NECESSARY: plural -s
[Hum6] HELPFUL: Move adv before verb to reduce distance between adv and verb to make it faster to process the info
[Hum7] NECESSARY: Preposition 
[Hum8] NECESSARY: combine into general noun phrase “multi-agent heterogeneity”, so definite article “the” is not needed
[Hum9] NECESSARY: no definite article “the” is needed because “DRL environments” is a general noun phrase.
(3) The needs of HCC and AI have significantly become important magnitudes to estimate the modern AI systems. (4) Especially, many AI systems have mainly focused on how to design and implement the system architecture for attaining the convergent goal systematically and expeditiously, but regardless of the effectiveness of human’s high-level adoption, such as heterogeneous agents, ethical transparency and safety, critical saliency and conditional situations of HC AI modeling mechanism.The needs of Human-Centered Computing (HCC) and AI have [10] significantly become important magnitudes to estimate the modern AI systems. [11] Especially, This is especially the case for many AI systems which have mainly focused on how to design and implement the system architecture for [12] systematically and expeditiously attaining the convergent [13] goals  systematically and expeditiously, but [14] have not accounted for factors affecting high-level adoption by humans regardless of the effectiveness of human’s high-level adoption, such as heterogeneous agents, ethical transparency and safety, critical saliency and conditional situations of [15] the HC AI modeling mechanism.[Hum10] NECESSARY: word choice –  “Significantly” is redundant as “important” is used two words later; “significantly” in experimental research also has the technical meaning of “Statistical significance” which is not the meaning described here.
[Hum11] NECESSARY: word choice – “Especially” is not a transition word that can connect two sentences (as it is in Chinese, the native language of the original author).
[Hum12] HELPFUL: In this SVO clause, moving the adv before the verb makes it faster for the reader to process the info [Hum13] NECESSARY: plural -s (alternative: “the” → “a”, like [Tr6] above)
[Hum14] HELPFUL: Clearer and more direct phrasing
[15] NECESSARY: definite article, “HC AI modeling” tells us which “mechanism”, so “the” is needed.

Comparing 5 machine and human performances 

Table 8 re-presents the data from Figure 1 at the beginning of this article and compares all of the machine and human analyses. As has already been mentioned, the number of identification mistakes (false positives) is alarming because without other assistance, NNS may be swayed into following these incorrect suggestions and introducing even more errors into their writing. This uncertainty and probability about making false positive suggestions is not unknown to the creators of these machine grammar/spell checkers, which is why Grammarly offers hedged explanations for their suggestions, like the “It appears that …” explanation for the suggestion to change the verb agreement from “shows” (for singular noun) to “show” (for a plural noun):

The Grammarly grammar checker mistakenly identified “knowledge and causations” (plural, 2 nouns) as the subject, but actually the subject is the “proposed system” (singular noun).

In terms of identifying errors and making “necessary” suggestions, only Grammarly and Trinka were able to identify 1 (Gr5,Tr6), out of a possible 9 identified by the human editor. These necessary suggestions both involved articles (a/the), but the human editor found 4 article mistakes (Hum 8,9,13,15). As for the 5 remaining “necessary” revisions by the human editor, they included: 3 word choice issues (“sole”, “significantly”, “especially”), 1 plural noun, and 1 preposition. 

Table 8. Comparison of the four suggestion types for four key sentences

grammar/spell checkerTotal no. of suggestionsNecessaryHelpfulNot necessaryMistake
Ms Word10100
Google Doc00000
Ginger40004
Grammarly61 article4 1 added hyphen 3 Oxford comma01
Trinka61 article1 added hyphen22
Human copy editor159 4 article 3 word choice 1 plural 1 preposition6 3 More direct  1 Main idea at end of sentence 2 Adv before verb00

The “helpful” category for machine and human suggestions can help make the text easier to read and process for the reader. Both Grammarly and Trinka suggested a hyphen for the noun phrase “meta-parametric”, and Grammarly identified Microsoft Word’s missing Oxford comma, and 2 more for a total of 3. However, neither the “hyphen” or Oxford commas were suggested by the human copy editor for two possible reasons: 1) they are not strictly necessary, and 2) they were embedded in sentences that required major reconstruction so they were overlooked. A human reader or copy editor has a limited amount of attention and cannot always attend to everything.

The free versions of Ginger, Grammarly and Trinka were not able to locate any major grammar or readability issues with the highly problematic sentences (1) and (4), whereas the human copy editor found 5 issues in both (1) and (4). However, the Trinka Cloud Demo (of the premium version) was able to detect a problem with sentence (4), probably because of the large number of phrases and commas (Figure 5).

Figure 5. Trinka’s premium version free trial was able to find a readability problem with sentence (4).

The companies who created Trinka and the other grammar/spell checkers realize that there are limitations to these applications. And because Trinka is a product from a company that specializes in helping researchers better communicate their research, there is an option within Trinka to “send for Professional Editing” (Figure 6), which is a professional service that will be able to identify all the errors missed by Trinka as well as make the text more readable to readers. 

Figure 6. Trinka has a built-in link to human copy editors.

Conclusions and recommendations for NNS research writers? 

In conclusion, the results from this admittedly superficial investigation into the errors and readability issues of 4 key sentences in a research manuscript show that machine grammar/spell checkers have a limited functionality.

Even Crimson’s comparison study only showed that Trinka could only detect 51% of the errors in their sample of 258  individual sentences from different research domains, which was only marginally better than Grammarly’s 46%. Of course, we do not know how simple or complicated those sentences were, and if they meaningfully followed previous sentences. If the sentences were fairly simple, grammatically direct or self-contained, like sentences (2) and (3), then it is unlikely that they would contain issues related to readability, like cohesion, coherence, or reader-writer interaction.

These tools are useful to identify a number of errors that can remind and even train NNSs about correct usage and perhaps give them clues about  sentences that are “Difficult to read”. This is good. However, the “false positive” scores are quite high from Crimson’s (2020) study (20% for Trinka, 22% for Grammarly), and these mistaken suggestions will cause NNS writers to spend extra time mulling over the language and perhaps even blindly following the suggestions to further increase the errors in their writing. 

From this brief analysis, we can draw some conclusions. The first is that research writers need to be aware which sentences in their paper communicate key information, and spend more time revising, clarifying and connecting the ideas to make them as clear and as easy to process as possible.

The second conclusion involves the use of grammar/spell checkers. In practical terms, NNS researchers can consider using a number of different machine grammar spell checkers together to catch more errors, but only under certain conditions:

  1. Turn on the language checkers only after you finish writing; you don’t want the suggestions to interfere with your writing and thinking process, which go hand-in-hand,
  2. Don’t get overwhelmed by all the suggestions – not all are correct, and not all are necessary,
  3. Only pay attention to suggestions that confirm your knowledge of correct grammar and use them as reminders,
  4. Still get a very high level English user familiar with your field’s research writing to proofread your manuscript before sending it to a conference or journal.

References

Crimson (2020). White Paper: A Comparison of Trinka with Grammarly and LanguageTool on Academic Text. Accessed from https://www.trinka.ai/  June 17, 2021. 

Devopedia. (April 28, 2021). “Grammar and Spell Checker.” Version 13, April 28. Accessed 2021-06-14. https://devopedia.org/grammar-and-spell-checker

Soni, M., & Thakur, J. S. (2018). A systematic review of automated grammar checking in English language. arXiv preprint arXiv:1804.00540.

Trinka website (2021).  https://www.enago.com/academy/trinka-vs-leading-grammar-checker-tools/

3 thoughts on “Insights into the usefulness of machine grammar/spell checkers for improving research writing”

Leave a Comment

Your email address will not be published. Required fields are marked *