Identifying structure of student essays


Students must be able to read scientific texts with deep understanding and create coherent explanations that connect causes to events. Can we use Natural Language Processing to evaluate whether they're doing that based on essays they write?

research questions

  1. Can we automatically identify important concepts in students' scientific explanations?
  2. Can we identify causal relations in their explanations?
  3. Can we use these to identify an essay's causal structure?
  4. How many training examples are necessary?
  5. Can we automatically assemble training materials for a new topic?


  • Simon Hughes, PhD defended in 2019.
  • Clayton Cohn, MS Thesis defended in 2020.
  • Keith Cochran, PhD student, in progress.
  • Peter Hastings, squadron leader.
  • Noriko Tomuro, associate.
  • M. Anne Britt, NIU, collaborator.


  1. Simon Hughes's dissertation demonstrated \(F_1\) scores averaging 0.84 for RQ 1 with bi-directional RNN.
  2. Simon demonstrated \(F_1\) scores between .73 and .79 using a bi-directional RNN, and a novel shift-reduce parser.
  3. Simon created a re-ranking approach that scored between .75 and .83 on identifying the entire essay structure (Fig. 1).
  4. Hastings, et al. showed that 100 annotated essays produced a signicant portion of the performance that 1000 essays did.
  5. Working on it.

future work

  • Application of deep learning transformers like BERT to tasks above, and expanding to other domains.
  • Developing specialized deep learning methods for inferring text structure.
  • Exploring ensembling methods.

selected publications (more)

( 2020-12-04 Fri 18:20