Stratified learning for reducing training set size
Peter Hastings, Simon Hughes, M. Anne Britt, Patricia Wallace, and Dylan Blaum. Stratified learning for reducing training set size. In Proceedings of the 13th International Conference on Intelligent Tutoring Systems, ITS 2016, LNCS 9684, pp. 341 – 346, Springer, Berlin, 2016.
Download
Abstract
Educational standards put a renewed focus on strengthening students' abilities to construct scientific explanations and engage in scientific arguments. Evaluating student explanatory writing is extremely time-intensive, so we are developing techniques to automatically analyze the causal structure in student essays so that effective feedback may be provided. These techniques rely on a significant training corpus of annotated essays. Because one of our long-term goals is to make it easier to establish this approach in new subject domains, we are keenly interested in the question of how much training data is enough to support this. This paper describes our analysis of that question, and looks at one mechanism for reducing that data requirement which uses student scores on a related multiple choice test.
BibTeX
@INPROCEEDINGS{Hastings:its2016, author = name:psapd, title = {Stratified learning for reducing training set size}, booktitle = {{Proceedings of the 13th International Conference on Intelligent Tutoring Systems, ITS 2016, LNCS 9684}}, editors = {A. Micarelli and J. Stamper and K. Panourgia}, year = 2016, address = {Berlin}, pages = {341 - 346}, publisher = {Springer}, cvnote = {Acceptance rate: 42\%. CORE Conference Rank: A.}, doi = {10.1007/978-3-319-39583-8 39}, abstract = {Educational standards put a renewed focus on strengthening students' abilities to construct scientific explanations and engage in scientific arguments. Evaluating student explanatory writing is extremely time-intensive, so we are developing techniques to automatically analyze the causal structure in student essays so that effective feedback may be provided. These techniques rely on a significant training corpus of annotated essays. Because one of our long-term goals is to make it easier to establish this approach in new subject domains, we are keenly interested in the question of how much training data is enough to support this. This paper describes our analysis of that question, and looks at one mechanism for reducing that data requirement which uses student scores on a related multiple choice test.} }