MaltOptimizer: A System For MaltParser Optimization

User Guide

Using MaltOptimizer

Usage:
  java -jar MaltOptimizer.jar -p <phase number> -m <MaltParser jar path> -c <training corpus> [-v <validation method>]
Note: To use the version of MaltParser included in the MaltOptimizer-1.0.2 distribution, use -m maltparser-1.7.jar. To use another version of MaltParser, make sure that you specify the path correctly and that the version is 1.7 or later.

Phase 1: Data Characteristics

In the data analysis, MaltOptimizer gathers information about the following properties of the training set:
Usage:
  java -jar MaltOptimizer.jar -p 1 -m <MaltParser jar path> -c <training corpus>

Phase 2: Parsing Algorithm

In the second phase, MaltOptimizer explores a subset of the parsing algorithms implemented in MaltParser, based on the results of the data analysis. In particular, if there are no non-projective dependencies in the training set, then only projective algorithms are explored, including the arc-eager and arc-standard versions of Nivre's algorithm and an implementation of Covington's projective parsing algorithm. By contrast, if the training set contains a non-negligible proportion of non-projective dependencies, then MaltOptimizer may also test Covington's non-projective algorithm and algorithms using pseudo-projective parsing or online reordering. After testing each of the algorithms with default settings, MaltOptimizer tunes the parameters of the best performing algorithms and creates a new option file for the best performing configuration so far. The user is given the opportunity to edit the option file (or stop the process) before optimization continues.
Usage:
  java -jar MaltOptimizer.jar -p 2 -m <MaltParser jar path> -c <training corpus> [-v <validation method>]
For phase 2, the usage includes an additional flag -v (for validation) with default value dev (for development set) and alternative value cv (for cross-validation).

Phase 3: Feature Models and Learning Algorithm

In the third phase, MaltOptimizer tries to optimize the feature model given the parameters chosen so far (in particular the parsing algorithm). It first performs backward selection experiments to ensure that all features in the default model for the given parsing algorithm actually make a contribution. It then proceeds with forward selection experiments, trying potentially useful features one by one and in combination. An exhaustive search for the best possible feature model is practically impossible, so the optimization strategy is based on heuristics derived from proven experience, see Quick Guide to MaltParser Optimization. The major steps of the forward selection experiments are the following. After the feature selection experiments are completed, MaltOptimizer tunes the parameters of the learning algorithm and creates a new option file and a new feature specification file. The user is given the opportunity to edit both of these files and continue with manual optimization.
Finally, the system stores a final MaltParser configuration file (finalOptionsFile.xml) in the MaltOptimizer installation directory. At the end of the optimization process, you may run MaltParser as follows:
java -jar <MaltParser jar path> -f finalOptionsFile.xml -F <path to the feature model suggested>
Usage:
  java -jar MaltOptimizer.jar -p 3 -m <MaltParser jar path> -c <training corpus> [-v <validation method>]
For phase 3, the usage includes an additional flag -v (for validation) with default value dev (for development set) and alternative value cv (for cross-validation).