Task 1: Text entailment.
Train a deep learning (DL) classifier to solve this task. Here is a starter code in Theano that you can use, unless you prefer to write your own in another DL platform. Here is a diagram of the starter code. Here is another possible starter code, in TensorFlow, with good explanations. It uses a different set of training data, so please train on the SICK dataset (or on any training data), but submit results on the provided SICK test data.
For comparison to a simple baseline, a very simple baseline classifier computed with the script compute_overlap_baseline.py from the SemEval webpage achieves an accuracy of 56.15%, as computed with the R evaluation script. (This baseline method choses the class ENTAILMENT if the overlap score between the two sentences is higher than a threshold, the class CONTRADICTION if it is lower than another threshold, and the class NEUTRAL if it is in between the two thresholds).
The main evaluation measure will be the classification accuracy on the test data. Other measures you can look at are the confusion matrices, as well as the Precision, Recall, and F-measure for each class.
Task 2: Semantic relatedness.
Train a deep learning classifier to solve this task. You can modify your code for task 1.
For comparison to a simple baseline, a very simple baseline classifier that computes the overlap between the two sentences leads to a Pearson correlation of 0.62. These baseline overlap scores were produced with the script compute_overlap_baseline.py and the correlation was computed with the R evaluation script.
The main evaluation measure will be the Pearson correlation between your scores for the test data and the gold standard ratings. Additional evaluation can be the mean squared error (computed on standardized scores) and the Spearman correlation.
Write a report in a file Report ( , , or .txt)
Explain what you did for task 1 and for task 2.
For task 1, report the accuracy of the classification on the test set for all the experiments that you ran.
For task 2, report Pearson correlation scores.
Discuss what classifier and what features led to your best results.
Resulst.txt
Submit the predictions of your best classifier on the test data in a file named Results.txt
Your file must contain the following 3 tab-delimited columns:
-pair_ID (the ids from the test data),
– entailment_judgment (predictions of your system for task 1; possible values: ENTAILMENT, CONTRADICTION, NEUTRAL) and
– relatedness_score (numerical predictions of your system for task 2).