Recently, MIT Technology Review announced the release of a new word-emotion and word-polarity lexicon developed by Saif Mohammad and Peter Turney from Canada’s National Research Council (NRC). The lexicon includes 10,000 words and their associated emotions and polarity, and according to the researchers, was built cost-effectively in the span of a few weeks using Mechanical Turk.
Companies are increasingly using software to mine data from social sites, like Twitter and Facebook, to understand public opinion toward brands, products and services. However, the accuracy of these applications is limited by the software’s ability to interpret the emotion associated with each word and most existing technologies rely on relatively limited or small emotion lexicons.
We reached out to the Mohammad and Turney to learn more about their methods for building their lexicon using Mechanical Turk. Here are a few kernels of advice that they had to share with other Requesters based on their experience:
- Design and implement an effective quality control strategy. Key to Mohammad and Turney’s quality control was their use of Known Answers. In each HIT, the researchers primed Workers by presenting a word choice question to assess each Worker’s familiarity with the word to which they would later be asked to associate an emotion. According to Mohammad, “Prompting Workers to consider the word’s meaning at the beginning of the task put them in a better mindset to answer the following emotion association question and led to higher quality annotations.”
- Simplify HITs by breaking workflows into simple steps. To increase Workers’ speed, while also improving the quality of their outputs, the researchers recommend breaking complex questions into simpler “bite-size” questions. According to Mohammad, “If it is easy for Workers to quickly comprehend questions and provide answers, the quality of Workers’ answers will be higher and more Workers will be more interested in contributing to your HITs.”
- Present clear and concise instructions. Early in their HIT design process, Mohammad and Turney tested variations on the wording of their instructions. For instance, in one test they asked Workers if a word “evokes” a specific emotion; whereas, in a second test they asked if a word “is associated” with a specific emotion. The test revealed that agreement between Workers was more frequent when the instructions included the term “is associated”. Mohammad explained, “The experiment confirmed our intuition that 'evokes' may encourage subjective answers whereas 'is associated' guides Turkers to more objective answers.”
- Offer appropriate compensation to target the right Workers. Setting a competitive HIT price is critical to attracting the right Workers. However, according to Mohammad, “Paying more does not yield higher quality results, but paying too little may discourage the right Workers from your HITs.”
For Mohammad and Turney, the tactics above significantly impacted the quality of their results and the speed with which they were able to build their lexicon. By optimizing instructions, workflow, and pricing, they found they not only attracted the best Workers for their task, but kept them engaged in their HIT. Further, by utilizing scalable quality control tools, such as Known Answers, they were able to reliably infer the accuracy of the annotations collected as their project progressed.
To learn more about their project, see Mohommad and Turney’s research paper available here.
Comments