Week 5
This past week, we experimented on human annotation. We used the same comments as we used last time on gpt4 and set up a task on MTurk. The issues with this experiment were 1) for each comment, there were more than one masked entities; 2) the comments chosen are not balanced in ingroup/outgroup masked entities. Therefore, we sampled a second group of comments and set up another pilot experiment, allowing answers from ingroup, outgroup, and unknown. Two members finished the task, with worker1 guessing correctly on 53.33% labels, and worker2 on 60% labels. They aligned on 53.33% answers. For the same masked entity annotation task, gpt4 achieved an accuracy of 46%. One potential bias in this experiment is that human designs the task so we know there’s no unknown label, but gpt4 doesn’t know.
In addition to the annotation task, this week I have finished scraping the data. I thus know the proportion of comments that contain some team names. Take Texans’ statistics for an instance: there are in total 51861 comments scraped from the web. 5791 of them mention at least one team name. More over, since we want to know how fans of competing teams respond to the same game, we pair the game posts of competing teams together. There are 527 game day pairs, 461 post game pairs, and 27 pregame pairs.