Big Data & Society (Feb 2020)
How biased is the sample? Reverse engineering the ranking algorithm of Facebook’s Graph application programming interface
Abstract
Facebook research has proliferated during recent years. However, since November 2017, Facebook has introduced a new limitation on the maximum amount of page posts retrievable through their Graph application programming interface, while there is limited documentation on how these posts are selected. This paper compares two datasets of the same Facebook page, a full dataset obtained before the introduction of the limitation and a partial dataset obtained after, and employs bootstrapping technique to assess the bias caused by the new limitation. This paper demonstrates that posts with high user engagement, Photo posts and Video posts, are over-represented, while Link posts are under-represented. Top-term analysis reveals that there are significant differences in the most prominent terms between the full and partial dataset. This paper also reverse engineered the new application programming interface’s ranking algorithm to identify the features of a post that would affect its odds of being selected. Sentiment analysis reveals that there are significant differences in the sentiment word usage between the selected and non-selected posts. This paper has significant implications for the representativeness of research that use Facebook page data collected after the introduction of the limitation.