网络与信息安全学报 (Aug 2023)
Twitter user geolocation method based on single-point toponym matching and local toponym filtering
Abstract
The availability of accurate toponyms in user tweets is crucial for geolocating Twitter users.However, existing methods for locating Twitter users often suffer from limited quantity and reliability of acquired toponyms, thus impacting the accuracy of user geolocation.To address this issue, a twitter user geolocation method based on single-point toponym matching and local toponym filtering was proposed.A toponym type discriminating algorithm based on the aggregation degree of locations of the toponym was designed.In the proposed algorithm, a single-point toponym database was generated to provide more reliable toponyms extracted from tweets.Then, according to a proposed local place name filtering algorithm based on the aggregation degree of user location, the aggregation degree of user location centered on the longitude and latitude of toponyms and the average longitude and latitude of users were calculated.This process helped in extracting local toponyms with a high aggregation degree, which enhances the reliability of toponyms used in geolocation.Finally, a user-toponym heterogeneous graph was constructed based on user social relationships and user mentions of toponyms, and users were located by graph representation learning and neural networks.A large number of user geolocation experiments were conducted based on two commonly used public datasets in this field, namely GEOTEXT and TW-US.Comparisons with nine existing typical methods for Twitter user geolocation, including HGNN, ReLP, and GCN, demonstrate that our proposed method achieves significantly higher geolocation accuracy.On the GEOTEXT dataset, the average error is reduced by 7.3~342.8 km, the median error is reduced by 2.4~354.4 km, and the accuracy of large area-level geolocation is improved by 1.3%~26.3%.On the TW-US dataset, the average error is reduced by 8.6~246.6 km, the median error is reduced by 5.7~149.7 km, and the accuracy of large area-level geolocation is improved by 1.5%~20.5%.
Keywords