Journal of King Saud University: Computer and Information Sciences (Apr 2019)
Broken link repairing system for constructing contextual information portals
Abstract
The web is an extremely powerful resource that has the potential to improve education and health. It enables access to new markets. There are, however, fundamental problems with web access in emerging regions. The primary issue is that internet connectivity is not keeping up with web complexity and size. Recently an innovative technology is developed in the form of contextual information portals (CIP) to mitigate the effect of low connectivity. CIP provides offline searchable and browse-able information portal. The information in CIP is composed of vertical slices of the internet about specific topics. CIP is an ideal tool for developing regions which have limited access to internet. It can be used in schools and colleges to enhance lesson plans and educational material. Although, as a standalone portal CIP provides an interactive searching and browsing interface enabling a web-like experience, however, a fundamental problem that users face is broken links. This is because crawling the web for constructing a collection for CIP only makes available a portion of webpages but not all possible documents. This creates several broken links. To address this problem we develop a broken link repairing system (brLinkRepair) for repairing broken links. brLinkRepair is useful when a user tries to navigate between pages through links and pointed pages of links are missing from the CIP. We provide an information retrieval system for repairing broken links. For each broken link our system recommends related pages that are similar to pointed pages. To further improve the effectiveness of system we combine all information sources using learning to rank approach. Our results indicate learning to rank (by combining information sources) improves effectiveness. Keywords: Information retrieval, Machine learning, Broken links, Learning to rank, Contextual information portals for intermittent networks