Dianxin kexue (Mar 2013)
Constructing View of Uncertain Data Provenance for Scientific Workflow in Cloud Computing
Abstract
The view of data provenance in scientific workf1ow provides an approach of data abstraction and encapsu1ation by partitioning tasks in the data provenance graph(DPG)into a set of composite modu1es due to the data f1ow re1ations among them, so as to efficient1y decrease the work1oad consumed by researchers making ana1ysis on the data provenance and the time needed in doing data querying.Neverthe1ess, deve1oping and app1ying the scientific workf1ow systems in c1oud computing environments suffers the prob1em of uncertainty brought by the inaccuracy of data co11ection and unre1iabi1ity of data servers distributed in the internet.Concentrating on this scenario, the definitions of uncertain DPG and its sound view were presented first1y, and then a method for detecting the unsound view of DPG was proposed.A1so, a method for constructing sound and high-support view was presented, which is based on the data f1ow re1ations among the tasks and their first-order preceding tasks in the graph, and the 1oca1 expected support of the composite modu1es.A po1ynomia1-time a1gorithm was designed, and its maxima1 time comp1exity was a1so ana1yzed.Additiona11y, an examp1e and conduct comprehensive experiments were given to show the feasibi1ity and effectiveness of the method.