Towards a NoOps Model for WLCG

Gardner Robert; Bryant Lincoln; McKee Shawn; Stephen Judith; Vukotic Ilija; Weaver Christopher; Wu Wenjing

doi:10.1051/epjconf/202024507024

EPJ Web of Conferences (Jan 2020)

Towards a NoOps Model for WLCG

Gardner Robert,
Bryant Lincoln,
McKee Shawn,
Stephen Judith,
Vukotic Ilija,
Weaver Christopher,
Wu Wenjing

Affiliations

Gardner Robert: Enrico Fermi Institute, University of Chicago
Bryant Lincoln: Enrico Fermi Institute, University of Chicago
McKee Shawn: Physics Department, University of Michigan
Stephen Judith: Enrico Fermi Institute, University of Chicago
Vukotic Ilija: Enrico Fermi Institute, University of Chicago
Weaver Christopher: Enrico Fermi Institute, University of Chicago
Wu Wenjing: Physics Department, University of Michigan

DOI: https://doi.org/10.1051/epjconf/202024507024
Journal volume & issue: Vol. 245
p. 07024

Abstract

Read online

One of the most costly factors in providing a global computing infrastructure such as the WLCG is the human effort in deployment, integration, and operation of the distributed services supporting collaborative computing, data sharing and delivery, and analysis of extreme scale datasets. Furthermore, the time required to roll out global software updates, introduce new service components, or prototype novel systems requiring coordinated deployments across multiple facilities is often increased by communication latencies, staff availability, and in many cases expertise required for operations of bespoke services. While the WLCG (and distributed systems implemented throughout HEP) is a global service platform, it lacks the capability and flexibility of a modern platform-as-a-service including continuous integration/continuous delivery (CI/CD) methods, development-operations capabilities (DevOps, where developers assume a more direct role in the actual production infrastructure), and automation. Most importantly, tooling which reduces required training, bespoke service expertise, and the operational effort throughout the infrastructure, most notably at the resource endpoints (sites), is entirely absent in the current model. In this paper, we explore ideas and questions around potential NoOps models in this context: what is realistic given organizational policies and constraints? How should operational responsibility be organized across teams and facilities? What are the technical gaps? What are the social and cybersecurity challenges? Conversely what advantages does a NoOps model deliver for innovation and for accelerating the pace of delivery of new services needed for the HL-LHC era? We will describe initial work along these lines in the context of providing a data delivery network supporting IRIS-HEP DOMA R&D.

Published in EPJ Web of Conferences

ISSN: 2100-014X (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Science: Physics
Website: http://www.epj-conferences.org/

About the journal