Ain Shams Engineering Journal (Mar 2022)

An efficient GPU acceleration technique for CBCT based on memory aware optimization scheme

  • Hassan Youness,
  • Marwa Abbas,
  • Ammar Hassan

Journal volume & issue
Vol. 13, no. 2
p. 101567

Abstract

Read online

Among many of the image reconstruction techniques; the Feldkamp-Davis-Kress (FDK) is considered the most practical algorithm that used in clinical cone-beam Computed Tomography (CT) technology. In this paper, we present a full memory-aware management optimization scheme to reconstruct high-resolution volumes for a large number of X-Ray images as projections based on an efficient GPU acceleration technique. The proposed optimization technique is a scheme for FDK acceleration, that sheds new light on 4 new strategies: (1) Overcome limited device memory by evaluating different four configurations to migrate input and output data in the CPU/GPU system, (2) Reduce data transfer time during the FDK execution by using a Unified Memory (UM) model with data prefetching feature during reconstruction for data migration between host and device, (3) Accelerate the bottleneck algorithm back-projection as a result of computation mitigation, and (4) Optimize projection data layout that led to better device memory access and overcomes memory latency problem before processing and computing on GPU. The experimental results based on the proposed method show that it took 21.17 sec and 101.76 sec to reconstruct high-resolution 10243,20483- voxel volume from 1800 projection of 10242and 20482-pixel respectively. The performance resulted from our optimization technique considered on-the-fly reconstruction (real-time), which means it can process 85 projections per second for high-resolution volumes from a large number of projections to achieve accuracy and resolution for reconstructed volume. In the term of speedup, the proposed method can achieve 1.72 speedup to reconstruct the low-resolution volume5123, 1.82 speedup for 10243 volume and 1.70speedupfor20483 for highest resolution volumes over baseline implementation and optimized over previous reconstruction methods based on FDK.

Keywords