5th Annual Conference of IT4Innovations

Europe/Prague
atrium (IT4Innovations)

atrium

IT4Innovations

Studentská 1B 708 33 Ostrava - Poruba
Vít Vondrák (IT4Innovations)
Conference Programme
Participants
  • Alena Vašatová
  • Alexandra Fröml Gréková
  • Alexandros Markopoulos
  • Aleš Vítek
  • Alice Lipinová
  • Branislav Jansík
  • Darek Sovadina
  • David Horák
  • David Hrbáč
  • Dominik Legut
  • Ekaterina Grakova
  • Frantisek Grezl
  • Hana Valouchova
  • Jakub Kruzik
  • Jan Martinovič
  • Jan Przezwiecki
  • Jaromír Pištora
  • Jiri Jaros
  • Jiří Blahuta
  • Jiří Hanzelka
  • Jiří Starý
  • Jiří Ševčík
  • Jonáš Tokarský
  • Karina Pešatová
  • Katerina Slaninova
  • Kateřina Janurová
  • Kateřina Niesnerová
  • Kateřina Sciglová
  • Kristina Motyčková
  • Lenka Dulaiová
  • Lubomir Riha
  • Lucie Valečková
  • Lukas Maly
  • Lukas Topiarz
  • Lukas Vojacek
  • Lukáš Krupčík
  • Lukáš Rapant
  • Marek Klemenc
  • Marek Lampart
  • Martin Drahansky
  • Martin Golasowski
  • Martin Palkovič
  • Martin Čermák
  • Martin Šviček
  • Michaela Terkovičová
  • Michal Podhoranyi
  • Milan Lazecky
  • Nora Palovská
  • Ondrej Jakl
  • Pavel Marsalek
  • Pavla Jirůtková
  • Peter Arbenz
  • Petr Cintula
  • Petr Sosík
  • Petra Murinová
  • Radek Tomis
  • Radim Blaheta
  • Radim Briš
  • Radim Sojka
  • Radim Vavřík
  • Rajko Ćosić
  • Robert Skopal
  • Stanislav Bohm
  • Stanislav Sysala
  • Tomáš Brzobohatý
  • Tomáš Karásek
  • Tomáš Kozubek
  • Tomáš Ligurský
  • Tomáš Luber
  • Tomáš Martinovič
  • Ulrich Bodenhofer
  • Vaclav Ryska
  • Vaclav Satek
  • Veronika Korbelová
  • Vilém Novák
  • Vit Ptosek
  • Václav Hapla
  • Václav Snášel
  • Václav Svatoň
  • Vít Vondrak
  • Zbyněk Schmejkal
    • 09:00 09:15
      Registration of attendees atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 09:15 09:30
      Martin Palkovič: IT4Innovations in 2016 atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 09:30 09:45
      Branislav Jansík: IT4I's Infrastructure atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 09:45 10:00
      Jan Martinovič: Research Programme 1 Summary atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 10:00 10:15
      Radim Blaheta: Research Programme 2 Summary atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 10:15 10:30
      Tomáš Kozubek: Research Programme 3 Summary atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 10:30 11:00
      Coffee Break 30m atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 11:00 11:15
      Jaromír Pištora: Research Programme 4 Summary atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 11:15 11:30
      Václav Snášel: Research Programme 5 Summary atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 11:30 11:45
      Vílém Novák: Research Programme 6 Summary atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 11:45 12:00
      František Grézl: Research Programme 7 Summary atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 12:00 12:15
      Martin Drahanský: Research Programme 8 Summary atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 12:15 13:30
      Break for lunch (lunch will not be provided by the organizers) 1h 15m
    • 13:30 14:00
      Keynote 1 atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
      • 13:30
        Model-based treatment planning for cancer therapy using high-intensity focused ultrasound 30m
        **High-intensity focused ultrasound** High-intensity focused ultrasound (HIFU) is a non-invasive therapy method which does not require puncturing the skin and typically has minimal or no side-effects. In HIFU therapy, focused ultrasound beams are used to create a rapid temperature rise in the focal point, which results in irreversible tissue damage due to coagulative thermal necrosis. HIFU therapy can be used clinically to treat cancerous tissue in organs such as kidney, liver or prostate, but the oncological outcomes have been variable. The major challenge is to ensure the ultrasound focus is accurately placed at the desired target, sufficient amount of energy is delivered to this point, and no harm is caused to healthy tissue. **Clinical problem and treatment planning** Due to the deep location of most cancer niduses, several tissue layers, including skin, fat, muscle, soft tissue or bones, lay in front of the treatment targets. These layers might significantly reduce the intensity of the ultrasound field due to attenuation. The effect of attenuation might be significant especially in the nonlinear case (high acoustic pressure levels) where higher harmonic frequencies generated during HIFU therapy are more strongly attenuated. In addition to attenuation, the defocusing of ultrasound due to refraction and the reflections at the tissue interfaces might result in significant loss of HIFU energy. The key factor to ensure high success rates of HIFU treatment is patient specific treatment planing based on the CT or MR scans. **Numerical model** The personalised HIFU treatment plans can be obtained by a coupled acoustic and thermal numerical models implemented in the k-Wave toolbox. Since the first beta release in 2010, k-Wave has rapidly become the de facto standard software in the field, with almost 8000 registered users in 60 countries (from both academia and industry). The toolbox now underpins a wide range of international research in ultrasound and photoacoustics, ranging from the reconstruction of clinical photoacoustic images to fundamental studies into the design of ultrasound transducers. The acoustic model of ultrasound wave propagation in soft tissue is based on a generalised version of the Westervelt equation which accounts for nonlinearity, material heterogeneities and power law absorption. The governing equations are solved using a k-space pseudospectral approach where the Fourier collocation spectral method is used to calculate spatial gradients, and a k-space corrected finite difference scheme is used to integrate forwards in time. The thermal model is based on Penne bioheat equation solved by the same approach. The thermal model is fed by the acoustic intensity computed by the acoustic model during a single sonication. Consequently the formation of lesion is predicted. **High-performance implementations** The computation challenge in creating personalised treatment plans arises from the ultrasound simulation scale. The primary factors cover the wavelength of the highest harmonics needed (sub millimetres), the size of the simulation domain encompasses the target region, coupling medium and the ultrasound transducers (approx 20cm) and number of grid points per wavelength (5 to 10) to ensure numerical stability. The typical size of simulation domain is between 1152 x 1152x 1152 and 3072 x 3072 x 3072 grid points asking for 1.4 - 6.6 TB of RAM and between 50k and 250k of core hours. **Discussion and outlook** Personalised HIFU treatment planning provides a very powerful tool that can be leveraged for a range of tasks, including patient selection (determining whether a patient is a good candidate for a particular procedure based on their individual anatomy), treatment verification (determining the cause of adverse events or treatment failures), and model-based treatment planning (determining the best transducer position and sonication parameters to deliver the ultrasound energy to the planning target volume). The main challenge is still satisfying the computational requirements of the simulation toolbox. We anticipate that to deliver the treatment plan within 24 hours, a supercomputer with more than 10,000 computer cores is needed.
        Speaker: Dr Jiri Jaros (Brno University of Technology)
    • 14:00 15:00
      Plenary talks 1 atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
      • 14:00
        Solving quadratic programming problems using PERMON 20m
        Our novel software toolbox PERMON makes use of results in quadratic programming and domain decomposition methods. It is built on top of the PETSc framework for numerical computations. We will present its fundamental packages and show their applications. We will focus on contact problems of mechanics decomposed by means of a FETI-type non-overlapping domain decomposition method implemented in the PermonFLLOP package. These problems lead to inequality constrained quadratic programming problems that can be solved by the PermonQP package.
        Speaker: Dr David Horák (IT4Innovations & Dep. of Applied Math., VSB-TUO)
      • 14:20
        Computing betweenness centrality and combination with flood prediction 20m
        Betweenness centrality is widely used graph metric used in order to find most significant vertices (or edges) in the graph. Betweenness quantifies the number of times a node is a bridge along the shortest path between two other vertices. Therefore by computing betweenness over the road network represented by weighted, oriented graph we are able to identify places that will most likely become traffic bottlenecks. Combining this approach with flood prediction we are able to simulate and monitor how will these bottlenecks move in case of flooding. In this talk our parallel implementation of betweenness algorithm is going to be presented along with results from combining this approach with flood prediction.
        Speaker: Mr Robert Skopal (IT4Innovations National Supercomuting Centre)
      • 14:40
        Task scheduling on HPC infrastructure 20m
        The ExCAPE is Horizon 2020 project focused on computational chemogenomics on a large scale. This includes executions of machine learning algorithms in HPC environment. One of major challenges is an efficient execution of many relatively small interdependent tasks. Therefore, we are developing programming model and scheduler that allows to handle this task. One of key aspects is focusing on low overhead, we want to achieve this by open scheduler architecture that allows to integrate existing code directly into the scheduling infrastructure.
        Speaker: Dr Stanislav Böhm (IT4Innovations)
    • 15:00 15:30
      Coffee Break 30m atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
    • 15:30 16:00
      Keynote 2 atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
      • 15:30
        ESPRESO - ExaScale PaRallel FETI SOlver 30m
        ESPRESO is a sparse solver library developed at Czech national supercomputing centre IT4Innovations. The main focus of the development team is to create a highly efficient parallel solver which contains several FETI based algorithms including Hybrid Total FETI method suitable for parallel machines with tens or hundreds of thousands of cores. The solver is based on highly efficient communication layer on top of pure MPI. The main focus of the CPU version is the development of an MPI based communication layer designed particularly for FETI that enables the scalability of the solver. It is not just an MPI code but, as many modern parallel applications, it uses the hybrid parallelization. The three levels of parallelization are message passing, threading using Cilk++ and vectorization using Intel MKL and Cilk++. Within the IPCC (Intel Parallel Computing Centre) project has been developed a new approach for acceleration of FETI methods in general, not just the Hybrid FETI, by Intel Xeon Phi accelerators. The Oak Ridge National Laboratory (ORNL) director discretion project has been used to develop GPU acceleration of the solver. This project also allowed for tuning and scalability tests of the communication layer on the Titan size machine. The Titan machine has 18,688 compute nodes, of which 95% have been successfully used for solving a problem of up to 120 billion unknowns that arise from the discretization of the Laplace equation in 3D. ESPRESO contains not only the linear solvers but also QP solvers designed for contact problems (Quadratic Programming problems with inequality and equality constraints), and several Finite Element (FEM) and Boundary Element (BEM) preprocessing tools designed particularly for FETI solvers. The BEM support was produced in collaboration with developers of the BEM4I library ([bem4i.it4i.cz][1]). The preprocessor supports FEM and BEM discretization for Advection-diffusion equation, Stokes flow, and Structural mechanics. Real engineering problems can be imported from Ansys Workbench or OpenFOAM. In addition, a C API allows ESPRESO to be used as a solver library for the third-party application. This has been used for integration with CSC ELMER. For large-scale tests, the preprocessor also contains a multi-block benchmark generator. The post-processing and visualization are based on the VTK library and Paraview, including ParaView Catalyst for inSitu visualization. This summer, the alpha version of ESPRESO was released to the public and can be downloaded from the project website [espreso.it4i.cz][2]. [1]: http://bem4i.it4i.cz [2]: http://espreso.it4i.cz
        Speaker: Dr Lubomír Říha (IT4Innovations)
    • 16:00 17:00
      Plenary talks 2 atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
      • 16:00
        Magneto-optical spectroscopy at the soft x-ray range 20m
        Graphene bears huge potential in spintronic applications in which the graphene’s contact with ferromagnetic metals plays a crucial role. We present magneto-optical investigations of graphene on Co by means of resonant magnetic reflection spectroscopy, the transversal magneto-optical Kerr-effect (T-MOKE) and the x-ray magnetic circular dichroism (XMCD). Exploiting linearly polarized synchrotron radiation in the soft x-ray region across the carbon 1s edge the π- and σ- bondings of graphene could be excited individually to test their importance for the magnetic coupling between graphene and the substrate. A broad magnetic signal was obtained over a wide energy range from 255 eV to 340 eV with enhanced T-MOKE peak values of 1.5% at the π – resonance energy near 285 eV. From data of corresponding T-MOKE spectra across the 2p edge of the Co substrate we deduce an induced magnetic moment of on carbon of 0.05 - 0.065 μ B. This is slightly larger than the magnetic moment of amorphous carbon induced by Fe in a Fe/C multilayer [1]. By Using T-MOKE spectroscopy a hysteresis curve was monitored recorded at the C 1s showing demonstrating ferro-magnetic behavior of graphene on Co. An identical hysteresis curve was obtained at the Co 2p edges showing that magnetism in on the carbon atoms is induced by the ferromagnetic Co substrate. From energy and polarization dependence we conclude that the magnetism in graphene is carried by the π – orbitals which is confirmed by XMCD spectra. These show a strong resonant peak of 4% at the π – energy and negligible contributions at the σ – energy. Further onmore, from the XMCD signal the difference in the spin- polarized density of states (DOS) of graphene could be deduced from the XMCD signal. The experimental recordings is challenge by the first-principles calculations.
        Speaker: Dr Dominik Legut (IT4I)
      • 16:20
        Linguistic Characterization of Natural Data by Applying Intermediate Quantifiers on Fuzzy Association Rules 20m
        Extended Abstract The main goal of this talk is to put together theoretical results on intermediate quantifiers which were proposed in several papers (see e.g. [1, 2, 3, 4]) with the Fuzzy GUHA method [5], and to introduce a linguistic characterization of natural data using generalized intermediate quantifiers. The theory of intermediate quantifiers was introduced by Nov´ak in [3] and now is a constituent of the theory of Fuzzy Natural Logic (FNL), which is a mathematical counterpart of the concept of Natural Logic introduced by Lakoff [6]. This theory is based on Łukasiewicz fuzzy type theory (Ł- FTT) [4], which is one of the existing higher-order fuzzy logics. Fuzzy GUHA is a special method for automated search of association rules from numerical data. Generally, obtained associations are in the form A s B, which means that the occurrence of A is associated with the occurrence of B, where A and B are formulae created from objects’ attributes. As proposed by H´ajek et al. [5], the original GUHA method allowed only boolean attributes to be involved. Some parts of their approach was independently re-invented by Agrawal [7] many years later and is also known as the mining of association rules or market basket analysis. A detailed book on the GUHA method is [8], where one can find distinct statistically approved associations between attributes of given objects. Fuzzy GUHA is an extension of a classical GUHA method for fuzzy data. In this paper, we work with associations in the form of IF-THEN rules composed of evaluative linguistic expressions, which allow the quantities to be characterized with vague linguistic terms such as “very small”, “big”, “medium” etc. To measure the interestingness of a rule, many numerical characteristics or indices have been proposed (see [9, 10] for a nice overview). As a supplement to them, we try to utilize the theory of intermediate quantifiers to characterize the intensity of association, which allows us to use linguistic characterizations such as “almost all”, “most”, “some”, or “few”. As a result, we may automatically obtain the following sentences from numerical bio-statistical data: Almost all people, who suffer atopic tetter, live in an area affected by heavy industry and smoke, suffer from asthma. Most people who smoke and suffer from respiratory diseases also suffer from ischemic disease of leg. In the practice, it is often the case that some data are not available e.g. due the error in measures, missing results, or if the respondent is not willing to answer or has no opinion on the given subject. We can completely remove the cases with missing values to obtain clean data, but it can result in an excessive loss of information. Alternatively, we can handle missing values by using fuzzy partial logics, which were proposed by Bˇehounek and Nov´ak in [11]. They provide formal apparatus for several types of missing information such as “unknown” or “undefined” (i.e. not meaningful) value. Basically, the semantics of these logics formed by algebras of truth values is extended by a special value “”.
        Speaker: Dr Petra Murinová (Institute for Reseaech and Application of Fuzzy Modelling)
      • 16:40
        Using an Adaptive and time predictable Runtime System for Power-Aware HPC-oriented applications 20m
        High-performance modeling and simulation are performing a driving role in decision making and forecast. For time critical emergency support applications such as severe weather forecasting, or flood modeling, late results can be unusable. Forecast models are primarily executed and the data analyzed while their predictions can still be applied. These on-demand large-scale computations cannot wait infinitely in a job queue for supercomputer resources to become available. Neither can the community retain multimillion-dollar infrastructures idle until required by urgent computation. Even if it rarely happens, the best effort could not be sufficient when critical applications are competing in the job queue with other users. A specific support is needed to provide computing resources quickly, automatically, and reliably. An increasing number of High-Performance Applications demand some form of time predictability, in particular in scenarios where correctness depends on both performance and timing requirements and the failure to meet either of them is critical. Consequently, a more predictable HPC system is required, particularly for an emerging class of adaptive real-time HPC applications. Here we present our runtime approach which produces the results in the predictable time with the minimized allocation of hardware resources. The paper describes the advantages regarding execution time reliability and the trade-offs regarding power/energy consumption and temperature of the system compared to the current GNU/Linux governors.
        Speaker: Dr Antoni Portero (IT4Innovations, National Supercomputing Center)
    • 17:00 19:00
      Poster session & Snack atrium

      atrium

      IT4Innovations

      Studentská 1B 708 33 Ostrava - Poruba
      • 17:00
        Automatization of inventory control process 1h
        **Abstract** We present automated application of inventory optimization based on sales forecast. Inventory stock optimization is very required issue by companies recent years, however inventory models are based on the sales expectation. Therefore, the problem of optimizing inventory stock is divided into two parts, sales forecast and setting optimal inventory. We describe an automated solution to model selection for sales forecast and the inventory setting based on those predictions. In the end, we present our validation of the system through historical simulation. We compare simulations results against real inventory levels. Due to the large number of different length time series, this simulation was run in parallel on cluster and was parallelized in R. The algorithms were developed and tested on inventory time series from real data sets of the K2 atmitec company. **Keywords:** inventory optimization, sales forecasting, model selection, parallelization
        Speaker: Tomáš Martinovič (IT4Innovations, VŠB - Technical University of Ostrava)
      • 17:00
        Constitutive solution scheme for Mohr-Coulomb plasticity in 3D 1h
        This poster summarizes an implicit constitutive solution scheme of the elastoplastic problem containing the Mohr-Coulomb yield criterion, a nonassociative flow rule, and a nonlinear isotropic hardening. The scheme builds upon the subdifferential formulation of the flow rule leading to several improvements. For example, one can a priori detect a position of the unknown stress tensor within the Mohr-Coulomb pyramid or simplify a construction of the consistent tangent operator. The presented scheme was included to a vectorized Matlab code and used for solving a slope stability problem in 3D.
        Speaker: Dr Stanislav Sysala (Institute of Geonicsof the CAS)
      • 17:00
        How to Detect and Analyze Atherosclerotic Plaques in B-MODE Ultrasound Images: A Pilot Study of Reproducibility of Computer Analysis 1h
        This pilot study is focused on recognition and digital analysis of atherosclerotic plaques in ultrasound B-images. The plaques are dis- played as dierently echogenic regions depending on plaque composition. The rst goal is to nd signicant features, e.g. homogenicity, shape and size to plaque analysis from digitized ultrasound images. We developed software to nding hyperechogenicity of substantia nigra to Parkinson's Disease evaluation in B-images. Currently we try to discover how to use this software also for atherosclerotic plaques analysis to estimation risk level of ischemic stroke. The software has a built-up function of intelli- gent brightness detection. We use a set of 23 images, each of them was analyzed ve times. The primary goal is to verify the reproducibility of this software to atherosclerotic plaques analysis in medical practice with evaluation by an experienced sonographer. All used images in this study have the same initial settings of gamma, contrast and brightness.
        Speaker: Mr Jiří Blahuta (Silesian University in Opava)
      • 17:00
        Modest but effective system for satellite-based identification of land processes 1h
        This topic covers current implementation of satellite radar interferometry techniques for routine identification of land processes such as slow landslides, subsidence or slow movement of various objects of infrastructure. Already previous work utilizing IT4Innovations capacities through a virtual machine with commercial software SARPROZ was performed and various interesting results demonstrated practical applicability of its Persistent Scatterers technique in monitoring bridges, dams, houses standing on moving hill slopes etc. With the emerge of European Copernicus programme, the need of effectivity in satellite Big Data processing increased. Currently there are two Sentinel-1 satellites observing the Earth with 6 days revisit time, sending daily 100 TB of data to be archived. Not long time ago, CESNET has accepted the role of assessing Copernicus Ground Segment programme for Czechia. A database mirroring Sentinel data over Czechia is to be established. Mimicking TU Leeds' LiCS project for early identification of land issues (by observing tectonic and volcanic movements in selected areas, worldwide), a potential service based on interferometric processing of Sentinel-1 data from this database has been prepared. As a basis of the system, several open-source projects were deployed, including MySQL-based burst metadatabase (TU Leeds), ISCE TOPS Processor (NASA/JPL), doris coregistration algorithm (TU Delft) and StaMPS Small Baselines processor (Stanford University). Though more functionality can be rapidly developed, incorporating some of own post-processing algorithms, even current early version of the system can yield interesting results by a fully automatic processing chain (the only user input is a box of coordinates of area to be processed). Several examples will be given in this contribution, explaining both potential of the technique (e.g. to generate a nationwide landslide map or precisely observe changes in subsidence velocity in mining areas), as well as the system architecture.
        Speaker: Dr Milan Lazecky (IT4Innovations)
      • 17:00
        MOLDIME - Platform for Massive Parallel Sequencing Data Analysis 1h
        Massive parallel sequencing (MPS) data analysis tasks are often computationally demanding and their execution time would take too long using standard computing machine. Thus there is a need for parallelization of this tasks and ability to execute them on a sufficiently powerful computing machines. The presentation will contain a description of a newly created platform for analysis of MPS data in terms of architecture, which is ready to deploy on HPC infrastructure. The presentation will contain the description from the architecture point of view and as well as from the viewpoint of the currently available functionalities in respect to straightforward user-friendly utilization for MPS data, obtained primarily from humane genome in clinical laboratories.
        Speaker: Mr Václav Svatoň (IT4Innovations)
      • 17:00
        Spectral Domain Decomposition Using Local Fourier Basis: Application to Ultrasound Simulation on a Cluster of GPUs 1h
        **Introduction** The simulation of ultrasound wave propagation through biological tissue has a wide range of practical applications including planning therapeutic ultrasound treatments of various brain disorders such as brain tumours, essential tremor, and Parkinson's disease. The major challenge is to ensure the ultrasound focus is accurately placed at the desired target within the brain because the skull can significantly distort it. Performing accurate ultrasound simulations, however, requires the simulation code to be able to exploit thousands of processor cores and work with TBs of data while delivering the output within 24 hours. We have recently developed an efficient full-wave ultrasound model (the parallel k-Wave toolbox) enabling to solve realistic problems within a week using the pseudospectral model and a global slab domain decomposition (GDD). Unfortunately, GDD limits scaling by the number of 2D slabs, which is usually below 2048. Moreover, since the method is reliant on the fast 3D Fourier transform, all-to-all communications concealed in matrix transpositions significantly deteriorate the performance. The imbalance between communication and computation is even more striking when graphics processing units (GPUs) are used, as the raw performance of GPUs is an order of magnitude above current central processing units (CPUs). In addition, transfers over the peripheral component interconnect express (PCI-E) bus have to be considered as another source of communication overhead. The most efficient implementation to our knowledge proposed by Gholami reveals the fundamental communication problem of distributed GPU FFTs. For an $1024^3$ FFT calculated using 128 GPUs, the communication overhead accounts for 99% of the total execution time. Although the execution time reduces by 8.6$\times$ for a 32$\times$ increase in the number of GPUs (giving a parallel efficiency of 27%), this overhead may not be not acceptable in many applications. **Proposed method** This paper presents a novel multi-GPU implementation of the Fourier spectral method using domain decomposition based on local Fourier basis. The fundamental idea behind this work is the replacement of the global all-to-all communications introduced by the FFT (used to calculate spatial derivatives) by direct neighbour exchanges. By doing so, the communication burden can be significantly reduced, at the expense of a slight reduction in numerical accuracy. The accuracy is shown to be dependent on the overlap (halo) size, independent of the local domain size, and to increase linearly with the number of domain cuts an acoustic wave must traverse. For an overlap (halo) size of 16 grid points, the error is on the order of $10^{-3}$, which is comparable to the error introduced by the perfectly matched layer (ensuring signal attenuation at the domain boundaries and enforcing periodicity). Consequently, the level of parallelism achievable in practice is not limited by the reduction in accuracy due to the use of local Fourier basis. Strong scaling results demonstrate that the code scales with reasonable parallel efficiency, reaching 50% for large simulation domain sizes. However, the small amount of on-board memory ultimately limits the global domain size for a given number of GPUs. 1D decomposition is shown to be the most efficient unless the local subdomain becomes too thin. Beyond, it is useful to exploit half 2D or 3D decomposition with only a single neighbour in a given direction to limit the number of MPI transfers. An overlap size of 16 grid points is shown to be a good trade off between speed and accuracy, with larger overlaps becoming impractical due to the overhead imposed by large MPI transfers. Compared to the CPU implementation using global domain decomposition, the GPU version is always faster for an equivalent number of nodes. For production simulations executed as part of ultrasound treatment planning, the GPU implementation reduces the simulation time by a factor of 7.5 and the simulation cost by a factor of 3.8. This is a promising result, given the GPUs utilised are now almost decommissioned.
        Speaker: Dr Jiri Jaros (Brno University of Technology)