Samuel Pichardo's webpage

High performance computing (HPC) for HIFU

Modeling of sound propagation and bio-effects of ultrasound are constantly used to explore new applications, optimize designs of devices and offer robust treatment planning tools. Such as in other areas of science, Linux clusters have become an indispensable tool that allows using precise models where constraints introduced by the anatomy and physiological conditions can be considered.

As part of my activities, I'm constantly developing, updating and using software implementations of diverse models for sound propagation and bio-effects.

An example of a problem

Let says we want to focus ultrasound inside the brain. This means we need to cross the skull bone before reaching a particular target.

In therapy with ultrasound, sooner or later we need to implement some solution to predict the sound propagation that we could expect inside biological tissue. For the most, the degree of complexity of the solution depends of the conditions where ultrasound is being propagated. The most critical condition is the nature of the medium in terms of its intrinsic acoustic impedance Z. Acoustic impedance is given by the product of speed of sound propagation and the density. Changes in Z can modify the propagation of sound in two ways:

  • Any change of the speed of sound changes the phase of the sound wave and can alter the focusing effect of HIFU

  • Any change of Z implies the generation of a scattered wave which implies a loss of energy for the propagated wave.

If many discontinuities in Z are found along the way of the wave, the multiple scattered wave can recombine and generate a "echo" strong enough to be detected by a sensor. This is a basic principle for ultrasound imaging (but that is another story).

For HIFU, a medium showing a heterogeneous distribution of Z imply that the ultrasound phase will be distorted and the convergence of sound in the desired location can be compromised. The mentioned situation of heterogeneous nature in Z applies for the skull bone. This organ shows important variations of its density and, to make things even more difficult, the speed of sound for this organ depends also of this density.

If we are dealing with a medium showing such degree of heterogeneous nature in Z , then it is probable we need to solve a "full" equation such as the Westervelt equation using a Finite-Difference Time-Difference (FDTD) scheme.

A FDTD solution has the advantage of being precise enough and allows modeling the transient wave. The big "problem" is that often we need solutions that require large matrices and the third or higher derivatives in time are required. We can always select a volume of interest and be smart enough to select the time scale for just what we need. Still, after reducing the size of the problem, we deal often with a problem that remains computational intensive. This become more critical if some kind of optimization or parametric study is involved, since this would mean hundreds or thousands of calculations of acoustic fields. Different approaches can be used to speed-up things such he use of solution based on spectral decomposition, steady-state or even sometimes on fully analytical models. However, at some point it is common that some level of simplification has to be done and some compromise has to take place, with the impelling risk of over-simplification. This is not a problem for a medium modeled exclusively of homogeneous soft tissue, but for other medium such as the bone this can not be easily applied. High performance computing allow us to avoid us such kind of over simplification when it is considered we must keep as much of detail as possible.

Development of HPC infrastructure at TBRRI for HIFU

I have accumulated in the last years a set of libraries that I use in function of a specific project such the one mentioned above. We count with tools that allows us to calculate models of sound propagation and several of its bio-effects such as

  • Westervelt

  • KZK

  • Rayleigh-Sommerfeld integral

  • Bioheat transfer equation

  • Cavitation (coupled with sound propagation)

At some point, most of this models reduce to some sort of spectral solution, finite-difference time-difference solution, numerical integration, Gaussian decomposition, etc, etc, and it will require a considerable amount of processing power, even in modern desktop computers. We are constantly working in creating a sustainable platform for modeling and solving problems related to HIFU. What we have currently is an seamless environment in which we can test from:

  • Low scale problems, using multi-core technology and OpenMP.

  • Mid scale problems, using GPGPU and CUDA

  • Large scale problems, using our own Linux cluster and MPI

  • Very large scale problems, using SHARCNET and MPI

Our constant goal is to switch from one environment to another in the most transparent way possible. For the most, we use Matlab or Octave to prepare the data and a specific function is executed either locally or submitted to a cluster facility.

One design to rule them all and every tool for what is for

Most of people would agree that there is nothing can't be done with a single tool or language (for the record, longtime ago I worked with compilers written in Cobol and yes, that's scary) but also most of people recognizes that some tools are much better than other ones for specific tasks. By example: Matlab and Octave (among many other things) are excellent for data preparation and results analysis; if you really want crunch numbers really fast at some point you need C or Fortran; MPI (using an API based on C/C++/Fortran) has been since several years the defacto standard to use distributed computing facilities; CUDA has been reshaping a lot the scientific computing scene by leveraging a considerable and cost-effective existing hardware.

The big challenge is putting all these together in a coherent way and that is what we do everyday here at TBRRI. For the most, we use Matlab or Octave to prepare the data and in function of the size of the problem a specific implementation is called either to be run locally either to be submitted to a Linux cluster. When the "crunching" part has to be done, a C/Fortran program or function is called. This program can imply the of use of one or more of several technologies such as OpenMP, CUDA or MPI. The final data is analyzed in Matlab or Octave. This is a simple design and I'm pretty sure that many people out there follow a similar approach. What we have been particularly successful is to facilitate the calling convention to ensure that the interface to each function remains exactly the same no matter in which platform the code will be executed. I can choose with a single parameter if I want to run locally using OpenMP, or using my GPGPU with CUDA or using my own 108-cores cluster. This scenario allows us to go extremely easy back and forth between small size and large scale problems.

MPITB in SHARCNET

Sometimes some people are quite successful to diminish the boundaries about what different tools can do. From a design point of view, there is nothing that should avoid to use the same MPI specification under Octave or Matlab rather than inside a C/C++/Fortran program. Well, Javier Fernández Baldomero from the University of Granada in Spain decided to spent some time to do exactly that. What he developed was a wrapper to most of the MPI functionality to be called directly inside Matlab (at the very beginning) and Octave (most recently).

His library is called MPITB (MPI Toolbox for Octave) and it is available under GPL (original link, which by some weird reason sometimes is impossible to access, cache copy in Google if first link does not work). MPITB allows to reduce a lot the coding and simplifies everything since a MPI_Send reduces from

MPI_Send(MyData, HowMany,MPI_DOUBLE,0, 1, MPI_COMM_WORLD);

to

MPI_Send(MyData, 0, 1, MPI_COMM_WORLD);

Any C/Fortran MPI programmer would tell that the data type and number of elements are missing, and any Octave programmer would tell that we can always ask for the size and data class of any variable. We can mix the nice flexibility and robustness of language such Octave to prepare, re-index, or whatever you need, your data and still split the process across a cluster. A MPI program in Octave looks just as any C/Fortran counterpart, with its

function MyOctaveMPIFunction(param1, param2)
info = MPI_Init;
[info rank] = MPI_Comm_rank (MPI_COMM_WORLD);
[info sizecluster] = MPI_Comm_size (MPI_COMM_WORLD);
...(and so on)

We only require to have an installation of Octave, MPITB and, of course, a MPI implementation.

Well, almost

True is that MPITB supports officially LAM/MPI and OpenMPI implementations. This is fine since there are many installations out there using one of these implementations of MPI. When I discovered MPITB, it just took a couple of hours to install the right version of LAM/MPI in my own cluster and everything worked perfectly. When Javier moved from Matlab to Octave (and then to OpenMPI), I just installed OpenMPI and everything kept working nicely. Then I told myself it would be great of having this running in SHARCNET, since Octave would help a lot to run, by example, very complex parametric studies without the excessive coding effort required by using C/C++. Anyways, at some point I always use a C-MEX file to run the most demanding calculations, but with MPITB I can use each language for what it is best for.

A nice job for a summer student

Months passed and I just did not have enough time to port MPITB to the specific environment in SHARCNET. Benjamin Zaporzan joined me as summer student in May 2009 and he took the job of studying and modifying the required files to port MPITB to SHARCNET. This is the link to modified source code

mpitbMPICH2_SHARCNET.tar.gz

The code has been tested in several of the sites in SHARCNET, but mostly in whale, requin and narwal systems. Please let me know of your experience on other systems. Our goal was to ensure that MPITB could run on the top of the default MPI implementation in SHARCNET (hpMPI).

Quick installation

  1. The obvious, untar the file. In my case I also created an octave sub-directory under my work directory and I untar-ed the file there. I also created a symbolic link mpitb pointing to mpitbMPICH2_SHARCNET. The directory /work/<my_user>/octave/mpitb will be referred as MPITB_ROOT.

  2. Go to MPITB_ROOT/src and do a "make"

  3. To allow your Octave programs be ready to use MPI you need run an initial script to setup paths and some variables. The startups are located under MPITB_ROOT/startups. In my case I copied MPITB_ROOT/startups/startup_MPITB.m to a given directory where I have my Octave projects and I created under that directory a .octaverc file with the code

    %Toolbox startup M-file, if it exists.
    warning("off","all");
    if exist('startup_MPITB.m','file')
    startup_MPITB
    end

    By using this every time I start Octave in that directory I'm sure that MPITB is ready to be used if required.

  4. I had to modify startup_MPITB.m to look under my work directory instead of my home directory. The main reason of installing MPITB under work is to avoid conflicts with versions of libraries since MPITB is compiled to use dynamic libraries.

  5. That's all. You are ready to use MPI under Octave. To submit your programs you can call Octave with sqsub and specify that octave should start your MPI program

    sqsub -q mpi -n 4 -r 180 octave --eval 'MyOctaveMPIFunction(12,"abc")'

Final thoughts

These modifications to Javier's code are given "AS IS". As you can imagine, I can't commit to offer a formal support if you deal with some problems. Nevertheless, in the limit of my schedule, I'm always open to try to give some help.

Javier included a quite comprehensible help info for every MPI function (using the standard "help" command inside Octave) and an amazing tutorial. Just take a look at the directory MPITB_ROOT for the tutorial. Some functionality specific to OpenMPI is not available and the changes done to Javier's original version are detailed in the file MPITB_ROOT/src/changesBZ.txt. One particular point I haven't tested yet is the auto-spawn functionality, Javier have some insights about this in his tutorial and website.

I hope that this work will be useful for someone else in the SHARCNET community. I can't give enough prise to Javier's work and certainly I'm not the only one who appreciates his effort. Of course MPITB would not be the ideal solution for some people and certainly it adds some extra software layer. Nevertheless, for many users out there MPITB can be a nice addition to their set of tools.

I just want to thank Benjamin Zaporzan, undergrad student in engineering at Lakehead University, for his talented work to help make this happen finally.