I am a computational scientist and work in high performance computing (HPC). I am currently employed at the Numerical Algorithms Group in Oxford and work on all things HPC and Fortran. I’ll put my musings, thoughts and interesting views about computational science and HPC here. The topics below are some of the areas that I am interested in and will be mainly Fortran (90+) centric although I intend to put a few bits of Python too. All development work presented in this blog will be done in the Linux environment. I would also like to create a forum for dialogue and discussion, so please let me know your thoughts! I will be slowly populating the sections below.
I am one of the founders of the Fortran Modernisation Workshop which is aimed at Fortran developers. Its aim is to encourage the usage of modern Fortran standards and writing optimised and efficient code as well as using good software engineering techniques. If you would like me to host this workshop at your site, please do get in touch via my NAG email address. The workshop was developed with Fatima Chami (Durham University) and Filippo Spiga (Cambridge University).
Computational Science Workflow
Before I discuss the computational science workflow, I would like to define what is computational science. It is the overlap of applied science, (numerical) mathematics and computer science. This is shown below in the Venn diagram:
Below is a diagram showing the computational science workflow:
- A natural phenomenon is observed which is of interest to a scientist, e.g. weather modelling. The value of science is being able to predict this phenomenon of interest;
- This phenomenon is so complex that it is difficult to model exactly or within certain error bounds, so an idealised model is created. This makes certain simplifications or the real model, e.g. incompressibility is assumed for modelling fluid flow even though we know that everything has compressibility, including steel. This is the area of applied mathematics;
- The continuum model cannot be solved on a digital computer, so it is discretised. The numerical method should be written with parallelism in mind so it can exploit parallel HPC architectures. This is the area of numerical methods;
- The numerical method is then implemented in a computer programming language, e.g. Fortran or C. This is the area of computer science;
- The parallel code is then executed on a HPC machine with the code parameters to reflect a physical phenomenon, e.g. modelling the weather of the UK at a certain time. This includes setting the a) geometry b) initial conditions of the physics, chemistry and biology. In addition, parameters of the parallel run are also set up, e.g. the number of processes required;
- The run is executed and creates data which is usually stored on a parallel file system. This is the area of scientific data management;
- The data is then visualised to make some sort of scientific insight. The purpose of computers is not numbers, but insight!
- When the correct scientific insight is made, this is communicated to others in the form of an academic paper.
This purpose of computational science is to accelerate this process.
Software Engineering for Computational Science
Computational software has traditionally been written in procedural languages such as C and Fortran, and there is some discussion as to whether object oriented programming should be used for computational software. The book “Scientific Software Design: The Object-Oriented Way” by D. Rouson is an obvious advocate of OOP in computational software but the answer isn’t so obvious. The major drawback of using OOP in computational software is its performance degradation, but D. Rouson promotes the idea of using OOP as a “glue” with computationally intensive parts of code not using OOP features. Whilst this makes sense, I think a general I can suggest is:
If your code is single physics and single scale, then there is not an obvious benefit of using OOP. If your code is multi physics and multi scale, then it makes sense to use the OOP for your code, but OOP should be used in an intelligent way that does not degrade performance drastically. Anything in between will require a judgement call but usually procedural language usually do suffice.
An important point to note when writing computational code, whether single scale/single physics or multi scale/multi physics, is that each scale and physics is componentised or abstracted with clear APIs. This idea has been promoted by the Common Component Architecture forum which seems a bit dormant now. Nevertheless, its philosophy is very applicable in writing large computational codes.
Individual components can be implemented using Fortran modules which provides pseudo-OOP features and is a valuable language feature. The Fortran module feature is provided by the Fortran 90 standard and should be used even for small codes as it provides a convenient method for abstraction. In addition, as each module is placed in a separate file, it allows collaborative code development with minimal conflicts as each developer can work on his/her own module.
Management of Scientific and Technical Organisations and Teams
Scientific and/or technical organisations are unique and are dissimilar to typical organisations. In this section I will share views and thoughts on how scientific/technical organisations should be managed to ensure the greatest levels of collaboration and productivity.
The Open Organization
A book that I came across recently is “The Open Organization” by the RedHat CEO Jim Whitehurst: It advocates an open and transparent organisational model which is based on participation and community. It promotes a model which seeks consent and participation of every member of the organisation/team before decisions are made. This is in stark contrast to top-down hierarchical organisational structures which, in my view, inhibit creativity and productivity. Many hierarchical organisations would absolutely love to know what workers really think, but have incredible difficulty in obtaining their opinions and views. Why is this so? Lack or trust of middle and higher management. To build trust and engagement, an open organisational model is required and the RedHat experience will be valuable for other organisations and teams. Why am I referring to such a book? Science is a highly creative and collaborative endeavour and requires the participation of every member of the team where rank and position is irrelevant. To get a high level of participation, it requires an open, transparent and honest ways of working. This will lead to better decision making in organisations where hundreds of decisions are made every day. Imagine if each and every such decision is the best one? Imagine the benefits the organisation will reap. That’s the message of the book and I would definitely recommend it. From reading this book, the following are the characteristics that a leader in an open organisation would require:
- Humility – they must be humble enough to admit that they don’t know the answers and they are on the same journey as everyone else. They must be able to stifle their ego and rank, and work with everyone else to achieve the aims and objectives of the organisation. They have to earn the respect of workers and don’t automatically expect it based on their rank;
- Patience – during the consent phase, they must be patient. They must allocate time to engage with each and every team member before making decisions. It is initially a slow process, but is accelerated once consent is reached;
- High emotional quotient (EQ) – there is a high emphasis on IQ (intelligence) but nothing on emotional intelligence. To get people to work and do amazing things for you, you must engage and connect with people on an emotional and social level. Social activities should be included in teams, e.g. group coffee;
- A change in mindset – throughout history, the hierarchical model has prevailed. The open model will require a huge shift in thinking and an open mind, and be willing to embrace new thinking.
The structure of the open model is shown below which is contrasted with the conventional model:
I’ve worked in a number of HPC teams, most of which have been amazing to work with. A HPC service is a tool for computational science and should be viewed as a scientific service and is comparable, to say, a synchrotron or a particle accelerator. Too often, a HPC service is viewed as purely an IT service which is incorrect as such a view completely detaches it from the scientific workflow described above. As a consequence, such HPC teams are dominated by computer scientists who have no idea of the science being conducted on their HPC service. How does one develop a HPC service that is a scientific tool? The answer is the scientific diversity of the HPC team: every team member should be from a different scientific discipline who also happen to be knowledgeable of computer science. For example, a team member may be a chemist, the other a physicist, another a biologist, the other a mathematician. This diversity will help better serve the needs of the HPC users and can even encourage new science to be conducted. Below is a suggestion for the structure of a HPC service:
There are four components of the HPC service:
- Application Support:
- Involved in application installs, parallelisation and optimisation;
- Researching applications/libraries that can be useful for the scientific community;
- Code modernisation service – researching new and novel HPC architectures;
- Provide a software engineering service – encourage good practice for scientific codes;
- Collaborative model should be based on CoDesign for HPC.
- HPC Vendors: application support liaise with the vendors when exploring new and novel architectures for the codes being run on the HPC service;
- System Administrators:
- Ensuring operational continuity of the service;
- Deal with hardware and system issues;
- Manage the resource management system, e.g. Platform LSF, PBS Pro;
- Authentication, authorisation and accounting;
- Securing the service using secure protocols and firewalls;
- Ensuring energy efficiency of the service;
- KPI: number of academic papers per unit power;
- Manage the parallel IO and hierarchical storage system. This should also provide a data management service.
- User Education and Outreach: this team provides extensive HPC user education programmes that includes (but not limited to) code parallelisation, software engineering practices and data management techniques. HPC service providers should not assume that users already have the knowledge on how to use the service efficiently. A user education programme is essential to the success and efficiency of the service.
As with a lot of teams, sometimes a team can become so insular and be dismissive and defensive to external criticism. It is important that the HPC team, as with any other kind of team, for it not to develop such defensive behaviours over time. External criticism, particularly from HPC users, can be a highly valuable form of feedback. How does one protect one’s team from defensive and negative behaviours? Develop an open and collaborative approach to science, particularly when computational science is such a collaborative endeavour. View the HPC team as a collaborative partner with HPC users, not just as a HPC service provider. Also, develop collaborative links with other HPC service providers and ensure HPC team members attend key HPC events and workshops, e.g. Supercomputing. Attending HPC events of interest ensures that HPC team members continue to develop their collaborative mindsets and keep an open mind on better ways of doing things. This way, they will be more willing to take on criticism to advanced science and research which is the primary objective of a HPC service.
When offering a HPC service, it is essential that the users are educated on how to utilise the service properly and efficiently. Providing just a bare metal HPC service is woefully inadequate and will result in less than optimal usage of the resources, and less scientific productivity. In addition, HPC users, namely scientists and academics, should view HPC education as an integral part of their research and not something that is peripheral to it.
For this, the service will require an education programme which covers the following topics:
- how to write codes in a programming/scripting language;
- using parallel communication and parallel I/O libraries;
- software engineering, including parallel profiling and debugging;
- domain specific libraries, e.g. MKL, PETSc, etc. There are numerous libraries available, most of which are open source;
- using the job scheduler, e.g. LSF, PBS, etc;
- scientific workflow managers, e.g. Pegasus;
- data management, e.g. metadata and data provenance;
- parallel data visualisation, e.g. Paraview, VisIt;
- parallel data analysis, e.g. HDF5 FastQuery, SciDB.
The last two points (8 and 9) are an important part of the research workflow as making scientific insight requires a wholistic view of the data which is captured by visualisation and analysis. It could also be stated that producing the data is half the research – the other half is interpreting it correctly which can be challenging and this is made easier by using the right set of parallel tools.
Below are some topics on HPC education that I presented during my time at the University of East Anglia:
- Debugging, Profiling and Optimisation (of sequential codes);
- Scientific Parallel Programming;
- Parallelising Scientific Codes using OpenMP;
- Parallelising Scientific Codes using Graphical Processing Units (GPUs) in OpenACC;
- Parallelising Scientific Codes using MPI.