The aim of this proposal is to enable MOLUSC model, implemented in Simile, to be run on the ECDF (Edinburgh Compute and Data Facility) cluster. This will enable many instances of the model to be run in parallel, which in turn will support various analyses of the model, including sensitivity analysis, parameter estimation, and the investigation of conditional probabilistic futures.
The attached Appendix considers the technical issues and options involved in porting the MOLUSC model to the ECDF.
Proposed work
- Export the shared object;
- Create a program ("molusc5d.cpp", a modified version of grabavalue.cpp) to run the model;
- Enable data to be imported from the spf into the model;
- Provide a sample bash shell script to control the running of the model instances for a sample analysis;
- Provide documentation explaining how to:
- load subsequent versions of MOLUSC onto the cluster;
- adapt the bash shellscript for other analyses
Time 10 days
Cost 10 days @ £500 per day = £5000
Completion date: 30th April 2008?? or before??
Appendix: Technical issues and options
To get MOLUSC running on the cluster using the 5D interface there are 2 stages:
- Creating a standalone c++ executable that loads and runs molusc.so (created in Simile), then
- Copying this executable to EDDIE and setting up the runs with different initial parameters.
Stage 1: Create an executable that runs the model
- Export the shared object (trivial), and possibly export the Prolog
- Create a modified version of grabavalue.cpp called molusc5d.cpp to run the model (easy/hard, see below)
- Import the data from the spf into the model (difficult)
By this point we have a runnable executable that can be copied to the cluster and used.
In the 1st step we may need the Prolog to generate an interface to the model (see next paragraph)
The difficulty of the 2nd step depends on how general the solution is. If there is a set of output variables that will always be displayed, then they can be hard coded into the modified version of grabavalue. Otherwise, we need to parse the model Prolog to generate some sort of interface to the model. Whilst there is code to do this as part of SimileWeb?, it is not fully tested and may take some time. For this project, we will take the first option: i.e.develop a ersion of grabavalue that is specific to the MOLUSC model.
The 3rd step involves parsing the spf, extracting any data that may be in it and using it to find out what data to extract from the csv files, then parsing these files. Each part of the process is in place, but it needs testing.
Stage 2: get this executable running on the cluster with different initial parameters
- Copy molusc5d.cpp across to EDDIE
- Decide how to run the model with different parameters using the Sun gridengine software (different spfs/data files for each job? normal distribution for some params? random params? searching over an array of values?)
- Decide how to generate/store/access data produced.
- Run!
A lot of the decisions about how best to set up the different runs will be easier to make once there is something to experiment with, so it probably makes sense to do stage 1 before starting on stage 2.
Terminology
- 5d interface: c++ API for running and getting the values from a generated Simile model shared object (made in Simile)
- spf: Simile parameter file, files used to load data into models
- csv: Comma separated values
- grabavalue.cpp: program that uses 5d interface to run a simple model
- EDDIE: ECDF cluster computer
- gridengine: open source Sun software running on EDDIE