How to submit a job that uses TensorFlow
With this tutorial you will be able to submit a job that uses TensorFlow to the batch cluster.
The following steps allow the user to execute a Python script that uses TensorFlow and other Python libraries.
Copy the project folder to the cluster
[mcastro@fedora ~]$ scp -r -J mcastro@fw03 /home/mcastro/my_project/ mcastro@cirrus01
Access the cluster
[mcastro@fedora ~]$ ssh mcastro@cirrus01
Clone the reference repository
[mcastro@cirrus01]$ git clone https://gitlab.com/lip-computing/computing/tf_run_job.git
Submit the job with the Python script inside project folder. In this example, the datasets are in my_datasets subfolder.
[mcastro@cirrus01]$ cd my_project
[mcastro@cirrus01 my_project]$ sbatch ~/tf_run_job/run_job --input my_python_script.py --file my_datasets/dataset1.csv my_datasets/dataset2.csv
Once the job is completed the console log with the program messages will be written to a folder in the user's home directory.
[mcastro@cirrus01 my_project]$ cat slurm-124811.out
* ----------------------------------------------------------------
* Running PROLOG for run_job on Tue Nov 17 17:22:01 WET 2020
* PARTITION : gpu
* JOB_NAME : run_job
* JOB_ID : 124811
* USER : mcastro
* NODE_LIST : hpc050
* SLURM_NNODES : 1
* SLURM_NPROCS :
* SLURM_NTASKS :
* SLURM_JOB_CPUS_PER_NODE : 1
* WORK_DIR : /users/hpc/mcastro/my_project
* ----------------------------------------------------------------
Info: deleting container: 61fb9513-b33d-3b7f-85ed-25db26202b61
7f5d9200-712f-3134-a470-defdffb21e81
Warning: non-existing user will be created
##############################################################################
# #
# STARTING 7f5d9200-712f-3134-a470-defdffb21e81 #
# #
##############################################################################
executing: bash
Results available on workdir: /home/hpc/mcastro/Job.ZlV3RW
Any additional support for this procedure or to use different requirements for the provided TensorFlow docker image, just contact helpdesk@incd.pt.