How to find available resources

Global parameters

  • HPC users are alowed to submit to the hpcgrid queue
  • Jobs duration is maxed to 96h (total job elapsed). INCD allows users to submit longer jobs but they need to ask special rights. Please send email to: helpdesk at incd pt
  • Max number of core allowed is 64

Global resources

  • Users can query the batch system in order to know the total available resources
qstat -g c -q hpcgrid
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
hpcgrid                           0.55    160      0     48    264      0     56 
  • CQLOAD: percentage of the total resources used
  • USED: number of cores in use
  • RES: total number of cores reserved
  • AVAIL: total number of cores free (TOTAL - USED - aoACDS - cdsuE - RES)
  • TOTAL total number of cores in queue (it can include compute nodes temporarily unavailable).

Special resources

  • Users can query the batch system to check further resources available like (memory, cores, etc)
  • For example querying for compute nodes with total_ memory higher than 300G
 qhost -l mem_total=300g 
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
hpc044                  lx-amd64       64    2   64   64 60.40  378.6G   20.3G  128.0G   34.6M
hpc045                  lx-amd64       64    2   64   64 64.34  378.6G   20.3G  447.1G   44.3M
hpc046                  lx-amd64      128    2   64  128  0.03  377.6G    6.4G  128.0G     0.0
hpc047                  lx-amd64      128    2   64  128  0.04  377.6G    5.6G  128.0G     0.0
hpc048                  lx-amd64      128    2  128  128 77.03  378.6G    6.5G  128.0G     0.0
  • Important list of parameters:

    • cpu : total number of CPU
    • m_core: total number of cores available for job submission (m_core and cpu may differ
    • mem_total : total memory availble for job
    • virtual_total: total memory + swap
    • slots: total available resources (1 slot = 1 CPU)
  • Full list of parameters per computing machine (long list)

qhost -F