overview of the resources offered

`sinfo` : overview of the resources offered by the cluster

By default, sinfo lists the available partitions name(s), availability, time limit, number of nodes, their state and the nodelist. A partition is a set of compute nodes.

The command sinfo by default

$ sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*         up   infinite      5  down* wn[075,096,105,110,146]
all*         up   infinite      6  drain wn[077,091,101,117,143,148]
all*         up   infinite      2    mix wn[079,097]
all*         up   infinite     33  alloc wn[081-089,092-095,099-100,104,108,112,115,118,124,135-139,144-145,151,155-158]
all*         up   infinite     40   idle wn[071-073,076,080,090,098,102-103,106-107,109,111,113-114,116,120-123,125-128,130-134,140-142,147,149-150,152-154,159-160]
all*         up   infinite      4   down wn[074,078,119,129]
debug        up   infinite      8   idle wn[060-063,065-067,069]
debug        up   infinite      3   down wn[064,068,070]

The command sinfo --Node provides the list of nodes and their actual state individually.

$ sinfo -Node

NODELIST   NODES PARTITION STATE
wn071          1      all* alloc
wn072          1      all* drain
wn073          1      all* alloc
wn074          1      all* down
wn075          1      all* down*
wn076          1      all* alloc

The command sinfo --summarize provides the node state in the form "available/idle/other/total"

$ sinfo --summarize

PARTITION AVAIL  TIMELIMIT   NODES(A/I/O/T)  NODELIST
all*         up   infinite       36/7/47/90  wn[071-160]
debug        up   infinite         2/6/3/11  wn[060-070]

The command sinfo --long provides additional information than sinfo. Informations about the OverSubscribe (OVERSUBS), All the queues are defined as OVERSUBS=NO, none of the partitions(queues) allow requestes over the limit of the consumable resources.

$ sinfo --long

PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT OVERSUBS     GROUPS  NODES       STATE NODELIST
all*         up   infinite 1-infinite   no       NO        all      5       down* wn[075,096,105,110,146]
all*         up   infinite 1-infinite   no       NO        all     38     drained wn[072-073,076-077,080,090-091,098,101-103,106-107,109,113-114,116-117,120-123,125-128,130,133-134,136,140-141,143,147-148,150,152,159]
all*         up   infinite 1-infinite   no       NO        all      4       mixed wn[079,094,097,137]
all*         up   infinite 1-infinite   no       NO        all     32   allocated wn[071,081-089,092-093,095,099-100,104,108,112,115,118,124,131-132,135,138-139,144,151,155-158]
all*         up   infinite 1-infinite   no       NO        all      7        idle wn[111,142,145,149,153-154,160]

With sinfo you can also filter the nodes/partitions for specific situation, in this example we requested to list the nodes either idle or down

$sinfo --states=idle,down

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*         up   infinite      5  down* wn[075,096,105,110,146]
all*         up   infinite      8   idle wn[113,116,121-122,126,140-141,143]
all*         up   infinite      4   down wn[074,078,119,129]
debug        up   infinite      7   idle wn[060-063,065-067]
debug        up   infinite      3   down wn[064,068,070]

For more detailed information, please see manual man sinfo

states:

mix : consumable resources partially allocated
idle : available to requests consumable resources
drain : unavailable for use per system administrator request
drng : currently executing a job, but will not be allocated to additional jobs. The node will be changed to state DRAINED when the last job on it completes
alloc : consumable resources fully allocated
down : unavailable for use. Slurm can automatically place nodes in this state if some failure occurs.

Slurm

Jobs information

My first slurm job

overview of the resources offered

show job accounting data

stop or cancel jobs

Show jobs information in queue

How to run parallel job's with srun

Preparing the Environment

Interactive Sessions

Job pipeline using slurm dependencies

Use of user QOS for CPU jobs

How to Run a Job with a GPU

Use QOS to run GPU jobs

Deep Learning Example

How to selected a GPU

My jobs need to run longer than the queues permit

Resource Consuption

overview of the resources offered

`sinfo` : overview of the resources offered by the cluster

states:

Slurm

Jobs information

My first slurm job

overview of the resources offered

show job accounting data

stop or cancel jobs

Show jobs information in queue

How to run parallel job's with srun

Preparing the Environment

Interactive Sessions

Job pipeline using slurm dependencies

Use of user QOS for CPU jobs

How to Run a Job with a GPU

Use QOS to run GPU jobs

Deep Learning Example

How to selected a GPU

My jobs need to run longer than the queues permit

Resource Consuption

overview of the resources offered

sinfo : overview of the resources offered by the cluster

states:

`sinfo` : overview of the resources offered by the cluster