overview of the resources offered

sinfo : overview of the resources offered by the cluster

By default, sinfo lists the available partitions name(s), availability, time limit, number of nodes, their state and the nodelist. A partition is a set of compute nodes.

The command sinfo by default

$ sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*         up   infinite      5  down* wn[075,096,105,110,146]
all*         up   infinite      6  drain wn[077,091,101,117,143,148]
all*         up   infinite      2    mix wn[079,097]
all*         up   infinite     33  alloc wn[081-089,092-095,099-100,104,108,112,115,118,124,135-139,144-145,151,155-158]
all*         up   infinite     40   idle wn[071-073,076,080,090,098,102-103,106-107,109,111,113-114,116,120-123,125-128,130-134,140-142,147,149-150,152-154,159-160]
all*         up   infinite      4   down wn[074,078,119,129]
debug        up   infinite      8   idle wn[060-063,065-067,069]
debug        up   infinite      3   down wn[064,068,070]

The command sinfo --Node provides the list of nodes and their actual state individually.

$ sinfo -Node

NODELIST   NODES PARTITION STATE
wn071          1      all* alloc
wn072          1      all* drain
wn073          1      all* alloc
wn074          1      all* down
wn075          1      all* down*
wn076          1      all* alloc

The command sinfo --summarize provides the node state in the form "available/idle/other/total"

$ sinfo --summarize

PARTITION AVAIL  TIMELIMIT   NODES(A/I/O/T)  NODELIST
all*         up   infinite       36/7/47/90  wn[071-160]
debug        up   infinite         2/6/3/11  wn[060-070]


The command sinfo --long provides additional information than sinfo. Informations about the OverSubscribe (OVERSUBS), All the queues are defined as OVERSUBS=NO, none of the partitions(queues) allow requestes over the limit of the consumable resources.

$ sinfo --long

PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT OVERSUBS     GROUPS  NODES       STATE NODELIST
all*         up   infinite 1-infinite   no       NO        all      5       down* wn[075,096,105,110,146]
all*         up   infinite 1-infinite   no       NO        all     38     drained wn[072-073,076-077,080,090-091,098,101-103,106-107,109,113-114,116-117,120-123,125-128,130,133-134,136,140-141,143,147-148,150,152,159]
all*         up   infinite 1-infinite   no       NO        all      4       mixed wn[079,094,097,137]
all*         up   infinite 1-infinite   no       NO        all     32   allocated wn[071,081-089,092-093,095,099-100,104,108,112,115,118,124,131-132,135,138-139,144,151,155-158]
all*         up   infinite 1-infinite   no       NO        all      7        idle wn[111,142,145,149,153-154,160]


With sinfo you can also filter the nodes/partitions for specific situation, in this example we requested to list the nodes either idle or down

$sinfo --states=idle,down

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*         up   infinite      5  down* wn[075,096,105,110,146]
all*         up   infinite      8   idle wn[113,116,121-122,126,140-141,143]
all*         up   infinite      4   down wn[074,078,119,129]
debug        up   infinite      7   idle wn[060-063,065-067]
debug        up   infinite      3   down wn[064,068,070]


For more detailed information, please see manual man sinfo

states:

  • mix : consumable resources partially allocated
  • idle : available to requests consumable resources
  • drain : unavailable for use per system administrator request
  • drng : currently executing a job, but will not be allocated to additional jobs. The node will be changed to state DRAINED when the last job on it completes
  • alloc : consumable resources fully allocated
  • down : unavailable for use. Slurm can automatically place nodes in this state if some failure occurs.