How to interpret the job status

  • There are a number of status for a job. to check a job status:

qstat

  • The main status are:
Status Meaning
qw job in queue and waiting for available resources
hqw job in queue, waiting and sytem hold
R job is running
t job running and transfering
eqw , ehqw job pending in error state
dr all running and suspended with deletion

Full details

  • If you want to know the full details of your job:
qstat -j jobID

or

qstat -j jobID | less 

Example

qstat -j 190407
  • output:
job_number:                 190407    (jobID number)
exec_file:                  job_scripts/190407  
submission_time:            Thu May  9 13:01:35 2019  (submission jobTIME)
owner:                      biomed015 (username)
uid:                        3060015  (userID)
group:                      biomed (userGROUP)
gid:                        3060000  (usergroupID)
sge_o_home:                 /home/biomed/biomed015  (home of the username)
sge_o_log_name:             biomed015  (SGE internal parameters)
sge_o_path:                 /opt/sge/bin/lx-amd64:/sbin:/bin:/usr/sbin:/usr/bin (SGE internal parameters)
sge_o_shell:                /sbin/nologin   (SGE internal parameters)
sge_o_workdir:              /var/tmp  (SGE internal parameters)
sge_o_host:                 ce06 (host name submiter)
account:                    sge  (SGE internal parameters)
mail_list:                  biomed015@ce06.ncg.ingrid.pt   (SGE internal parameters)
notify:                     FALSE (email job notifier ON / OFF) -> at INCD this feature is not available
job_name:                   cream_940554311 (name of job: job submission script) 
jobshare:                   0 (jobSHARE: Equal to everyone)
hard_queue_list:            hpc (queue name)
shell_list:                 NONE:/bin/bash 
env_list:                   ..... (very long list). ....
script_file:                /tmp/cream_940554311 (SGE internal parameters)
project:                    BiomedGrid 
binding:                    NONE 
job_type:                   NONE 
scheduling info:            queue instance "csyslip@wn216.ncg.ingrid.pt" dropped because it is temporarily not available 

Explanation

job_number:                 jobID number
exec_file:                  scripts needede to run the job: create by SGE no action from user required  
submission_time:            submission jobTIME
owner:                      username
uid:                        userid
group:                      user group
gid:                        usergroupid
sge_o_home:                 home of the username
sge_o_log_name:             SGE internal parameters
sge_o_path:                 SGE internal parameters
sge_o_shell:                SGE internal parameters
sge_o_workdir:              SGE internal parameters
sge_o_host:                 hostname from which the job is submited
account:                    SGE internal parameters
mail_list:                  SGE internal parameters
notify:                     email job finish ON / OFF -> at INCD this feature is not available
job_name:                   name of job: job submission script)
jobshare:                   jobSHARE: Equal to everyone
hard_queue_list:            queue name to which the job is submited
shell_list:                 shell used (taken form user account information)
env_list:                   FULL list of environment variables
script_file:                /tmp/cream_940554311 (SGE internal parameters)
project:                    Project name to hwich user is assigned (SGE internal parameters) 
binding:                    not used at INCD 
job_type:                   not used at INCD
scheduling info:            long list of resources available to SGE. Take into account that SGE looks at ALL resources available to him and only latter it check checks if users is entitled to run there.