Testing condor_ssh_to_job

Author: Fabio Andrijauskas - University of California San Diego
Date: 12/23/2024

HTCondor allows users to establish an interactive session with any running job they own. This is equivalent to creating an SSH section that connects to the job, where the user can inspect the job logs, intermediate results and other files. The command involved is called condor_ssh_to_job. The objective of this activity was to document how reliable is this functionality on the OSPool.

Executive summary

Our tests show that only 30 out of the 56 tested sites available through OSPool  provided full condor_ssh_to_job functionality. Another 20 provided partial functionality. Only 6 sites provided no condor_ssh_to_job functionality. We did not see any difference between vanilla and container-based jobs.

The error reported in the vast majority of failed attempts was Can't setns to user namespace: Invalid argument”. This seems to be fatal for fully interactive access, but is considered only a warning for scripted access.

Recommendations

We recommend that the HTCondor team graciously deals with the reported namespace-realted failure mode, as it does not seem to be essential for the functionality of condor_ssh_to_job.

Further details

The tests were performed using ap21.uc.osg-htc.org and looked for all available sites in the OSPool using condor_status -pool factory-1.osg-htc.org -any -const 'MyType=="glide factory"' -af GLIDEIN_Site. Each site was tested 10 times using one script running: condor_ssh_to_job jobid 'hostname' and condor_ssh_to_job jobid; the methodology was based on checking if it is possible to log into the job, list files, and change to any directory using Vanilla and Containers jobs. We do not see any difference between vanilla and container jobs; interactive logins, however, failed significantly more often than command invocations through condor_ssh_to_job. All the tests were executed during 11/15/24 to 12/10/24.

Tables 1a and 1b show the results: 56 sites were reached, and 30 sites provided an SSH session, file listing, and directory change. Regarding issues, 26 sites could not provide an SSH session. More details about each site can be found in the appendix (https://drive.google.com/file/d/17ASdER6tz5D8_kVwgIR42uWlAKWZrpFR/view?usp=sharing). The nodes showing Can't setns to user namespace: Invalid argument” could not provide shell sessions. However, it was possible to run commands.

Table 1a: Numbers about sites where interactive ssh-to-job works.

Status

Quantity

Comments

Works

30

3 Slow SSH – NotreDame - UConn-HP - UIUC-TGI-RAILS

Broken

26

21 sites: “Can't setns to user namespace: Invalid argument”

1 site: “‘$GahpVersion: 1.8.0 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $”

           

 

 

 

Table 1b: Numbers about sites where ‘command only’ for the ssh-to-job works

Status

Qtd

Comments

Works

50

21 sites: warning for SSH commands “Can't setns to user namespace: Invalid argument”

Broken

6

1 site: “/usr/bin/ssh-keygen: No such file or directory

1 site: “Failed, because sshd not correctly configured (SSH_TO_JOB_SSHD=/usr/sbin/sshd): No such file or directory”

1 site: “Connection to condor-job.node827.dcs.ligo-wa.caltech.edu closed by remote host.”

1 site: “/bin/bash: Permission denied”

1 site: “Connection closed by UNKNOWN port 65535”

1 site: “‘$GahpVersion: 1.8.0 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $”


 

Table 2 shows the percentage of successes and failures for each site. Most sites were homogenous, but a few showed errors in some of the nodes.

Table 2: Percentage of successes and failures for each site.

Sites

Command only

Interactive session.

Status

Success

Failure

Success

Failure

 

AMNH

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

Alabama-CHPC

100%

0%

90%

10%

‘Can't setns to user namespace: Invalid argument’

BEOCAT-SLATE

0%

100%

0%

100%

‘$GahpVersion: 1.8.0 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $’

CHTC

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument  and ‘memory usage exceeded request_memory

CHTC-Spark

100%

0%

100%

0%

 

Clemson-Palmetto

100%

0%

100%

0%

 

Colorado

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

Duke-NCShare

100%

0%

100%

0%

 

ELSA

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

FANDM-ITS

100%

0%

100%

0%

 

FNAL

0%

100%

0%

100%

‘/usr/bin/ssh-keygen: No such file or directory’

FNAL_GPGrid

0%

100%

0%

100%

‘Failed, because sshd not correctly configured (SSH_TO_JOB_SSHD=/usr/sbin/sshd): No such file or directory’

GATech

100%

0%

100%

0%

 

GRID_ce2

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

GSU-ACIDS

100%

0%

100%

0%

 

Hawaii-Koa

100%

0%

100%

0%

slow

LIGO-WA

0%

100%

0%

100%

‘Connection to condor-job.node827.dcs.ligo-wa.caltech.edu closed by remote host.’

Lehigh - Hawk

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

Langston-Lion

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

Lafayette-Firebird

100%

0%

100%

0%

 

LSU-Deep_Bayou

100%

0%

100%

0%

 

LSUHSC-Tigerfish

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

MI-HORUS

0%

100%

0%

100%

‘/bin/bash: Permission denied’

MSU-DataMachine

100%

0%

0%

100%

‘Can't setns to pid namespace: Operation not permitted’

MTState-Tempest

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

New Mexico State Discovery

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

NCSU-OSG

100%

0%

100%

0%

 

NotreDame

100%

0%

100%

0%

 

ORU-Titan

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

OSG_US_FSU_HNPGRID

100%

0%

100%

0%

slow

PSU-LIGO

100%

0%

100%

0%

 

Purdue-Anvil

100%

0%

100%

0%

 

PuertoRico

100%

0%

100%

0%

 

PDX-Coeus

100%

0%

100%

0%

 

SIUE-CC-production

100%

0%

100%

0%

 

SPRACE

100%

0%

100%

0%

 

SU-ITS

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

Swarthmore-Firebird

100%

0%

100%

0%

 

ODU-Ubuntu

100%

0%

0%

100%

‘Can't setns to pid namespace: Operation not permitted’ or timeout

UAH-Voyager

100%

0%

100%

0%

 

UC-Denver

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

UWEC-BOSE

100%

0%

100%

0%

 

UChicago

60%

40%

60%

40%

‘Connection closed by UNKNOWN port 65535’

UConn

100%

0%

100%

0%

 

UConn-HPC

100%

0%

100%

0%

 

UCR-HPCC

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

UIUC-TGI-RAILS

100%

0%

100%

0%

slow

UMT-Hellgate

100%

0%

100%

0%

 

UNR-CC

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

UUCHPC

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

UND-Talon

100%

0%

100%

0%

 

UWM-Mortimer

100%

0%

0%

100%

‘Can't setns to user namespace: Invalid argument’

UW-IT

100%

0%

100%

0%

 

Wisconsin

100%

0%

100%

0%