CRIU - Checkpoint Restore in Userspace
By Fabio Andrijauskas
While executing scientific applications, they must sometimes stop the process due to hardware problems or even job end-of-life. In that case, some applications can create a set of files to save their current state from being loaded on a restoring process later. However, most applications do not have this feature. CRIU (Checkpoint Restore in Userspace, pronounced kree-oo) is a tool for checkpointing and restoring applications in GNU/Linux environment. With CRIU it is possible to stop an application, save the working memory on disk and restore this state. As the OSPool is built around the notion of pre-emptable resources, this could be very useful on jobs that get pre-empted or exceed allocated runtimes.
Executive summary
The checking/restore tool is vital for time-consuming scientific applications or for stopping applications due to pre-emption, maintenance or other problems. About CRIU:
- CRIU can provide several options to stop and restore applications.
- It is possible to control applications with multiple threads and processes maintaining network connections.
- CRIU supports containers using docker and podman
- It requires a “form” of root access: sudo, SUID Bit, or Kernel capabilities.
- To use Kernel capabilities requires a specific version of CRIU and Linux.
- Restoring a previously checkpointed process requires that the same directory paths are used during restore as during checkpointing.
- There is no support for GPUs.
Tests using CRIU
The table below shows the tests run using the CRIU. CRIU non-root requires a kernel version of 5.9 or later. On the results, “root access” means any way to reach root privileges or the needed capabilities.
Test | CRIU 3.17.1 | CRIU Branch non-root/criu dev (setcap cap_checkpoint_restore+eip) | Comments |
Simple C code: writing on the terminal. Using the same operational system and hardware | Successful with root access | Successful without root access | The test was successful, only requiring a few options on restore and dump |
LAMMPS: read and write files. Using the same operational system and hardware | Successful with root access | Successful without root access | If CRIU detects any file or directory change location, the restoration fails. To restore it is required to set the paths from before the CRIU dump. |
Simple C code: dump from Intel hardware to AMD hardware. | Successful with root | Successful without root access | If it is used any kind of processer code optimization it will fail, and all the system libraries should be the same |
Simple C code – dump from AMD hardware to Intel hardware. | Successful with root | Successful without root access | If it is used any kind of processer code optimization it will fail, and all the system libraries should be the same |
Open files and dump the software | Successful with root | Successful without root access | The file structure should be the same on dump and restore |
TCP Network connections, one machine - dump and restore the client on the same computer. | Successful with root | Successful without root access | The was made using two software, one was TCP server and other was TCP client. It requires a few options to load and restore on CRIU. CRIU uses a advanced way to put the socket in a wait mode - https://www.criu.org/TCP_connection It is possible to restore only once the application |
TCP Network connections, one machine - dump and restore the server on the same computer. | Successful with root | Successful without root access | It is possible to restore only once the application |
TCP Network connections, one machine - dump and restore the server and the client on the same computer. | Successful with root | Successful without root access | It is possible to restore only once the application |
TCP Network connections, two machines, server and client on each machine respectively - dump and restore the client. | Successful with root | Successful without root access | It is possible to restore only once the application |
TCP Network connections, two machines, server and client on each machine respectively - dump and restore the server. | Successful with root | Successful without root access | It is possible to restore only once the application |
TCP Network connections, two machines, server and client on each machine respectively - dump and restore the client executed first on the other machine | Unsuccessful | Unsuccessful | |
TCP Network connections, two machines, server and client on each machine respectively - dump and restore the server executed first on the other machine | Unsuccessful | Unsuccessful | |
UDP Network connections, one machine - dump and restore the client on the same computer. | Successful with root | Successful without root access | |
UDP Network connections, one machine - dump and restore the server on the same computer. | Successful with root | Successful without root access | |
UDP Network connections, one machine - dump and restore the server and the client on the same computer. | Successful with root | Successful without root access | |
UDP Network connections, two machines, server, and client on each machine respectively - dump and restore the client. | Successful with root | Successful without root access | It is possible to restore only once the application |
UDP Network connections, two machines, server, and client on each machine respectively - dump and restore the server. | Successful with root | Successful without root access | It is possible to restore only once the application |
UDP Network connections, two machines, server, and client on each machine respectively - dump and restore the client executed first on the other machine | Unsuccessful | Unsuccessful | |
UDP Network connections, two machines, server, and client on each machine respectively - dump and restore the server executed first on the other machine | Unsuccessful | Unsuccessful | |
Nvidia GPUs are not supported at all | Unsuccessful | Unsuccessful | |
Singularity – Test to dump and restore inside the container | Unsuccessful | Unsuccessful | |
Singularity – Test to dump and restore outside the container | Unsuccessful | Unsuccessful | |
Podman – Test to dump and restore outside the container | Successful with root | Successful with root | Using podman/CRIU integration interface is possible to dump and check containers using the podman command. |
podman – Test to dump and restore inside the container | Unsuccessful | Unsuccessful | |
docker – Test to dump and restore outside the contaier | Successful with root | Successful with root access | Using docker/CRIU integration interface is possible to dump and check containers using the podman command. |
Docker – Test to dump and restore inside the container | Unsuccessful | Unsuccessful | CRIU requires a specific integration just like podman or docker |
Dumping and restoring a POD using CVMFS | Successful with root | Successful without root access | It is possible to recover the CVMFS. |
Dumping and restoring a process with FORK | Successful with root access | Successful with root access | The restore process requires root access |
Dumping and restoring a process with MPI | Unsuccessful | Unsuccessful | Several tests resulted in “hung” dump or restore |
Recommendations
Following the tests executed using CRIU, we have some recommendations related to how to use CRIU and what to do in the future with CRIU.
- Work with the apptainer developers to create an interface with CRIU. That way creates the possibility to checkpoint and dump softwares executed by apptainer
- Add CRIU support in HTCondor for non-pilot use cases. CRIU can be used in some scenarios where it is possible to have some “root” access.
- Define a policy that will minimize app errors due to changes in HW (e.g., Intel to AMD). CRIU can’t recover applications using a crossover CPU architecture.
- Add CRIU support in pilots, but only when using containers (see also (E) ). Using CRIU on container scenarios is possible, but some configurations are necessary.
- Re-evaluate CRIU non-root capabilities a few months from now. It is interesting to check if the non-root capabilities are present in the new CRIU version.
Tests and Description
During the test we found a bug and report it to CRIU dev team: https://github.com/checkpoint-restore/criu/issues/2032
Code 1 and 2 shows the TCP client and server, Code 3 and 4 shows the UDP client and server.
Code 1: TCP Client.
#include <arpa/inet.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>
#include <sys/socket.h>
#include <unistd.h>
#define MAX 80
#define PORT 8090
#define SA struct sockaddr
void func(int sockfd)
{
char buff[MAX];
int n;
for (;;) {
bzero(buff, sizeof(buff));
printf("Enter the string : ");
n = 0;
while ((buff[n++] = getchar()) != '\n')
;
write(sockfd, buff, sizeof(buff));
bzero(buff, sizeof(buff));
read(sockfd, buff, sizeof(buff));
printf("From Server : %s", buff);
if ((strncmp(buff, "exit", 4)) == 0) {
printf("Client Exit...\n");
break;
}
}
}
int main()
{
int sockfd, connfd;
struct sockaddr_in servaddr, cli;
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
printf("socket creation failed...\n");
exit(0);
}
else
printf("Socket successfully created..\n");
bzero(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = inet_addr("169.228.130.112");
servaddr.sin_port = htons(PORT);
if (connect(sockfd, (SA*)&servaddr, sizeof(servaddr))
!= 0) {
printf("connection with the server failed...\n");
exit(0);
}
else
printf("connected to the server..\n");
func(sockfd);
close(sockfd);
}
Code 2: TCP server
#include <stdio.h>
#include <netdb.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h> // read(), write(), close()
#define MAX 80
#define PORT 8090
#define SA struct sockaddr
// Function designed for chat between client and server.
void func(int connfd)
{
char buff[MAX];
int n;
// infinite loop for chat
for (;;) {
bzero(buff, MAX);
// read the message from client and copy it in buffer
read(connfd, buff, sizeof(buff));
// print buffer which contains the client contents
printf("From client: %s\t To client : ", buff);
bzero(buff, MAX);
n = 0;
// copy server message in the buffer
while ((buff[n++] = getchar()) != '\n')
;
// and send that buffer to client
write(connfd, buff, sizeof(buff));
// if msg contains "Exit" then server exit and chat ended.
if (strncmp("exit", buff, 4) == 0) {
printf("Server Exit...\n");
break;
}
}
}
// Driver function
int main()
{
int sockfd, connfd, len;
struct sockaddr_in servaddr, cli;
// socket create and verification
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd == -1) {
printf("socket creation failed...\n");
exit(0);
}
else
printf("Socket successfully created..\n");
bzero(&servaddr, sizeof(servaddr));
// assign IP, PORT
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(PORT);
// Binding newly created socket to given IP and verification
if ((bind(sockfd, (SA*)&servaddr, sizeof(servaddr))) != 0) {
printf("socket bind failed...\n");
exit(0);
}
else
printf("Socket successfully binded..\n");
// Now server is ready to listen and verification
if ((listen(sockfd, 5)) != 0) {
printf("Listen failed...\n");
exit(0);
}
else
printf("Server listening..\n");
len = sizeof(cli);
// Accept the data packet from client and verification
connfd = accept(sockfd, (SA*)&cli, &len);
if (connfd < 0) {
printf("server accept failed...\n");
exit(0);
}
else
printf("server accept the client...\n");
// Function for chatting between client and server
func(connfd);
// After chatting close the socket
close(sockfd);
}
Code 3: UDP client.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#define PORT 8080
#define MAXLINE 1024
// Driver code
int main() {
int sockfd;
char buffer[MAXLINE];
//char *hello = "Hello from client";
char *hello = calloc(100,sizeof(char));
struct sockaddr_in servaddr;
// Creating socket file descriptor
if ( (sockfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0 ) {
perror("socket creation failed");
exit(EXIT_FAILURE);
}
memset(&servaddr, 0, sizeof(servaddr));
// Filling server information
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons(PORT);
servaddr.sin_addr.s_addr = inet_addr("169.228.130.115");
int n, len;
while(1)
{
printf("Type msg:.\n");
fgets (hello,10 , stdin);
sendto(sockfd, (const char *)hello, strlen(hello),
MSG_CONFIRM, (const struct sockaddr *) &servaddr,
sizeof(servaddr));
printf("Hello message sent.\n");
n = recvfrom(sockfd, (char *)buffer, MAXLINE,
MSG_WAITALL, (struct sockaddr *) &servaddr,
&len);
buffer[n] = '\0';
printf("Server : %s\n", buffer);
}
close(sockfd);
return 0;
}
Code 4: UDP server
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#define PORT 8080
#define MAXLINE 1024
// Driver code
int main() {
int sockfd;
char buffer[MAXLINE];
char *hello = "Hello from server";
struct sockaddr_in servaddr, cliaddr;
// Creating socket file descriptor
if ( (sockfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0 ) {
perror("socket creation failed");
exit(EXIT_FAILURE);
}
memset(&servaddr, 0, sizeof(servaddr));
memset(&cliaddr, 0, sizeof(cliaddr));
// Filling server information
servaddr.sin_family = AF_INET; // IPv4
servaddr.sin_addr.s_addr = INADDR_ANY;
servaddr.sin_port = htons(PORT);
// Bind the socket with the server address
if ( bind(sockfd, (const struct sockaddr *)&servaddr,
sizeof(servaddr)) < 0 )
{
perror("bind failed");
exit(EXIT_FAILURE);
}
int len, n;
len = sizeof(cliaddr); //len is value/result
while(1)
{
n = recvfrom(sockfd, (char *)buffer, MAXLINE,
MSG_WAITALL, ( struct sockaddr *) &cliaddr,
&len);
buffer[n] = '\0';
printf("Client : %s\n", buffer);
sendto(sockfd, (const char *)hello, strlen(hello),
MSG_CONFIRM, (const struct sockaddr *) &cliaddr,
len);
printf("Hello message sent.\n");
}
return 0;
}