1.
Introduction
The X-Ray Imaging and Spectroscopy Mission (XRISM) is the seventh Japanese X-ray observatory and the successor of the Hitomi X-ray satellite. It was launched at 8:42 a.m. on September 7, 2023, in Japan Standard Time with the H-IIA Launch Vehicle No. 47 from Tanegashima Space Center. The project is led by the Japan Aerospace Exploration Agency (JAXA) and the National Aeronautics and Space Administration (NASA) in collaboration with the European Space Agency (ESA) and other partners. XRISM carries the revolutionary X-ray microcalorimeter array with a high-energy resolution (; Resolve) and an X-ray CCD camera with a large field of view (; Xtend). The first light observations of supernova remnant N132D in the Large Magellanic Cloud1 and the active galactic nucleus NGC 41512 with Resolve, and the galaxy cluster Abell 23193 with Xtend promise that XRISM provides a leap in our understanding of the formation and evolution of the Universe, galaxies, compact objects, and supernovae.
The mission schedule is as follows: the critical operation and "initial phase commissioning period" (the "commissioning period" hereafter; lasting three months after the launch), the "nominal phase initial calibration and performance verification period" (the "PV period" hereafter; lasting seven months after the commissioning period), and the "nominal phase nominal observation period" (the "nominal observation period" hereafter; lasting 26 months after the PV period). The nominal observation period will conclude in September 2026, after which XRISM will undergo a review to evaluate whether it can transition into the late phase, that is, the extended observation phase. As of January 2025, XRISM is in the general observer program cycle 1 (GO-1) during the nominal observation period.
The Science Operations Center (SOC) at JAXA, the Science Data Center (SDC) at NASA, and the European Space Astronomy Centre (ESAC) at ESA share the science operation tasks in XRISM, and the first two centers are responsible for data processing and distribution.4 XRISM adopts a "two-stage" data reduction: pre-pipeline (PPL) process in SOC and subsequent pipeline (PL) process in SDC. Although the software was designed carefully prior to the launch, further improvements are still underway; downlinked observation data are regularly processed using the latest versions of PPL and PL at that point. When a significant update to either PPL or PL occurs at some point during a mission period (this has happened a few times), the quality of the products distributed to scientists changes from that point on. As PPL processing cannot be applied to end users, all data should be reprocessed using the latest software at certain points, such as at the end of the commissioning and PV periods of the mission (approximately once a year).
On the other hand, high-performance computing (HPC) is required to complete these reprocessing tasks within a realistic time (less than one week), which requires additional work to port the software to the HPC system. To this end, we developed new and efficient porting methods that utilize Singularity,5 which is a container platform. Containers are lightweight virtualization technologies (VTs) compared with virtual machines (VMs). Packaging our ordinal PPL environment with core system libraries, such as the GNU C Library (glibc), into a Singularity disk image allows the PPL software to run on the JAXA "TOKI-RURI" HPC system6 with minimal modifications. Herein, we elaborate on our methods and processing results using the HPC system.
The remainder of this paper is organized as follows. Section 2 briefly describes regular data processing using PPL and PL. Section 3 summarizes VTs in contrast to VMs and containers, Singularity, and our motivations and methods. Section 4 discusses the processing results of TOKI-RURI. Section 5 summarizes this study.
2.
Ordinary Data Processing for Pipeline Products
Figure 1 shows a schematic of the data reduction for pipeline products in XRISM (see Refs. 4 and 7 for details). Data accumulated from payload instruments, including X-ray detectors, in the form of Spacecraft Monitor and Control Protocol (SMCP)8 messages into an onboard data recorder (DR) are downlinked to the tracking and control stations on the ground in the space packet format defined by the Consultative Committee for Space Data Systems (CCSDS). These packets are stored in a Scientific Information Retrieval and Integrated Utilization System (SIRIUS),9 which is a telemetry database operated by the Science Satellite Operation and Data Archive Unit (C-SODA) in JAXA, along with information on communication antennae. In regular data processing, we use the "merged mode" for SIRIUS, where SIRIUS ignores the antenna information and behaves as if all packets in the system had been downlinked with one virtual antenna. In this mode, SIRIUS dedupes the packets; the same memory addresses in the DR can be read multiple times during daily operations for robustness, and this functionality removes these duplicate packets. An explicit specification of an antenna to SIRIUS allows us to access the removed packets.
Fig. 1
Schematic of data processing in XRISM. Telemetry data downlinked from XRISM are converted into FITS products with the pre-pipeline and pipeline software in SOC and SDC, respectively. Both centers distribute a set of SFFs and FITS products to the principal investigator of the observation.
During the first stage of the PPL software managed by SOC, which runs on a standard Linux (RedHat Enterprise Linux 7) VM named "Reformatter" featuring four-core virtual central processing units (CPUs) and 64 GB of virtual memory hosted on a physical server with dual Intel Xeon E5-2698 v4 processors and 256 GB of memory in C-SODA, the CCSDS packets corresponding to a single observation identified by OBSID—a nine-digit unique integer where the highest digit indicates the observation type, such as commissioning (=0) and calibration (=1) observations—are dumped into a single Flexible Image Transport System (FITS)10 ,11 file with variable-length arrays called "raw packet telemetry" (RPT). In the latter stage of PPL, the CCSDS packets in the RPT are reconstructed into SMCP messages and are then compiled into essential raw values in stages by referencing the definition files for telemetry and commands (SIB28 ,12; confidential) and satellite and observation information. These raw values are then reduced to multiple "First FITS Files" (FFFs). The typical PPL processing time is . The FFFs are transferred to SDC.
The PL software in SDC calibrates the raw values in the FFFs into physical quantities such as the coordinates and pulse height invariants in stages. Referencing the calibration database, PL creates "second FITS files" (SFFs). PL also generates ready-for-analysis FITS products (cleaned-event FITS files) by filtering out low-quality events in the SFFs and applying good-time intervals. A set of SFFs and FITS products is distributed to the principal investigator of the observation, whereas the RPT and FFFs as files are hidden inside the XRISM project; all columns in the FFFs are transferred to the SFFs, and users can access all the information in the FFFs through the corresponding SFFs. In other words, the only difference between FFFs and SFFs is whether the correct values are entered into the FITS tables; they are of the same size, except for the HISTORY and COMMENT keywords in their respective FITS headers. Including intermediate information, no data are lost during PPL and PL processing. From the perspective of the principal investigator as an end user, only a few columns ( MiB in size) in the FITS products (a few tens of GiB) are necessary for analysis.
3.
Motivations and Strategies for the HPC Version of PPL
After the launch and the subsequent short critical operation period, XRISM experienced two essential milestones: the commissioning and PV periods. PPL and PL software improved aggressively in the first two periods as it was fundamental for the instrument teams to examine whether their instruments were functioning properly. This also implies that all observational data must have been simultaneously reprocessed with the latest versions of PPL and PL at the end of each period to homogenize the quality of the PPL and PL products in preparation for scientific publication. Concurrently, there were 80 and 161 observation IDs (OBSIDs) at the end of the commissioning and PV periods, respectively, and they were unfeasible tasks for our ordinary VMs (a temporary change to the VM settings according to the PPL tasks was not accommodated by C-SODA). This has strongly motivated SOC to explore a method to boost PPL tasks in an HPC system such as a supercomputer. On the other hand, given that SIB2 is classified information and XRISM’s operation is scheduled to continue until September 2026, as of January 2025, developing a rapid method that could be implemented using resources available in JAXA was deemed essential. RedHat Enterprise Linux 7 or an alternative was also required for PPL because some programs can only be compiled with and tested using the GNU Compiler Collection (GCC) 4.8. We required parallelization in a unit of OBSID and did not intend to implement parallelizable algorithms that use Open Multiprocessing and Message Passing Interface (MPI), among others, in PPL.
3.1.
JAXA Supercomputer System Generation 3
The JAXA Supercomputer System Generation 3 (JSS3),6 which began operating in 2020, is the infrastructure for numerical simulations and large-scale data analyses. It consists of
• TOKI-SORA: the leading computing platform equipping A64FX13-based CPUs with the theoretical peak performance of 19.4 PFLOPS,
• TOKI-RURI and TOKI-TRURI: x86-64 architecture clusters running Rocky Linux 8 for general-purpose computing, including machine learning, with the theoretical peak performance of 1.24 PFLOPS and 145 TFLOPS, respectively,
• TOKI-FS: a storage system with 10-PB nonvolatile memory express (NVMe) solid-state drives (SSDs) and 40-PB hard disks (HDDs) in total, powered by the Lustre distributed parallel file system14 (FS),
• J-SPACE: the archiving infrastructure.
There is also a 0.4-PB storage system called TOKI-TFS, which connects to TOKI-TRURI. For TOKI-SORA and TOKI-RURI, TOKI-FS serves as the sole hot storage; all input and output files must reside in TOKI-FS. It should be noted that these systems are completely isolated from the Internet. To use them, one should log in to one of the login nodes through a virtual private network (VPN). Furthermore, only SSH communication from the Internet to the login node is permitted. The firewall prohibits communication in the reverse direction (see the bottom half of Fig. 2).
TOKI-RURI is the best choice as each node can be considered a standard personal computer (PC) running Linux. TOKI-RURI is further divided into four types as summarized in Table 1. Thus, the TOKI-RURI system has 14,976 physical CPU cores and a total of 104.8 TiB of memory; MPI enables parallel computations across multiple nodes, even if they are of different types. The login nodes for TOKI-RURI consist of eight servers, each equipped with dual Intel Xeon Gold 6240 processors, 384 GiB of memory, and one NVIDIA Quadro P4000. These login nodes are also used for small-scale testing and debugging. The sequential write to an HDD region in TOKI-FS on a login node is .
Table 1
Specifications of the TOKI-RURI system.
Type | TOKI-XM | TOKI-LM | TOKI-ST | TOKI-GP |
---|---|---|---|---|
Number of nodes | 2 | 7 | 375 | 32 |
CPU (Intel Xeon) | Gold 6240L () | Gold 6240 () | ||
Memory | 6 TiB | 1.5 TiB | 192 GiB | 384 GiB |
GPGPU (NVIDIA) | Quadro P4000 () | Tesla V100 SXM2 () |
To execute commands (or programs) in parallel on TOKI-RURI, users need to write and submit a bash script, often referred to as a "job script," that defines necessary computing resources such as the number of CPU cores, memory capacity, and computation time (Code 1). By considering the resource requirements outlined in the job script and ensuring fairness among users, a job scheduler allocates the appropriate computing nodes from the four types of TOKI-RURI listed in Table 1 and determines the execution timing (fair-share scheduling). This ensures that the job runs without being affected by performance degradation caused by other jobs. However, the job will be deferred when sufficient resources are not available at that moment. This implies that the resource requirements of the job script must always exceed the actual usage for the task. If the actual usage slightly exceeds that defined in the job script, the job is forcibly terminated. In addition, the user does not know exactly when the job will start ( jobs started execution immediately after submitting the job scripts in our case). Unless an exception request for special circumstances is submitted, the computing time is limited to 24 h.
Code 1
Example of a job script for the JSS3 system. Resource requirements for a batch job are provided through comments that start with the special prefix "#JX."
Because PPL does not have parallel execution parts, we can reliably request a single CPU core as a constant value for the job script. To adapt PPL for TOKI-RURI, we need to derive formulas that estimate the memory capacity and computing time required for each OBSID based on its observation duration and prepare a metascript to incorporate these values into a job script template.
3.2.
Virtualization Technologies on Linux and Singularity
VMs such as the Oracle VirtualBox15 are among the most well-known VTs. VMs were invented in the late 1960s and are categorized into type 1 and 2 hypervisors.16 The former runs directly on PC hardware, whereas the latter runs on a host operating system (OS). Both hypervisors can emulate commonly used PC hardware to run arbitrary OSes. Although PC users can simultaneously execute both Linux and Windows applications, for instance, on their laptops by employing VMs, running multiple VMs on the respective high-performance servers increases their density in a data center (Reformatter is the case), thus maximizing the hardware resource usage. In return, there are overheads due to hardware emulations, which are significant in file input/output (I/O) to virtual disks.
Another VT isolates a process, an execution unit of programs, and its subprocesses from the others on an OS. It originates from the chroot command/system call of Version 7 Unix, released in 1979, leading to the jail mechanism of FreeBSD introduced in 2000.17 The same feature was implemented as a set of patches, "Linux-VServer," to the Linux kernel, announced in 2001.18 Regarding Linux, the process isolation technology known today as "containers" is based on the two features of the Linux kernel cgroups19 and namespaces.20 The former hierarchically groups the respective processes and limits their hardware usage, such as CPU time, I/O bandwidth, and memory capacity, and the latter constrains their visibilities of software resources such as mount points, hostname, and user/group IDs. As the program inside a container instance runs directly on the host kernel, there is marginal overhead, which is an advantage over VMs. On the other hand, note that a container on a Linux system accepts only Linux programs, unlike VMs.
Container platforms, such as Docker21 and Singularity (and its descendant Apptainer22), wrap the cgroups and namespaces into an easy-to-use form and package all necessary software into a container image by resolving their mutual dependencies. Both enable users to build their original container image from a seed image on Docker Hub,23 which covers major Linux distributions and creates package management tools such as apt, dnf, pacman, and zypper commands available inside the container instance. Hence, VMs and containers are similar in terms of practical usage. Because a Docker instance occupies its image, we can create just one instance from one Docker image; Docker is frequently used to develop and deploy server applications. Singularity mounts a container image as a read-only image, which means that multiple instances can be spawned from a single container image. Singularity is often used in HPC systems, and its Version 3.10 is also available in TOKI-RURI.
3.3.
Obstacles to the HPC Version of PPL and the Key to the Solution
In the early design phase of PPL, we fixed all software versions, including the operating system (RedHat Enterprise Linux 7 with glibc 2.17), compilers (GCC 4.8 and 8), and scripting languages (Perl 5.16, Python 3.8, and Ruby 2.6). We also decided that it should only run on Reformatter and is not guaranteed to work in any other environment. Although these compromises in the PPL specifications clarified and simplified the development goals, the latest version of PPL, its related software, and the auxiliary files have been deployed in a subdirectory named after the eight-digit release date in the appropriate directory and identified as "latest" by a symbolic link (note that we mention deployment here, and the source codes are naturally version-controlled by Git). There were also cross-references between the directories through symbolic links. Although everything worked perfectly inside Reformatter, these tricks seemed to hinder the execution of PPL on the TOKI-RURI system because it was time consuming to manually build all the software that PPL depends on and to fit PPL into the system’s directory structure.
However, based on our knowledge of containers and their technological backgrounds, we provide an elegant solution for porting PPL to TOKI-RURI: manipulating the mount namespace and creating everything inside a container instance identical to Reformatter. If this is possible, all we have to do is copy the binaries compiled on Reformatter into a container image and other materials onto TOKI-FS; Singularity accepts the "--bind" option. For example,
--bind /host/path/to/A:/container/path/to/B
makes the directory /host/path/to/A on the host appear as /container/path/to/B inside the container. This option also accepts a loopback device (or disk image) formatted as ext3.
--bind /host/loop/ext3.img:/container/path/to/B:image-src=/
shows the root directory / inside the disk image of the host /host/loop/ext3.img as /container/path/to/B inside the container.
3.4.
Estimating Computing Resources on Reformatter
During the commissioning period, we needed to profile PPL on Reformatter. Because no profilers were installed there and we did not have root privileges for Reformatter, we wrote a small profiler in Python. The uses and functions of the script are as follows.
We start two terminals and execute the top-level script of PPL in one of them. Immediately afterward, we run the ps command in the other terminal to find the process ID of the top-level script and execute the profiler by specifying the process ID as an argument. The profiler runs the following shell one-liner every minute, where pid_parent represents the process ID of the top-level script: Obtain a list of all the descendant processes’ process IDs
pstree -p (pid_parent) | perl -ne 'print "1ドル " while /\((\d+)\)/g'
Next, the profiler immediately executes the following commands to obtain a list of each process’s physical memory usage (resident set size; RSS), reserved memory size (virtual memory size; VSZ), and command name with its arguments. The variable pid_list corresponds to a comma-separated list of the aforementioned process IDs
ps -p (pid_list) -o pid,rss,vsz,command
The profiler appends these values and the time required for each process to a log file as tab-delimited text (one line for each process). The profiler also records the totals of RSS and VSZ at that moment as the consumption of a virtual "TOTAL" command (the name has no other meaning than as a flag) in the log file. When the top-level script terminates, the profiler loads the log file, groups the data using the command names, computes the maxima of RSS and VSZ for each command, and stores the results in a summary file. We consider the maximum VSZ value of the "TOTAL" command in the summary file to represent the memory consumption of PPL; we define the execution time of PPL as the last time recorded in the log file minus the first time.
We selected five OBSIDs that were likely to cover a range of future observation durations and measured the execution time () and memory usage () of PPL for each OBSID using the aforementioned method. Note that Reformatter is noisy regarding these measurements because programs related to XRISM other than PPL are running. By linear fitting, we obtained
Eq. (1)
Eq. (2)
C-SODA provides a program in the PPL chain that converts telemetry packets into raw values in a FITS file. We observed its anomalous behavior in 2020. An analysis using Valgrind revealed that the initial memory size and increment were not optimized for the typical size of an XRISM RPT file, wasting approximately half the execution time of the program () and of memory. Although we reported this issue to C-SODA, which was later resolved, the update was not reflected in PPL because we had already fixed the software version at that time and did not require the absolute performance. Considering that the program is called multiple times in the PPL chain, the constants in Eqs. (1) and (2) can be attributed to this overhead.
3.5.
Porting Strategies
Figures 2 and 3 illustrate the schematic network setup and sequence diagram of the HPC version of PPL, respectively. Reformatter has a (sequential write) network-attached storage (NAS) that stores all input and output data. We created a directory on the NAS to accumulate all the materials necessary for PPL processing on TOKI-RURI (the "delivery directory" hereafter). Because TOKI-RURI cannot communicate with SIRIUS, we wrote a small patch (25 lines) to PPL that pauses the PPL processing when the corresponding RPT file is generated on Reformatter and resumes the remaining processing on TOKI-RURI.
Fig. 2
Network overview between Reformatter and TOKI-RURI during HPC PPL processing. An operator uses a local PC on the Internet to log in to Reformatter and one of TOKI-RURI’s login nodes through SSH. Reformatter and the TOKI-RURI login node are isolated from each other, and file transfer between the NAS for Reformatter and TOKI-FS is relayed by the relay server over a one-way SSH session from Reformatter to the login node, even when the outputs of HPC PPL on TOKI-RURI are rsynced to the NAS (the file transfer direction is opposite to that of the SSH session in this case).
Because we do not have root privileges for Reformatter, we created a container image on a PC other than Reformatter, where Ubuntu 24.04 is running and Version 3.10.5 of the Singularity Community Edition is installed to match the version with TOKI-RURI. In the definition file, a recipe for creating the container image, a seed of CentOS 7 on Docker Hub is obtained, and all software updates are applied; all software packages installed on Reformatter with the yum command are also installed similarly on the container. At the end of the definition file, the binaries of the patched PPL built on Reformatter are copied to the container image, and all mount points for the "--bind" option are created Code 2). The image ( GiB) is transferred to the delivery directory on Reformatter via the scp command.
Code 2
Example of a Singularity definition file. A CentOS 7 image is pulled from Docker Hub, all software updates are applied, and development tools are installed. When a container instance starts, environment variables for Japanese are set, and echo commands are run.
On Reformatter, the time calibration table, orbit, and attitude files are copied to the delivery directory. A small Python script reads the list of OBSIDs, sequentially creates the respective RPT files, and copies them to the delivery directory. The directory is archived into a tarball with Zstandard compression () and copied to a TOKI-RURI login node using the rsync command, allowing file transfer to resume in case of unexpected network interruptions. The bandwidth of rsync was .
When the file transfer is completed, it is manually extracted into the working directory of TOKI-FS (not SSDs but HDDs). As described in Sec. 3.1, we require a job script that defines the computing resources necessary to run the containerized PPL as a batch job on TOKI-RURI. The key point here is not that the estimation is correct but rather that the actual resource consumption does not exceed the estimation. By considering the difference in the CPU clock frequencies between Reformatter and TOKI-RURI, the overhead of the Lustre FS (Sec. 4.3), and margins, we estimate the computing time and memory requirement on TOKI-RURI as follows:
Eq. (3)
Eq. (4)
A Python script reads the list of OBSIDs with their observation start and end times, computes the respective , builds the argument for the "--bind" option, which gives the mappings between the native FS and inside the container on TOKI-RURI, and writes them into the corresponding job scripts. The Python script automatically submits these job scripts. When all jobs are completed, the resultant FFFs and log files are manually archived into a tarball using Zstandard compression. The tarball is sent back to Reformatter with the rsync command and extracted into a specified directory on Reformatter.
4.
Results and Discussion
4.1.
Reprocessing with HPC PPL Version 1
In March 2024, we reprocessed the 80 OBSIDs observed during the commissioning period using the initial version of the HPC PPL. The results are summarized in the first row of Table 2. It was estimated to take 218.3 h to complete the reprocessing tasks on the Reformatter with no parallelization; the time required to create the RPT files was estimated to be of the total. The HPC PPL completed them in 26.3 h (the latest log file update time among all 80 jobs minus the earliest creation time), although it failed to process 11 OBSIDs (Sec. 4.2 for the cause), which were later reprocessed on Reformatter. When we define speedup as
Eq. (5)
Table 2
Number of OBSIDs for reprocessing and the statistics.
Project period | HPC PPL versionb | OBSIDsc | Reformatter (h)d | TOKI-RURI (h)e | Failed OBSIDsf |
---|---|---|---|---|---|
Commissioning | 1 | 80 | 218.3 | 26.3 | 11 |
Commissioning + PV | 2 | 161 | 515.5 | 15.2a | 0 |
aDisk images formatted to ext3 were used for working areas.
bThe version number of HPC PPL.
cThe number of OBSIDs for reprocessing.
dEstimated total computing time in units of hours on Reformatter.
eActual total computing time in units of hours on TOKI-RURI.
fThe number of OBSIDs that failed to be reprocessed.
4.2.
Cause of the Failures
Shortly after completing the reprocessing tasks, we investigated the causes of the failures in detail. We successfully reproduced these errors on a PC for container image creation. To convert the CCSDS packets in an RPT file into raw values, we used a set of programs provided by C-SODA. For some of them, no source code was accessible for SOC, and some of these without the source code were 32-bit binaries. One of these 32-bit programs invokes the others and communicates with them through standard input/output. Because these 32-bit programs are unaware of the 64-bit inode numbers, which is the case for XFS and Lustre FS, they fail to read their configuration files, open the input and output files, and spawn others onto these FSes when their inode numbers exceed the 32-bit range. One of the advantages of containers is that we can perform detailed analyses on our own PCs, in which various debugging tools are installed.
4.3.
Improvements Toward Version 2
Our solution to this problem is straightforward: we used disk images formatted to ext3, where everything was 32-bit, as working areas. We attached two working disk images to each container instance with the "--bind" option: one 64 GiB for the PPL software and another 512 GiB for the RPT and FFFs because the option does not accept the mapping from a disk image on the host onto two different directories inside the container instance. Although copying the PPL software inside the container image onto an ext3 disk image for each container instance would be wasteful, this reduces human errors during container image creation. Because PPL is required to process continuous observations for up to 10 days and we have never experienced such a condition, we could not determine whether 512 GiB is excessive.
Using a disk image as the working area is also expected to improve the I/O performance of the Lustre FS, which consists of MetaData servers (MDSes) and object storage servers (OSSes). When an append operation occurs in an existing file, the MDSes search for adequate blocks among the storage devices mounted by OSSes and update the metadata shared among the MDSes; when small appends occur extensively in a single file, which is the case for log and FITS file manipulations, the performance of Lustre FS is significantly degraded.24 However, when partial changes occur within a file for which space has already been allocated, the computation for block allocations shall be omitted, leading to minimal performance degradation.
The dd command can create a disk image with general user permission, formatting the image to ext3 with the mkfs.ext3 command requires root permission. Hence, we format the images on the local PC and compress them with the xz command with the "?9" option; this reduces the 64 and 512 GiB images to 10 and 77 MiB, respectively. The compressed working disks and container images are transferred to the delivery directory on the NAS attached to Reformatter. When a container instance is invoked on TOKI-RURI, the PPL software inside the container image and the RPT file on the host are copied to their respective ext3 working disks. After the PPL process is completed and immediately before the container instance is terminated, the FFFs as products and log files are synchronized with the host FS. Considering the time required for synchronization, Eq. (3) is modified as
Eq. (6)
4.4.
Reprocessing with HPC PPL Version 2
In September 2024, soon after the PV period, we reprocessed the 161 OBSIDs observed during the commissioning and PV periods, respectively, using the second version of the HPC PPL. Because running 161 jobs at once could reach the disk quota of the working directory on TOKI-RURI, we submitted them in two batches: 100 + 61; the latter batch was submitted when 90% of the former was completed. The results are summarized in the second row of Table 2. Although it was estimated to take 515.5 h to complete the reprocessing tasks on Reformatter with no parallelization, the HPC PPL completed them within 15.2 h; it is noteworthy that the operator was not constantly monitoring the execution status. The speedup is
Eq. (7)
No errors were observed. From this, the following conclusions were drawn:
• We successfully ported the PPL to the HPC system using Singularity and its "--bind" option.
• A speedup at least was accomplished; 3-week jobs were reduced to about half a day.
• Using a working disk image formatted to ext3 maximizes the Lustre FS’s performance even for software designed for a standard Linux system, not intended for an HPC system; Eq. (3) can finally be reduced to .
4.5.
Bottleneck: Bandwidth of the File Transfers
We mentioned file transfers during reprocessing in September. The size of the tarball included all RPT files created on the Reformatter, as well as the Singularity and working disk images, was 1.6 TiB. The size of the tarball containing all the products on TOKI-RURI was 1.2 TiB. When the bandwidth of 50 MiB/s is assumed, we obtain 9.3 and 7.0 h for the file transfers, respectively. Even if these are added to the computing time of TOKI-RURI, we could still conclude that the 3-week jobs were reduced to 2 days. On the other hand, a file transfer with parallel TCP streams using the bbcp25 command will shorten this time. However, these connections are now completely blocked by the firewall on the JSS3 side. Reducing the file transfer time will require coordination between departments within JAXA, which is expected to be realized in the future.
5.
Summary
We briefly described the regular data processing in XRISM, which consists of PPL and PL. As both software programs are still being improved and scientists cannot apply the latest versions of PPL and CALDB to their data owing to the nonpublicity of FFFs, SOC has to reprocess all the observation data for each transition during the study period. We developed porting methods to boost the number of reprocessing tasks that enable PPL to run on the TOKI-RURI HPC system using Singularity, which is a container platform. Using these methods, PPL, which only supports RedHat Enterprise Linux 7, can operate on Rocky Linux 8 (TOKI-RURI) and Ubuntu 24.04 (the PC used to create the container image). Using disk images formatted to ext3 as working areas, we obtained a speedup even on the Lustre FS. File transfer over the Internet between Reformatter and TOKI-RUI is currently a bottleneck; however, parallel TCP connections can be used as a solution.
The total amount of observational data is proportional to the duration of the mission. Although we anticipate that XRISM will continue its observations for over three years, no mission schedule has been finalized beyond September 2026. Therefore, there should be OBSIDs, and the reprocessing time on Reformatter is estimated to exceed two months; our methods will reduce it to one week. Furthermore, the proposed methods are highly versatile; based on the methods, it will be ordinal for observational data obtained with a scientific satellite to be processed with an HPC system.