Performance monitoring with PCP¶

Performance Co-Pilot (PCP) is an open source framework and toolkit for monitoring, analyzing, and responding to details of live and historical system performance. The pluggable and distributed nature of PCP can be extended into very complex and robust monitoring architectures. However, in the embedded Automotive use case, monitoring activities are often focused on an isolated time period of data from nodes where you might not want to run ongoing performance metrics captures, and the overhead of a complete deployment of PCP may be unnecessary.

This guide provides a high-level summary for getting started with local data collection using PCP, including installation and setup dependencies, but then focuses on a quick-start method using a containerized PCP instance, which may be more suitable to the relevant use cases.

Installing and running PCP¶

For CentOS/Fedora/RHEL-based systems, the primary pcp package will install all critical dependencies for monitoring. A couple of additional packages including pcp-system-tools and pcp-export-pcp2json provide tools for post-processing and reading the PCP data.

dnf -y install pcp pcp-system-tools pcp-export-pcp2json

After you install PCP, you must start the PMCD daemon. Typically you start the daemon with systemd, or in some cases you might want to start this manually.

manualsystemd

/usr/libexec/pcp/lib/pmcd start

systemctl enable --now pmcd

Running pmlogger¶

The pmlogger creates archive files of performance metric values that can be read and “played back” after the data collection. This is the key tool that you use to monitor a system. Data will be collected into the archive files for the duration of the time that pmlogger is running.

Note

The pmlogger uses a configuration file that can be used to customize the archive file. In most cases, the default configuration will likely suffice, and modifying the configuration file is an advanced topic that is outside the scope of this document.

The pmlogger can also be started either manually or via systemd. Typically you are looking for fine-grained control of when pmlogger starts and stops so that you can isolate the time frame of your data collection and define the location of the output archive file. Starting pmlogger manually is best to achieve this control.

manualsystemd

Start the pmlogger

/usr/bin/pmlogger pmlogger-out

The pmlogger in the case runs interactively and can be stopped with a keyboard interrupt signal (Ctrl-c).

Start the pmlogger

systemctl start pmlogger

Stop the pmlogger

systemctl stop pmlogger

The pmlogger generates its primary output files as index-numbered volumes, for instance pmlogger-out.0 and pmlogger-out.1 based on the manual example above. When leveraging any of the tools to read the archive, simply refer to the archive by its root file name, pmlogger-out in this example.

Reading the pmlogger archive¶

A number of tools are available to post-process and display the data of the pmlogger archive files. Most of these tools leverage a pmrep configuration file (or directory of configuration files). These configuration files are designed to manipulate and format the archive data into particular structures. The RPM installation of PCP includes a very robust set of configuration files under /etc/pcp/pmrep/, which provide output formats matching other well-known tools like mpstat, sar, and collectl.

Note

Customizing the pmrep configuration file is an advanced topic that is outside the scope of this document.

As an example, you can get sar-formatted output from the archive file with something like this:

pmrep -a pmlogger-out :sar
                CPU     %user     %nice   %system   %iowait    %steal     %idle
13:10:53          =      3.91      0.00      0.30      0.40      0.00     95.03
13:10:54          =      3.92      0.00      0.29      0.41      0.00     95.03
13:10:55          =      3.91      0.00      0.30      0.40      0.00     95.03
13:10:56          =      3.91      0.00      0.30      0.40      0.00     95.03

Or you can request individual metrics of your choosing without any format post-processing with something like this:

pmrep -a pmlogger-out kernel.cpu.util.user kernel.cpu.util.sys kernel.cpu.util.wait kernel.cpu.util.idle
  k.c.u.user  k.c.u.sys  k.c.u.wait  k.c.u.idle
       3.908      0.300       0.400      95.033
       3.917      0.292       0.408      95.033
       3.908      0.300       0.400      95.033
       3.908      0.300       0.400      95.033

Other archive processing tools can convert the data into useful machine-readable formats such as CSV or JSON while leveraging the same pmrep configuration files:

CSVJSON

pcp2csv -l, -a pmlogger-out :sar
Time,"kernel.cpu.util.user","kernel.cpu.util.nice","kernel.cpu.util.sys","kernel.cpu.util.wait","kernel.cpu.util.steal","kernel.cpu.util.idle"
2024-10-07 13:10:53,3.91,0.00,0.30,0.40,0.00,95.03
2024-10-07 13:10:54,3.92,0.00,0.29,0.41,0.00,95.03
2024-10-07 13:10:55,3.91,0.00,0.30,0.40,0.00,95.03
2024-10-07 13:10:56,3.91,0.00,0.30,0.40,0.00,95.03

pcp2json -a pmlogger-out kernel.cpu.util.user kernel.cpu.util.sys kernel.cpu.util.wait kernel.cpu.util.idle
{
"@pcp": {
    "@hosts": [
    {
        "@host": "a0f9c1690267",
        "@metrics": [
        {
            "@interval": "60",
            "@timestamp": "2024-10-07 13:11:51",
            "kernel": {
            "cpu": {
                "util": {
                "idle": {
                    "value": "95.033"
                },
                "sys": {
                    "value": "0.299"
                },
                "user": {
                    "value": "3.911"
                },
                "wait": {
                    "value": "0.403"
                }
                }
            }
            }
        }
        ],
        "@source": "pmlogger-out",
        "@timezone": "UTC+0"
    }
    ]
}
}

Quick start: running PCP containerized¶

Because the installation and configuration above can be cumbersome, particularly in environments that are regularly re-provisioned or are otherwise ephemeral, instead you can use a containerized PCP deployment that allows you to concentrate immediately on collecting and processing the data without having to worry about the setup of PCP.

This containerized method uses an Arcaflow plugin and focuses on being a machine-executable and -readable process, expecting YAML or JSON input and providing YAML output. The full documentation for the plugin including its input and output API spec are availble in the github repo here. The container build of the plugin is maintained in quay.io here for multiple architectures.

Create the input file¶

The plugin input file defines how you want to collect the data from PCP and display it. Typically, it is a good idea to “flatten” the output, which makes the YAML format better compatible with archive and indexing systems, and when running interactively you might also want to generate the CSV output. Additionally, define the pmrep metrics that you want to display, and the collection interval for the metrics.

Example input.yaml file

flatten: true
generate_csv: true
pmlogger_interval: 1.0
pmlogger_metrics: |
  kernel.uname, hinv.ncpu, mem.physmem, disk.dev.scheduler, kernel.percpu.cpu.vuser,
  kernel.percpu.cpu.nice, kernel.percpu.cpu.sys, kernel.percpu.cpu.wait,
  kernel.percpu.cpu.steal, kernel.percpu.cpu.idle, mem.util.dirty, swap.in,
  swap.pagesin, swap.out, swap.pagesout,mem.freemem, mem.util.available,
  mem.util.used, mem.util.bufmem, mem.util.cached, mem.util.active, mem.util.inactive,
  kernel.cpu.util.user, kernel.cpu.util.nice, kernel.cpu.util.sys,
  kernel.cpu.util.wait, kernel.cpu.util.steal, kernel.cpu.util.idle, disk.all.total,
  disk.all.read, disk.all.write, disk.all.blkread, disk.all.blkwrite,
  network.interface.in.packets, network.interface.out.packets,
  network.interface.in.bytes, network.interface.out.bytes

Tip

You can also provide a timeout value in seconds if you know explicitly how long you want pmlogger to collect data. However, you generally want to run this interactively and stop the collection with a keyboard interrupt signal (Ctrl-c).

Run the PCP plugin¶

The plugin itself is written in python, and it expects the input file in the context of the container. You can accomplish this by redirecting the input from the host system to the container. This requires running the container with -i interactive mode, and using a - as the file input parameter to indicate the redirected input. You must also tell the plugin what step to run (-s), which in this case is the run-pcp step.

With the example input file above, the data collection will run indefinitely until the interactive container receives a keyboard interrupt signal (Ctrl-c). The requested CSV format will be provided in the debug output of the plugin after the formatted YAML output.

Note

The example output below is truncated for brevity.

podman run -i --rm quay.io/arcalot/arcaflow-plugin-pcp:latest -s run-pcp -f - < input.yaml
^Coutput_id: success
output_data:
  pcp_output:
  - Time: '2024-10-07T14:27:14.722408+00:00'
    kernel.cpu.util.user: '0.616667'
    kernel.cpu.util.nice: '0.000000'
    kernel.cpu.util.sys: '0.283333'
    kernel.cpu.util.wait: '0.808333'
    kernel.cpu.util.steal: '0.000000'
    kernel.cpu.util.idle: '98.108333'
    ...
  - Time: '2024-10-07T14:27:15.722408+00:00'
    kernel.cpu.util.user: '4.966667'
    kernel.cpu.util.nice: '0.000000'
    kernel.cpu.util.sys: '0.816667'
    kernel.cpu.util.wait: '0.025000'
    kernel.cpu.util.steal: '0.000000'
    kernel.cpu.util.idle: '92.866667'
    ...
debug_logs: 'Generating default pmlogger configuration file

  Using default /etc/pcp/pmrep configuration directory

  Gathering data... Use Ctrl-C to stop.

  Received keyboard interrupt; Stopping data collection.

  Reporting metrics for: kernel.uname, hinv.ncpu, mem.physmem, disk.dev.scheduler,
  kernel.percpu.cpu.vuser,
  kernel.percpu.cpu.nice, kernel.percpu.cpu.sys, kernel.percpu.cpu.wait,
  kernel.percpu.cpu.steal, kernel.percpu.cpu.idle, mem.util.dirty, swap.in,
  swap.pagesin, swap.out, swap.pagesout,mem.freemem, mem.util.available,
  mem.util.used, mem.util.bufmem, mem.util.cached, mem.util.active, mem.util.inactive,
  kernel.cpu.util.user, kernel.cpu.util.nice, kernel.cpu.util.sys,
  kernel.cpu.util.wait, kernel.cpu.util.steal, kernel.cpu.util.idle, disk.all.total,
  disk.all.read, disk.all.write, disk.all.blkread, disk.all.blkwrite,
  network.interface.in.packets, network.interface.out.packets,
  network.interface.in.bytes, network.interface.out.bytes

  Time,"kernel.cpu.util.user","kernel.cpu.util.nice","kernel.cpu.util.sys","kernel.cpu.util.wait","kernel.cpu.util.steal","kernel.cpu.util.idle",...
  2024-10-07T14:27:14.722408+00:00,0.616667,0.000000,0.283333,0.808333,0.000000,98.108333,...
  2024-10-07T14:27:15.722408+00:00,4.966667,0.000000,0.816667,0.025000,0.000000,92.866667,...
  '

Tip

Often you might want to “wrap” the PCP data collection around a workload. In this case, you can start the plugin container as above, collect the PID of the PCP container, run the workload that you are interested in, and then send the keyboard interrupt signal from outside with:

kill -SIGINT $PID

This ends up being very straightforward to incorporate into a script.

Container privileges¶

Most of the relevant system metrics are recorded from the /proc and /sys kernel file systems, which are by default readable by an unprivileged container. This means that the PCP plugin container typically does not need to execute with any escalated privileges in order to report host system metrics accurately. That said, in some cases it may be important to increase privileges and/or adjust namespaces for the PCP plugin container in order to access the metrics you need. YMMV

Example: remote workload and PCP metrics collection¶

In this example, you can find a complete performance testing and data collection solution for one or more remote nodes. The performance workloads used here are designed to stress CPU, memory, and storage subsystems by leveraging the stress-ng and fio tools. PCP is run concurrently with the workloads to collect the system metric time series data. This is all orchestrated through an Arcaflow workflow for maintainability and portability.

Prerequisites¶

Remote execution leverages the podman remote functionality, which provides for a local client to interact with a Podman backend node through a RESTful API tunneled through a SSH connection.

On the target remote nodes, start the podman socket:
```
systemctl start podman.socket
```
Establish SSH key-based authentication between the local host and the target remote nodes:
```
ssh-copy-id -f -i <path to your ssh public key> <username>@<remote host>
```
Add the remote podman connections to the local host’s containers.conf file (usually /etc/containers/containers.conf) for persistence (the example remote_node_1 name here is arbitrary and user-defined). Repeat this for every target remote node:
```
cat << EOF >> containers.conf
[engine.service_destinations.remote_node_1]
uri="ssh://<username>@<remote host>/run/podman/podman.sock"
identity="<path to your ssh private key>"
EOF
```
Check that the remote podman connections are configured correctly:
```
podman system connection list
```
On the local host, download the Arcaflow engine binary for your platform from Arcaflow engine releases.

Warning

The Arcaflow plugins are not compatible with with a read-only container configuration. In the containers.conf files on the target remote nodes, ensure that read_only=true is not set.

Considerations for a mixed-criticality architecture¶

There are numerous configurations in which mixed-criticality environments might be deployed together on a platform. In likely all of those configurations, there is expected to be a strict level of separation between the two environments such that each environment operates in its own operating system resource context with its own network namespace. For the sake of remote performance testing, this means that you must treat each environment as if it is an independent system. You must follow the Prerequisites for each environment, defining podman remote connections to them individually. As an example, your local containers.conf might include something like this:

containers.conf

[engine.service_destinations.remote_asil_b]
uri="ssh://root@asil_b.example.com/run/podman/podman.sock"
identity="/root/.ssh/id_rsa"

[engine.service_destinations.remote_qm]
uri="ssh://root@qm.example.com/run/podman/podman.sock"
identity="/root/.ssh/id_rsa"

Then as you will see later in the Arcaflow config file section, you must choose which of the remote connections you want to target for performance testing by its destination name defined in the containers.conf file (remote_asil_b or remote_qm in this example).

Nested container special case¶

One recommended architecture for deploying the mixed criticality environments together is to use systemd slices and nested containers such that the QM is deployed as an isolated container within the parent operating system occupied by a critical environment. In this case, it is possible that observing system metrics in the critical environment will show the resource utilization of workloads running in the QM environment. This is because the QM environment is using a constrained subset of the parent resources of the critical environment. The opposite is likely not true in this case – The QM environment would not be able to see any resource utilization from the critical environment.

Example Mixed Criticality Nested Container Architecture

Arcaflow workflow¶

The Arcaflow workflow files define the steps taken to execute the performance tests and collect the data, as well as the input schema and the output structure. All steps are run as containers, and all data is returned in YAML format. This is a highly-configurable structure, and what is provided in this example can be extended to your needs.

Tip

A much more in-depth workflow with more workloads and detailed metadata collection is maintained in GitLab by the Red Hat Performance & Scale team: https://gitlab.com/redhat/edge/tests/perfscale/arcaflow-workflow-auto-perf

The primary workflow.yaml file is the outer structure of the workflow. Within it are defined two inner loops, one for the stress-ng workloads and one for the fio workloads.

Note

The fio_loop step has a wait_for dependency that ensures this loop only starts after the stressng_loop has completed. Both loops also have a parellelism: 1 parameter that ensures each loop iteration completes before the next one starts. These options together ensure that all workloads are run serially.

workflow.yaml

version: v0.2.0

steps:
  # Run the loop of stress-ng sub-workflows
  stressng_loop:
    kind: foreach
    items: !expr 'bindConstants($.input.stressng_tests, $.input.pcp_params)'
    workflow: workflow-stressng.yaml
    parallelism: 1

  # Run the loop of fio sub-workflows
  fio_loop:
    kind: foreach
    items: !expr 'bindConstants($.input.fio_tests, $.input.pcp_params)'
    workflow: workflow-fio.yaml
    parallelism: 1
    # Don't start this step until after the stressng_loop step completes
    wait_for: !expr $.steps.stressng_loop.outputs

input:
  root: WorkflowInput
  objects:
    WorkflowInput:
      id: WorkflowInput
      properties:
        pcp_params:
          type:
            id: PcpInputParams
            type_id: ref
            namespace: $.steps.stressng_loop.execute.inputs.items.constant
        stressng_tests:
          type:
            type_id: list
            items:
              id: StressNGParams
              type_id: ref
              namespace: $.steps.stressng_loop.execute.inputs.items.item
        fio_tests:
          type:
            type_id: list
            items:
              id: FioInput
              type_id: ref
              namespace: $.steps.fio_loop.execute.inputs.items.item

outputs:
  success:
    stressng_workload: !expr $.steps.stressng_loop.outputs.success.data
    fio_workload: !expr $.steps.fio_loop.outputs.success.data

The inner workflow for the stress-ng workload loop is defined in the workflow-stressng.yaml file. This workflow wraps the PCP data collection around the stress-ng load with some wait times added to ensure that the time series data collection is complete.

workflow-stressng.yaml

version: v0.2.0

steps:
  # Start the PCP data collection
  pcp:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-pcp:0.10.0
    step: run-pcp
    input: !expr $.input.constant
    closure_wait_timeout: 60000
    # Stop the PCP data collection after the post_wait step completes
    stop_if: !expr $.steps.post_wait.outputs

  # Wait the specified milliseconds before starting the stress-ng workload
  pre_wait:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-utilities:0.6.0
    step: wait
    input:
      wait_time_ms: 10000
    # Don't start this step until after the pcp step has started
    wait_for: !expr $.steps.pcp.starting.started

  # Start the stress-ng workload
  stressng:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-stressng:0.8.0
    step: workload
    input: !expr $.input.item
    # Don't start this step until after the pre_wait has completed
    wait_for: !expr $.steps.pre_wait.outputs

  # Wait the specified milliseconds after the stress-ng workload succeeds
  post_wait:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-utilities:0.6.0
    step: wait
    input:
      wait_time_ms: 10000
    # Don't start this step until after the stressng step completes
    wait_for: !expr $.steps.stressng.outputs

input:
  root: StressNGParams__PcpInputParams
  objects:
    StressNGParams__PcpInputParams:
      id: StressNGParams__PcpInputParams
      properties:
        constant:
          display:
            description: The parameters for the PCP workload
            name: PCP parameters
          type:
            type_id: ref
            id: PcpInputParams
            namespace: $.steps.pcp.starting.inputs.input
        item:
          display:
            description: The parameters for the stressng workload
            name: stressng parameters
          type:
            type_id: ref
            id: StressNGParams
            namespace: $.steps.stressng.starting.inputs.input
          required: true

outputs:
  success:
    test_results: !expr $.steps.stressng.outputs.success
    pcp_time_series: !expr $.steps.pcp.outputs.success.pcp_output

The inner workflow for the fio workload loop defined in workflow-fio.yaml is nearly identical to the stress-ng workload loop.

workflow-fio.yaml

version: v0.2.0

steps:
  # Start the PCP data collection
  pcp:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-pcp:0.10.0
    step: run-pcp
    input: !expr $.input.constant
    closure_wait_timeout: 60000
    # Stop the PCP data collection after the post_wait step completes
    stop_if: !expr $.steps.post_wait.outputs

  # Wait the specified milliseconds before starting the fio workload
  pre_wait:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-utilities:0.6.0
    step: wait
    input:
      wait_time_ms: 10000
    # Don't start this step until after the pcp step has started
    wait_for: !expr $.steps.pcp.starting.started

  # Start the fio workload
  fio:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-fio:0.4.0
    input: !expr $.input.item
    # Don't start this step until after the pre_wait has completed
    wait_for: !expr $.steps.pre_wait.outputs

  # Wait the specified milliseconds after the fio workload succeeds
  post_wait:
    plugin:
      deployment_type: image
      src: quay.io/arcalot/arcaflow-plugin-utilities:0.6.0
    step: wait
    input:
      wait_time_ms: 10000
    # Don't start this step until after the fio step completes
    wait_for: !expr $.steps.fio.outputs

input:
  root: FioInput__PcpInputParams
  objects:
    FioInput__PcpInputParams:
      id: FioInput__PcpInputParams
      properties:
        constant:
          display:
            description: The parameters for the PCP workload
            name: PCP parameters
          type:
            type_id: ref
            id: PcpInputParams
            namespace: $.steps.pcp.starting.inputs.input
        item:
          display:
            description: The parameters for the fio workload
            name: fio parameters
          type:
            type_id: ref
            id: FioInput
            namespace: $.steps.fio.starting.inputs.input
          required: true

outputs:
  success:
    test_results: !expr $.steps.fio.outputs.success
    pcp_time_series: !expr $.steps.pcp.outputs.success.pcp_output

Arcaflow input¶

In order to run the Arcaflow workflow, you must provide input that matches the input: section schema definition in the workflow.yaml file. This example input.yaml file sets a variety of PCP parameters to collect, and defines several stress-ng and fio tests to run. These input parameters are tunable to your needs.

Tip

The stressng_tests and fio_tests inputs are list objects, and each item in the list is a completely separate run of the respective sub-workflow.

input.yaml

pcp_params:
  flatten: true
  pmlogger_interval: 1.0
  pmlogger_metrics: |
    swap.in, swap.pagesin, swap.out, swap.pagesout,
    mem.util.dirty, mem.util.available, mem.util.used, mem.util.bufmem,
    mem.util.cached, mem.util.active, mem.util.inactive,
    kernel.cpu.util.user, kernel.cpu.util.nice, kernel.cpu.util.sys,
    kernel.cpu.util.wait, kernel.cpu.util.steal, kernel.cpu.util.idle,
    disk.all.read, disk.all.write, disk.all.blkread, disk.all.blkwrite,
    network.interface.in.packets, network.interface.out.packets,
    network.interface.in.bytes, network.interface.out.bytes
stressng_tests:
  # CPU tests
  - timeout: 60
    stressors:
      - stressor: cpu
        workers: 1
  - timeout: 60
    stressors:
      - stressor: cpu
        workers: 2
  - timeout: 60
    stressors:
      - stressor: cpu
        workers: 4
  # Memory tests
  - timeout: 60
    page-in: True
    stressors:
      - stressor: vm
        workers: 1
        vm-bytes: 12.5%
      - stressor: mmap
        workers: 1
        mmap-bytes: 12.5%
  - timeout: 60
    page-in: True
    stressors:
      - stressor: vm
        workers: 2
        vm-bytes: 25%
      - stressor: mmap
        workers: 2
        mmap-bytes: 25%
  - timeout: 60
    page-in: True
    stressors:
      - stressor: vm
        workers: 4
        vm-bytes: 37.5%
      - stressor: mmap
        workers: 4
        mmap-bytes: 37.5%
fio_tests:
  - jobs:
    - name: sensor-data-logs-small-files
      params:
        ioengine: libaio
        direct: 1
        runtime: 60
        time_based: 1
        size: 1024k
        readwrite: randrw
        blocksize: 4k
        rwmixread: 70
        iodepth: 16
        numjobs: 4
  - jobs:
    - name: multimedia-data-large-files
      params:
        ioengine: libaio
        direct: 1
        runtime: 60
        time_based: 1
        size: 1024k
        readwrite: write
        blocksize: 1m
        iodepth: 64
        numjobs: 2
  - jobs:
    - name: navigation-data-sequential-read
      params:
        ioengine: libaio
        direct: 1
        runtime: 60
        time_based: 1
        size: 1024k
        readwrite: read
        blocksize: 512k
        iodepth: 32
        numjobs: 2
  - jobs:
    - name: unpredictable-application-behavior
      params:
        ioengine: libaio
        direct: 1
        runtime: 60
        time_based: 1
        size: 1024k
        readwrite: randrw
        blocksize_range: 8k-128k
        rwmixread: 50
        iodepth: 32
        numjobs: 4

Testing the workflow locally¶

The default deployment target for Arcaflow’s step containers is the local host’s podman runtime. You can run the workflow locally from the directory where your YAML files are stored simply as shown below.

Tip

The YAML output of the workflow is extensive with all of the PCP data collected along with all of the individual workload output. It is recommended to redirect or tee the output of the workflow.

arcaflow --input input.yaml | tee output.yaml

Arcaflow config file¶

You can use the config.yaml file to override Arcaflow’s default configuration. In this example, use the config file to set the deployment target to a podman remote connection that you defined in the Prerequisites section. Also enable workflow and plugin debug logging for more runtime feedback.

Note

The connectionName in the config file must match a [engine.service_destinations.<connection name>] that you defined in the local containers.conf file in the Prerequisites section.

config.yaml

log:
  level: debug
logged_outputs:
  error:
    level: debug
deployers:
  image:
    deployer_name: podman
    podman:
      connectionName: remote_node_1

Running the workflow remotely¶

You can now pass the config file defined above to the arcaflow command, which will result in the use of the defined remote podman connection as the deployment target.

arcaflow --input input.yaml --config config.yaml | tee output.yaml

Arcaflow output¶

The output of the Arcaflow workflow is sent to stdout as YAML. What is returned and its structure is determined by the configurations of the individual steps and the definition in the outputs: sections of the workflow files. The example here returns two objects in the output_data output: stressng_workload and fio_workload. Each of those has a list of two sub-objects, one list entry for each workload loop: test_results and pcp_time_series.

Each test_results sub-object has the output of the particular workload, either stress-ng or fio. Each pcp_time_series sub-object has a time-series list of the selected PCP system metrics from the input.yaml file in the time resolution also defined there (1.0 second in this example). As machine-readable structured output, the PCP time-series data can be extracted and fed into your preferred plotting or dashboarding system.

output.yaml structure

output_data:
    stressng_workload:
        - test_results:
            ...
          pcp_time_series:
            - ...
    fio_workload:
        - test_results:
            ...
          pcp_time_series:
            - ...
output_id: success

Appendix: Arcaflow workflow diagrams¶

parent workflow¶

%% Mermaid markdown workflow
flowchart LR
%% Success path
input-->steps.fio_loop.execute
input-->steps.stressng_loop.execute
steps.fio_loop.closed-->steps.fio_loop.closed.result
steps.fio_loop.disabled-->steps.fio_loop.disabled.output
steps.fio_loop.enabling-->steps.fio_loop.closed
steps.fio_loop.enabling-->steps.fio_loop.disabled
steps.fio_loop.enabling-->steps.fio_loop.enabling.resolved
steps.fio_loop.enabling-->steps.fio_loop.execute
steps.fio_loop.execute-->steps.fio_loop.outputs
steps.fio_loop.outputs-->steps.fio_loop.outputs.success
steps.fio_loop.outputs.success-->outputs.success
steps.stressng_loop.closed-->steps.stressng_loop.closed.result
steps.stressng_loop.disabled-->steps.stressng_loop.disabled.output
steps.stressng_loop.enabling-->steps.stressng_loop.closed
steps.stressng_loop.enabling-->steps.stressng_loop.disabled
steps.stressng_loop.enabling-->steps.stressng_loop.enabling.resolved
steps.stressng_loop.enabling-->steps.stressng_loop.execute
steps.stressng_loop.execute-->steps.stressng_loop.outputs
steps.stressng_loop.outputs-->steps.fio_loop.execute
steps.stressng_loop.outputs-->steps.stressng_loop.outputs.success
steps.stressng_loop.outputs.success-->outputs.success

stress-ng sub-workflow¶

%% Mermaid markdown workflow
flowchart LR
%% Success path
input-->steps.pcp.starting
input-->steps.stressng.starting
steps.pcp.cancelled-->steps.pcp.closed
steps.pcp.cancelled-->steps.pcp.outputs
steps.pcp.closed-->steps.pcp.closed.result
steps.pcp.deploy-->steps.pcp.closed
steps.pcp.deploy-->steps.pcp.starting
steps.pcp.disabled-->steps.pcp.disabled.output
steps.pcp.enabling-->steps.pcp.closed
steps.pcp.enabling-->steps.pcp.disabled
steps.pcp.enabling-->steps.pcp.enabling.resolved
steps.pcp.enabling-->steps.pcp.starting
steps.pcp.outputs-->steps.pcp.outputs.success
steps.pcp.outputs.success-->outputs.success
steps.pcp.running-->steps.pcp.closed
steps.pcp.running-->steps.pcp.outputs
steps.pcp.starting-->steps.pcp.closed
steps.pcp.starting-->steps.pcp.running
steps.pcp.starting-->steps.pcp.starting.started
steps.pcp.starting.started-->steps.pre_wait.starting
steps.post_wait.cancelled-->steps.post_wait.closed
steps.post_wait.cancelled-->steps.post_wait.outputs
steps.post_wait.closed-->steps.post_wait.closed.result
steps.post_wait.deploy-->steps.post_wait.closed
steps.post_wait.deploy-->steps.post_wait.starting
steps.post_wait.disabled-->steps.post_wait.disabled.output
steps.post_wait.enabling-->steps.post_wait.closed
steps.post_wait.enabling-->steps.post_wait.disabled
steps.post_wait.enabling-->steps.post_wait.enabling.resolved
steps.post_wait.enabling-->steps.post_wait.starting
steps.post_wait.outputs-->steps.pcp.cancelled
steps.post_wait.outputs-->steps.post_wait.outputs.success
steps.post_wait.running-->steps.post_wait.closed
steps.post_wait.running-->steps.post_wait.outputs
steps.post_wait.starting-->steps.post_wait.closed
steps.post_wait.starting-->steps.post_wait.running
steps.post_wait.starting-->steps.post_wait.starting.started
steps.pre_wait.cancelled-->steps.pre_wait.closed
steps.pre_wait.cancelled-->steps.pre_wait.outputs
steps.pre_wait.closed-->steps.pre_wait.closed.result
steps.pre_wait.deploy-->steps.pre_wait.closed
steps.pre_wait.deploy-->steps.pre_wait.starting
steps.pre_wait.disabled-->steps.pre_wait.disabled.output
steps.pre_wait.enabling-->steps.pre_wait.closed
steps.pre_wait.enabling-->steps.pre_wait.disabled
steps.pre_wait.enabling-->steps.pre_wait.enabling.resolved
steps.pre_wait.enabling-->steps.pre_wait.starting
steps.pre_wait.outputs-->steps.pre_wait.outputs.success
steps.pre_wait.outputs-->steps.stressng.starting
steps.pre_wait.running-->steps.pre_wait.closed
steps.pre_wait.running-->steps.pre_wait.outputs
steps.pre_wait.starting-->steps.pre_wait.closed
steps.pre_wait.starting-->steps.pre_wait.running
steps.pre_wait.starting-->steps.pre_wait.starting.started
steps.stressng.cancelled-->steps.stressng.closed
steps.stressng.cancelled-->steps.stressng.outputs
steps.stressng.closed-->steps.stressng.closed.result
steps.stressng.deploy-->steps.stressng.closed
steps.stressng.deploy-->steps.stressng.starting
steps.stressng.disabled-->steps.stressng.disabled.output
steps.stressng.enabling-->steps.stressng.closed
steps.stressng.enabling-->steps.stressng.disabled
steps.stressng.enabling-->steps.stressng.enabling.resolved
steps.stressng.enabling-->steps.stressng.starting
steps.stressng.outputs-->steps.post_wait.starting
steps.stressng.outputs-->steps.stressng.outputs.success
steps.stressng.outputs.success-->outputs.success
steps.stressng.running-->steps.stressng.closed
steps.stressng.running-->steps.stressng.outputs
steps.stressng.starting-->steps.stressng.closed
steps.stressng.starting-->steps.stressng.running
steps.stressng.starting-->steps.stressng.starting.started

fio sub-workflow¶

%% Mermaid markdown workflow
flowchart LR
%% Success path
input-->steps.fio.starting
input-->steps.pcp.starting
steps.fio.cancelled-->steps.fio.closed
steps.fio.cancelled-->steps.fio.outputs
steps.fio.closed-->steps.fio.closed.result
steps.fio.deploy-->steps.fio.closed
steps.fio.deploy-->steps.fio.starting
steps.fio.disabled-->steps.fio.disabled.output
steps.fio.enabling-->steps.fio.closed
steps.fio.enabling-->steps.fio.disabled
steps.fio.enabling-->steps.fio.enabling.resolved
steps.fio.enabling-->steps.fio.starting
steps.fio.outputs-->steps.fio.outputs.success
steps.fio.outputs-->steps.post_wait.starting
steps.fio.outputs.success-->outputs.success
steps.fio.running-->steps.fio.closed
steps.fio.running-->steps.fio.outputs
steps.fio.starting-->steps.fio.closed
steps.fio.starting-->steps.fio.running
steps.fio.starting-->steps.fio.starting.started
steps.pcp.cancelled-->steps.pcp.closed
steps.pcp.cancelled-->steps.pcp.outputs
steps.pcp.closed-->steps.pcp.closed.result
steps.pcp.deploy-->steps.pcp.closed
steps.pcp.deploy-->steps.pcp.starting
steps.pcp.disabled-->steps.pcp.disabled.output
steps.pcp.enabling-->steps.pcp.closed
steps.pcp.enabling-->steps.pcp.disabled
steps.pcp.enabling-->steps.pcp.enabling.resolved
steps.pcp.enabling-->steps.pcp.starting
steps.pcp.outputs-->steps.pcp.outputs.success
steps.pcp.outputs.success-->outputs.success
steps.pcp.running-->steps.pcp.closed
steps.pcp.running-->steps.pcp.outputs
steps.pcp.starting-->steps.pcp.closed
steps.pcp.starting-->steps.pcp.running
steps.pcp.starting-->steps.pcp.starting.started
steps.pcp.starting.started-->steps.pre_wait.starting
steps.post_wait.cancelled-->steps.post_wait.closed
steps.post_wait.cancelled-->steps.post_wait.outputs
steps.post_wait.closed-->steps.post_wait.closed.result
steps.post_wait.deploy-->steps.post_wait.closed
steps.post_wait.deploy-->steps.post_wait.starting
steps.post_wait.disabled-->steps.post_wait.disabled.output
steps.post_wait.enabling-->steps.post_wait.closed
steps.post_wait.enabling-->steps.post_wait.disabled
steps.post_wait.enabling-->steps.post_wait.enabling.resolved
steps.post_wait.enabling-->steps.post_wait.starting
steps.post_wait.outputs-->steps.pcp.cancelled
steps.post_wait.outputs-->steps.post_wait.outputs.success
steps.post_wait.running-->steps.post_wait.closed
steps.post_wait.running-->steps.post_wait.outputs
steps.post_wait.starting-->steps.post_wait.closed
steps.post_wait.starting-->steps.post_wait.running
steps.post_wait.starting-->steps.post_wait.starting.started
steps.pre_wait.cancelled-->steps.pre_wait.closed
steps.pre_wait.cancelled-->steps.pre_wait.outputs
steps.pre_wait.closed-->steps.pre_wait.closed.result
steps.pre_wait.deploy-->steps.pre_wait.closed
steps.pre_wait.deploy-->steps.pre_wait.starting
steps.pre_wait.disabled-->steps.pre_wait.disabled.output
steps.pre_wait.enabling-->steps.pre_wait.closed
steps.pre_wait.enabling-->steps.pre_wait.disabled
steps.pre_wait.enabling-->steps.pre_wait.enabling.resolved
steps.pre_wait.enabling-->steps.pre_wait.starting
steps.pre_wait.outputs-->steps.fio.starting
steps.pre_wait.outputs-->steps.pre_wait.outputs.success
steps.pre_wait.running-->steps.pre_wait.closed
steps.pre_wait.running-->steps.pre_wait.outputs
steps.pre_wait.starting-->steps.pre_wait.closed
steps.pre_wait.starting-->steps.pre_wait.running
steps.pre_wait.starting-->steps.pre_wait.starting.started