Skip to content

Upgrade to 3.4.1 and 7.4.1

1. Upgrade From 7.3/7.4 to 7.4.1

RDAF Infra Upgrade: from 1.0.2 to 1.0.3, 1.0.3.1(haproxy)

RDAF Platform: From 3.3 to 3.4.1

AIOps (OIA) Application: From 7.3 to 7.4.1

RDAF Deployment rdafk8s CLI: From 1.1.10 to 1.2.1

RDAF Client rdac CLI: From 3.3 to 3.4.1

RDAF Infra Upgrade: From 1.0.3.1(haproxy)

RDAF Platform: From 3.4 to 3.4.1

OIA (AIOps) Application: From 7.4 to 7.4.1

RDAF Deployment rdaf CLI: From 1.2.0 to 1.2.1

RDAF Client rdac CLI: From 3.4 to 3.4.1

1.1. Prerequisites

Before proceeding with this upgrade, please make sure and verify the below prerequisites are met.

  • RDAF Deployment CLI version: 1.1.10

  • Infra Services tag: 1.0.2,1.0.2.1(nats, haproxy)

  • Platform Services and RDA Worker tag: 3.3

  • OIA Application Services tag: 7.3,7.3.0.1(event_consumer),7.3.2(alert-ingester)

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

  • Delete alert-model dataset from datasets reports on UI before start upgrade

  • Check all MariaDB nodes are sync on HA setup using below commands before start upgrade

Danger

Upgrading both kafka and mariadb infra services require a downtime to the RDAF platform and application services.

Please proceed to the below steps only after scheduled downtime is approved.

Tip

Please run the below commands on the VM host where RDAF deployment CLI was installed and rdafk8s setup command was run. The mariadb configuration is read from /opt/rdaf/rdaf.cfg file.

MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`
MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`
MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "show status like 'wsrep_local_state_comment';"

Please verify that the mariadb cluster state is in Synced state.

+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

Please run the below command and verify that the mariadb cluster size is 3.

mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
  • RDAF Deployment CLI version: 1.2.0

  • Infra Services tag: 1.0.3

  • Platform Services and RDA Worker tag: 3.4

  • OIA Application Services tag: 7.4

  • CloudFabrix recommends taking VMware VM snapshots where RDA Fabric infra/platform/applications are deployed

Useful Information

Danger

In this release, all of the RDAF Infrastucture services are upgraded. So, it is mandatory to take VM level snapshot before proceeding with the upgrade process.

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Kubernetes: Though Kubernetes based RDA Fabric deployment supports zero downtime upgrade, it is recommended to schedule a maintenance window for upgrading RDAF Platform and AIOps services to newer version.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Kubernetes: Please run the below backup command to take the backup of application data.

rdafk8s backup --dest-dir <backup-dir>

Note: Please make sure this backup-dir is mounted across all infra,cli vms.

Run the below command on RDAF Management system and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)

kubectl get pods -n rda-fabric -l app_category=rdaf-infra
kubectl get pods -n rda-fabric -l app_category=rdaf-platform
kubectl get pods -n rda-fabric -l app_component=rda-worker 
kubectl get pods -n rda-fabric -l app_name=oia 

Danger

In this release, all of the RDAF Infrastucture services are upgraded. So, it is mandatory to take VM level snapshot before proceeding with the upgrade process.

Warning

Make sure all of the above pre-requisites are met before proceeding with the upgrade process.

Warning

Non-Kubernetes: Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

Important

Please make sure full backup of the RDAF platform system is completed before performing the upgrade.

Non-Kubernetes: Please run the below backup command to take the backup of application data.

rdaf backup --dest-dir <backup-dir>
Note: Please make sure this backup-dir is mounted across all infra,cli vms.

  • Verify that RDAF deployment rdaf cli version is 1.2.0 or rdafk8s cli version is 1.1.10 on the VM where CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployments.
rdafk8s --version
rdaf --version
  • On-premise docker registry service version is 1.0.2
docker ps | grep docker-registry
  • RDAF Infrastructure services version is 1.0.2 (rda-nats service version is 1.0.2.1 and rda-minio service version is RELEASE.2022-11-11T03-44-20Z)

Run the below command to get RDAF Infra services details

rdafk8s infra status
  • RDAF Platform services version is 3.3

Run the below command to get RDAF Platform services details

rdafk8s platform status
  • RDAF OIA Application services version is 7.3/7.3.0.1/7.3.2

Run the below command to get RDAF App services details

rdafk8s app status

Run the below command to get RDAF Infra services details

rdaf infra status
  • RDAF Platform services version is 3.4

Run the below command to get RDAF Platform services details

rdaf platform status
  • RDAF OIA Application services version is 7.4

Run the below command to get RDAF App services details

rdaf app status

RDAF Deployment CLI Upgrade:

Please follow the below given steps.

Note

Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.

Login into the VM where rdaf & rdafk8s deployment CLI was installed for docker on-prem registry and managing Kubernetes or Non-kubernetes deployment.

  • Download the RDAF Deployment CLI's newer version 1.2.1 bundle.
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/rdafcli-1.2.1.tar.gz
  • Upgrade the rdaf & rdafk8s CLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz
  • Verify the installed rdaf & rdafk8s CLI version is upgraded to 1.2.1
rdaf --version
rdafk8s --version
  • Download the RDAF Deployment CLI's newer version 1.1.10 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/offline-rhel-1.2.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.2.1.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.2.1
  • Upgrade the rdafCLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/offline-ubuntu-1.2.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.2.1.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.2.1
  • Upgrade the rdafCLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
  • Download the RDAF Deployment CLI's newer version 1.1.10 bundle
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/rdafcli-1.2.1.tar.gz
  • Upgrade the rdaf CLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz
  • Verify the installed rdaf CLI version is upgraded to 1.2.1
rdaf --version
  • Download the RDAF Deployment CLI's newer version 1.2.1 bundle and copy it to RDAF management VM on which rdaf & rdafk8s deployment CLI was installed.
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/offline-rhel-1.2.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-rhel-1.2.1.tar.gz
  • Change the directory to the extracted directory
cd offline-rhel-1.2.1
  • Upgrade the rdafCLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version
wget  https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/offline-ubuntu-1.2.1.tar.gz
  • Extract the rdaf CLI software bundle contents
tar -xvzf offline-ubuntu-1.2.1.tar.gz
  • Change the directory to the extracted directory
cd offline-ubuntu-1.2.1
  • Upgrade the rdafCLI to version 1.2.1
pip install --user rdafcli-1.2.1.tar.gz  -f ./ --no-index
  • Verify the installed rdaf CLI version
rdaf --version
rdafk8s --version

1.2. Download the new Docker Images

Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.

Run the below command to upgrade the registry

rdaf registry upgrade --tag 1.0.3
To fetch registry please use the below command

rdaf registry fetch --tag 1.0.3,1.0.3.1,3.4.1,7.4.1,3.4.1.2,7.4.1.2
rdaf registry fetch --minio-tag RELEASE.2023-09-30T07-02-29Z

Run the below command to upgrade the registry

rdaf registry upgrade --tag 1.0.3
To fetch registry please use the below command

rdaf registry fetch --tag 1.0.3.1,3.4.1,7.4.1,3.4.1.2,7.4.1.2

Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.

rdaf registry list-tags 

Please make sure 1.0.3.1 image tag is downloaded for the below RDAF Infra service.

  • rda-platform-haproxy

Please make sure 1.0.3 image tag is downloaded for the below RDAF Infra service.

  • rda-platform-haproxy
  • rda-platform-kafka
  • rda-platform-zookeeper
  • rda-platform-mariadb
  • rda-platform-opensearch
  • rda-platform-nats
  • rda-platform-busybox
  • rda-platform-nats-box
  • rda-platform-nats-boot-config
  • rda-platform-nats-server-config-reloader
  • rda-platform-prometheus-nats-exporter
  • rda-platform-redis
  • rda-platform-redis-sentinel
  • rda-platform-arangodb-starter
  • rda-platform-kube-arangodb
  • rda-platform-arangodb
  • rda-platform-kubectl
  • rda-platform-logstash
  • rda-platform-fluent-bit

Please make sure RELEASE.2023-09-30T07-02-29Z image tag is downloaded for the below RDAF Infra service.

  • minio

Please make sure 3.4.1 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-registry
  • rda-scheduler
  • rda-collector
  • rda-identity
  • rda-fsm
  • rda-stack-mgr
  • rda-access-manager
  • rda-resource-manager
  • rda-user-preferences
  • onprem-portal
  • onprem-portal-nginx
  • rda-worker-all
  • onprem-portal-dbinit
  • cfxdx-nb-nginx-all
  • rda-event-gateway
  • rda-chat-helper
  • rdac
  • rdac-full
  • cfxcollector

Please make sure 3.4.1.2 image tag is downloaded for the below RDAF Platform services.

  • rda-client-api-server
  • rda-scheduler
  • onprem-portal
  • onprem-portal-nginx

Please make sure 7.4.1 image tag is downloaded for the below RDAF OIA Application services.

  • rda-app-controller
  • rda-alert-processor
  • rda-file-browser
  • rda-smtp-server
  • rda-ingestion-tracker
  • rda-reports-registry
  • rda-ml-config
  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-alert-ingester
  • rda-collaboration
  • rda-notification-service
  • rda-configuration-service
  • rda-irm-service
  • rda-alert-processor-companion

Please make sure 7.4.1.2 image tag is downloaded for the below RDAF OIA Application services.

  • rda-event-consumer
  • rda-webhook-server
  • rda-irm-service
  • rda-smtp-server

Downloaded Docker images are stored under the below path.

/opt/rdaf/data/docker/registry/v2 or /opt/rdaf-registry/data/docker/registry/v2

Run the below command to check the filesystem's disk usage on which docker images are stored.

df -h /opt

Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.

rdaf registry delete-images --tag <tag1,tag2>

1.3. Upgrade Steps

1.3.1 Upgrade RDAF Infra Services

1.3.1.1 Update RDAF Infra/Platform Services Configuration

Please download the below python script (python rdaf_upgrade_1110_121.py)

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/rdaf_upgrade_1110_121.py

Warning

Please verify the python binary version using which RDAF deployment CLI was installed.

ls -l /home/rdauser/.local/lib --> this will show python version as a directory name. (ex: python3.7 or python3.8)

python --version --> The major version (ex: Python 3.7.4 or 3.8.10) should match output from the above.

If it doesn't match, please run the below commands.

sudo mv /usr/bin/python /usr/bin/python_backup

sudo ln -s /usr/bin/python3.7 /usr/bin/python --> Please choose the python binary version using which RDAF deployment CLI was installed. In this example, pythin3.7 binary was used.

Note: If the python version is not either 3.7.x or 3.8.x, please stop the upgrade and contact CloudFabrix support for additional assistance.

Please run the downloaded python upgrade script rdaf_upgrade_1110_121.py as shown below.

The below step will generate *values.yaml.latest files for all RDAF Infrastructure services under /opt/rdaf/deployment-scripts directory.

python rdaf_upgrade_1110_121.py upgrade --no-kafka-upgrade

Please run the below commands to take backup of the values.yaml files of Infrastrucutre and Application services.

cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup
cp /opt/rdaf/deployment-scripts/nats-values.yaml /opt/rdaf/deployment-scripts/nats-values.yaml.backup
cp /opt/rdaf/deployment-scripts/minio-values.yaml /opt/rdaf/deployment-scripts/minio-values.yaml.backup
cp /opt/rdaf/deployment-scripts/mariadb-values.yaml /opt/rdaf/deployment-scripts/mariadb-values.yaml.backup
cp /opt/rdaf/deployment-scripts/opensearch-values.yaml /opt/rdaf/deployment-scripts/opensearch-values.yaml.backup
cp /opt/rdaf/deployment-scripts/kafka-values.yaml /opt/rdaf/deployment-scripts/kafka-values.yaml.backup
cp /opt/rdaf/deployment-scripts/redis-values.yaml /opt/rdaf/deployment-scripts/redis-values.yaml.backup
cp /opt/rdaf/deployment-scripts/arangodb-operator-values.yaml /opt/rdaf/deployment-scripts/arangodb-operator-values.yaml.backup

Update NATs configuration:

Run the below command to copy the upgraded NATs configuration from nats-values.yaml.latest to nats-values.yaml

cp /opt/rdaf/deployment-scripts/nats-values.yaml.latest /opt/rdaf/deployment-scripts/nats-values.yaml

Please update the memory limit value (below highlighted parameters) in /opt/rdaf/deployment-scripts/nats-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/nats-values.yaml.backup file.

Note: Below given values are for a reference only.

nats-values.yaml.backup (existing config) nats-values.yaml (updated config)

---
nats:
  image: 192.168.125.140:5000/rda-platform-nats:1.0.2.1
  pullPolicy: Always
  limits:
    pingInterval: 15s
    maxPings: 2
    maxPayload: 8MB
  tls:
    secret:
      name: rdaf-certs
    ca: ca.cert
    cert: rdaf.cert
    key: rdaf.key
  selectorLabels:
    app: rda-fabric-services
    app_category: rdaf-infra
    app_component: rda-nats
  resources:
    requests:
      memory: 4Gi
    limits:
      memory: 12Gi
bootconfig:
  image: 192.168.125.140:5000/rda-platform-nats-boot-config:1.0.2
natsbox:
  image: 192.168.125.140:5000/rda-platform-nats-box:1.0.2
  nodeSelector:
    rdaf_infra_nats: allow

---
global:
  image:
    pullSecretNames:
    - cfxregistry-cred
  labels:
    app: rda-fabric-services
    app_category: rdaf-infra
    app_component: rda-nats
tlsCA:
  enabled: true
  secretName: rdaf-certs
  key: ca.cert
....
....    
container:
  image:
    repository: 192.168.125.140:5000/rda-platform-nats
    tag: 1.0.3
    pullPolicy: IfNotPresent
  merge:
    livenessProbe:
      initialDelaySeconds: 10
      timeoutSeconds: 5
      periodSeconds: 30
      successThreshold: 1
      failureThreshold: 3
    readinessProbe:
      initialDelaySeconds: 10
      timeoutSeconds: 5
      periodSeconds: 10
      successThreshold: 1
      failureThreshold: 3
    resources:
      requests:
        memory: 4Gi
      limits:
        memory: 12Gi
....
....

Update Minio configuration:

Run the below command to copy the upgraded Minio configuration from minio-values.yaml.latest to minio-values.yaml

cp /opt/rdaf/deployment-scripts/minio-values.yaml.latest /opt/rdaf/deployment-scripts/minio-values.yaml

Please update the memory limit value (below highlighted parameters) in /opt/rdaf/deployment-scripts/minio-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/minio-values.yaml.backup file.

Note: Below given values are for a reference only.

minio-values.yaml.backup (existing config) minio-values.yaml (updated config)

image:
  repository: 192.168.125.140:5000/minio
  tag: RELEASE.2022-11-11T03-44-20Z
  pullPolicy: Always
imagePullSecrets: []
mcImage:
  repository: 192.168.125.140:5000/mc
  tag: RELEASE.2022-11-07T23-47-39Z
  pullPolicy: Always
service:
  type: NodePort
  nodePort: 30443
resources:
  requests:
    memory: 2Gi
  limits:
    memory: 8Gi
persistence:
  enabled: true
  size: 50Gi
  storageClass: "local-storage"
mode: standalone
....
....

---
image:
  repository: 192.168.125.140:5000/minio
  tag: RELEASE.2023-09-30T07-02-29Z
  pullPolicy: IfNotPresent
imagePullSecrets: []
mcImage:
  repository: 192.168.125.140:5000/mc
  tag: RELEASE.2023-09-29T16-41-22Z
  pullPolicy: IfNotPresent
service:
  type: NodePort
  nodePort: 30443
resources:
  requests:
    memory: 2Gi
  limits:
    memory: 8Gi
persistence:
  enabled: true
  size: 50Gi
  storageClass: local-storage
mode: standalone

Update Opensearch configuration:

Run the below command to copy the upgraded Opensearch configuration opensearch-values.yaml.latest to opensearch-values.yaml

cp /opt/rdaf/deployment-scripts/opensearch-values.yaml.latest /opt/rdaf/deployment-scripts/opensearch-values.yaml

Please update the opensearchJavaOpts and memory limit values (below highlighted parameters) in /opt/rdaf/deployment-scripts/opensearch-values.yaml by copying the current value from /opt/rdaf/deployment-scripts/opensearch-values.yaml.backup file.

Note: Below given values are for a reference only.

opensearch-values.yaml.backup (existing config) opensearch-values.yaml (updated config)

singleNode: false
replicas: 3
roles:
  - master
  - ingest
  - data
opensearchJavaOpts: "-Xmx24G -Xms24G"
extraEnvs:
  - name: DISABLE_INSTALL_DEMO_CONFIG
    value: "true"
image:
  repository: 192.168.125.140:5000/rda-platform-opensearch
  tag: 1.0.2
  pullPolicy: Always
imagePullSecrets:
  - name: cfxregistry-cred
service:
  type: NodePort
labels:
  app: rda-fabric-services
  app_category: rdaf-infra
  app_component: rda-opensearch
resources:
  requests:
    memory: 4Gi
  limits:
    memory: 48Gi
secretMounts:
....

singleNode: false
replicas: 3
roles:
  - master
  - ingest
  - data
opensearchJavaOpts: "-Xmx24G -Xms24G"
extraEnvs:
  - name: DISABLE_INSTALL_DEMO_CONFIG
    value: "true"
image:
  repository: 192.168.125.140:5000/rda-platform-opensearch
  tag: 1.0.3
  pullPolicy: IfNotPresent
imagePullSecrets:
  - name: cfxregistry-cred
service:
  type: NodePort
labels:
  app: rda-fabric-services
  app_category: rdaf-infra
  app_component: rda-opensearch
resources:
  requests:
    memory: 4Gi
  limits:
    memory: 48Gi
livenessProbe:
  periodSeconds: 20
  timeoutSeconds: 5
  failureThreshold: 10

Update Redis configuration:

Run the below command to copy the upgraded Redis configuratoin from redis-values.yaml.latest to redis-values.yaml

cp /opt/rdaf/deployment-scripts/redis-values.yaml.latest /opt/rdaf/deployment-scripts/redis-values.yaml

Update MariaDB configuration:

Run the below command to copy the upgraded MariaDB configuratoin from mariadb-values.yaml.latest to mariadb-values.yaml

cp /opt/rdaf/deployment-scripts/mariadb-values.yaml.latest /opt/rdaf/deployment-scripts/mariadb-values.yaml

Please update the below parameters (highlighted parameters in the given config example) in /opt/rdaf/deployment-scripts/mariadb-values.yaml file.

  • memory: Update it by copying the current value from /opt/rdaf/deployment-scripts/mariadb-values.yaml.backup file

  • initialDelaySeconds: set the value to 1200 (Under livenessProbe section)

  • failureThreshold: set the value to 15 (Under livenessProbe section)

  • expire_logs_days set the value to 1

  • innodb_buffer_pool_size: Update it by copying the current value from /opt/rdaf/deployment-scripts/mariadb-values.yaml.backup file

  • Comment out wsrep_replicate_myisam=ON line. Please ignore, if it is already commented out.

Note: Below given values are for a reference only.

mariadb-values.yaml.backup (existing config) mariadb-values.yaml (updated config)

image:
  registry: 192.168.125.140:5000
  repository: rda-platform-mariadb
  tag: 1.0.2
  pullPolicy: Always
  pullSecrets:
    - cfxregistry-cred
podLabels:
  app: rda-fabric-services
  app_category: rdaf-infra
  app_component: rda-mariadb
resources:
  requests: {}
  limits:
    memory: 28Gi
livenessProbe:
  enabled: true
  initialDelaySeconds: 1200
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 15
readinessProbe:
  enabled: true
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 3
....
....
mariadbConfiguration: |-
[client]
port=3306
socket=/opt/bitnami/mariadb/tmp/mysql.sock
plugin_dir=/opt/bitnami/mariadb/plugin

[mysqld]
default_storage_engine=InnoDB
basedir=/opt/bitnami/mariadb
datadir=/bitnami/mariadb/data
....
....
## Binary Logging
##
log_bin=mysql-bin
expire_logs_days=1
# Disabling for performance per ....
sync_binlog=0
# Required for Galera
binlog_format=row
....
....
innodb_log_files_in_group=2
innodb_log_file_size=128M
innodb_flush_log_at_trx_commit=1
innodb_file_per_table=1
# 80% Memory is default reco.
# Need to re-evaluate when DB size grows
innodb_buffer_pool_size=18G
innodb_file_format=Barracuda
....
....
[galera]
wsrep_on=ON
wsrep_provider=/opt/bitnami/mariadb/lib/libgalera_smm.so
wsrep_sst_method=mariabackup
wsrep_slave_threads=4
wsrep_cluster_address=gcomm://
wsrep_cluster_name=galera
wsrep_sst_auth="root:"
# Enabled for performance per https://mariadb.com/....
innodb_flush_log_at_trx_commit=2
# MYISAM REPLICATION SUPPORT #
wsrep_replicate_myisam=ON

image:
  registry: 192.168.125.140:5000
  repository: rda-platform-mariadb
  tag: 1.0.3
  pullPolicy: IfNotPresent
  pullSecrets:
    - cfxregistry-cred
podLabels:
  app: rda-fabric-services
  app_category: rdaf-infra
  app_component: rda-mariadb
resources:
  requests: {}
  limits:
    memory: 28Gi
livenessProbe:
  enabled: true
  initialDelaySeconds: 1200
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 15
readinessProbe:
  enabled: true
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 1
  successThreshold: 1
  failureThreshold: 3
....
....
mariadbConfiguration: |-
[client]
port=3306
socket=/opt/bitnami/mariadb/tmp/mysql.sock
plugin_dir=/opt/bitnami/mariadb/plugin

[mysqld]
default_storage_engine=InnoDB
basedir=/opt/bitnami/mariadb
datadir=/bitnami/mariadb/data
plugin_dir=/opt/bitnami/mariadb/plugin
....
....
## Binary Logging
##
log_bin=mysql-bin
expire_logs_days=1
# Disabling for performance per ....
sync_binlog=0
# Required for Galera
binlog_format=row
....
....
innodb_log_files_in_group=2
innodb_log_file_size=128M
innodb_flush_log_at_trx_commit=1
innodb_file_per_table=1
# 80% Memory is default reco.
# Need to re-evaluate when DB size grows
innodb_buffer_pool_size=18G
innodb_file_format=Barracuda
....
....
[galera]
wsrep_on=ON
wsrep_provider=/opt/bitnami/mariadb/lib/libgalera_smm.so
wsrep_sst_method=mariabackup
wsrep_slave_threads=4
wsrep_cluster_address=gcomm://
wsrep_cluster_name=galera
wsrep_sst_auth="root:"
# Enabled for performance per https://mariadb.com/....
innodb_flush_log_at_trx_commit=2
# MYISAM REPLICATION SUPPORT #
#wsrep_replicate_myisam=ON

Update Kafka configuration:

Run the below command to copy the upgraded Kafka configuratoin from kafka-values.yaml.latest to kafka-values.yaml

cp /opt/rdaf/deployment-scripts/kafka-values.yaml.latest /opt/rdaf/deployment-scripts/kafka-values.yaml

Please update the below parameters (highlighted parameters in the given config example) in /opt/rdaf/deployment-scripts/kafka-values.yaml file.

  • memory: Update it by copying the current value from /opt/rdaf/deployment-scripts/kafka-values.yaml.backup file

  • nodePorts: Update it by copying the current value from kafka-values.yaml.backup file, please make sure to maintain the order of the nodePorts same as in the current configuration.

  • initialDelaySeconds: set the value to 1200 (Under livenessProbe section)

  • failureThreshold: set the value to 15 (Under livenessProbe section)

Note: Below given values are for a reference only.

kafka-values.yaml.backup (existing config) kafka-values.yaml (updated config)

---
global:
  imagePullSecrets:
  - cfxregistry-cred
image:
  registry: 192.168.125.140:5000
  repository: rda-platform-kafka
  tag: 1.0.2
  pullPolicy: Always
....
....
externalAccess:
  enabled: true
  autoDiscovery:
    enabled: true
  service:
    type: NodePort
    nodePorts:
    - 32606
    - 31877
    - 30323
serviceAccount:
  create: true
rbac:
  create: true
authorizerClassName: kafka.security.authorizer.AclAuthorizer
allowEveryoneIfNoAclFound: true
....
....
....
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app_component
            operator: In
            values:
            - rda-kafka
        topologyKey: kubernetes.io/hostname
nodeSelector:
  rdaf_infra_services: allow
persistence:
  enabled: true
  size: 8Gi
resources:
  limits:
    memory: 12Gi
zookeeper:
  image:
    registry: 192.168.125.140:5000
    repository: rda-platform-zookeeper
    tag: 1.0.2
....
....

---
global:
  imagePullSecrets:
  - cfxregistry-cred
image:
  registry: 192.168.125.140:5000
  repository: rda-platform-kafka
  tag: 1.0.3
  pullPolicy: IfNotPresent
heapOpts: -Xmx2048m -Xms2048m
....
....
  livenessProbe:
    enabled: true
    initialDelaySeconds: 1200
    timeoutSeconds: 5
    failureThreshold: 15
    periodSeconds: 10
    successThreshold: 1
  readinessProbe:
    enabled: true
    initialDelaySeconds: 5
    failureThreshold: 6
    timeoutSeconds: 5
    periodSeconds: 10
    successThreshold: 1
  nodeSelector:
    rdaf_infra_services: allow
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app_component
              operator: In
              values:
              - rda-kafka-controller
          topologyKey: kubernetes.io/hostname
  persistence:
    enabled: true
    size: 8Gi
  resources:
    limits:
      memory: 12Gi
service:
  type: ClusterIP
  ports:
    client: 9092
    controller: 9095
    interbroker: 9093
    external: 9094
....
....
  controller:
    service:
      type: NodePort
      ports:
        external: 9094
      nodePorts:
      - 32606
      - 31877
      - 30323
serviceAccount:
  create: true
rbac:
  create: true
kraft:
  enabled: true

Update rda_scheduler Service Configuration:

Please take a backup of the /opt/rdaf/deployment-scripts/values.yaml

cp /opt/rdaf/deployment-scripts/values.yaml /opt/rdaf/deployment-scripts/values.yaml.backup

Edit /opt/rdaf/deployment-scripts/values.yaml file and update the rda_scheduler service configuration by adding the below environment variable as shown below.

  • NUM_SERVER_PROCESSES: Set the value to 4
....
....
rda_scheduler:
  replicas: 1
  privileged: true
  resources:
    requests:
      memory: 100Mi
    limits:
      memory: 2Gi
  env:
    NUM_SERVER_PROCESSES: '4'
    RDA_ENABLE_TRACES: 'no'
    DISABLE_REMOTE_LOGGING_CONTROL: 'no'
    RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
    RDA_GIT_ACCESS_TOKEN: ''
    RDA_GIT_URL: ''
    RDA_GITHUB_ORG: ''
    RDA_GITHUB_REPO: ''
    RDA_GITHUB_BRANCH_PREFIX: ''
    LABELS: tenant_name=rdaf-01
  • Download the python script (rdaf_upgrade_120_121.py)
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/rdaf_upgrade_1110_121.py
  • Please run the downloaded python upgrade script.
python rdaf_upgrade_120_121.py
  • Install haproxy service using below command
rdaf infra install --tag 1.0.3.1 --service haproxy

Run the below RDAF command to check infra status

rdaf infra status
+----------------------+----------------+------------+--------------+------------------------------+
| Name                 | Host           | Status     | Container Id | Tag                          |
+----------------------+----------------+------------+--------------+------------------------------+
| haproxy              | 192.168.133.97 | Up 2 hours | 342fc1338ba1 | 1.0.3.1                      |
| haproxy              | 192.168.133.98 | Up 2 hours | ec0de9d45a66 | 1.0.3.1                      |
| keepalived           | 192.168.133.97 | active     | N/A          | N/A                          |
| keepalived           | 192.168.133.98 | active     | N/A          | N/A                          |
| nats                 | 192.168.133.97 | Up 4 hours | d2dc79419daa | 1.0.3                        |
| nats                 | 192.168.133.98 | Up 4 hours | ef7c632bdb58 | 1.0.3                        |
| minio                | 192.168.133.93 | Up 4 hours | 414d2a2351b9 | RELEASE.2023-09-30T07-02-29Z |
| minio                | 192.168.133.97 | Up 4 hours | aa0f20af7d70 | RELEASE.2023-09-30T07-02-29Z |
| minio                | 192.168.133.98 | Up 4 hours | 91e123f8ba43 | RELEASE.2023-09-30T07-02-29Z |
| minio                | 192.168.133.99 | Up 4 hours | 74e74cc328b5 | RELEASE.2023-09-30T07-02-29Z |
| mariadb              | 192.168.133.97 | Up 4 hours | c2d71adc09ce | 1.0.3                        |
| mariadb              | 192.168.133.98 | Up 4 hours | 54615146c0fc | 1.0.3                        |
| mariadb              | 192.168.133.99 | Up 4 hours | 68e2a6088477 | 1.0.3                        |
| opensearch           | 192.168.133.97 | Up 3 hours | 7e700c133672 | 1.0.3                        |
| opensearch           | 192.168.133.98 | Up 3 hours | a582e7b552d6 | 1.0.3                        |
| opensearch           | 192.168.133.99 | Up 3 hours | f752837167e2 | 1.0.3                        |
+----------------------+----------------+------------+--------------+------------------------------+

Run the below RDAF command to check infra healthcheck status

rdaf infra healthcheck
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| Name           | Check           | Status | Reason                       | Host           | Container Id |
+----------------+-----------------+--------+------------------------------+----------------+--------------+
| haproxy        | Port Connection | OK     | N/A                          | 192.168.133.97 | 340d7ce361e0 |
| haproxy        | Service Status  | OK     | N/A                          | 192.168.133.97 | 340d7ce361e0 |
| haproxy        | Firewall Port   | OK     | N/A                          | 192.168.133.97 | 340d7ce361e0 |
| haproxy        | Port Connection | OK     | N/A                          | 192.168.133.98 | 4a6015c9362a |
| haproxy        | Service Status  | OK     | N/A                          | 192.168.133.98 | 4a6015c9362a |
| haproxy        | Firewall Port   | OK     | N/A                          | 192.168.133.98 | 4a6015c9362a |
| keepalived     | Service Status  | OK     | N/A                          | 192.168.133.97 | N/A          |
| keepalived     | Service Status  | OK     | N/A                          | 192.168.133.98 | N/A          |
| nats           | Port Connection | OK     | N/A                          | 192.168.133.97 | 991873bb3420 |
| nats           | Service Status  | OK     | N/A                          | 192.168.133.97 | 991873bb3420 |
| nats           | Firewall Port   | OK     | N/A                          | 192.168.133.97 | 991873bb3420 |
| nats           | Port Connection | OK     | N/A                          | 192.168.133.98 | 016438fe2d17 |
| nats           | Service Status  | OK     | N/A                          | 192.168.133.98 | 016438fe2d17 |
| nats           | Firewall Port   | OK     | N/A                          | 192.168.133.98 | 016438fe2d17 |
| minio          | Port Connection | OK     | N/A                          | 192.168.133.93 | 0c3c86e896c6 |
| minio          | Service Status  | OK     | N/A                          | 192.168.133.93 | 0c3c86e896c6 |
| minio          | Firewall Port   | OK     | N/A                          | 192.168.133.93 | 0c3c86e896c6 |
| minio          | Port Connection | OK     | N/A                          | 192.168.133.97 | 604fc5ce14a3 |
| minio          | Service Status  | OK     | N/A                          | 192.168.133.97 | 604fc5ce14a3 |
| minio          | Firewall Port   | OK     | N/A                          | 192.168.133.97 | 604fc5ce14a3 |
| minio          | Port Connection | OK     | N/A                          | 192.168.133.98 | 0c2ae986076e |
| minio          | Service Status  | OK     | N/A                          | 192.168.133.98 | 0c2ae986076e |
| minio          | Firewall Port   | OK     | N/A                          | 192.168.133.98 | 0c2ae986076e |
| minio          | Port Connection | OK     | N/A                          | 192.168.133.99 | 67a7681a40b4 |
| minio          | Service Status  | OK     | N/A                          | 192.168.133.99 | 67a7681a40b4 |
| minio          | Firewall Port   | OK     | N/A                          | 192.168.133.99 | 67a7681a40b4 |
| mariadb        | Port Connection | OK     | N/A                          | 192.168.133.97 | 40e9915a3cf4 |
| mariadb        | Service Status  | OK     | N/A                          | 192.168.133.97 | 40e9915a3cf4 |
| mariadb        | Firewall Port   | OK     | N/A                          | 192.168.133.97 | 40e9915a3cf4 |
+----------------+-----------------+--------+------------------------------+----------------+--------------+

1.3.1.2 Upgrade RDAF Infra Services

  • Upgrade haproxy service using below command
rdafk8s infra upgrade --tag 1.0.3.1 --service haproxy
  • Please use the below mentioned command to see haproxy is up and in Running state.
rdafk8s infra status

Warning

Please verify RDAF portal access to make sure it is accessible after haproxy service is ugpraded before proceeding to the next step.

  • Upgrade nats service using below command
rdafk8s infra upgrade --tag 1.0.3 --service nats
  • Please use the below mentioned command and wait till all of the nats pods are in Running state and Ready status is 2/2
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i nats

Tip

If the nats service upgrade is failed with PodDisruptionBudget policy version error message, please update the below file with apiVersion to policy/v1beta1

vi /home/rdauser/.local/lib/python3.7/site-packages/rdaf/deployments/helm/rda-nats/files/pod-disruption-budget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  {{- include "nats.metadataNamespace" $ | nindent 2 }}
  name: {{ .Values.podDisruptionBudget.name }}
  labels:
    {{- include "nats.labels" $ | nindent 4 }}
....

Run the nats service upgrade command.

rdafk8s infra upgrade --tag 1.0.3 --service nats
  • Upgrade minio service using below command
rdafk8s infra upgrade --tag 1.0.3 --service minio
  • Please use the below mentioned command and wait till all of the minio pods are in Running state and Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i minio
  • Upgrade redis service using below command
rdafk8s infra upgrade --tag 1.0.3 --service redis
  • Please use the below mentioned command and wait till all of the redis pods are in Running state and Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i redis
  • Upgrade opensearch service using below command
rdafk8s infra upgrade --tag 1.0.3 --service opensearch
  • Please use the below mentioned command and wait till all of the opensearch pods are in Running state and Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i opensearch

Run the below command to get RDAF Infra services details

rdafk8s infra status

Danger

Upgrading both kafka and mariadb infra services require a downtime to the RDAF platform and application services.

Please proceed to the below steps only after scheduled downtime is approved.

Please download the MariaDB upgrade scripts:

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/AP_7.4.1_migration_ddl_version_from_20_to_22.ql
wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.2.1/AP_7.4.1_copy_history_data_version_from_20_to_22.ql

Stop RDAF Application Services:

  • To stop rda-webhook-server application service and wait for 60 seconds. This step is to help stop receiving the incoming webhook alerts and allow rest of the application services complete processing the in-transit alerts.
rdafk8s app down OIA --service rda-webhook-server --force
sleep 60
  • To stop all of the Application services.
rdafk8s app down OIA --force
  • Check the Application services status. When all of the application services are stopped, it will show an empty output.
rdafk8s app status

Upgrade kafka Service:

  • Please run the below upgrade script rdaf_upgrade_1110_121.py. This script will clear all the data of Kafka and Zookeeper services under the mount points /kafka-logs and /zookeeper, and delete Kubernetes (k8s) pods, Helm charts, persistent volumes (pv), and persistent volume claims (pvc) configuration. After this step, it will uninstall the Kafka and Zookeeper services.
python rdaf_upgrade_1110_121.py upgrade-kafka
  • Please run the below command to check kafka and zookeeper services are uninstalled.
helm list -n rda-fabric
  • Install kafka service using below command.
rdafk8s infra install --tag 1.0.3 --service kafka
  • Please run the below command and wait till all of the kafka pods are in Running state and the Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i kafka
  • Please run the below command to create necessary Kafka Topics and corresponding configuration.
python rdaf_upgrade_1110_121.py configure-kafka-tenant

Upgrade mariadb Service:

  • To stop mariadb services, run the below command. Wait until all of the services are stopped.
rdafk8s infra down --service mariadb
  • Please run the below command to check mariadb pods are down
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i mariadb
  • Upgrade mariadb service using the below command
rdafk8s infra upgrade --tag 1.0.3 --service mariadb 
  • Please run the below command and wait till all of the mariadb pods are in Running state and Ready status is 1/1
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep -i mariadb

Warning

Please wait till all of the Kafka and MariaDB infra serivce pods are in Running state and Ready status is 1/1

  • Run the below commands to check the status of the mariadb cluster. Please verify that the cluster state is in Synced state.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`

MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`

MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "show status like 'wsrep_local_state_comment';"
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_state_comment | Synced |
+---------------------------+--------+

Run the below commands to check the cluster size of the mariadb cluster. Please verify that the cluster size is 3.

mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size'";
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
  • Please run the below commands to drop the indexes on two alert tables of AIOps application services.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`

MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`

MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor -e "DROP INDEX IF EXISTS AlertAlternateKey on alert;"
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor -e "DROP INDEX IF EXISTS AlertHistoryAlternateKey on alerthistory;"

Warning

Please make sure above commands are executed successfully, before continuing to the below step.

  • Please run the below command to upgrade the DB schema configuration of the mariadb serivce post the 1.0.3 version upgrade.
python rdaf_upgrade_1110_121.py configure-mariadb   
  • Please run the below RDAF command to check infra services status
rdafk8s infra status
+--------------------------+----------------+-----------------+--------------+------------------------------+
| Name                     | Host           | Status          | Container Id | Tag                          |
+--------------------------+----------------+-----------------+--------------+------------------------------+
| haproxy                  | 192.168.131.41 | Up 16 hours     | e2b3b46f702d | 1.0.3.1                      |
| haproxy                  | 192.168.131.42 | Up 5 hours      | a89fdd2c5299 | 1.0.3.1                      |
| keepalived               | 192.168.131.41 | active          | N/A          | N/A                          |
| keepalived               | 192.168.131.42 | active          | N/A          | N/A                          |
| rda-nats                 | 192.168.131.41 | Up 16 Hours ago | 3682271b3b58 | 1.0.3                        |
| rda-nats                 | 192.168.131.42 | Up 4 Hours ago  | 1f3599cf7193 | 1.0.3                        |
| rda-minio                | 192.168.131.41 | Up 16 Hours ago | 80a865d27b2c | RELEASE.2023-09-30T07-02-29Z |
| rda-minio                | 192.168.131.42 | Up 4 Hours ago  | 22c7da5bc030 | RELEASE.2023-09-30T07-02-29Z |
| rda-minio                | 192.168.131.43 | Up 3 Weeks ago  | 1af5abda3061 | RELEASE.2023-09-30T07-02-29Z |
| rda-minio                | 192.168.131.48 | Up 3 Weeks ago  | 7eec14f4ce0e | RELEASE.2023-09-30T07-02-29Z |
| rda-mariadb              | 192.168.131.41 | Up 16 Hours ago | 2596eaddb435 | 1.0.3                        |
| rda-mariadb              | 192.168.131.42 | Up 4 Hours ago  | c004da615516 | 1.0.3                        |
| rda-mariadb              | 192.168.131.43 | Up 2 Weeks ago  | b49f33d491d6 | 1.0.3                        |
| rda-opensearch           | 192.168.131.41 | Up 16 Hours ago | 5595347d56d6 | 1.0.3                        |
...
...
+--------------------------+--------------+-----------------+--------------+--------------------------------+
  • Please run the below commands to create a copy of alert and alerthistory tables of rda-alert-processor service DB as a backup and update the schema.
MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`

MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`

MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor < AP_7.4.1_migration_ddl_version_from_20_to_22.ql
  • Please run the below commands to copy the data from alert_bak and alerthistory_bak backup tables of rda-alert-processor service DB back to primary alert and alerthistory tables.

Note

The copy process would take sometime depends on the historical data in alerthistory table. Please continue with rest of the steps while the data is being copied.

MARIADB_HOST=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep datadir | awk '{print $3}' | cut -f1 -d'/'`

MARIADB_USER=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep user | awk '{print $3}' | base64 -d`

MARIADB_PASSWORD=`cat /opt/rdaf/rdaf.cfg | grep -A3 mariadb | grep password | awk '{print $3}' | base64 -d`

TENANT_ID=`cat /opt/rdaf/rdaf.cfg | grep external_user | awk '{print $3}' | cut -f1 -d'.'`
mysql -u$MARIADB_USER -p$MARIADB_PASSWORD -h $MARIADB_HOST -P3307 -D ${TENANT_ID}_alert_processor < AP_7.4.1_copy_history_data_version_from_20_to_22.ql

Installing GraphDB Service:

Tip

Please skip the below step if GraphDB service is NOT going to be installed.

Warning

For installing GraphDB service, please add additional disk to RDA Fabric Infrastructure VM. Clicking Here

It is a pre-requisite and this step need to be completed before installing the GraphDB service.

rdafk8s infra install --tag 1.0.3 --service graphdb
  • Please use the below mentioned command and wait till all of the arangodb pods are in Running state.
kubectl get pods -n rda-fabric -l app_category=rdaf-infra | grep arango

1.3.2 Upgrade RDAF Platform Services to 3.4.1

Step-1: Run the below command to initiate upgrading RDAF Platform services.

rdafk8s platform upgrade --tag 3.4.1

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Note

If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.

wget https://macaw-amer.s3.amazonaws.com/releases/rdaf-platform/1.1.6/maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.4.1 version.

rdafk8s platform status
+---------------------+----------------+-----------------+--------------+-------+
| Name                | Host           | Status          | Container Id | Tag   |
+---------------------+----------------+-----------------+--------------+-------+
| rda-api-server      | 192.168.131.46 | Up 21 Hours ago | 98ec9561d787 | 3.4.1 |
| rda-api-server      | 192.168.131.45 | Up 21 Hours ago | e7b7cdb7d3d2 | 3.4.1 |
| rda-registry        | 192.168.131.44 | Up 21 Hours ago | bc2fed4a15f3 | 3.4.1 |
| rda-registry        | 192.168.131.46 | Up 21 Hours ago | 1b6da7ff3ce2 | 3.4.1 |
| rda-identity        | 192.168.131.45 | Up 21 Hours ago | 30053cf6667e | 3.4.1 |
| rda-identity        | 192.168.131.46 | Up 21 Hours ago | 6ee2e6a861f7 | 3.4.1 |
| rda-fsm             | 192.168.131.44 | Up 21 Hours ago | c014e84bf197 | 3.4.1 |
| rda-fsm             | 192.168.131.46 | Up 21 Hours ago | 6a609f8ab579 | 3.4.1 |
+---------------------+----------------+-----------------+--------------+-------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age             |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server                             | True        | rda-api-server | 5081891f |             | 0                                                                             :29:54 |      8 |        31.33 |               |              |
| Infra | api-server                             | True        | rda-api-server | 9fc5db97 |             | 0                                                                             :29:52 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | f9b6a00d |             | 0                                                                             :30:00 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | 0a4eb8cd |             | 0                                                                             :30:01 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-7 | 758fc2cb |             | 0                                                                             :30:51 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-7 | 3d56a31f |             | 0                                                                             :28:49 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | 8b570be5 |             | 0                                                                             :30:44 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | 44930ac7 | *leader*    | 0                                                                             :30:47 |      8 |        31.33 |               |              |
| Infra | worker                                 | True        | rda-worker-69d | 91615244 | rda-site-01 | 0                                                                             :25:30 |      8 |        31.33 | 0             | 9            |
| Infra | worker                                 | True        | rda-worker-69d | af99d199 | rda-site-01 | 0                                                                             :25:31 |      8 |        31.33 | 2             | 14           |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

Warning

For Non-Kubernetes deployment, upgrading RDAF Platform and AIOps application services is a disruptive operation. Please schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.

  • To stop application services, run the below command. Wait until all of the services are stopped.

    rdaf app down OIA
    
    rdaf app status
    
    • To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
    rdaf worker down
    
    rdaf worker status
    
    • To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
    rdaf platform down
    
    rdaf platform status
    

Run the below command to initiate upgrading RDAF Platform services.

rdaf platform upgrade --tag 3.4.1

Please wait till all of the new platform service are in Up state and run the below command to verify their status and make sure all of them are running with 3.3 version.

rdaf platform status
+--------------------------+----------------+------------+--------------+-------+
| Name                     | Host           | Status     | Container Id | Tag   |
+--------------------------+----------------+------------+--------------+-------+
| rda_api_server           | 192.168.133.92 | Up 2 hours | 6366c9717f07 | 3.4.1 |
| rda_api_server           | 192.168.133.93 | Up 2 hours | d5b8c2722f72 | 3.4.1 |
| rda_registry             | 192.168.133.92 | Up 2 hours | 47f722aab97b | 3.4.1 |
| rda_registry             | 192.168.133.93 | Up 2 hours | f5ce662af82f | 3.4.1 |
| rda_scheduler            | 192.168.133.92 | Up 2 hours | 28b597777069 | 3.4.1 |
| rda_scheduler            | 192.168.133.93 | Up 2 hours | 2d70a4ac184e | 3.4.1 |
| rda_collector            | 192.168.133.92 | Up 2 hours | 637a07f4df17 | 3.4.1 |
| rda_collector            | 192.168.133.93 | Up 2 hours | 478167b3952a | 3.4.1 |
| rda_asset_dependency     | 192.168.133.92 | Up 2 hours | c910651896fe | 3.4.1 |
| rda_asset_dependency     | 192.168.133.93 | Up 2 hours | c1ddfde81b13 | 3.4.1 |
| rda_identity             | 192.168.133.92 | Up 2 hours | f70beaa486a6 | 3.4.1 |
| rda_identity             | 192.168.133.93 | Up 2 hours | a726b0f154c8 | 3.4.1 |
| rda_fsm                  | 192.168.133.92 | Up 2 hours | 87b26529566a | 3.4.1 |
| rda_fsm                  | 192.168.133.93 | Up 2 hours | 13891be75c05 | 3.4.1 |
+--------------------------+----------------+------------+--------------+-------+

Run the below command to check rda-fsm service is up and running and also verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | kafka-connectivity                                  | ok       | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=1, Brokers=[1, 2, 3] |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | kafka-connectivity                                  | ok       | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=2, Brokers=[1, 2, 3] |
| rda_app   | alert-processor                        | 2afde67935ac | 33170bc7 |             | service-status                                      | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

1.3.2.1 Upgrade RDAF Platform Services to 3.4.1.2

Step-1: Run the below command to initiate upgrading below RDAF Platform services.

  • rda-scheduler
  • rda-api-server
  • rda-portal
rdafk8s platform upgrade --tag 3.4.1.2 --service rda-scheduler --service rda-api-server --service rda-portal

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state.

kubectl get pods -n rda-fabric -l app_category=rdaf-platform

Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF platform service PODs

for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.

Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.4.1 version.

rdafk8s platform status
+---------------------+----------------+-----------------+--------------+---------+
| Name                | Host           | Status          | Container Id | Tag     |
+---------------------+----------------+-----------------+--------------+---------+
| rda-api-server      | 192.168.131.46 | Up 21 Hours ago | 98ec9561d787 | 3.4.1.2 |
| rda-api-server      | 192.168.131.45 | Up 21 Hours ago | e7b7cdb7d3d2 | 3.4.1.2 |
| rda-scheduler       | 192.168.131.44 | Up 21 Hours ago | bc2fed4a15f3 | 3.4.1.2 |
| rda-scheduler       | 192.168.131.46 | Up 21 Hours ago | 1b6da7ff3ce2 | 3.4.1.2 |
| rda-portal-backend  | 192.168.131.45 | Up 21 Hours ago | 30053cf6667e | 3.4.1.2 |
| rda-portal-backend  | 192.168.131.46 | Up 21 Hours ago | 6ee2e6a861f7 | 3.4.1.2 |
| rda-portal-frontend | 192.168.131.44 | Up 21 Hours ago | c014e84bf197 | 3.4.1.2 |
| rda-portal-frontend | 192.168.131.46 | Up 21 Hours ago | 6a609f8ab579 | 3.4.1.2 |
+---------------------+----------------+-----------------+--------------+-------+

Run the below command to check and verify that one of the rda-scheduler service is elected as a leader under Site column.

rdac pods
+-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age             |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+-----------------+--------+--------------+---------------+--------------|
| Infra | api-server                             | True        | rda-api-server | 5081891f |             | 0                                                                             :29:54 |      8 |        31.33 |               |              |
| Infra | api-server                             | True        | rda-api-server | 9fc5db97 |             | 0                                                                             :29:52 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | f9b6a00d |             | 0                                                                             :30:00 |      8 |        31.33 |               |              |
| Infra | collector                              | True        | rda-collector- | 0a4eb8cd |             | 0                                                                             :30:01 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-7 | 758fc2cb |             | 0                                                                             :30:51 |      8 |        31.33 |               |              |
| Infra | registry                               | True        | rda-registry-7 | 3d56a31f |             | 0                                                                             :28:49 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | 8b570be5 |             | 0                                                                             :30:44 |      8 |        31.33 |               |              |
| Infra | scheduler                              | True        | rda-scheduler- | 44930ac7 | *leader*    | 0                                                                             :30:47 |      8 |        31.33 |               |              |
| Infra | worker                                 | True        | rda-worker-69d | 91615244 | rda-site-01 | 0                                                                             :25:30 |      8 |        31.33 | 0             | 9            |
| Infra | worker                                 | True        | rda-worker-69d | af99d199 | rda-site-01 | 0                                                                             :25:31 |      8 |        31.33 | 2             | 14           |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-----------------+--------+--------------+---------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck

1.3.3 Upgrade rdac CLI

Run the below command to upgrade the rdac CLI

rdafk8s rdac_cli upgrade --tag 3.4.1

Run the below command to upgrade the rdac CLI

rdaf rdac_cli upgrade --tag 3.4.1

1.3.4 Upgrade OIA Application Services to 7.4.1

Step-1: Run the below commands to initiate upgrading RDAF OIA Application services

rdafk8s app upgrade OIA --tag 7.4.1

Step-2: Run the below command to check the status of the newly upgraded PODs.

kubectl get pods -n rda-fabric -l app_name=oia

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-3: Run the below command to put all Terminating RDAF Application service PODs into maintenance mode. It will list all of the POD Ids of application services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-application-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF application services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF application service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Application service PODs.

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.4.1 version.

rdafk8s app status

+-------------------------------+----------------+----------------+--------------+-------+
| Name                          | Host           | Status         | Container Id | Tag   |
+-------------------------------+----------------+----------------+--------------+-------+
| rda-alert-ingester            | 192.168.131.50 | Up 4 Hours ago | 013e6fb89274 | 7.4.1 |
| rda-alert-ingester            | 192.168.131.49 | Up 4 Hours ago | ce269889fe6c | 7.4.1 |
| rda-alert-processor-companion | 192.168.131.49 | Up 4 Hours ago | b4bca9347589 | 7.4.1 |
| rda-alert-processor-companion | 192.168.131.50 | Up 4 Hours ago | 1c530b32c563 | 7.4.1 |
| rda-alert-processor           | 192.168.131.47 | Up 4 Hours ago | b0e25a38c72d | 7.4.1 |
| rda-alert-processor           | 192.168.131.46 | Up 4 Hours ago | 2a5b0f764cfd | 7.4.1 |
| rda-app-controller            | 192.168.131.50 | Up 4 Hours ago | 0261820f6e01 | 7.4.1 |
| rda-app-controller            | 192.168.131.46 | Up 4 Hours ago | 134844ff7208 | 7.4.1 |
| rda-collaboration             | 192.168.131.50 | Up 4 Hours ago | e5e196b74462 | 7.4.1 |
| rda-collaboration             | 192.168.131.46 | Up 4 Hours ago | ed4ec37435b7 | 7.4.1 |
| rda-configuration-service     | 192.168.131.46 | Up 4 Hours ago | 74e22e5ddee1 | 7.4.1 |
| rda-configuration-service     | 192.168.131.50 | Up 4 Hours ago | b09637691cbd | 7.4.1 |
+-------------------------------+----------------+----------------+--------------+-------+
Step-6: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods
+-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age     |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App   | alert-ingester                         | True        | rda-alert-inge | 7861bd4f |             | 4:20:52 |      8 |        31.33 |               |              |
| App   | alert-ingester                         | True        | rda-alert-inge | 4abc521f |             | 4:20:52 |      8 |        31.33 |               |              |
| App   | alert-processor                        | True        | rda-alert-proc | 9bf94e67 |             | 4:20:50 |      8 |        31.33 |               |              |
| App   | alert-processor                        | True        | rda-alert-proc | 4e679139 |             | 4:20:48 |      8 |        31.33 |               |              |
| App   | alert-processor-companion              | True        | rda-alert-proc | 745dfbb9 |             | 4:20:39 |      8 |        31.33 |               |              |
| App   | alert-processor-companion              | True        | rda-alert-proc | 02f6bce0 |             | 4:20:41 |      8 |        31.33 |               |              |
| App   | asset-dependency                       | True        | rda-asset-depe | fc6c7a60 |             | 4:28:00 |      8 |        31.33 |               |              |
| App   | asset-dependency                       | True        | rda-asset-depe | d3ca4c11 |             | 4:27:07 |      8 |        31.33 |               |              |
| App   | authenticator                          | True        | rda-identity-6 | 4cd59d9c |             | 4:27:01 |      8 |        31.33 |               |              |
| App   | authenticator                          | True        | rda-identity-6 | 174298c3 |             | 4:25:53 |      8 |        31.33 |               |              |
| App   | cfx-app-controller                     | True        | rda-app-contro | 4d923832 |             | 4:20:42 |      8 |        31.33 |               |              |
| App   | cfx-app-controller                     | True        | rda-app-contro | b16deafa |             | 4:20:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | rda-access-man | 09d1fada |             | 4:27:56 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | rda-access-man | e0af2bcc |             | 4:27:54 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | rda-collaborat | 9e7f7bcb |             | 4:20:31 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | rda-collaborat | 38db5386 |             | 4:20:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | 589e18f8 |             | 4:20:20 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | 853545f8 |             | 4:19:59 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | d17f8dcd |             | 4:20:06 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | 44decaa7 | *leader*    | 4:19:41 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | rda-notificati | 74e58855 |             | 4:20:14 |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | kafka-connectivity                                  | ok       | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=0, Brokers=[0, 1, 2] |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | kafka-connectivity                                  | ok       | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=2, Brokers=[0, 1, 2] |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | service-dependency:cfx-app-controller               | ok       | 2 pod(s) found for cfx-app-controller                       |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

Run the below commands to initiate upgrading the RDA Fabric OIA Application services.

rdaf app upgrade OIA --tag 7.4.1

Please wait till all of the new OIA application service containers are in Up state and run the below command to verify their status and make sure they are running with 7.4.1 version.

rdaf app status

+-----------------------------------+----------------+------------+--------------+-------+
| Name                              | Host           | Status     | Container Id | Tag   |
+-----------------------------------+----------------+------------+--------------+-------+
| cfx-rda-app-controller            | 192.168.133.96 | Up 2 hours | deab59a554f6 | 7.4.1 |
| cfx-rda-app-controller            | 192.168.133.92 | Up 2 hours | 7e3cbfc6d899 | 7.4.1 |
| cfx-rda-reports-registry          | 192.168.133.96 | Up 2 hours | 934ef236dde2 | 7.4.1 |
| cfx-rda-reports-registry          | 192.168.133.92 | Up 2 hours | 8749187dfb82 | 7.4.1 |
| cfx-rda-notification-service      | 192.168.133.96 | Up 2 hours | eaaa0116b25c | 7.4.1 |
| cfx-rda-notification-service      | 192.168.133.92 | Up 2 hours | 7f5b91f6b166 | 7.4.1 |
| cfx-rda-file-browser              | 192.168.133.96 | Up 2 hours | 62ba48307a89 | 7.4.1 |
| cfx-rda-file-browser              | 192.168.133.92 | Up 2 hours | ad83ab7f2611 | 7.4.1 |
| cfx-rda-configuration-service     | 192.168.133.96 | Up 2 hours | 6f24b3296c44 | 7.4.1 |
| cfx-rda-configuration-service     | 192.168.133.92 | Up 2 hours | ad93c6ddf2bc | 7.4.1 |
| cfx-rda-alert-ingester            | 192.168.133.96 | Up 2 hours | 9132494ea9ab | 7.4.1 |
| cfx-rda-alert-ingester            | 192.168.133.92 | Up 2 hours | f5312c1fc474 | 7.4.1 |
+-----------------------------------+----------------+------------+--------------+-------+
Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.

rdac pods

+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host         | ID       | Site        | Age     |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App   | alert-ingester                         | True        | 9132494ea9ab | ad43cf79 |             | 1:56:34 |      4 |        31.21 |               |              |
| App   | alert-ingester                         | True        | f5312c1fc474 | 2a129b31 |             | 1:56:21 |      4 |        31.21 |               |              |
| App   | alert-processor                        | True        | 2afde67935ac | 33170bc7 |             | 1:54:29 |      4 |        31.21 |               |              |
| App   | alert-processor                        | True        | f289e1088a16 | 831fe5c3 |             | 1:54:14 |      4 |        31.21 |               |              |
| App   | alert-processor-companion              | True        | 83ebf4300ac5 | c9dba0df |             | 1:47:44 |      4 |        31.21 |               |              |
| App   | alert-processor-companion              | True        | 9b1b55d78d1a | a66ecf29 |             | 1:47:29 |      4 |        31.21 |               |              |
| App   | asset-dependency                       | True        | c1ddfde81b13 | 985fc496 |             | 2:20:03 |      4 |        31.21 |               |              |
| App   | asset-dependency                       | True        | c910651896fe | 9c355c7d |             | 2:20:06 |      4 |        31.21 |               |              |
| App   | authenticator                          | True        | f70beaa486a6 | 955eb254 |             | 2:19:59 |      4 |        31.21 |               |              |
| App   | authenticator                          | True        | a726b0f154c8 | 898c36b4 |             | 2:19:57 |      4 |        31.21 |               |              |
| App   | cfx-app-controller                     | True        | 7e3cbfc6d899 | 2097a877 |             | 1:58:49 |      4 |        31.21 |               |              |
| App   | cfx-app-controller                     | True        | deab59a554f6 | 3bd4ce27 |             | 1:59:02 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | f47c6cab13f1 | e0636eea |             | 2:19:32 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | 02b526adf7f9 | 7a286ce7 |             | 2:19:23 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | b602c2cddd90 | 836e0134 |             | 1:53:02 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | 2f02987f249d | c4d4720d |             | 1:48:31 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | 62ba48307a89 | 48d1d0d2 |             | 1:57:34 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | ad83ab7f2611 | 93078496 |             | 1:57:14 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | 56dffc7d6501 | 672ff70a | *leader*    | 1:53:57 |      4 |        31.21 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | b40a96601c73 | 25fe51f5 |             | 1:53:42 |      4 |        31.21 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | 9132494ea9ab | ad43cf79 |             | kafka-connectivity                                  | ok       | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=1, Brokers=[1, 2, 3] |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | f5312c1fc474 | 2a129b31 |             | kafka-connectivity                                  | ok       | Cluster=ZTRlZGFmZjhkZDFiMTFlZQ, Broker=3, Brokers=[1, 2, 3] |
| rda_app   | alert-processor                        | 2afde67935ac | 33170bc7 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-processor                        | 2afde67935ac | 33170bc7 |             | minio-connectivity                                  | ok       |                                                             |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

1.3.4.1 Upgrade OIA Application Services to 7.4.1.2/7.4.1.3

Step-1: Run the below commands to initiate upgrading the below RDAF OIA Application services

  • rda-webhook-server

  • rda-event-consumer

  • rda-smtp-server

rdafk8s app upgrade OIA --tag 7.4.1.2 --service rda-webhook-server --service rda-smtp-server
rdafk8s app upgrade OIA --tag 7.4.1.3 --service rda-event-consumer

Step-2: Run the below command to check the status of the newly upgraded PODs.

kubectl get pods -n rda-fabric -l app_name=oia

As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.

Step-3: Run the below command to put above mentioned Terminating RDAF Application service PODs into maintenance mode. It will list POD Ids of application services along with rdac maintenance command that required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-application-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF application services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF application service PODs

for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds and Repeat above steps from Step-2 to Step-6 for rest of the RDAF Application service PODs.

Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.4.1.2 version.

rdafk8s app status

+-------------------------------+----------------+----------------+--------------+---------+
| Name                          | Host           | Status         | Container Id | Tag     |
+-------------------------------+----------------+----------------+--------------+---------+
| rda-event-consumer            | 192.168.131.50 | Up 4 Hours ago | 013e6fb89274 | 7.4.1.2 |
| rda-event-consumer            | 192.168.131.49 | Up 4 Hours ago | ce269889fe6c | 7.4.1.2 |
| rda-webhook-server            | 192.168.131.49 | Up 4 Hours ago | b4bca9347589 | 7.4.1.2 |
| rda-webhook-server            | 192.168.131.50 | Up 4 Hours ago | 1c530b32c563 | 7.4.1.2 |
| rda-smtp-server               | 192.168.131.47 | Up 4 Hours ago | b0e25a38c72d | 7.4.1.2 |
| rda-smtp-server               | 192.168.131.46 | Up 4 Hours ago | 2a5b0f764cfd | 7.4.1.2 |
+-------------------------------+----------------+----------------+--------------+-------+
Step-7: Run the below command to verify above upgraded OIA application services are up and running.

rdac pods
+-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat   | Pod-Type                               | Pod-Ready   | Host           | ID       | Site        | Age     |   CPUs |   Memory(GB) | Active Jobs   | Total Jobs   |
|-------+----------------------------------------+-------------+----------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App   | alert-ingester                         | True        | rda-alert-inge | 7861bd4f |             | 4:20:52 |      8 |        31.33 |               |              |
| App   | alert-ingester                         | True        | rda-alert-inge | 4abc521f |             | 4:20:52 |      8 |        31.33 |               |              |
| App   | alert-processor                        | True        | rda-alert-proc | 9bf94e67 |             | 4:20:50 |      8 |        31.33 |               |              |
| App   | alert-processor                        | True        | rda-alert-proc | 4e679139 |             | 4:20:48 |      8 |        31.33 |               |              |
| App   | alert-processor-companion              | True        | rda-alert-proc | 745dfbb9 |             | 4:20:39 |      8 |        31.33 |               |              |
| App   | alert-processor-companion              | True        | rda-alert-proc | 02f6bce0 |             | 4:20:41 |      8 |        31.33 |               |              |
| App   | asset-dependency                       | True        | rda-asset-depe | fc6c7a60 |             | 4:28:00 |      8 |        31.33 |               |              |
| App   | asset-dependency                       | True        | rda-asset-depe | d3ca4c11 |             | 4:27:07 |      8 |        31.33 |               |              |
| App   | authenticator                          | True        | rda-identity-6 | 4cd59d9c |             | 4:27:01 |      8 |        31.33 |               |              |
| App   | authenticator                          | True        | rda-identity-6 | 174298c3 |             | 4:25:53 |      8 |        31.33 |               |              |
| App   | cfx-app-controller                     | True        | rda-app-contro | 4d923832 |             | 4:20:42 |      8 |        31.33 |               |              |
| App   | cfx-app-controller                     | True        | rda-app-contro | b16deafa |             | 4:20:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | rda-access-man | 09d1fada |             | 4:27:56 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-access-manager       | True        | rda-access-man | e0af2bcc |             | 4:27:54 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | rda-collaborat | 9e7f7bcb |             | 4:20:31 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-collaboration        | True        | rda-collaborat | 38db5386 |             | 4:20:25 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | 589e18f8 |             | 4:20:20 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-file-browser         | True        | rda-file-brows | 853545f8 |             | 4:19:59 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | d17f8dcd |             | 4:20:06 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-irm_service          | True        | rda-irm-servic | 44decaa7 | *leader*    | 4:19:41 |      8 |        31.33 |               |              |
| App   | cfxdimensions-app-notification-service | True        | rda-notificati | 74e58855 |             | 4:20:14 |      8 |        31.33 |               |              |
+-------+----------------------------------------+-------------+----------------+----------+-------------+-------------------+--------+-----------------------------+--------------+

Run the below command to check if all services has ok status and does not throw any failure messages.

rdac healthcheck
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat       | Pod-Type                               | Host         | ID       | Site        | Health Parameter                                    | Status   | Message                                                     |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 4abc521f |             | kafka-connectivity                                  | ok       | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=0, Brokers=[0, 1, 2] |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-dependency:configuration-service            | ok       | 2 pod(s) found for configuration-service                    |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | service-initialization-status                       | ok       |                                                             |
| rda_app   | alert-ingester                         | rda-alert-in | 7861bd4f |             | kafka-connectivity                                  | ok       | Cluster=IrA5ccri7mBeUvhzvrimEg, Broker=2, Brokers=[0, 1, 2] |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | service-status                                      | ok       |                                                             |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | minio-connectivity                                  | ok       |                                                             |
| rda_app   | alert-processor                        | rda-alert-pr | 4e679139 |             | service-dependency:cfx-app-controller               | ok       | 2 pod(s) found for cfx-app-controller                       |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+

1.3.5 Upgrade RDA Worker Services

Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.

rdafk8s worker upgrade --tag 3.4.1

Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.

kubectl get pods -n rda-fabric -l app_component=rda-worker
NAME                          READY   STATUS    RESTARTS   AGE
rda-worker-69d485f476-99tnv   1/1     Running   0          45h
rda-worker-69d485f476-gwq4f   1/1     Running   0          45h

Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.

python maint_command.py

Step-4: Copy & Paste the rdac maintenance command as below.

rdac maintenance start --ids <comma-separated-list-of-platform-pod-ids>

Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.

rdac pods --show_maintenance | grep False

Step-6: Run the below command to delete the Terminating RDAF worker service PODs

for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done

Note

Wait for 120 seconds between each RDAF worker service upgrade by repeating above steps from Step-2 to Step-6 for rest of the RDAF worker service PODs.

Step-6: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.

rdac pods | grep rda-worker
rdafk8s worker status
+------------+----------------+-----------------+--------------+-------+
| Name       | Host           | Status          | Container Id | Tag   |
+------------+----------------+-----------------+--------------+-------+
| rda-worker | 192.168.131.45 | Up 19 Hours ago | 6360f61b4249 | 3.4.1 |
| rda-worker | 192.168.131.44 | Up 19 Hours ago | 806b7b334943 | 3.4.1 |
+------------+----------------+-----------------+--------------+-------+

Step-7: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck
  • Upgrade RDA Worker Services

Please run the below command to initiate upgrading the RDA Worker service PODs.

rdaf worker upgrade --tag 3.4.1

Note

If the worker is deployed in proxy environment, please add the required environment proxy variables in /opt/rdaf/deployment-scripts/values.yaml, under the section rda_worker -> env:, instead of making changes to worker.yaml (this is needed only if there are any new changes needed for worker)

Please wait for 120 seconds to let the newer version of RDA Worker service containers join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service containers.

rdac pods | grep worker
rdaf worker status

+------------+----------------+------------+--------------+-------+
| Name       | Host           | Status     | Container Id | Tag   |
+------------+----------------+------------+--------------+-------+
| rda_worker | 192.168.133.96 | Up 2 hours | 03061dd8dfcc | 3.4.1 |
| rda_worker | 192.168.133.92 | Up 2 hours | cbb31b875cf6 | 3.4.1 |
+------------+----------------+------------+--------------+-------+
Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.

rdac healthcheck

1.4.Post Upgrade Steps

1.4.1 OIA

1. Deploy latest Alerts and Incidents Dashboard configuration

Go to Main Menu --> Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on Deploy action to deploy the latest Dashboards configuration for Alerts and Incidents.

Warning

It is mandatory to deploy the oia_l1_l2_bundle (Alerts and Incidents Dashboards configuration) as the existing dashboard configuration for the same has been deprecated.

After deploying the oia_l1_l2_bundle, we will get main incident landing page which is oia-incidents-os-template. From this page we can drill down on any incident which will take us to incident-details-app. Within each Incident dashboard page, below pages are enabled by default irrespective of the corresponding features are configured or not.

  • Alerts
  • Topology
  • Metrics
  • Insights
  • Collaboration
  • Diagnostics
  • Remediation
  • Activities

Within each Incident page, the Alerts and Collaboration pages are mandatory, while the rest of the pages are optional until they are configured within the system.

If you need to remove these optional pages from the default Incident's view dashboard, please follow the below steps.

Go to Main Menu --> Configuration --> RDA Administration --> Dashboards --> User Dashboards --> Edit JSON config of incident-details-app dashboard and delete the below highlighted JSON configuration blocks.

....
....
"dashboard_pages": [
{
  "name": "incident-details-alerts",
  "label": "Alerts",
  "icon": "alert.svg"
},
{
  "name": "incident-details-topology",
  "label": "Topology",
  "icon": "topology.svg"
},
{
  "name": "incident-details-metrics",
  "label": "Metrics",
  "icon": "metrics.svg"
},
{
  "name": "incident-details-insights",
  "label": "Insights",
  "icon": "nextSteps.svg"
},
{
  "name": "incident-details-collaboration",
  "label": "Collaboration",
  "icon": "collaboration.svg"
},
{
  "name": "incident-details-diagnostics",
  "label": "Diagnostics",
  "icon": "diagnostic.svg"
},
{
  "name": "incident-details-remediation",
  "label": "Remediation",
  "icon": "remedial.svg"
},
{
  "name": "incident-details-activities",
  "label": "Activities",
  "icon": "activities.svg"
}
....
....

Note

Please note that, these deleted configuration blocks of Topology, Metrics, Insights, Diagnostics, Remediation and Activities can be added back once the corresponding features are configured within the system.

1. Deploy latest Alerts and Incidents Dashboard configuration

Go to Main Menu --> Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on Deploy action to deploy the latest Dashboards configuration for Alerts and Incidents.

Warning

It is mandatory to deploy the oia_l1_l2_bundle (Alerts and Incidents Dashboards configuration) as the existing dashboard configuration for the same has been deprecated.

After deploying the oia_l1_l2_bundle, we will get main incident landing page which is oia-incidents-os-template. From this page we can drill down on any incident which will take us to incident-details-app. Within each Incident dashboard page, below pages are enabled by default irrespective of the corresponding features are configured or not.

  • Alerts
  • Topology
  • Metrics
  • Insights
  • Collaboration
  • Diagnostics
  • Remediation
  • Activities

Within each Incident page, the Alerts and Collaboration pages are mandatory, while the rest of the pages are optional until they are configured within the system.

If you need to remove these optional pages from the default Incident's view dashboard, please follow the below steps.

Go to Main Menu --> Configuration --> RDA Administration --> Dashboards --> User Dashboards --> Edit JSON config of incident-details-app dashboard and delete the below highlighted JSON configuration blocks.

....
....
"dashboard_pages": [
{
  "name": "incident-details-alerts",
  "label": "Alerts",
  "icon": "alert.svg"
},
{
  "name": "incident-details-topology",
  "label": "Topology",
  "icon": "topology.svg"
},
{
  "name": "incident-details-metrics",
  "label": "Metrics",
  "icon": "metrics.svg"
},
{
  "name": "incident-details-insights",
  "label": "Insights",
  "icon": "nextSteps.svg"
},
{
  "name": "incident-details-collaboration",
  "label": "Collaboration",
  "icon": "collaboration.svg"
},
{
  "name": "incident-details-diagnostics",
  "label": "Diagnostics",
  "icon": "diagnostic.svg"
},
{
  "name": "incident-details-remediation",
  "label": "Remediation",
  "icon": "remedial.svg"
},
{
  "name": "incident-details-activities",
  "label": "Activities",
  "icon": "activities.svg"
}
....
....

Note

Please note that, these deleted configuration blocks of Topology, Metrics, Insights, Diagnostics, Remediation and Activities can be added back once the corresponding features are configured within the system.

1.4.2 Migrating Custom Built Dashboards

1.4.2.1 For Alert Dashboards

Below are the changes done to Alerts Dashboard

1. Moved alerts to tabular report to oia-alert-os.json and reused it across all the alerts dashboards.

2. Need to update Clear Alerts(BULK CLEAR) action context with below context if we are using individual alerts tabular report defined in respective dashboards.

 "context": {
    "projectId": "{{PROJECT_ID}}",
    "sourceDashboardId": "{{SOURCE_DASHBOARD_ID}}"
}

a) Oia-Alert-OS

Add below variable in template_variables, This is needed to fix bulk clear alerts issue

"SOURCE_DASHBOARD_ID": {
            "contextId": "sourceDashboardId",
            "default": "user-dashboard-oia-alerts-os"
}       
  • Update Clear Alerts Action Context With Below Context
 "context": {
    "projectId": "{{PROJECT_ID}}",
    "sourceDashboardId": "{{SOURCE_DASHBOARD_ID}}"
}
  • Noise Reduction

add/update extra_filter

"extra_filter": "a_correlation_status in ['ACTIVE', 'CORRELATING', 'CORRELATED', 'CLEARED']",
  • Incidents Without Policy

add/update extra_filter

"extra_filter": "a_rule_name is 'Not Available'",
  • Alerts without Policy

add/update extra_filter

"extra_filter": "a_rule_name is 'Not Available'",

b) Oia-Alert-Groups-Policy-OS

"SOURCE_DASHBOARD_ID": {
            "default": "user-dashboard-oia-alert-groups-policy-os",
            "contextId": "sourceDashboardId"
}

c) Oia-View-Alerts-Policy-OS

"SOURCE_DASHBOARD_ID": {
            "default": "user-dashboard-oia-view-alerts-policy-os",
            "contextId": "sourceDashboardId"
}

d) Alert-Trail

"SOURCE_DASHBOARD_ID": {
            "default": "user-dashboard-incident-details-alerts",
            "contextId": "sourceDashboardId"
        },

e) Incident-Details-Alerts

 "SOURCE_DASHBOARD_ID": {
            "default": "user-dashboard-incident-details-alerts",
            "contextId": "sourceDashboardId"
        }

f) Oia-Alert-Group-View-Alerts-OS

"SOURCE_DASHBOARD_ID": {
            "default": "user-dashboard-oia-alert-group-view-alerts-os",
            "contextId": "sourceDashboardId"
        }

g) OIA-Alert-Group-View-details-OS-V2

"SOURCE_DASHBOARD_ID": {
            "default": "user-dashboard-oia-alert-group-view-details-os-v2",
            "contextId": "sourceDashboardId"
        }

h) OIA-Alert-Group-View-Details-OS

"SOURCE_DASHBOARD_ID": {
    "default": "user-dashboard-oia-alert-group-view-details-os",
    "contextId": "sourceDashboardId"
}

Note

From (b - h) in case Individual Alerts Tabular Report defined in the dashboard definition shows BULK CLEAR ACTION. Please refer to point No. 2 to update context.

i) OIA-Alert-Groups-OS

 "SOURCE_DASHBOARD_ID": {
            "default": "user-dashboard-oia-alert-groups-os",
            "contextId": "sourceDashboardId"
        }

j) Update Clear Alerts Action Context With Below Context

"context": {
    "projectId": "{{PROJECT_ID}}",
    "sourceDashboardId": "{{SOURCE_DASHBOARD_ID}}"
}

1.4.2.2 For Incident Dashboards

Below are the changes done to Incidents Dashboard

1. Change "appName" from "incident-details" to "user-dashboard/incident-details-app"

2. Locate column name "i_cfx_state" and remove it from list of columns, group filters etc

a) Incident-Topology

Add "auto_group": false after "stack_type": "OIA"

b) l1-service-health.json, l2-l3-service-health.json

From contextParamList -> contextParams remove { "paramKey": "project_id", "paramId": "id"}

c) Oia-Incidents-Os-Template

1. Remove actions with titles: "Collect Data", "Share"

2. "auto_group": false to be added after "stack_type": "OIA"