Upgrade to 3.2.1.3 and 7.2.2.1
1. Upgrade from 7.2.1.x to 7.2.2.1
1.1. Pre-requisites
Below are the pre-requisites which need to be in place before upgrading the OIA (AIOps) application services.
-
RDAF Deployment CLI Version: 1.1.8
-
RDAF Infrastructure Services Tag Version: 1.0.2,1.0.2.1(nats)
-
RDAF Core Platform & Worker Services Tag Version: 3.2.1.3
-
RDAF Client (RDAC) Tag Version: 3.2.1 / 3.2.1.x
-
OIA Services Tag Version: 7.2.1.1/7.2.1.5/7.2.1.6
-
CloudFabrix recommends taking VMware VM snapshots where RDA Fabric platform/applications are deployed
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
Important
Please make sure full backup of the RDAF platform system is completed before performing the upgrade.
Kubernetes: Please run the below backup command to take the backup of application data.
Non-Kubernetes: Please run the below backup command to take the backup of application data.Note: Please make sure the shared backup-dir is NFS mounted across all RDA Fabric Virtual Machines.
Run the below K8s commands and make sure the Kubernetes PODs are NOT in restarting mode (it is applicable to only Kubernetes environment)
RDAF Deployment CLI Upgrade:
Please follow the below given steps.
Note
Upgrade RDAF Deployment CLI on both on-premise docker registry VM and RDAF Platform's management VM if provisioned separately.
- Run the below command to verify the current version of RDAF CLI is 1.1.8 version.
- Download the RDAF Deployment CLI's newer version 1.1.9.1 bundle
- Upgrade the
rdafCLI to version 1.1.9.1
- Verify the installed
rdafCLI version is upgraded to 1.1.9.1
- Download the RDAF Deployment CLI's newer version 1.1.9.1 bundle and copy it to RDAF management VM on which
rdafdeployment CLI was installed.
- Extract the
rdafCLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdafCLI to version 1.1.9.1
- Verify the installed
rdafCLI version
- Extract the
rdafCLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdafCLI to version 1.1.9.1
- Verify the installed
rdafCLI version
1.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Please make sure 3.2.2.1 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-stack-mgr
- rda-identity
- rda-fsm
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rdac
- rdac-full
Please make sure 7.2.2.1 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
1.3. Upgrade Services
1.3.1 Upgrade RDAF Infra Services
Download the below upgrade script and copy it to RDAF management VM on which rdaf deployment CLI was installed.
Please run the downloaded upgrade script. It configures and applies the below changes.
- Creates a new Kafka user specifically for allowing Kafka topics which need to be exposed to external systems to publish the data such as events or alerts or notifications.
- Updates the
/opt/rdaf/config/network_config/config.jsonfile with newly created Kafka user's credentials. - Creates and applies lifecycle management policy for Opensearch's default
security audit logsindex to purge the older data. It is configured to purge the data that is older than 15 days. - Updates
/opt/rdaf/deployment-scripts/values.yamlfile to add the support for newalert processor companionservice. It also updatesrda-workerservice configuration to attach a new persistent-volume. The persisten-volume is created out of local host's directory path @/opt/rdaf/config/worker/rda_packageson whichrda-workerservice is running.
Important
Please make sure above upgrade script is executed before moving to next step.
- Update
kafka-values.yamlwith below parameters.
Tip
- Upgrade script generates
kafka-values.yaml.latestfile in/opt/rdaf/deployment-scripts/directory which will have updated configuration. - Please take a backup of the
kafka-values.yamlfile before making changes. - Please skip the changes if the current
kafka-values.yamlfile already has below mentioned parameters.
Edit kafka-values.yaml file.
Find the below parameter and delete it if it exists.
Add below highlighted paramters. Please skip if these are already configured.
global:
imagePullSecrets:
- cfxregistry-cred
image:
registry: 192.168.10.10:5000
repository: rda-platform-kafka
tag: 1.0.2
pullPolicy: Always
heapOpts: -Xmx2048m -Xms2048m
defaultReplicationFactor: 3
offsetsTopicReplicationFactor: 3
transactionStateLogReplicationFactor: 3
transactionStateLogMinIsr: 2
maxMessageBytes: '8399093'
numPartitions: 15
externalAccess:
enabled: true
autoDiscovery:
enabled: true
service:
type: NodePort
nodePorts:
- 31252
- 31533
- 31964
serviceAccount:
create: true
rbac:
create: true
authorizerClassName: kafka.security.authorizer.AclAuthorizer
logRetentionHours: 24
allowEveryoneIfNoAclFound: true
Apply above configuration changes to kafka infra service.
- Please wait till all of the Kafka service pods are in Running state.
- Please make sure all infra services are in Running state before moving to next section.
- Additionally, please run the below command to make sure there are no errors with RDA Fabric services.
1.3.2 Upgrade RDAF Platform Services
Step-1: Run the below command to initiate upgrading RDAF Platform services.
As the upgrade procedure is a non-disruptive upgrade, it puts the currently running PODs into Terminating state and newer version PODs into Pending state.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each Platform service is in Terminating state. (Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)
Step-3: Run the below command to put all Terminating RDAF platform service PODs into maintenance mode. It will list all of the POD Ids of platform services along with rdac maintenance command that required to be put in maintenance mode.
Note
If maint_command.py script doesn't exist on RDAF deployment CLI VM, it can be downloaded using the below command.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF platform services.
Warning
Wait for 120 seconds before executing Step-6.
Step-6: Run the below command to delete the Terminating RDAF platform service PODs
for i in `kubectl get pods -n rda-fabric -l app_category=rdaf-platform | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Repeat above steps from Step-2 to Step-6 for rest of the RDAF Platform service PODs.
Please wait till all of the new platform service PODs are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2.1 version.
+----------------------+----------------+-----------------+--------------+-------------+
| Name | Host | Status | Container Id | Tag |
+----------------------+--------------+-----------------+--------------+---------------+
| rda-api-server | 192.168.131.45 | Up 19 Hours ago | 4d5adbbf954b | 3.2.2.1 |
| rda-api-server | 192.168.131.44 | Up 19 Hours ago | 2c58bccaf38d | 3.2.2.1 |
| rda-registry | 192.168.131.44 | Up 20 Hours ago | 408a4ddcc685 | 3.2.2.1 |
| rda-registry | 192.168.131.45 | Up 20 Hours ago | 4f01fc820585 | 3.2.2.1 |
| rda-identity | 192.168.131.44 | Up 20 Hours ago | bdd1e91f86ec | 3.2.2.1 |
| rda-identity | 192.168.131.45 | Up 20 Hours ago | e63af9c6e9d9 | 3.2.2.1 |
| rda-fsm | 192.168.131.45 | Up 20 Hours ago | 3ec246cf7edd | 3.2.2.1 |
+----------------------+--------------+-----------------+--------------+---------------+
Run the below command to check one of the rda-scheduler service is elected as a leader under Site column.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Infra | api-server | True | rda-api-server | 35a17877 | | 20:15:37 | 8 | 31.33 | | |
| Infra | api-server | True | rda-api-server | 8f678e25 | | 20:14:39 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | 17ce190d | | 20:47:41 | 8 | 31.33 | | |
| Infra | collector | True | rda-collector- | 6b91bf23 | | 20:47:22 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-5 | 4ee8ef7d | | 20:48:20 | 8 | 31.33 | | |
| Infra | registry | True | rda-registry-5 | 895b7f5c | | 20:47:39 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | ab79ba8d | | 20:47:43 | 8 | 31.33 | | |
| Infra | scheduler | True | rda-scheduler- | f2cefc92 | *leader* | 20:47:23 | 8 | 31.33 | | |
| Infra | worker | True | rda-worker-df5 | e2174794 | rda-site-01 | 20:28:50 | 8 | 31.33 | 1 | 97 |
| Infra | worker | True | rda-worker-df5 | 6debca1d | rda-site-01 | 20:26:08 | 8 | 31.33 | 2 | 91 |
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | minio-connectivity | ok | |
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | service-initialization-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 1afb8c8d | | kafka-connectivity | ok | Cluster=nzyeX9qkR-ChWXC0fRvSyQ, Broker=0, Brokers=[0, 2, 1] |
| rda_app | alert-ingester | rda-alert-in | 5751f199 | | service-status | ok | |
| rda_app | alert-ingester | rda-alert-in | 5751f199 | | minio-connectivity | ok | |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
1.3.3 Upgrade RDAC cli
1.3.4 Upgrade RDA Worker Services
Step-1: Please run the below command to initiate upgrading the RDA Worker service PODs.
Tip
If the RDA worker is deployed in http proxy environment, add the required environment variables for http proxy settings in /opt/rdaf/deployment-scripts/values.yaml under rda_worker section. Below is the sample http proxy configuration for worker service.
rda_worker:
mem_limit: 8G
memswap_limit: 8G
privileged: false
environment:
RDA_ENABLE_TRACES: 'no'
RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
extraEnvs:
- name: http_proxy
value: "http://user:[email protected]:3128"
- name: https_proxy
value: "http://user:[email protected]:3128"
- name: HTTP_PROXY
value: "http://user:[email protected]:3128"
- name: HTTPS_PROXY
value: "http://user:[email protected]:3128"
Step-2: Run the below command to check the status of the existing and newer worker PODs and make sure atleast one instance of each RDA Worker service POD is in Terminating state.
(Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)
Step-3: Run the below command to put all Terminating RDAF worker service PODs into maintenance mode. It will list all of the POD Ids of RDA worker services along with rdac maintenance command that is required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the RDAF worker services.
Warning
Wait for 120 seconds before executing Step-6.
Step-6: Run the below command to delete the Terminating RDAF worker service PODs
for i in `kubectl get pods -n rda-fabric -l app_component=rda-worker | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Repeat above steps from Step-2 to Step-6 for rest of the RDAF Worker service PODs.
Please wait till all the new worker service pods are in Running state.
Step-7: Please wait for 120 seconds to let the newer version of RDA Worker service PODs join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA Worker service PODs.
+------------+----------------+----------------+--------------+-------------+
| Name | Host | Status | Container Id | Tag |
+------------+----------------+----------------+--------------+-------------+
| rda-worker | 192.168.131.44 | Up 6 Hours ago | eb679ed8a6c6 | 3.2.2.1 |
| rda-worker | 192.168.131.45 | Up 6 Hours ago | a3356b168c50 | 3.2.2.1 |
| | | | | |
+------------+----------------+----------------+--------------+-------------+
Step-8: Run the below command to check if all RDA Worker services has ok status and does not throw any failure messages.
1.3.5 Upgrade OIA Application Services
Step-1: Run the below commands to initiate upgrading RDAF OIA Application services.
Step-2: Run the below command to check the status of the existing and newer PODs and make sure atleast one instance of each OIA application service is in Terminating state.
(Note: Please wait if a POD is in ContainerCreating state until it is transistioned into Terminating state.)
Step-3: Run the below command to put all Terminating OIA application service PODs into maintenance mode. It will list all of the POD Ids of OIA application services along with rdac maintenance command that are required to be put in maintenance mode.
Step-4: Copy & Paste the rdac maintenance command as below.
Step-5: Run the below command to verify the maintenance mode status of the OIA application services.
Warning
Wait for 120 seconds before executing Step-6.
Step-6: Run the below command to delete the Terminating OIA application service PODs
for i in `kubectl get pods -n rda-fabric -l app_name=oia | grep 'Terminating' | awk '{print $1}'`; do kubectl delete pod $i -n rda-fabric --force; done
Note
Repeat above steps from Step-2 to Step-6 for rest of OIA application service PODs.
Please wait till all of the new OIA application service PODs are in Running state and run the below command to verify their status and make sure they are running with 7.2.2.1 version.
+-------------------------------+----------------+-----------------+--------------+-----------------+
| Name | Host | Status | Container Id | Tag |
+-------------------------------+--------------+-----------------+--------------+-------------------+
| rda-alert-ingester | 192.168.131.50 | Up 1 Days ago | a400c11be238 | 7.2.2.1 |
| rda-alert-ingester | 192.168.131.49 | Up 1 Days ago | 5187d5a093a5 | 7.2.2.1 |
| rda-alert-processor | 192.168.131.46 | Up 1 Days ago | 34901aba5e7d | 7.2.2.1 |
| rda-alert-processor | 192.168.131.47 | Up 1 Days ago | e6fe0aa7ffe4 | 7.2.2.1 |
| rda-alert-processor-companion | 192.168.131.50 | Up 1 Days ago | 8e3cc2f3b252 | 7.2.2.1 |
| rda-alert-processor-companion | 192.168.131.49 | Up 1 Days ago | 4237fb52031c | 7.2.2.1 |
| rda-app-controller | 192.168.131.47 | Up 1 Days ago | fbe360d13fa3 | 7.2.2.1 |
| rda-app-controller | 192.168.131.46 | Up 1 Days ago | 8346f5c69e7b | 7.2.2.1 |
+-------------------------------+----------------+-----------------+--------------+-----------------+
Step-7: Run the below command to verify all OIA application services are up and running. Please wait till the cfxdimensions-app-irm_service has leader status under Site column.
Run the below command to check if all RDA worker services has ok status and does not throw any failure messages.
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App | cfxdimensions-app-collaboration | True | rda-collaborat | ba007878 | | 22:57:58 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | bf349af7 | | 23:00:54 | 8 | 31.33 | | |
| App | cfxdimensions-app-file-browser | True | rda-file-brows | 46c7c2dc | | 22:52:17 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | 34698062 | | 23:00:23 | 8 | 31.33 | | |
| App | cfxdimensions-app-irm_service | True | rda-irm-servic | b824b35b | *leader* | 22:50:33 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | rda-notificati | 73d2c7f9 | | 23:01:23 | 8 | 31.33 | | |
| App | cfxdimensions-app-notification-service | True | rda-notificati | bac009ba | | 22:59:05 | 8 | 31.33 | | |
| App | cfxdimensions-app-resource-manager | True | rda-resource-m | 3e164b71 | | 23:25:24 | 8 | 31.33 | | |
| App | cfxdimensions-app-resource-manager | True | rda-resource-m | dba599c6 | | 23:25:00 | 8 | 31.33 | | |
| App | configuration-service | True | rda-configurat | dd7ec9d9 | | 5:46:22 | 8 | 31.33 | | |
+-------+----------------------------------------+-------------+----------------+----------+-------------+----------+--------+--------------+---------------+--------------+
1.4 Post Installation Steps
-
Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select
oia_l1_l2_bundleand Click on deploy action -
Download the script from below path to migrate the UI-Icon URL from private to Public
Tip
This script run is an optional step to perform only if (1) white labeling customization was done on the Login page with an uploaded image before the version upgrade, (2) you are experiencing that the custom image is no longer showing up at the Login page after the upgrade.
- Copy the above script to
rda_identityplatform service container. Run the below command to get the container-id forrda_identityand the host IP on which it is running.
+--------------------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+--------------+------------+--------------+-------+
| rda_api_server | 192.168.107.40 | Up 7 hours | 6540e670fdb9 | 3.2.2.1 |
| rda_registry | 192.168.107.40 | Up 7 hours | 98bead8d0599 | 3.2.2.1 |
....
| rda_identity | 192.168.107.40 | Up 7 hours | 67940390d61f | 3.2.2.1 |
| rda_fsm | 192.168.107.40 | Up 7 hours | a96870b5f32f | 3.2.2.1 |
| cfx-rda-access-manager | 192.168.107.40 | Up 7 hours | 33362030fe52 | 3.2.2.1 |
| cfx-rda-resource-manager | 192.168.107.40 | Up 7 hours | 54a082b754be | 3.2.2.1 |
| cfx-rda-user-preferences | 192.168.107.40 | Up 7 hours | 829f95d98741 | 3.2.2.1 |
+--------------------------+--------------+------------+--------------+-------+
- Login to the host on which
rda_identityservice is running as rdauser using SSH CLI and run below command to copy the above downloaded script.
- Run the below command to switch into
rda_identityservice's container shell.
- Execute below command to migrate the customer branding (white labelling) changes.
- In this new version (7.2.2.1), suppression policy added support to read the data from a pstream to suppress the alerts. As a pre-requisite for this feature to work, the pstream that is going to be used in a suppression policy, should be configured with
attr_nameand it's value using which it can filter the alerts to apply the suppression policy. Additionally, the attributesstart_time_utcandend_time_utcshould be in ISO datetime format.
- This new version also added a new feature to enrich the incoming alerts using either
datasetorpstreamor both within each alert's source mapper configuration. Below is a sample configuration for a reference on how to usedataset_enrichandstream_enrichfunctions within the alert mapper.
Dataset based enrichment:
- name: Dataset name
- condition: CFXQL based condition which can be defined with one or more conditions with
ANDandORbetween each condition. Each condition is evaluated in the specified order and it picks the enrichment value(s) for whichever condition matches. - enriched_columns: Specify one or more attributes to be selected as enriched attributes on above condition match. When no attribute is specified, it will pick all of the available attributes.
{
"func": {
"dataset_enrich": {
"name": "nagios-host-group-members",
"condition": "host_name is '$assetName'",
"enriched_columns": "group_id,hostgroup_name"
}
}
}
Pstream based enrichment:
- name: Pstream name
- condition: CFXQL based condition which can be defined with one or more conditions with
ANDandORbetween each condition. Each condition is evaluated in the specified order and it picks the enrichment value(s) for whichever condition matches. - enriched_columns: Specify one or more attributes to be selected as enriched attributes on above condition match. When no attribute is specified, it will pick all of the available attributes.