Upgrade to 3.2.1.x and 7.2.2
1. Upgrade from 7.2.1.x to 7.2.2
1.1. Pre-requisites
Below are the pre-requisites which need to be in place before upgrading the OIA (AIOps) application services.
-
RDAF Deployment CLI Version: 1.1.8
-
RDAF Infrastructure Services Tag Version: 1.0.2,1.0.2.1(nats)
-
RDAF Core Platform & Worker Services Tag Version: 3.2.1 / 3.2.1.x
-
RDAF Client (RDAC) Tag Version: 3.2.1 / 3.2.1.x
-
OIA Services Tag Version: 7.2.1 / 7.2.1.x
-
CloudFabrix recommends taking VMware VM snapshots where AIOps solution is deployed
Important
Applicable only if FSM is configured for ITSM ticketing:
Before proceeding with the upgrade, please make sure to disable the below Service Blueprints.
- Create Ticket
- Update Ticket
- Resolve Ticket
- Read Alert Stream
- Read Incident Stream
- Read ITSM ticketing Inbound Notifications
Warning
Make sure all of the above pre-requisites are met before proceeding with the upgrade process.
Warning
Upgrading RDAF Platform and AIOps application services is a disruptive operation. Schedule a maintenance window before upgrading RDAF Platform and AIOps services to newer version.
- Download the RDAF Deployment CLI's newer version 1.1.9 bundle
- Upgrade the
rdafCLI to version 1.1.9
- Verify the installed
rdafCLI version is upgraded to 1.1.9
- Download the RDAF Deployment CLI's newer version 1.1.9 bundle and copy it to RDAF management VM on which
rdafdeployment CLI was installed.
- Extract the
rdafCLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdafCLI to version 1.1.9
- Verify the installed
rdafCLI version
- Extract the
rdafCLI software bundle contents
- Change the directory to the extracted directory
- Upgrade the
rdafCLI to version 1.1.9
- Verify the installed
rdafCLI version
- To stop OIA (AIOps) application services, run the below command. Wait until all of the services are stopped.
- To stop RDAF worker services, run the below command. Wait until all of the services are stopped.
- To stop RDAF platform services, run the below command. Wait until all of the services are stopped.
- Upgrade kafka using below command
Run the below RDAF command to check infra status
+----------------+--------------+-----------------+--------------+------------------------------+
| Name | Host | Status | Container Id | Tag |
+----------------+--------------+-----------------+--------------+------------------------------+
| haproxy | 192.168.107.40 | Up 2 weeks | 92875cebe689 | 1.0.2 |
| keepalived | 192.168.107.40 | Not Provisioned | N/A | N/A |
| nats | 192.168.107.41 | Up 2 weeks | e365e0b794c7 | 1.0.2.1 |
| minio | 192.168.107.41 | Up 2 weeks | 900c8b078059 | RELEASE.2022-11-11T03-44-20Z |
| mariadb | 192.168.107.41 | Up 2 weeks | c549e07c2688 | 1.0.2 |
| opensearch | 192.168.107.41 | Up 2 weeks | 783204d75ba9 | 1.0.2 |
| zookeeper | 192.168.107.41 | Up 2 weeks | f51138ff8a95 | 1.0.2 |
| kafka | 192.168.107.41 | Up 4 days | 255020d998c9 | 1.0.2 |
| redis | 192.168.107.41 | Up 2 weeks | 5d929327121d | 1.0.2 |
| redis-sentinel | 192.168.107.41 | Up 2 weeks | 4a5fdde49a21 | 1.0.2 |
+----------------+--------------+-----------------+--------------+------------------------------+
Run the below RDAF command to check infra healthcheck status
+----------------+-----------------+--------+----------------------+--------------+--------------+
| Name | Check | Status | Reason | Host | Container Id |
+----------------+-----------------+--------+----------------------+--------------+--------------+
| haproxy | Port Connection | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Service Status | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.63 | ed0e8a4f95d6 |
| haproxy | Port Connection | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| haproxy | Service Status | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| haproxy | Firewall Port | OK | N/A | 192.168.107.64 | 91c361ea0f58 |
| keepalived | Service Status | OK | N/A | 192.168.107.63 | N/A |
| keepalived | Service Status | OK | N/A | 192.168.107.64 | N/A |
| nats | Port Connection | OK | N/A | 192.168.107.63 | f57ed825681b |
| nats | Service Status | OK | N/A | 192.168.107.63 | f57ed825681b |
+----------------+-----------------+--------+----------------------+--------------+--------------+
-
Run the below python upgrade script. It is for applying the below configuration & settings.
- Create kafka topics and configure the topic message max size to 8mb
- Create kafka-external user in config.json.
- Add new alert-processor companion service settings in values.yaml
- Configure and apply security index purge policy for Opensearch
Important
Take a backup of /opt/rdaf/deployment-scripts/values.yaml before running the below upgrade script.
Important
Make sure above upgrade script is executed before moving to next step.
1.2. Download the new Docker Images
Download the new docker image tags for RDAF Platform and OIA Application services and wait until all of the images are downloaded.
Run the below command to verify above mentioned tags are downloaded for all of the RDAF Platform and OIA Application services.
Make sure 3.2.2 image tag is downloaded for the below RDAF Platform services.
- rda-client-api-server
- rda-registry
- rda-rda-scheduler
- rda-collector
- rda-stack-mgr
- rda-identity
- rda-fsm
- rda-access-manager
- rda-resource-manager
- rda-user-preferences
- onprem-portal
- onprem-portal-nginx
- rda-worker-all
- onprem-portal-dbinit
- cfxdx-nb-nginx-all
- rda-event-gateway
- rdac
- rdac-full
Make sure 7.2.2 image tag is downloaded for the below RDAF OIA Application services.
- rda-app-controller
- rda-alert-processor
- rda-file-browser
- rda-smtp-server
- rda-ingestion-tracker
- rda-reports-registry
- rda-ml-config
- rda-event-consumer
- rda-webhook-server
- rda-irm-service
- rda-alert-ingester
- rda-collaboration
- rda-notification-service
- rda-configuration-service
Downloaded Docker images are stored under the below path.
/opt/rdaf/data/docker/registry/v2
Run the below command to check the filesystem's disk usage on which docker images are stored.
Optionally, If required, older image-tags which are no longer used can be deleted to free up the disk space using the below command.
1.3. Upgrade Services
1.3.1 Upgrade RDAF Platform Services
Run the below command to initiate upgrading RDAF Platform services.
Wait till all of the new platform service are in Running state and run the below command to verify their status and make sure all of them are running with 3.2.2 version.+---------------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+---------------------+--------------+------------+--------------+-------+
| rda_api_server | 192.168.107.60 | Up 4 hours | 0da7ebeadceb | 3.2.2 |
| rda_registry | 192.168.107.60 | Up 4 hours | 841a4e03447d | 3.2.2 |
| rda_scheduler | 192.168.107.60 | Up 4 hours | 806af221a299 | 3.2.2 |
| rda_collector | 192.168.107.60 | Up 4 hours | 9ae8da4d2182 | 3.2.2 |
| rda_asset_dependenc | 192.168.107.60 | Up 4 hours | e96cf642b2d6 | 3.2.2 |
| y | | | | |
| rda_identity | 192.168.107.60 | Up 4 hours | 2a57ce63a756 | 3.2.2 |
| rda_fsm | 192.168.107.60 | Up 4 hours | 2b645a75b5f0 | 3.2.2 |
+--------------------------+--------------+------------+--------------+--+
1.3.2 Upgrade RDAC cli
Run the below command to upgrade the rdac CLI
Run the below command to verify that one of the scheduler service is elected as a leader under Site column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------|
| App | fsm | True | 8b5dfca4cce9 | c0a8bbd7 | | 7:33:16 | 8 | 31.21 | | |
| App | ingestion-tracker | True | d37e78507693 | e1bd1405 | | 7:21:16 | 8 | 31.21 | | |
| App | ml-config | True | 0c73604632bc | 65594689 | | 7:22:02 | 8 | 31.21 | | |
| App | reports-registry | True | be82a9e704a2 | 567f1275 | | 7:25:23 | 8 | 31.21 | | |
| App | smtp-server | True | 08a8dd347660 | 06242bab | | 7:23:35 | 8 | 31.21 | | |
| App | user-preferences | True | fc7a4a5a0591 | 53dce7ca | | 7:32:25 | 8 | 31.21 | | |
| App | webhook-server | True | 20a2afb33b6c | fdb1eb21 | | 7:23:53 | 8 | 31.21 | | |
| Infra | api-server | True | b1e7105b231e | 33f6ed2c | | 2:04:53 | 8 | 31.21 | | |
| Infra | collector | True | f5abb5cac9a5 | eb17ce02 | | 3:50:51 | 8 | 31.21 | | |
| Infra | registry | True | ce73263c7828 | 8cda9974 | | 7:34:05 | 8 | 31.21 | | |
| Infra | scheduler | True | d9d62c1f1bb7 | 96047389 | *leader* | 7:33:59 | 8 | 31.21 | | |
| Infra | worker | True | ba1198f05f6b | afd229a8 | rda-site-01 | 7:26:20 | 8 | 31.21 | 7 | 109 |
+-------+----------------------------------------+-------------+--------------+----------+-------------+---------+--------+--------------+---------------+--------------+
Run the below command to check if all services has ok status and does not throw any failure messages.
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------|
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-status | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 9a0775246a0f | 8f538695 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=3, Brokers=[2, 3, 1] |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-status | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | minio-connectivity | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-dependency:configuration-service | ok | 2 pod(s) found for configuration-service |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | service-initialization-status | ok | |
| rda_app | alert-ingester | 79d6756db639 | 95921403 | | kafka-connectivity | ok | Cluster=F8PAtrvtRk6RbMZgp7deHQ, Broker=1, Brokers=[2, 3, 1] |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------------+
1.3.3 Upgrade RDA Worker Services
Run the below command to initiate upgrading the RDA worker service(s).
Tip
If the RDA worker is deployed in http proxy environment, add the required environment variables for http proxy settings in /opt/rdaf/deployment-scripts/values.yaml under rda_worker section. Below is the sample http proxy configuration for worker service.
rda_worker:
mem_limit: 8G
memswap_limit: 8G
privileged: false
environment:
RDA_ENABLE_TRACES: 'no'
RDA_SELF_HEALTH_RESTART_AFTER_FAILURES: 3
http_proxy: "http://user:[email protected]:3128"
https_proxy: "http://user:[email protected]:3128"
HTTP_PROXY: "http://user:[email protected]:3128"
HTTPS_PROXY: "http://user:[email protected]:3128
Wait for 120 seconds to let the newer version of RDA worker services join the RDA Fabric appropriately. Run the below commands to verify the status of the newer RDA worker services.
+------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+------------+--------------+------------+--------------+-------+
| rda_worker | 192.168.107.60 | Up 4 hours | d968c908d3e3 | 3.2.2 |
+------------+--------------+------------+--------------+-------+
1.3.4 Upgrade OIA/AIA Application Services
Run the below commands to initiate upgrading RDAF OIA/AIA Application services
Wait till all of the new OIA/AIA application services are in Running state and run the below command to verify their status and make sure they are running with 7.2.2 version. Check the new service cfx-rda-alert-processor-companion is deployed. Make sure all OIA/AIA services are up with the new tag.
+-----------------------------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+-----------------------------------+--------------+------------+--------------+-------+
| cfx-rda-app-controller | 192.168.107.60 | Up 3 hours | 017692a218b8 | 7.2.2 |
| cfx-rda-reports-registry | 192.168.107.60 | Up 3 hours | be82a9e704a2 | 7.2.2 |
| cfx-rda-notification-service | 192.168.107.60 | Up 3 hours | 42d3c8c4861c | 7.2.2 |
| cfx-rda-file-browser | 192.168.107.60 | Up 3 hours | 46b9dedab4b0 | 7.2.2 |
| cfx-rda-configuration-service | 192.168.107.60 | Up 3 hours | 6bef9741ff46 | 7.2.2 |
| cfx-rda-alert-ingester | 192.168.107.60 | Up 3 hours | 13975b9efe7d | 7.2.2 |
| cfx-rda-webhook-server | 192.168.107.60 | Up 3 hours | 20a2afb33b6c | 7.2.2 |
| cfx-rda-smtp-server | 192.168.107.60 | Up 3 hours | 08a8dd347660 | 7.2.2 |
| cfx-rda-event-consumer | 192.168.107.60 | Up 3 hours | b0b62c88064a | 7.2.2 |
| cfx-rda-alert-processor | 192.168.107.60 | Up 3 hours | ab24dcbd6e3a | 7.2.2 |
| cfx-rda-irm-service | 192.168.107.60 | Up 3 hours | 11c92a206eaa | 7.2.2 |
| cfx-rda-ml-config | 192.168.107.60 | Up 3 hours | 0c73604632bc | 7.2.2 |
| cfx-rda-collaboration | 192.168.107.60 | Up 3 hours | a5cfe5b681bb | 7.2.2 |
| cfx-rda-ingestion-tracker | 192.168.107.60 | Up 3 hours | d37e78507693 | 7.2.2 |
| cfx-rda-alert-processor-companion | 192.168.107.60 | Up 3 hours | b74d82710af9 | 7.2.2 |
+-----------------------------------+--------------+------------+--------------+-------+
cfxdimensions-app-irm_service is elected as a leader under Site column.
+-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------+
| Cat | Pod-Type | Pod-Ready | Host | ID | Site | Age | CPUs | Memory(GB) | Active Jobs | Total Jobs |
|-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------|
| App | alert-ingester | True | 13975b9efe7d | dd32fdef | | 12:07:37 | 8 | 31.21 | | |
| App | alert-processor | True | ab24dcbd6e3a | a980d44e | | 12:06:10 | 8 | 31.21 | | |
| App | alert-processor-companion | True | b74d82710af9 | 8f37b360 | | 12:04:19 | 8 | 31.21 | | |
| App | asset-dependency | True | 83c5d941f3a6 | f17cc305 | | 12:16:59 | 8 | 31.21 | | |
| App | authenticator | True | fb82e1664219 | b6f19086 | | 12:16:47 | 8 | 31.21 | | |
| App | cfx-app-controller | True | 017692a218b8 | 55015d69 | | 12:09:04 | 8 | 31.21 | | |
| App | cfxdimensions-app-access-manager | True | 87871b87d45e | b0465aa5 | | 12:16:19 | 8 | 31.21 | | |
| App | cfxdimensions-app-collaboration | True | a5cfe5b681bb | c5b40c98 | | 12:05:05 | 8 | 31.21 | | |
| App | cfxdimensions-app-file-browser | True | 46b9dedab4b0 | 3bcc6bc5 | | 12:08:13 | 8 | 31.21 | | |
| App | cfxdimensions-app-irm_service | True | 11c92a206eaa | 851f07b7 | *leader* | 12:05:48 | 8 | 31.21 | | |
| App | cfxdimensions-app-notification-service | True | 42d3c8c4861c | 891ab559 | | 12:08:31 | 8 | 31.21 | | |
| App | cfxdimensions-app-resource-manager | True | a35dd8127434 | 29b57c51 | | 12:16:08 | 8 | 31.21 | | |
+-------+----------------------------------------+-------------+--------------+----------+-------------+----------+--------+--------------+---------------+--------------+
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
| Cat | Pod-Type | Host | ID | Site | Health Parameter | Status | Message |
|-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------|
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | service-status | ok | |
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | minio-connectivity | ok | |
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | service-dependency:configuration-service | ok | 1 pod(s) found for configuration-service |
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | service-initialization-status | ok | |
| rda_app | alert-ingester | 13975b9efe7d | dd32fdef | | kafka-connectivity | ok | Cluster=oDO7X5AZTh-78HgTt0WbrA, Broker=1, Brokers=[1] |
| rda_app | alert-processor | ab24dcbd6e3a | a980d44e | | service-status | ok | |
| rda_app | alert-processor | ab24dcbd6e3a | a980d44e | | minio-connectivity | ok | |
| rda_app | alert-processor | ab24dcbd6e3a | a980d44e | | service-dependency:cfx-app-controller | ok | 1 pod(s) found for cfx-app-controller |
+-----------+----------------------------------------+--------------+----------+-------------+-----------------------------------------------------+----------+-------------------------------------------------------+
Note
Run the rdaf prune_imagescommand on to cleanup old docker images.
1.4. Post Upgrade Steps
1. Download the script from below path to migrate the UI-Icon URL from private to Public
Tip
This script run is an optional step to perform only if (1) white labeling customization was done on the Login page with an uploaded image before the version upgrade, (2) you are experiencing that the custom image is no longer showing up at the Login page after the upgrade.
- Copy the above script to
rda_identityplatform service container. Run the below command to get the container-id forrda_identityand the host IP on which it is running.
+--------------------------+--------------+------------+--------------+-------+
| Name | Host | Status | Container Id | Tag |
+--------------------------+--------------+------------+--------------+-------+
| rda_api_server | 192.168.107.40 | Up 7 hours | 6540e670fdb9 | 3.2.2 |
| rda_registry | 192.168.107.40 | Up 7 hours | 98bead8d0599 | 3.2.2 |
....
| rda_identity | 192.168.107.40 | Up 7 hours | 67940390d61f | 3.2.2 |
| rda_fsm | 192.168.107.40 | Up 7 hours | a96870b5f32f | 3.2.2 |
| cfx-rda-access-manager | 192.168.107.40 | Up 7 hours | 33362030fe52 | 3.2.2 |
| cfx-rda-resource-manager | 192.168.107.40 | Up 7 hours | 54a082b754be | 3.2.2 |
| cfx-rda-user-preferences | 192.168.107.40 | Up 7 hours | 829f95d98741 | 3.2.2 |
+--------------------------+--------------+------------+--------------+-------+
- Login to the host on which
rda_identityservice is running as rdauser using SSH CLI and run below command to copy the above downloaded script.
- Run the below command to switch into
rda_identityservice's container shell.
- Execute below command to migrate the customer branding (white labelling) changes.
2. Deploy latest l1&l2 bundles. Go to Configuration --> RDA Administration --> Bundles --> Select oia_l1_l2_bundle and Click on deploy action in row level
3. Enable ML experiments manually if any experiments are configured (Organization --> Configuration --> Machine Learning)
4. FSM Installation Steps ( Applicable only for Remedy ITSM ticketing deployment )
a) Update the Team configuration that was created for ITSM ticketing (Team with Source 'Others'). Include the following content in the JSON editor of the Team's configuration. Adjust or add alert sources and execution delay as necessary.
[
{
"alert_source": "SNMP",
"execution_delay": 900,
"auto_share": {
"create": true,
"update": true,
"close": true,
"resolved": true,
"cancel": true,
"alert_count_changes": true
}
},
{
"alert_source": "Syslog",
"execution_delay": 900,
"auto_share": {
"create": true,
"update": true,
"close": true,
"resolved": true,
"cancel": true,
"alert_count_changes": true
}
}
]
b) Download and Update latest FSM model Configuration ->RDA Administration -> FSM Models
Important
Take a backup of existing model before update
c) Add formatting templates Configuration ->RDA Administration -> Formatting Templates
- snow-notes-template
{% for r in rows %}
<b>Message</b> : {{r.a_message}} <br>
<b>RaisedAt</b> : {{r.a_raised_ts}} <br>
<b>UpdatedAt</b> : {{r.a_updated_ts}} <br>
<b>Status</b> : {{r.a_status}} <br>
<b>AssetName</b> : {{r.a_asset_name}} <br>
<b>AssetType</b> : {{r.a_asset_type}} <br>
<b>RepeatCount</b> : {{r.a_repeat_count}} <br>
<b>Action</b> : {{r.action_name}} <br>
<br><br>
{%endfor%}
- snow-description-template
d) Deploy FSM bundles
fsm_events_kafka_publisher_bundles,oia_fsm_aots_ticketing_bundle
oia_fsm_common_ticketing_bundles
e) Create 'fsm-debug-outbound-ticketing' and 'aots_ticket_notifications' PStreams from the UI if they do not already exist
f) Enable Service Blueprints - Read Alert Stream, Read Incident Stream, Create Ticket, Update Ticket, Resolve Ticket, Read AOTS Inbound Notifications