Hi there,
We recently upgraded our instance of Unimus to 2.6.3, and the night after we did this, a scheduled backup caused every Alcatel OS6450 switch we have to crash with the below error (date and time wrong as this is taken from a test system):
FRI DEC 10 22:49:46 1999 CSM-CHASSIS info == CSM == CS excep handler: exception 16 in task 0x62d7b80
FRI DEC 10 22:49:46 1999 CSM-CHASSIS info == CSM == Excep in task: sshd_ct1 PC : 0x0
FRI DEC 10 22:49:46 1999 CSM-CHASSIS alarm **************** STR ****************
FRI DEC 10 22:49:46 1999 CSM-CHASSIS alarm Level : ERROR DETECTED - APPLICATION FATAL
We narrowed the crash/reboots down to the time of the backup, and I've since set up 2 otherwise unconfigured 6450s in our lab to try and replicate the issue - I was able to cause the crash simply by adding them to Unimus and subsequently when trying a manual discovery so it seems it's the discovery specifically that's causing this.
The switches that were impacted by this had all been in Unimus for years up to this point without issue and this happened immediately after the upgrade so I can only conclude that something in the upgrade has caused this issue. As a result of this problem we've had to disable management of all of these switches as we can't risk a repeat. I've also been able to replicate the issue every time I do a manual discovery of either of my test switches so this is a consistent problem.
If it helps at all, the switches are mostly on firmware 6.7.2.122.R08 GA.
Any help you can offer here would be much appreciated, thank you!
Andy
Discovery process in 2.6.3 causing Alcatel OS6450 switches to crash and reboot
Hi,
This is most likely a bug in Alcatel firmware that Unimus is triggering during Discovery. We have (sadly) seen issues like this before, where a bug in device firmware was triggered by Unimus.
You should be able to easily replicate the crash with Putty (or any SSH client). Enable "Device Output Logging" (Zones > your_zone > Debug Mode) and run a discovery on one of these switches. After the Discovery job fails (due to the device crashing), download the Device Output Log file. You will be able to see the actual CLI conversation with the switch in the log.
You can then fire the same (last) command you see in the discovery over Putty, and you should be able to replicate the firmware crash.
Alcatel will have to fix this. We can implement mitigations in Unimus (so Unimus doesn't trigger the crash condition). For that, please create a Support Ticket on our Portal, and we can setup a Zoom call to investigate how to mitigate this with you.
This is most likely a bug in Alcatel firmware that Unimus is triggering during Discovery. We have (sadly) seen issues like this before, where a bug in device firmware was triggered by Unimus.
You should be able to easily replicate the crash with Putty (or any SSH client). Enable "Device Output Logging" (Zones > your_zone > Debug Mode) and run a discovery on one of these switches. After the Discovery job fails (due to the device crashing), download the Device Output Log file. You will be able to see the actual CLI conversation with the switch in the log.
You can then fire the same (last) command you see in the discovery over Putty, and you should be able to replicate the firmware crash.
Alcatel will have to fix this. We can implement mitigations in Unimus (so Unimus doesn't trigger the crash condition). For that, please create a Support Ticket on our Portal, and we can setup a Zoom call to investigate how to mitigate this with you.