Page 1 of 1

[Fixed in 1.10.2] Devices stuck in Discover or Backup

Posted: Mon May 13, 2019 9:38 pm
by SteveLamb
We have several instance so Unimus running. it seems that when one instance has > 1000 devices that devices will fail to finish discovery or backup. once they are stuck in this state it is not able to try any further action on this device. restarting the unimus instance or deleting and readding the device seems to resolve the issue.

some of our backups may be very long as we have some switches and routers with significantly large vlan tables.

we are currently running version 1.10.1
below is from the error log when this appears to occur.

Code: Select all

prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    | 2019-04-30 03:07:18.274  WARN 1 --- [  discovery-106] net.unimus.core.api.CoreImpl             : Error during discovery of 10.109.41.5
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    | 
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    | java.lang.IllegalStateException: Can't start StopWatch: it's already running
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at org.springframework.util.StopWatch.start(StopWatch.java:127)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at org.springframework.util.StopWatch.start(StopWatch.java:116)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at net.unimus.core.util.metrics.JobDurationMetrics.startMeasuring(JobDurationMetrics.java:27)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at net.unimus.core.api.CoreImpl$DiscoveryExecutor.doRun(CoreImpl.java:348)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at net.unimus.core.api.CoreImpl$ErrorHandlingExecutor.run(CoreImpl.java:302)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.lang.Thread.run(Thread.java:745)

Re: Devices stuck in Discover or Backup

Posted: Fri May 17, 2019 3:38 am
by Tomas
Just an update on this - we are investigating this issue right now.

We have had 1 additional customer report this as well, so it is definitely something on our end.
It seems this issue is not wide-spread however, as we have had no other report other than these 2.

I will post an update as soon as we have any news.

Re: [Fixed in 1.10.2] Devices stuck in Discover or Backup

Posted: Mon May 20, 2019 3:54 pm
by Tomas
Update:

This issue should now be solved, and the fix will be available in 1.10.2.

Can you please test with the latest 1.10.2 Beta release and let us know if this fixes it for you?
viewtopic.php?p=2172#p2172

Thanks!

Re: [Fixed in 1.10.2] Devices stuck in Discover or Backup

Posted: Thu May 23, 2019 3:52 pm
by SteveLamb
this is working better but not totally fixed. after the first scheduled event we have 3 devices stuck in discovery. running version Version : 1.10.2-Beta2. is there information i can provide that will assist with this.

Thanks

Re: [Fixed in 1.10.2] Devices stuck in Discover or Backup

Posted: Thu May 23, 2019 4:13 pm
by Tomas
We are investigating - we have a report from another custom also that this is not yet fully fixed.

Will provide an update ASAP.

Re: [Fixed in 1.10.2] Devices stuck in Discover or Backup

Posted: Tue Jun 04, 2019 11:36 am
by Tomas
We have found additional rare cases where jobs could get stuck.

Could you please try with the latest Beta build and let us know if you are still seeing issues?
viewtopic.php?p=2172#p2172

Thanks!

Re: [Fixed in 1.10.2] Devices stuck in Discover or Backup

Posted: Mon Jun 10, 2019 9:30 pm
by SteveLamb
good news. on the first run through of 1200 devices i had 0 failed to discover. I will let you know if i see others fail later in the week

this is on beta5

Re: [Fixed in 1.10.2] Devices stuck in Discover or Backup

Posted: Wed Jun 12, 2019 1:24 pm
by SteveLamb
I may have spoken too soon.

we don't have as much of an issue with discovery, after 2 days i am seeing 5 out of ~1200. but 75% of them are sitting stuck in backup.

let me know if i can provide any information that would help with this.

1.10.2-Beta5

Re: [Fixed in 1.10.2] Devices stuck in Discover or Backup

Posted: Thu Jun 13, 2019 3:04 pm
by Tomas
Could we schedule a Webex session to investigate this in detail?
Sent a PM to work out the details.