[Fixed in 1.10.2] Devices stuck in Discover or Backup

Unimus support forum
Post Reply
SteveLamb
Posts: 9
Joined: Fri Dec 22, 2017 3:34 pm

Mon May 13, 2019 9:38 pm

We have several instance so Unimus running. it seems that when one instance has > 1000 devices that devices will fail to finish discovery or backup. once they are stuck in this state it is not able to try any further action on this device. restarting the unimus instance or deleting and readding the device seems to resolve the issue.

some of our backups may be very long as we have some switches and routers with significantly large vlan tables.

we are currently running version 1.10.1
below is from the error log when this appears to occur.

Code: Select all

prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    | 2019-04-30 03:07:18.274  WARN 1 --- [  discovery-106] net.unimus.core.api.CoreImpl             : Error during discovery of 10.109.41.5
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    | 
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    | java.lang.IllegalStateException: Can't start StopWatch: it's already running
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at org.springframework.util.StopWatch.start(StopWatch.java:127)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at org.springframework.util.StopWatch.start(StopWatch.java:116)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at net.unimus.core.util.metrics.JobDurationMetrics.startMeasuring(JobDurationMetrics.java:27)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at net.unimus.core.api.CoreImpl$DiscoveryExecutor.doRun(CoreImpl.java:348)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at net.unimus.core.api.CoreImpl$ErrorHandlingExecutor.run(CoreImpl.java:302)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
prod_unimus_unimus.1.xvshfhsqctw3@test.example.com    |   at java.lang.Thread.run(Thread.java:745)
User avatar
Tomas
Posts: 695
Joined: Sat Jun 25, 2016 12:33 pm

Fri May 17, 2019 3:38 am

Just an update on this - we are investigating this issue right now.

We have had 1 additional customer report this as well, so it is definitely something on our end.
It seems this issue is not wide-spread however, as we have had no other report other than these 2.

I will post an update as soon as we have any news.
User avatar
Tomas
Posts: 695
Joined: Sat Jun 25, 2016 12:33 pm

Mon May 20, 2019 3:54 pm

Update:

This issue should now be solved, and the fix will be available in 1.10.2.

Can you please test with the latest 1.10.2 Beta release and let us know if this fixes it for you?
viewtopic.php?p=2172#p2172

Thanks!
SteveLamb
Posts: 9
Joined: Fri Dec 22, 2017 3:34 pm

Thu May 23, 2019 3:52 pm

this is working better but not totally fixed. after the first scheduled event we have 3 devices stuck in discovery. running version Version : 1.10.2-Beta2. is there information i can provide that will assist with this.

Thanks
User avatar
Tomas
Posts: 695
Joined: Sat Jun 25, 2016 12:33 pm

Thu May 23, 2019 4:13 pm

We are investigating - we have a report from another custom also that this is not yet fully fixed.

Will provide an update ASAP.
Post Reply