Add volumes in 'Expunging' state to storage cleanup thread and during delete storage pool#12602
Add volumes in 'Expunging' state to storage cleanup thread and during delete storage pool#12602sureshanaparti wants to merge 2 commits intoapache:4.20from
Conversation
… delete storage pool
dd23e40 to
ca2552d
Compare
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.20 #12602 +/- ##
============================================
- Coverage 16.26% 16.25% -0.02%
+ Complexity 13429 13420 -9
============================================
Files 5661 5662 +1
Lines 500010 500162 +152
Branches 60715 60732 +17
============================================
- Hits 81331 81300 -31
- Misses 409606 409778 +172
- Partials 9073 9084 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16722 |
|
@blueorangutan test |
|
@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-15415)
|
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16868 |
kiranchavala
left a comment
There was a problem hiding this comment.
LGTM
Tested manually
- Change the global setting
”storage.cleanup.delay” and storage.cleanup.interval to 200
-
Launch a volume in cloudstack
-
Destroy the volume in cloudstack , make sure its in Destroy state
Example
mysql> select id,name,state,removed from volumes where name="test2" ;
+----+-------+---------+---------+
| id | name | state | removed |
+----+-------+---------+---------+
| 6 | test2 | Destroy | NULL |
+----+-------+---------+---------+
1 row in set (0.00 sec)
- Set the volume state to expunging
mysql> UPDATE volumes SET state = 'Expunging' WHERE id = 6;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
-
Stop the cloudstack service
-
Start the cloudstack service
-
Check the logs
-
Cloudstack cleaned up the volumes which were in expunging state
mysql> select id,name,state,removed from volumes where id=10;
+----+--------+-----------+---------+
| id | name | state | removed |
+----+--------+-----------+---------+
| 10 | test | Expunging | NULL |
+----+--------+-----------+---------+
1 row in set (0.00 sec)
mysql> select id,name,state,removed from volumes where id=10;
+----+--------+----------+---------------------+
| id | name | state | removed |
+----+--------+----------+---------------------+
| 10 | test | Expunged | 2026-02-18 10:26:44 |
+----+--------+----------+---------------------+
logs
root@ref-trl-11081-k-Mol8-kiran-chavala-mgmt1 ~]# cat /var/log/cloudstack/management/management-server.log |grep -i "logid:4c2827f6"
2026-02-18 10:26:43,885 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Storage pool garbage collector found [0] Templates to be cleaned up in storage pool [StoragePool {"id":1,"name":"ref-trl-11081-k-Mol8-kiran-chavala-kvm-pri1","poolType":"NetworkFilesystem","uuid":"48025699-3b4f-3165-ab47-5707e5511771"}].
2026-02-18 10:26:43,887 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Storage pool garbage collector found [0] Templates to be cleaned up in storage pool [StoragePool {"id":2,"name":"ref-trl-11081-k-Mol8-kiran-chavala-kvm-pri2","poolType":"NetworkFilesystem","uuid":"b730a16f-f034-3a44-b659-9a169aabd970"}].
2026-02-18 10:26:43,889 DEBUG [c.c.s.StatsCollector] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Verifying image storage [ImageStore {"id":1,"name":"NFS:\/\/10.0.32.4\/acs\/secondary\/ref-trl-11081-k-Mol8-kiran-chavala\/ref-trl-11081-k-Mol8-kiran-chavala-sec1","uuid":"7e5d0bb2-f8ea-497e-bfff-f9f9fd3da59f"}]. Capacity: total=[2.6357 TB], used=[1.5451 TB], threshold=[95.00%].
2026-02-18 10:26:43,890 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Secondary storage garbage collector found 0 Templates to cleanup on template_store_ref for store: ImageStore {"id":1,"name":"NFS:\/\/10.0.32.4\/acs\/secondary\/ref-trl-11081-k-Mol8-kiran-chavala\/ref-trl-11081-k-Mol8-kiran-chavala-sec1","uuid":"7e5d0bb2-f8ea-497e-bfff-f9f9fd3da59f"}
2026-02-18 10:26:43,890 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Secondary storage garbage collector found 0 snapshots to cleanup on snapshot_store_ref for store: ImageStore {"id":1,"name":"NFS:\/\/10.0.32.4\/acs\/secondary\/ref-trl-11081-k-Mol8-kiran-chavala\/ref-trl-11081-k-Mol8-kiran-chavala-sec1","uuid":"7e5d0bb2-f8ea-497e-bfff-f9f9fd3da59f"}
2026-02-18 10:26:43,892 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Secondary storage garbage collector found 0 volumes to cleanup on volume_store_ref for store: ImageStore {"id":1,"name":"NFS:\/\/10.0.32.4\/acs\/secondary\/ref-trl-11081-k-Mol8-kiran-chavala\/ref-trl-11081-k-Mol8-kiran-chavala-sec1","uuid":"7e5d0bb2-f8ea-497e-bfff-f9f9fd3da59f"}
2026-02-18 10:26:43,900 INFO [o.a.c.s.v.VolumeServiceImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Volume VolumeObject {"dataStore":"StoragePool {\"id\":1,\"name\":\"ref-trl-11081-k-Mol8-kiran-chavala-kvm-pri1\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"48025699-3b4f-3165-ab47-5707e5511771\"}","volumeVO":"Volume {\"id\":10,\"instanceId\":null,\"name\":\"testgd\",\"uuid\":\"450d3d1c-1232-4f82-b5ad-2e7a1a2ec400\",\"volumeType\":\"DATADISK\"}"} is already in Expunging, retrying
2026-02-18 10:26:43,939 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) getCommandHostDelegation: class org.apache.cloudstack.storage.command.DeleteCommand
2026-02-18 10:26:43,941 DEBUG [c.c.h.XenServerGuru] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) We are returning the default host to execute commands because the command is not of Copy type.
2026-02-18 10:26:43,941 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Wait time setting on org.apache.cloudstack.storage.command.DeleteCommand is 1800 seconds
2026-02-18 10:26:43,942 DEBUG [c.c.a.m.ClusteredAgentAttache] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Seq 2-1107885508333142034: Routed from 32987932525600
2026-02-18 10:26:43,943 DEBUG [c.c.a.t.Request] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Seq 1-1107885508333142034: Sending { Cmd , MgmtId: 32987932525600, via: 1(ref-trl-11081-k-Mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"450d3d1c-1232-4f82-b5ad-2e7a1a2ec400","volumeType":"DATADISK","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"48025699-3b4f-3165-ab47-5707e5511771","name":"ref-trl-11081-k-Mol8-kiran-chavala-kvm-pri1","id":"1","poolType":"NetworkFilesystem","host":"10.0.32.4","path":"/acs/primary/ref-trl-11081-k-Mol8-kiran-chavala/ref-trl-11081-k-Mol8-kiran-chavala-kvm-pri1","port":"2049","url":"NetworkFilesystem://10.0.32.4/acs/primary/ref-trl-11081-k-Mol8-kiran-chavala/ref-trl-11081-k-Mol8-kiran-chavala-kvm-pri1/?ROLE=Primary&STOREUUID=48025699-3b4f-3165-ab47-5707e5511771","isManaged":"false"}},"name":"testgd","size":"(100.00 GB) 107374182400","path":"450d3d1c-1232-4f82-b5ad-2e7a1a2ec400","volumeId":"10","accountId":"2","format":"QCOW2","provisioningType":"THIN","poolId":"1","id":"10","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false","followRedirects":"false"}},"wait":"0","bypassHostMaintenance":"false"}}] }
2026-02-18 10:26:44,075 DEBUG [c.c.a.t.Request] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Seq 1-1107885508333142034: Received: { Ans: , MgmtId: 32987932525600, via: 1(ref-trl-11081-k-Mol8-kiran-chavala-kvm1), Ver: v1, Flags: 10, { Answer } }
2026-02-18 10:26:44,083 INFO [o.a.c.s.v.VolumeServiceImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Volume VolumeObject {"dataStore":"StoragePool {\"id\":1,\"name\":\"ref-trl-11081-k-Mol8-kiran-chavala-kvm-pri1\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"48025699-3b4f-3165-ab47-5707e5511771\"}","volumeVO":"Volume {\"id\":10,\"instanceId\":null,\"name\":\"testgd\",\"uuid\":\"450d3d1c-1232-4f82-b5ad-2e7a1a2ec400\",\"volumeType\":\"DATADISK\"}"} is not referred anywhere, remove it from volumes table
2026-02-18 10:26:44,083 DEBUG [c.c.s.d.VolumeDaoImpl] (StorageManager-Scavenger-1:[ctx-212727e0]) (logid:4c2827f6) Removing volume 10 from DB
Description
This PR adds volumes in 'Expunging' state to storage cleanup thread and during delete storage pool.
There are some scenarios where volumes that get stuck in Expunging due to failure in the Expunge flow (and no failure callback received - due to Management server crash/restart, network loss etc), but not reverting back to Destroy state. They remain stuck in that state and expunge on them is never attempted leaving these volumes on the storage.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?