Safe tear down to prevent Infiniband partition access leakage between VM tenants
Device: mellanox cx6 with mlx5_core driver for the pf, and vfio-pci for the vf for passthrough to VMs
Situation: I have a VM with a passed-through VF, with the VF guid added to partition A. I want to tear this down, add the VF guid to partition B and start up a new VM without leaking access to partition A to the second tenant.
Ideas:
- Read back pkey from the VF after adding its guid to the new partition in SM (UFM): This is tricky because I don't want to unbind the VF from vfio-pci each time a new VM spins up. neither mlx5_core nor doca-ofed expose the VF pkey through sriov/ subtree in PF sysfs.
- Just change the GUID wholesale: this should work, but I'm worried about a race condition between the new vm starting and the SM sweep. I haven't been able to produce this race while testing, but I'm not certain it's impossible.
Ideally, I'd be able to read back the vf and vport state on the host before starting a new VM. I think that would solve my problems, but I've not figured out how to achieve that.
Top Answer/Comment:
To achieve tenant isolation, I do not feel it is sufficient to use the same GUID for different partitions and rely on the SM sweep being done before starting the next VM. It is not about how simple it might be to reproduce the race condition, but whether isolation depends on timing at all.
The best method to use is to consider the GUID a component of the tenant identity itself and obtain a new one whenever a VF changes tenants. The sequence of events will be as follows:
- Shut down the VM.
- Remove the old GUID from partition A.
- Assign a new GUID to the VF.
- Wait for the SM to discover the new GUID and program the new
membership.
- Start the new VM.
Updating the GUID will make sure that the next tenant will not have an old partition membership that may be related to the GUID being used previously.
Even after doing this, I wouldn’t begin to launch the virtual machine unless the fabric is in a state of convergence. This is because the partition enforcement on InfiniBand happens according to the knowledge of the SM about the fabric, and hence, there is always going to be a time delay between configuration changes and the sweeps.
On the issue of verification, getting the active P_Key information of the VF through the host while it’s still attached to vfio-pci is hard since vfio prevents the device from being driven by the kernel driver. For this reason, your best bet is probably the SM; verify that:
- The old GUID is no longer a member of partition A.
- The new GUID is a member of partition B.
- The SM has completed processing the change.
From the viewpoint of security, the essential rule would be that isolation is achieved using verified SM states and not on any assumptions made about the timing of sweeps and/or caching of VF states. Utilizing a unique GUID in each assignment and verifying convergence ensures that a newly introduced tenant cannot acquire rights through inheritance of partition membership.
상단 광고의 [X] 버튼을 누르면 내용이 보입니다