[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility chec
From: |
Duan, Zhenzhong |
Subject: |
RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap |
Date: |
Mon, 3 Jun 2024 06:25:52 +0000 |
Hi Michael,
>-----Original Message-----
>From: Michael S. Tsirkin <[email protected]>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>On Fri, Apr 26, 2024 at 03:10:14AM +0000, Duan, Zhenzhong wrote:
>>
>>
>> >-----Original Message-----
>> >From: Cédric Le Goater <[email protected]>
>> >Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>> >compatibility check with host IOMMU cap/ecap
>> >
>> >On 4/25/24 10:46, Duan, Zhenzhong wrote:
>> >> Hi Cédric,
>> >>
>> >>> -----Original Message-----
>> >>> From: Cédric Le Goater <[email protected]>
>> >>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>> >>> compatibility check with host IOMMU cap/ecap
>> >>>
>> >>> Hello Zhenzhong,
>> >>>
>> >>> On 4/18/24 10:42, Duan, Zhenzhong wrote:
>> >>>> Hi Cédric,
>> >>>>
>> >>>>> -----Original Message-----
>> >>>>> From: Cédric Le Goater <[email protected]>
>> >>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>> >>>>> compatibility check with host IOMMU cap/ecap
>> >>>>>
>> >>>>> Hello Zhenzhong
>> >>>>>
>> >>>>> On 4/17/24 11:24, Duan, Zhenzhong wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>> -----Original Message-----
>> >>>>>>> From: Cédric Le Goater <[email protected]>
>> >>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to
>do
>> >>>>>>> compatibility check with host IOMMU cap/ecap
>> >>>>>>>
>> >>>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> -----Original Message-----
>> >>>>>>>>> From: Cédric Le Goater <[email protected]>
>> >>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to
>do
>> >>>>>>>>> compatibility check with host IOMMU cap/ecap
>> >>>>>>>>>
>> >>>>>>>>> Hello,
>> >>>>>>>>>
>> >>>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>> >>>>>>>>>> Hi Cédric,
>> >>>>>>>>>>
>> >>>>>>>>>>> -----Original Message-----
>> >>>>>>>>>>> From: Cédric Le Goater <[email protected]>
>> >>>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework
>to
>> >do
>> >>>>>>>>>>> compatibility check with host IOMMU cap/ecap
>> >>>>>>>>>>>
>> >>>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>> >>>>>>>>>>>> From: Yi Liu <[email protected]>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device)
>> >>> should
>> >>>>>>> not
>> >>>>>>>>>>>> be passed to guest.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Implementation details for different backends will be in
>> >>> following
>> >>>>>>>>> patches.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Signed-off-by: Yi Liu <[email protected]>
>> >>>>>>>>>>>> Signed-off-by: Yi Sun <[email protected]>
>> >>>>>>>>>>>> Signed-off-by: Zhenzhong Duan
><[email protected]>
>> >>>>>>>>>>>> ---
>> >>>>>>>>>>>> hw/i386/intel_iommu.c | 35
>> >>>>>>>>>>> +++++++++++++++++++++++++++++++++++
>> >>>>>>>>>>>> 1 file changed, 35 insertions(+)
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c
>b/hw/i386/intel_iommu.c
>> >>>>>>>>>>>> index 4f84e2e801..a49b587c73 100644
>> >>>>>>>>>>>> --- a/hw/i386/intel_iommu.c
>> >>>>>>>>>>>> +++ b/hw/i386/intel_iommu.c
>> >>>>>>>>>>>> @@ -35,6 +35,7 @@
>> >>>>>>>>>>>> #include "sysemu/kvm.h"
>> >>>>>>>>>>>> #include "sysemu/dma.h"
>> >>>>>>>>>>>> #include "sysemu/sysemu.h"
>> >>>>>>>>>>>> +#include "sysemu/iommufd.h"
>> >>>>>>>>>>>> #include "hw/i386/apic_internal.h"
>> >>>>>>>>>>>> #include "kvm/kvm_i386.h"
>> >>>>>>>>>>>> #include "migration/vmstate.h"
>> >>>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>> >>>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>> >>>>>>>>>>>> return vtd_dev_as;
>> >>>>>>>>>>>> }
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>> >>>>>>>>>>>> + HostIOMMUDevice *hiod,
>> >>>>>>>>>>>> + Error **errp)
>> >>>>>>>>>>>> +{
>> >>>>>>>>>>>> + return 0;
>> >>>>>>>>>>>> +}
>> >>>>>>>>>>>> +
>> >>>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>> >>>>>>>>>>>> + HostIOMMUDevice *hiod,
>> >>>>>>>>>>>> + Error **errp)
>> >>>>>>>>>>>> +{
>> >>>>>>>>>>>> + return 0;
>> >>>>>>>>>>>> +}
>> >>>>>>>>>>>> +
>> >>>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>> >>>>>>>>> VTDHostIOMMUDevice
>> >>>>>>>>>>> *vtd_hdev,
>> >>>>>>>>>>>> + Error **errp)
>> >>>>>>>>>>>> +{
>> >>>>>>>>>>>> + HostIOMMUDevice *hiod = vtd_hdev->dev;
>> >>>>>>>>>>>> +
>> >>>>>>>>>>>> + if (object_dynamic_cast(OBJECT(hiod),
>> >>> TYPE_HIOD_IOMMUFD))
>> >>>>> {
>> >>>>>>>>>>>> + return vtd_check_iommufd_hdev(s, hiod, errp);
>> >>>>>>>>>>>> + }
>> >>>>>>>>>>>> +
>> >>>>>>>>>>>> + return vtd_check_legacy_hdev(s, hiod, errp);
>> >>>>>>>>>>>> +}
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> I think we should be using the .get_host_iommu_info() class
>> >>> handler
>> >>>>>>>>>>> instead. Can we refactor the code slightly to avoid this check
>on
>> >>>>>>>>>>> the type ?
>> >>>>>>>>>>
>> >>>>>>>>>> There is some difficulty ini avoiding this check, the behavior
>of
>> >>>>>>>>> vtd_check_legacy_hdev
>> >>>>>>>>>> and vtd_check_iommufd_hdev are different especially after
>> >>> nesting
>> >>>>>>>>> support introduced.
>> >>>>>>>>>> vtd_check_iommufd_hdev() has much wider check over
>> >cap/ecap
>> >>> bits
>> >>>>>>>>> besides aw_bits.
>> >>>>>>>>>
>> >>>>>>>>> I think it is important to fully separate the vIOMMU model
>from
>> >the
>> >>>>>>>>> host IOMMU backing device.
>> >>>>>
>> >>>>> This comment is true for the structures also.
>> >>>>>
>> >>>>>>>>> Could we introduce a new HostIOMMUDeviceClass
>> >>>>>>>>> handler .check_hdev() handler, which would
>> >>>>> call .get_host_iommu_info() ?
>> >>>>>
>> >>>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO
>> >should
>> >>> be
>> >>>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes
>> >>>>> for the different backends. Each .get_host_iommu_info()
>> >implementation
>> >>>>> would translate the specific host iommu device data presentation
>> >>>>> into the common 'HostIOMMUDeviceInfo', this is true for
>> >host_aw_bits.
>> >>>>
>> >>>> I see, it's just not easy to define the unified elements in
>> >>> HostIOMMUDeviceInfo
>> >>>> so that they maps to bits or fields in host return IOMMU info.
>> >>>
>> >>> The proposal is adding a vIOMMU <-> HostIOMMUDevice interface
>and a
>> >>> new
>> >>> API needs to be completely defined for it. The IOMMU backend
>> >>> implementation
>> >>> could be anything, legacy, iommufd, iommufd v2, some other
>framework
>> >>> and
>> >>> the vIOMMU shouldn't be aware of its implementation.
>> >>>
>> >>> Exposing the kernel structures as done below should be avoided
>because
>> >>> they are part of the QEMU <-> kernel IOMMUFD interface.
>> >>>
>> >>>
>> >>>> Different platform returned host IOMMU info is platform specific.
>> >>>> For vtd and siommu:
>> >>>>
>> >>>> struct iommu_hw_info_vtd {
>> >>>> __u32 flags;
>> >>>> __u32 __reserved;
>> >>>> __aligned_u64 cap_reg;
>> >>>> __aligned_u64 ecap_reg;
>> >>>> };
>> >>>>
>> >>>> struct iommu_hw_info_arm_smmuv3 {
>> >>>> __u32 flags;
>> >>>> __u32 __reserved;
>> >>>> __u32 idr[6];
>> >>>> __u32 iidr;
>> >>>> __u32 aidr;
>> >>>> };
>> >>>>
>> >>>> I can think of two kinds of declaration of HostIOMMUDeviceInfo:
>> >>>>
>> >>>> struct HostIOMMUDeviceInfo {
>> >>>> uint8_t aw_bits;
>> >>>> enum iommu_hw_info_type type;
>> >>>> union {
>> >>>> struct iommu_hw_info_vtd vtd;
>> >>>> struct iommu_hw_info_arm_smmuv3;
>> >>>> ......
>> >>>> } data;
>> >>>> }
>> >>>>
>> >>>> or
>> >>>>
>> >>>> struct HostIOMMUDeviceInfo {
>> >>>> uint8_t aw_bits;
>> >>>> enum iommu_hw_info_type type;
>> >>>> __u32 flags;
>> >>>> __aligned_u64 cap_reg;
>> >>>> __aligned_u64 ecap_reg;
>> >>>> __u32 idr[6];
>> >>>> __u32 iidr;
>> >>>> __u32 aidr;
>> >>>> ......
>> >>>> }
>> >>>>
>> >>>> Not clear if any is your expected format.
>> >>>>
>> >>>>> 'type' could be handled the same way, with a
>'HostIOMMUDeviceInfo'
>> >>>>> type attribute and host iommu device type definitions, or as you
>> >>>>> suggested with a QOM interface. This is more complex however. In
>> >>>>> this case, I would suggest to implement a .compatible() handler to
>> >>>>> compare the host iommu device type with the vIOMMU type.
>> >>>>>
>> >>>>> The resulting check_hdev routine would look something like :
>> >>>>>
>> >>>>> static int vtd_check_hdev(IntelIOMMUState *s,
>> >VTDHostIOMMUDevice
>> >>>>> *vtd_hdev,
>> >>>>> Error **errp)
>> >>>>> {
>> >>>>> HostIOMMUDevice *hiod = vtd_hdev->dev;
>> >>>>> HostIOMMUDeviceClass *hiodc =
>> >>>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>> >>>>> HostIOMMUDevice info;
>> >>>>> int host_aw_bits, ret;
>> >>>>>
>> >>>>> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info),
>errp);
>> >>>>> if (ret) {
>> >>>>> return ret;
>> >>>>> }
>> >>>>>
>> >>>>> ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>> >>>>> if (ret) {
>> >>>>> return ret;
>> >>>>> }
>> >>>>>
>> >>>>> if (s->aw_bits > info.aw_bits) {
>> >>>>> error_setg(errp, "aw-bits %d > host aw-bits %d",
>> >>>>> s->aw_bits, info.aw_bits);
>> >>>>> return -EINVAL;
>> >>>>> }
>> >>>>> }
>> >>>>>
>> >>>>> and the HostIOMMUDeviceClass::is_compatible() handler would
>call a
>> >>>>> vIOMMUInterface::compatible() handler simply returning
>> >>>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?
>> >>>>
>> >>>> Not quite get what HostIOMMUDeviceClass::is_compatible() does.
>> >>>
>> >>> HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU
>> >backend
>> >>> to determine which IOMMU types are exposed by the host, then calls
>the
>> >>> vIOMMUInterface::compatible() handler to do the compare. API is to
>be
>> >>> defined.
>> >>>
>> >>> As a refinement, we could introduce in the vIOMMU <->
>> >HostIOMMUDevice
>> >>> interface capabilities, or features, to check more precisely the level
>> >>> of compatibility between the vIOMMU and the host IOMMU device.
>This
>> >is
>> >>> similar to what is done between QEMU and KVM.
>> >>>
>> >>> If you think this is too complex, include type in HostIOMMUDeviceInfo.
>> >>>
>> >>>> Currently legacy and IOMMUFD host device has different check logic,
>> >how
>> >>> it can help
>> >>>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev()
>> >into
>> >>> a single vtd_check_hdev()?
>> >>>
>> >>> IMHO, IOMMU shouldn't be aware of the IOMMU backend
>> >implementation,
>> >>> but
>> >>> if you think the Intel vIOMMU should access directly the iommufd
>> >backend
>> >>> when available, then we should drop this proposal and revisit the
>design
>> >>> to take a different approach.
>> >>
>> >> I implemented a draft following your suggestions so we could explore
>> >further.
>> >> See
>> >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_
>pre
>> >q_v3_tmp
>> >>
>> >> In this draft, it uses .check_cap() to query
>HOST_IOMMU_DEVICE_CAP_xxx
>> >> just like KVM CAPs.
>> >> A common HostIOMMUDeviceCaps structure is introduced to be used
>by
>> >> both legacy and iommufd backend.
>> >>
>> >> It indeed is cleaner. Only problem is I failed to implement .compatible()
>> >> as all the check could go ahead by just calling check_cap().
>> >> Could you help a quick check to see if I misunderstood any of your
>> >suggestion?
>> >
>> >Thanks for the changes. It looks cleaner and simpler ! Some comments,
>> >
>> >* HostIOMMUDeviceIOMMUFDClass seems useless as it is empty. I don't
>> > remember if you told me already you had plans for future changes.
>> > Sorry about that if this is the case. I forgot.
>>
>> Never mind😊, reason is:
>>
>> In nesting series
>>
>https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_nesting
>_rfcv2/
>> This commit
>>
>https://github.com/yiliu1765/qemu/commit/581fc900aa296988eaa48abee
>6d68d3670faf8c9
>> implement [at|de]tach_hwpt handlers.
>>
>> So I add an extra layer of abstract HostIOMMUDeviceIOMMUFDClass to
>define
>> [at|de]tach_hwpt handlers.
>>
>> >
>> >* I would use the 'host_iommu_device_' prefix for external routines
>> > which are part of the HostIOMMUDevice API and use 'hiod_' for
>> > internal routines where it makes sense, to limit the name length for
>> > instance.
>>
>> Good idea, will do.
>>
>> >
>> >* I would rename HOST_IOMMU_DEVICE_CAP_IOMMUFD_V1 to
>> > HOST_IOMMU_DEVICE_CAP_IOMMUFD. I mentioned IOMMUFD v2 as
>a
>> > theoretical example of a different IOMMU interface. I don't think we
>> > need to anticipate yet :)
>>
>> Will do.
>>
>> >
>> >* HostIOMMUDeviceCaps is using 'enum iommu_hw_info_type' from
>> > 'linux/iommufd.h', that's not my preferred choice but I won't
>> > object. The result looks good.
>>
>> Ok, will keep it for now. We can change when you want in future.
>>
>> >
>> >* HostIOMMUDevice now has a realize() routine to query the host
>IOMMU
>> > capability for later usage. This is a good idea. However, you could
>> > change the return value to bool and avoid the ERRP_GUARD() prologue.
>>
>> Will do.
>>
>> >
>> >* Beware of :
>> >
>> > struct Range {
>> > /*
>> > * Do not access members directly, use the functions!
>> > * A non-empty range has @lob <= @upb.
>> > * An empty range has @lob == @upb + 1.
>> > */
>> > uint64_t lob; /* inclusive lower bound */
>> > uint64_t upb; /* inclusive upper bound */
>> > };
>>
>> I remember😊, will add the change in formal version.
>>
>> >
>> >
>> >* I think you could introduce a new VFIOIOMMUClass attribute. Let's
>> > call it VFIOIOMMUClass::hiod_typename. The creation of
>> >HostIOMMUDevice
>> > would become generic and you could move :
>> >
>> > hiod=
>> >HOST_IOMMU_DEVICE(object_new(TYPE_HOST_IOMMU_DEVICE_LEGAC
>Y_V
>> >FIO));
>> > HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev,
>errp);
>> > if (*errp) {
>> > object_unref(hiod);
>> > return -EINVAL;
>> > }
>> > vbasedev->hiod = hiod;
>> >
>> > at the end of vfio_attach_device().
>>
>> Good suggestion! Will do.
>>
>> Thanks
>> Zhenzhong
>
>
>So I'm expecting v3 of this.
Just posted v6,
https://lists.gnu.org/archive/html/qemu-devel/2024-06/msg00102.html
Comments appreciated!
Thanks
Zhenzhong