/*
 * ============================
 * MSI MESSAGE FORMATS (on x86)
 * ============================
 *
 * Message Signaled Interrupts are simply DMA transactions from the device.
 * It really is just "write <these> 32 bits <here> when you want attention."
 * The MSI (or MSI-X) message configured in the device is just the 64 bits of
 * the address to write to, and the 32 bits to write there.
 *
 * You can use this to do polled I/O by telling the device to write into a
 * data structure of your own choosing, then checking to see when it does so.
 *
 * Or you can tell the device to poke at MMIO on *another* device, for example
 * when it's finished receiving a packet and it's time for the next device to
 * process that packet.
 *
 * Of course, the way it's *supposed* to be used is to poke MMIO on another
 * device whose *sole* purpose is to raise an interrupt to the CPU.
 *
 * It's mostly been forgotten now, but on Intel chipsets used with the Pentium
 * and P6 family CPUs, the MMIO device used for this was the I/O APIC. There
 * was a "IRQ Pin Assertion Register" at 0xFEC00020, and a device could write
 * a pin number to that register to artificially assert an input pin. So
 * devices could be configured to use this, and as far as the rest of the
 * system was concerned it would be as if they actually had a line interrupt
 * wired to the corresponding pin on the I/O APIC. The I/O APIC would then
 * send the interrupt to the CPU via the APIC serial bus, just like for true
 * line interrupts.
 *
 * For Pentium 4 and Xeon onwards, Intel moved away from the APIC serial bus
 * and started to use the main system bus for interrupts. Devices can now
 * issue MMIO writes directly to the APIC at address 0x00000000FEExxxxx.
 *
 * When the APIC receives a write transaction across the system bus, it looks
 * at the low 20 bits of the address as well as the data being written. These
 * convey all the information about which interrupt vector to raise on which
 * CPU, and a few more details besides. Some of those details include special
 * cases like cluster delivery modes and ways to deliver NMI/INIT/etc. which
 * we won't go into here.
 *
 * This is MSI as we currently know it, and even the I/O APIC now effectively
 * turns line interrupts into MSIs by sending them on the system bus this way.
 *
 *
 * Compatibility Format
 * --------------------
 *
 * Originally, there was only one way of interpreting the bits in the MSI
 * message. This is what Intel documentation now calls "Compatibility Format"
 * (§5.1.2.1 of the VT-d spec). It is as follows:
 *
 * Address: 1111.1110.1110.dddd.dddd.0000.0000.rmxx
 *               0xFEE    . Dest ID .  Rsvd   .↑↑↑
 *                                             ||└-Don't Care
 *                                             |└-Destination Mode
 *                                             └-Redirection Hint
 *
 * Data:    0000.0000.0000.0000.TL00.0DDD.vvvv.vvvv
 *               Reserved      .↑↑     ↑ .  Vector
 *                              ||     └-Delivery Mode
 *                              |└-Trigger Mode Level
 *                              └-Trigger Mode
 *
 * Crucially, this format has only 8 bits for the Destination ID. Since 0xFF
 * is the broadcast address, this allows only up to 255 CPUs to be supported.
 *
 * For many years the Reserved bits in bit 4-11 of the address were labelled
 * in some Intel documentation as "Extended Destination ID", but never used.
 *
 * The vector to be delivered to the destination CPU is in the low bits of the
 * data. For devices with multiple interrupts, modern PCI MSI-X allows the
 * full address+data bits for each one to be configured independently, so they
 * can target arbitrary vectors on arbitrary CPUs.
 *
 * However, the older PCI multi-MSI standard only allowed the base MSI to be
 * configured, and every additional interrupt supported by the device was
 * signalled just by adding to the value of the data field. This means that
 * multi-MSI devices could raise a set of consecutive vectors on the *same*
 * CPU for different interrupts, but not raise interrupts to different CPUs.
 *
 *
 * I/O APIC Redirection Table Entries
 * ----------------------------------
 *
 * As noted above, the I/O APIC is now just a device for turning line-level
 * interrupts into MSI messages. Each pin on the I/O APIC has a Redirection
 * Table Entry (RTE) which configures the MSI message to be sent.
 *
 * The 64 bits in the original definition of the I/O APIC RTE map to all the
 * fields of the resulting MSI, including the Extended Destination ID. It's
 * just that they appear to have been shuffled into a strange order, because
 * back in the mists of time they actually corresponded more closely to the
 * message format on the APIC serial bus.
 *
 * RTE[63-32]: dddd.dddd.eeee.eeee.xxxx.xxxx.xxxx.xxxx
 *              Dest ID .ExtDestId.    Reserved
 *
 * RTE[31-0]:  xxxx.xxxx.xxxx.xxxM.TRPs.mDDD.vvvv.vvvv
 *                               ↑ ↑↑↑↑ ↑ ↑ . Vector
 *                               | |||| | └-Delivery Mode
 *                               | |||| └-Destination Mode
 *                               | |||└-Delivery Status (RO)
 *                               | ||└-Pin polarity
 *                               | |└-Remote IRR (RO)
 *                               | └-Trigger Mode
 *                               └- Mask
 *
 * These days, the field definitions are largely fictional because the I/O
 * APIC doesn't actually interpret most of those bits, and just passes them on
 * in an MSI message (with an important caveat noted below). The definitions
 * still make sense when the MSI generated by the I/O APIC is received as a
 * Compatibility Format MSI by a standard APIC, but when it is received by an
 * IOMMU and interpreted as a different format (as described later), they make
 * a lot less sense. It's much better to think of the RTE just as a weird
 * arrangement of the bits of the MSI message which will be generated, with
 * some remaining fields which *are* still used by the I/O APIC itself (mask,
 * polarity, status etc.):
 *
 * RTE[63-32]: aaaa.aaaa.aaaa.aaaa.xxxx.xxxx.xxxx.xxxx
 *             MSI Address [20-4] .    Don't Care
 *
 * RTE[31-0]:  xxxx.xxxx.xxxx.xxxM.DRPs.Addd.dddd.dddd
 *                               ↑ ↑↑↑↑ ↑  MSI Data[11-0]
 *                               | |||| |
 *                               | |||| └- MSI Address[3]
 *                               | |||└-Delivery Status (RO)
 *                               | ||└-Pin polarity
 *                               | |└-Remote IRR (RO)
 *                               | └-MSI Data[15]
 *                               └- Mask
 *
 *
 * You can see this in VMMs like QEMU, where the I/O APIC emulation just takes
 * the RTE and swizzles the bits around to create address+data of an MSI
 * message, adding the standard 0xFEExxxxx to the generated address. QEMU then
 * literally forwards that MSI as memory transaction in the physical address
 * space to which the I/O APIC is attached. The memory transaction is then
 * passed through the standard address decoding just as DMA writes from
 * devices would be. It is ultimately received and handled by either the APIC
 * or the IOMMU which handles the corresponding address space.
 *
 * Conversely, operating systems can configure the I/O APIC RTE by first
 * composing an MSI message in the format expected by the upstream APIC or
 * IOMMU which will receive it, and then just swizzling the bits into the
 * appropriate places.
 *
 * (Some operating systems, including old versions of Linux, instead have
 * complex special cases within the I/O APIC code, with special knowledge of
 * the upstream IOMMU formats. Or hooks into the IOMMU drivers to generate I/O
 * APIC RTEs directly, instead of just composing an MSI message the generic
 * way and deriving the RTE from that.)
 *
 * There is a caveat to this simplicity though, and it has to do with the way
 * that the I/O APIC handles level-triggered interrupts. When the interrupt is
 * first asserted, the I/O APIC sends the MSI message upstream to be handled.
 * Upon completing the interrupt, the CPU sends an "End of Interrupt" (EOI) to
 * the I/O APIC. At that point, the I/O APIC needs to send a new interrupt if
 * the level on the input pin is still asserted.
 *
 * The EOI from the CPU tells the I/O APIC which *vector* the CPU has finished
 * processing. And thus the I/O APIC still looks at the low 8 bits of the RTE,
 * which correspond to the low 8 bits of the MSI data, to determine which
 * interrupt is being EOI'd. So even if the IOMMU receiving the MSI message
 * does not even care about the contents of those bits (e.g. the Intel IOMMU
 * as described below), the operating system still needs to put appropriate
 * values in those bits for level-triggered interrupts. Likewise, bit 15 of
 * the RTE, which corresponds to bit 15 of the MSI data, is the bit which
 * indicates that a given pin is level-triggered.
 *
 *
 * Intel "Remappable Format"
 * -------------------------
 *
 * When Intel started supporting more than 255 CPUs, the 8-bit limit in what
 * was not yet called "Compatibility Format" became a problem. To support
 * the full 32 bits of logical x2APIC IDs they had to come up with another
 * solution. Since MSIs are basically just a DMA write, the logical place for
 * this was the IOMMU, which already intercepts DMA writes from devices. So
 * they invented "Interrupt Remapping". The "Remappable Format" MSI does not
 * directly encode which vector to send to which CPU; instead it just
 * identifies an index into an IOMMU table (the Interrupt Remapping Table).
 *
 * The Interrupt Remapping Table Entry (IRTE) contains all the information
 * which was once present in the MSI address+data, but allows for a full 32
 * bits of destination ID. (It can also be used for posted interrupts,
 * delivering the interrupt *directly* to a vCPU in guest mode).
 *
 * To signal a Remappable Format MSI, Intel used bit 4 of the MSI address,
 * which is the lowest of the bits which were previously labelled "Extended
 * Destination ID". With an Intel IOMMU doing Interrupt Remapping, devices
 * can send both Remappable Format MSIs, *and* Compatibility Format, and the
 * IOMMU will only actually remap the former. (It can be told to block the
 * latter, for security reasons.)
 *
 * Intel calls the IRTE index the "handle". In the simple case, the full 15
 * bits of the handle are conveyed in the address of the MSI (bits 19-5 and
 * bit 2), and the data written to that address is completely ignored.
 *
 * However, this would not support the legacy multi-MSI devices which only
 * have one MSI address/data configuration register and simply add one to the
 * data for each consecutive interrupt source. So the Intel IOMMU also has an
 * optional "subhandle" in the low bits of the data. If bit 3 of the address
 * (Subhandle Valid) is set, the IOMMU adds this subhandle to the handle
 * extracted from the address, and uses the result as the index into its
 * Interrupt Remapping Table. This even allows legacy multi-MSI devices to
 * target different CPUs with their different interrupt sources, which they
 * could not before.
 *
 * Address: 1111.1110.1110.hhhh.hhhh.hhhh.hhh1.shxx
 *               0xFEE    .   Handle[14:0]    .↑↑↑
 *                                             ||└-Don't Care
 *                                             |└-Handle[15]
 *                                             └-Subhandle Valid (SHV)
 *
 * Data:    0000.0000.0000.0000.ssss.ssss.ssss.ssss
 *               Reserved      .  Subhandle (if SHV==1 in address)
 *
 * As described earlier, the I/O APIC has legacy reasons to care about the
 * bits which end up in bits 7-0 and bit 15 of the data, which were once the
 * vector and trigger mode respectively. Since the operating system has no
 * need to set SHV=1 for MSIs generated by the I/O APIC, the IOMMU can ignore
 * the data completely, and the operating system is free to place whatever
 * values it likes in there to keep the I/O APIC happy for level-triggered
 * interrupts.
 *
 *
 * AMD Remappable MSI
 * ------------------
 *
 * AMD's IOMMU is completely different to Intel's, and they didn't make
 * things anywhere near as complicated. When the IOMMU is enabled, a
 * device cannot send "Compatibility Format" MSIs any more, so there is
 * no need to tell one format from the other. AMD just used the low 11
 * bits of the data as the IRTE index, and nothing else matters.
 *
 * Address: 1111.1110.1110.xxxx.xxxx.xxxx.xxxx.xxxx
 *               0xFEE    .       Don't Care
 *
 * Data:    xxxx.xxxx.xxxx.xxxx.xxxx.xiii.iiii.iiii
 *               Don't Care            IRTE Index
 *
 * The reason for using only 11 bits of IRTE index is because, as described
 * above, the I/O APIC actually *does* care about bit 11 of the MSI data, (or,
 * more accurately, it cares about the RTE bit which gets shuffled into bit 11
 * of the MSI data). That's the original "Trigger Mode" bit, which lets the
 * I/O APIC know that this is a level-triggered interrupt.
 *
 * Although the Intel IOMMU has a single Interrupt Remapping Table and a
 * single number space for IRTE indices across the whole system, the AMD
 * IOMMU has a table per device — so multiple devices may use IRTE index
 * number zero, for example. This, sadly, becomes important later.
 *
 *
 * The 15-bit MSI extension
 * ------------------------
 *
 * The problem with IOMMUs is that they were designed to support DMA
 * translation, and there is no architectural way to disable that and offer
 * guests an IOMMU which *only* supports Interrupt Remapping. We really don't
 * want guests doing their own DMA translation, as it has severe performance
 * and security implications.
 *
 * So KVM, Hyper-V and Xen all define a virt extension which uses 7 of the
 * original "Extended Destination ID" bits to give support for up to 32768
 * virtual CPUs. (This extension avoids the low bit which Intel used to
 * indicate Remappable Format). This format is exactly like the Compatibility
 * Format, except that bits 5-11 of the MSI address are used as bits 8-15
 * of the destination APIC ID:
 *
 * Address: 1111.1110.1110.dddd.dddd.DDDD.DDD0.rmxx
 *               0xFEE    . Dest ID .ExtDest  .↑↑↑
 *                                             ||└-Don't Care
 *                                             |└-Destination Mode
 *                                             └-Redirection Hint
 *
 * Data:    0000.0000.0000.0000.TL00.0DDD.vvvv.vvvv
 *               Reserved      .↑↑     ↑ .  Vector
 *                              ||     └-Delivery Mode
 *                              |└-Trigger Mode Level
 *                              └-Trigger Mode
 *
 * We have thus far mostly glossed over the distinction between logical and
 * physical destination IDs, indicated by the Destination Mode bit, because
 * these MSI formats are merely a transport for that information and have
 * little to do with its interpretation.
 *
 * However, we should note that in certain cases, the distinction between
 * logical and physical mode does matter. In x2APIC mode, each logical
 * "cluster" contains 16 CPUs. Logical mode addressing splits the 32-bit
 * destination ID into two parts; the top 16 bits contain the "cluster ID",
 * which is the physical APIC ID divided by 16. The low 16 bits are a bitmask
 * of which CPUs within that cluster should be eligible to receive the
 * interrupt. So, for example, an interrupt could be targeted at CPUs 21, 23,
 * 24, and 25 by using the logical destination ID 0x0001.03a0.
 *
 * Astute readers will have noticed that with only 15 bits of destination
 * ID, logical mode can only address the first cluster (CPUs 0-15), and in fact
 * can't even set the bit for CPU#15 either.
 *
 * So when using this 15-bit MSI format, it is expected that guests will set
 * the Destination Mode bit to zero to use physical addressing mode, where the
 * destination ID in the MSI message is simply the physical APIC ID of the
 * single CPU which is the target of the interrupt. Enlightened operating
 * systems ought to be capable of this for themselves, but hypervisors can
 * give them a helpful nudge by setting bit 19 ("Force APIC physical
 * destination mode") in the Fixed Feature Flags field of the Fixed ACPI
 * Description Table (FADT). A strict reading of the ACPI specification would
 * suggest that this flag is only for xAPIC mode, but both Windows and Linux
 * do honour it in x2APIC mode too.
 *
 *
 * Xen MSI → PIRQ mapping
 * ----------------------
 *
 * All of the above are implementable in real hardware. Actual external PCI
 * devices can perform memory transactions to addresses in the physical
 * address range 0x00000000FEExxxxx, which reach the APIC and cause
 * interrupts to be injected into the relevant CPU.
 *
 * But Xen guests know that they are running in a virtual machine. So they
 * know that the PCI config space is a complete fiction. For example, if they
 * set up a BAR of a given device with a certain address, that is a *guest*
 * physical address. The hypervisor probably doesn't even change anything on
 * the device itself; it just adjusts the EPT page tables to make the
 * corresponding BAR *appear* in the guest physical address space at the
 * desired location.
 *
 * MSI messages in a virtual environment are similarly fictional. If the guest
 * configures an MSI message in a PCI device with a certain vCPU APIC ID and
 * vector, the real hardware wouldn't know what to do with that. (Well, we
 * could design an IOMMU which *could* cope with that, let guests write
 * directly to the PCI devices' MSI tables, and use the resulting MSIs for
 * posted interrupts as a first-class citizen, but nobody's done that.)
 *
 * In practice, what happens is that the hypervisor registers its *own*
 * handler for the hardware interrupt in question (routing it to a given
 * vector on a given *host* CPU, typically handled by VFIO in the KVM case).
 * When that host interrupt handler is triggered, the hypervisor needs to
 * inject an interrupt to the guest vCPU accordingly. From that point, it's
 * just the same as raising an MSI from an *emulated* PCI device. Most
 * hypervisors, including Xen and KVM, do *not* have a mechanism to simply
 * write to guest memory *instead* of injecting an interrupt. So if the guest
 * configured the MSI to target an address outside the 0x00000000FEExxxxx
 * range, it just gets dropped. (Boo, no DPDK polled-mode implementations
 * abusing MSIs for memory writes, in virt guests!)
 *
 * This means that we can abuse the high 32 bits of the address even in a
 * guest-visible way, right? Nothing would ever go wrong...
 *
 * Xen was the first to do this. It needed a way to map MSI from PCI devices
 * to a 'PIRQ', which is a form of Xen paravirtualised interrupt which binds
 * to Xen Event Channels. By using vector#0, Xen guests indicate a special
 * MSI message which is to be routed to a PIRQ. The actual PIRQ# is then in
 * the original Destination ID field... and the high bits of the address.
 *
 * (We'll gloss over the way that Xen snoops on these even while masked, and
 * actually unmasks the MSI when the guests binds to the corresponding PIRQ,
 * because there's only so much pain I can inflict on the reader in one
 * sitting.)
 *
 * AddrHi:  DDDD.DDDD.DDDD.DDDD.DDDD.DDDD.0000.0000
 *                    PIRQ#[31-8]        .  Rsvd
 *
 * AddrLo:  1111.1110.1110.dddd.dddd.0000.0000.xxxx
 *               0xFEE    .PIRQ[7-0].  Rsvd   .Don't Care
 *
 * Data:    xxxx.xxxx.xxxx.xxxx.xxxx.xxxx.0000.0000
 *                  Don't Care           . Vector == 0
 *
 * When Xen attempts to raise such an MSI to the guest, it doesn't inject it
 * via the virtual APIC at all. It is routed to the PIRQ and thus to the Xen
 * event channel mechanism instead.
 *
 *
 * KVM X2APIC MSI API
 * ------------------
 *
 * KVM has an ioctl() for injecting MSI interrupts, and routing table entries
 * which cause MSIs to be injected to the guest when triggered. For
 * convenience, KVM originally just used the Compatibility Format MSI message
 * as its userspace ABI for configuring these. This got less convenient when
 * x2APIC came along and we needed an extra 24 bits for the Destination ID.
 *
 * KVM's solution was to abuse the high 32 bits of the address, If this was a
 * true memory transaction, such a write would miss the APIC completely and
 * scribble over guest memory at an address like 0x00000100FEExxxxx. But in
 * this case it's just an ABI between KVM and userspace, using bits which
 * would otherwise be completely redundant. KVM uses the high 24 bits of the
 * MSI address (bits 40-63) as the high 24 bits of the destination ID.
 *
 * AddrHi:  DDDD.DDDD.DDDD.DDDD.DDDD.DDDD.0000.0000
 *              Destination ID [31-8]    .  Rsvd
 *
 * AddrLo:  1111.1110.1110.dddd.dddd.0000.0000.rmxx
 *               0xFEE    .    ↑    .  Rsvd   .↑↑↑
 *                        DestID[8-0]          ||└-Don't Care
 *                                             |└-Destination Mode
 *                                             └-Redirection Hint
 *
 * Data:    0000.0000.0000.0000.TL00.0DDD.vvvv.vvvv
 *               Reserved      .↑↑     ↑ .  Vector
 *                              ||     └-Delivery Mode
 *                              |└-Trigger Mode Level
 *                              └-Trigger Mode
 *
 * This hack is not visible to a KVM guest. What a KVM guest programs into
 * the MSI descriptors of passthrough or emulated PCI devices is completely
 * different, and (at this point in our tale of woe, at least) never sets
 * the high 32 bits of the target address to anything but zero.
 *
 *
 * IOMMU interrupts
 * ----------------
 *
 * Since an IOMMU is responsible for remapping interrupts so they can reach
 * CPUs with higher APIC IDs, how do we actually configure the events from
 * the IOMMU itself?
 *
 * Intel uses the same format as the KVM x2APIC API (which may in fact have
 * been why KVM did it that way). Since it's never going to be an actual
 * memory transaction, it's safe to abuse the high bits of the address. Intel
 * offers { Data, Address, Upper Address } registers for each type of event
 * that the IOMMU can generate for itself, with the high 24 bits of the
 * destination ID in the high 24 bits of the address as shown above for KVM.
 *
 * AMD's IOMMU uses a completely different 64-bit register format (e.g. XT
 * IOMMU General Interrupt Control Register) which doesn't pretend very hard
 * to look like an MSI at all. But just happens to have the DestMode at bit
 * 2, like in the MSI address. And just happens to have the vector and
 * Delivery Mode (from the low 9 bits of the MSI data) in the low 9 bits of
 * its high word (bits 32-40 of the register). And then just throws the
 * actual destination ID in around them in some other bits:
 *
 * Low32:   dddd.dddd.dddd.dddd.dddd.dddd.xxxx.xmxx
 *             Destination ID [23-0]     . ↑  . ↑↑
 *                                       Don't  |└-Don't Care
 *                                        Care  └-Destination Mode
 *
 * High32:  DDDD.DDDD.xxxx.xxxx.xxxx.xxxD.vvvv.vvvv
 *        DestId[31-24]                 ↑.  Vector
 *                                      └-Delivery Mode
 *
 *
 * Windows, part 1: Intel IOMMU with no DMA translation
 * ----------------------------------------------------
 *
 * As noted above, the 15-bit extension was invented to avoid the need for
 * an IOMMU, because it is undesirable to offer a virtual IOMMU to guests
 * with support for them to do their own additional level of DMA translation.
 *
 * However, although Hyper-V exposes the 15-bit MSI feature, Windows as a
 * guest OS does not use it. In order to support Windows guests with more
 * than 255 vCPUs, a hack was found for the Intel IOMMU. Although there is no
 * official way to advertise that the IOMMU does not support DMA translation,
 * there *are* "Supported Adjusted Guest Address Width" bits which advertise
 * the ability to use 3-level, 4-level, and/or 5-level page tables. If
 * Windows encounters an IOMMU which sets *none* of these bits, Windows will
 * quietly refrain from attempting to use that IOMMU for DMA translation, but
 * will still use it for Interrupt Remapping.
 *
 * However, this only works correctly if Windows is running on an Intel CPU.
 * When Windows runs on an AMD CPU, it will happily configure and use the
 * Intel IOMMU, but misconfigures the MSI messages that it programs into the
 * devices. For I/O APIC interrupts, Windows programs the IRTE in the Intel
 * IOMMU correctly... but then configures the I/O APIC using the AMD format
 * (with the IRTE index where the vector would have been). A hack to the
 * virtual Intel IOMMU emulation can make it cope with this bug... but sadly
 * it *only* works for I/O APIC interrupts. For actual PCI MSI, Windows still
 * configures the device with an AMD-style remappable MSI but *doesn't*
 * actually configure the IRTE in the IOMMU at all. This is probably because
 * Intel's IRT is system-wide, while AMD has one per device; Windows does
 * seem to think it's using a separate IRTE space, so the first MSI vector
 * gets IRTE index 0 which conflicts with I/O APIC pin 0.
 *
 * So for PCI, the hypervisor has no idea where Windows intended a given MSI
 * to be routed, and cannot work around the Windows bugs to support >255 AMD
 * vCPUs this way.
 *
 *
 * Windows, part 2: No IOMMU
 * -------------------------
 *
 * If we do *not* offer an IOMMU to a Windows guest which has CPUs with high
 * APIC IDs, we encounter a *different* Windows bug, which is easier to work
 * around. Windows doesn't use the 15-bit extension described above, but it
 * *does* just throw the high bits of the destination ID into bits 32-55 of
 * the MSI address.
 *
 * (This obviously only works for devices which can generate 64-bit MSIs,
 * which does not include the I/O APIC or HPET. Persuading Windows to set
 * up the I/O APIC when there are CPUs with high APIC IDs is a different
 * issue, and not covered here.)
 *
 * Done without negotiation or discovery of any hypervisor feature, this abuse
 * of high address bits arguably ought to cause the device to write to an
 * address in guest *memory* and miss the APIC at 0x00000000FEExxxxx
 * altogether, but we already admitted almost no hypervisors actually *do*
 * that. (QEMU is the exception here, because for *emulated* PCI devices,
 * pci_msi_trigger() does actually generate true write cycles in the
 * corresponding DMA address space.)
 *
 * We can cope with this Windows bug and even use it to our advantage, by
 * spotting the high bits in the MSI address and using them. It does require
 * that we have an API which is specifically for *MSI*, not to be conflated
 * with actual DMA writes. So QEMU's pci_msi_trigger() would have to do
 * things differently. But let's pretend, for the same of argument, that I'm
 * typing this C-comment essay into a VMM other than QEMU, which already
 * does think that way and has a cleaner separation of emulated-PCI vs. the
 * VFIO or true emulation which can back it, and *does* always handle MSIs
 * explicity.
 *
 * In that case, all the translation function has to do, in addition to
 * invoking all the IOMMU and Xen and 15-bit translators as QEMU's
 * kvm_arch_fixup_msi_route() function already does, is add one more trivial
 * special case. This format is the same as the KVM x2APIC API format, with
 * the top 32 bits of the address shifted by 8 bits:
 *
 * AddrHi:  0000.0000.DDDD.DDDD.DDDD.DDDD.DDDD.DDDD.0000.0000
 *            Rsvd   .         Destination ID bits 8-31
 *
 * AddrLo:  1111.1110.1110.dddd.dddd.0000.0000.rmxx
 *               0xFEE    . Dest ID .  Rsvd   .↑↑↑
 *                                             ||└-Don't Care
 *                                             |└-Destination Mode
 *                                             └-Redirection Hint
 *
 * Data:    0000.0000.0000.0000.TL00.0DDD.vvvv.vvvv
 *               Reserved      .↑↑     ↑ .  Vector
 *                              ||     └-Delivery Mode
 *                              |└-Trigger Mode Level
 *                              └-Trigger Mode
 */