And why don't you explain what exactly you think the nonsense is rather than vio...

OneDeuxTriSeiGo · 2025-06-30T12:16:05 1751285765

This is a better description of what it's doing:

SR-IOV: https://cdrdv2-public.intel.com/321211/pci-sig-sr-iov-primer...

S-IOV: https://cdrdv2-public.intel.com/671403/intel-scalable-io-vir...

What they are doing is "technically" giving direct bus access, however the bus access they are giving is restricted such that the VM's accesses are all tagged and if they access anything outside the bounds they are permitted (as defined by access controls on the hardware during configuration), then you get a fault instead of the VM successfully touching anything.

This is similar to how VT-d and other CPU virt extensions allow direct access to RAM but with permissioning and access control through the IOMMU.

And then the other major component of SR-IOV and S-IOV is that they virtualise the interface on the PCI-E hardware itself (called virtual functions) and all of the context associated, the registers, the BAR, etc. This is akin to how VT-x and similar instructions virtualise the CPU (and registers, etc). And notably these virtual functions can be restricted via access controls, quotas, etc in hardware.

So your existing VT-x extension virtualises the CPU, your existing VT-d extension virtualises the IOMMU and RAM, your existing VT-c virtualises network interfaces (but not PCI-E in general). Now SR-IOV and S-IOV virtualise the PCI-E bus w/ access control over the lanes. And now SR-IOV and S-IOV virtualise the PCI-E device hardware and their functions/interface on the bus (akin to VT-x and VT-d).

Now notably S-IOV should be seen as a "SR-IOV 2.0" rather than an accompanying feature. It essentially moves the virtual function to physical function translation from the CPU or hardware in the chipset directly into the PCI-E device itself.

AshamedCaptain · 2025-06-30T12:35:15 1751286915

I do not understand what results in this confusion.

> What they are doing is "technically" giving direct bus access, however the bus access they are giving is restricted such that the VM's accesses are all tagged

This is exactly what I know and what I said in my original post: a way to identify which VM is accessing what. For... giving that VM access to the hardware.

> and if they access anything outside the bounds they are permitted (as defined by access controls on the hardware during configuration), then you get a fault instead of the VM successfully touching anything.

Again, this is exactly what I said: you are now at the mercy of the hardware manufacturer whether there is any partitioning whatsoever. To think otherwise is wishful thinking that I do not know where it comes from.

This is entirely the definition of giving the VM direct access to the hardware. There is no software-controlled emulation whatsoever going on, so you explicitly lose containment and increase your attack surface.

For everything except the simplest of ethernet cards, your hardware is likely implementing this multiplexing in closed source firmware done by hardware engineers. Very likely the worst type of code ever written security-wise.

> This is similar to how VT-d and other CPU virt extensions allow direct access to RAM but with permissioning and access control through the IOMMU.

Not at all. Usually IOMMU is for constraining hardware that already has direct access to the RAM in the first place.

> And then the other major component of SR-IOV and S-IOV is that they virtualise the interface on the PCI-E hardware itself (called virtual functions)

Is this the source of the confusion? That because it is called virtual you think this virtualized somehow? It is the reason I call it partition because it is much closer to what it is (from a hw point of view).

> your existing VT-x extension virtualises the CPU, your existing VT-d extension virtualises the IOMMU and RAM, your existing VT-c virtualises network interfaces (but not PCI-E in general

This is meaningless because it mixes and matches everything. What does it mean to "virtualize the RAM"? RAM is already virtualized by the normal MMU, no VT-d needed at all. Hardware is the one who may require to also have its RAM access virtualized so its idea of memory matches that of the VM directly accessing hardware (instead of through a software emulation layer), and that is what benefits from an IOMMU (but does not generally require it, see GART and VT-c).

But the entire point of this is again to give the VM direct access to hardware! What is it exactly that you want to refute from this?

OneDeuxTriSeiGo · 2025-06-30T13:24:07 1751289847

> This is exactly what I know and what I said in my original post: a way to identify which VM is accessing what. For... giving that VM access to the hardware.

Yes but the whole point is that it's moving the isolation of the VM's access from software to hardware. Yes you are giving direct access to a subset of hardware but that subset of hardware is configured from outside the VM's access to restrict the VM's access.

> Again, this is exactly what I said: you are now at the mercy of the hardware manufacturer whether there is any partitioning whatsoever. To think otherwise is wishful thinking that I do not know where it comes from.

That's not actually true to my knowledge. S-IOV and SR-IOV require hardware support. Sure the manufacturer can do a shit job at implementing it but both S-IOV and SR-IOV require partitioning. But if you are granting your VMs S-IOV or SR-IOV access to hardware, you are at minimum implicitly trusting that the hardware manufacturer implemented the spec correctly.

> There is no software-controlled emulation whatsoever going on, so you explicitly lose containment and increase your attack surface.

This is true but the same is true of VT-x, VT-d, etc (i.e. the commonplace virtualisation extensions). It is no less true with S-IOV or SR-IOV other than by them being newer and less "battletested". If you use virtualisation extensions you are no longer doing pure software virtualisation anyways.

> For everything except the simplest of ethernet cards, your hardware is likely implementing this multiplexing in closed source firmware done by hardware engineers. Very likely the worst type of code ever written security-wise.

The exact same applies to the microcode and internal firmware on modern CPUs and the associated chipset.

> Not at all. Usually IOMMU is for constraining hardware that already had direct access to the RAM in the first place.

Yes. And VT-d extends this for VMs by introducing hardware level IO, interrupt, and DMA remapping so that the host doesn't need to do software level remapping instead.

> Is this the source of the confusion? That because it is called virtual you think this virtualized somehow? It is the reason I call it partition because it is much closer to what it is (from a hw point of view).

I call it virtualisation because it is virtualisation. In SR-IOV it is still virtualisation but yes it is architecturally similar to partitioning with access controls however that is still virtualisation, it just prevents nesting. With S-IOV however it is full on-hardware virtualisation and supports nesting virtual devices.

> What does it mean to "virtualize the RAM"? RAM is already virtualized by the normal MMU, no VT-d needed at all. Hardware is the one who may require to also have its RAM access virtualized so its idea of memory matches that of the VM directly accessing hardware (instead of through a software emulation layer), and that is what benefits from an IOMMU (but does not generally require it, see GART and VT-c).

Yes I was playing loose with the terminology. Yes RAM is already virtualised (to a certain degree) but VT-d extends that completely and allows arbitrary nesting. And yes VT-d is not required for virtualisation but it is important in accelerating virtualisation by moving it from software virt to hardware virt.

> But the entire point of this is again to give the VM direct access to hardware! What is exactly that you want to refute from this?

I think the disconnect here is that I (and I assume others) are operating under the assumption that giving the VM access to an access controlled and permissioned subset of the hardware through hardware virtualisation extensions/frameworks wouldn't fall under "giving the VM direct access to the hardware" any more than CPU virtualisation extensions do (which are essentially always enabled).

----------

Edit: Oh I should also add in that another commenter was in our comment chain. I just realised they were the one arguing that SR-IOV/S-IOV wouldn't make you at the mercy of the HW manufacturer to implement the isolation and virtualisation functionality correctly. That may help clear up some misunderstanding because I 100% get that you are reliant on the HW manufacturer implementing the feature correctly for it to be secure.

AshamedCaptain · 2025-06-30T13:58:35 1751291915

> Yes but the whole point is that it's moving the isolation of the VM's access from software to hardware. Yes you are giving direct access to a subset of hardware but that subset of hardware is configured from outside the VM's access to restrict the VM's access.

But who is actually gating access to this "subset" (which normally isn't a subset of functionality anyway) ? Answer: the hardware.

Before, it was software who was emulating hardware and implementing whatever checks you wanted. Now, the VM OS is directly accessing the hardware, banging its registers, and you literally depend on the hardware to enforce any kind of isolation between accesses from the VMs.

> This is true but the same is true of VT-x, VT-d, etc (i.e. the commonplace virtualisation extensions). It is no less true with S-IOV or SR-IOV other than by them being newer and less "battletested". ". If you use virtualisation extensions you are no longer doing pure software virtualisation anyways.

No, this is not the correct analogy. Even without VT-x, CPUs since the 386 era are already designed to execute untrusted code. Adding VT-x on it changes a bit the picture but it is almost an irrelevant change in global architecture overall, since the CPU is in any case is directly executing VM guest code (see early virtualizers which did plenty well without VT-x).

Here, you are allowing untrusted code direct access to hardware that has never even imagined the idea of being ever accessed by untrusted software, or even user level code to being with for most it (very few exceptions such as GPUs).

The difference in the size of the security boundary is gigantic, even hard to visualize.

The correct analogy would be to if you were switching from say a JavaScript VM generting native cpu code into directly executing native CPU code directly downloaded from the internet. On a 8086 level CPU with a haphazardly added MMU on top of it. Sure, works on theory. In practice, it will make everyone shiver (and with reason). That is the proper analogy.

The discussion about SRIOV is a red herring because these technologies are about allowing this direct hardware access. It is not that SRIOV is a firewall between the hardware and the VM (or whatever it is that you envision). They are technologies entirely designed to facilitate this direct hardware access, not prevent or constrain it in any way.

OneDeuxTriSeiGo · 2025-07-01T18:45:25 1751395525

> Before, it was software who was emulating hardware and implementing whatever checks you wanted. Now, the VM OS is directly accessing the hardware, banging its registers, and you literally depend on the hardware to enforce any kind of isolation between accesses from the VMs.

This hasn't been true for decades. CPUs have been leaving virtualisation almost entirely to the hardware. For the most part all software was doing was configuring the hardware and injecting a bit of glue here and there unless you were full on emulating another architecture.

> Here, you are allowing untrusted code direct access to hardware that has never even imagined the idea of being ever accessed by untrusted software, or even user level code to being with for most it (very few exceptions such as GPUs).

If the device supports SR-IOV or S-IOV then they had to engineer the product to meet the spec. It's not like this is just a switch being enabled on old hardware. Every device on the stack has to support the standard and therefore is designed to at least attempt to respect the security boundaries those specs impose.

> The correct analogy would be to if you were switching from say a JavaScript VM generting native cpu code into directly executing native CPU code directly downloaded from the internet.

This is exactly what every modern browser does. Chrome's V8 JS engine parses JS and generates V8 bytecode. Then at runtime V8 JIT compiles that bytecode into native machine code and executes that native code on the hardware. That's not interpreting the JS, it's actually compiling the JS into native code running on the CPU (using prediction to make sure the compilation is done before the codepaths are expected to be executed).

> On a 8086 level CPU with a haphazardly added MMU on top of it. Sure, works on theory. In practice, it will make everyone shiver (and with reason). That is the proper analogy.

This also isn't true. Peer to Peer DMA support has been commonplace in consumer PCI-E devices (mainly NVME, network HBAs, and GPUs) for years now and has been available in datacenter, etc for a decade at least.

> On a 8086 level CPU with a haphazardly added MMU on top of it.

Also minor nit but the 80286 (the 3rd gen of 8086 CPUs, released less than 4 years after the original 8086) had an integrated MMU with proper segmentation support. Additionally MMUs long predate the 8086, it just didn't initially include an integrated one because it didn't need to for the market segment it was targeting).

> The discussion about SRIOV is a red herring because these technologies are about allowing this direct hardware access. It is not that SRIOV is a firewall between the hardware and the VM (or whatever it is that you envision). They are technologies entirely designed to facilitate this direct hardware access, not prevent or constrain it in any way.

Again this is just not true. They provide a framework for segmentation of hardware and enforcing isolation of those segments. That is absolutely intended for "preventing and constraining" access to hardware outside of what the host configures.

------

If you can provide some citations of how SR-IOV or S-IOV doesn't do what it claims to, I'm happy to continue this conversation.

AshamedCaptain · 2025-07-01T23:42:31 1751413351

> CPUs have been leaving virtualisation almost entirely to the hardware. For the most part all software was doing was configuring the hardware and injecting a bit of glue here and there unless you were full on emulating another architecture.

That's entirely wrong. VMs do contain a lot of emulator that is still _the primary way of guest OSes_ to access real hardware. "CPU-assisted virtualization" almost changes _nothing_ in the grand schematic. CPUs were executing the guest code before VT-x and are executing guest code afterwards. Your pure software virtualizer contains an entire x86 PC emulator, your "hardware based virtualizer" contains an entire x86 PC emulator. And if anything much more complex than the one included by non VT-x virtualizers because of all the extra virtual hardware they offer guests these days. Did people already forget so much about Popek and Goldberd that they have come to believe some magical properties about "CPU virtualization"?

(before anyone nitpicks, x86 sans VT-x _is_ Popek virtualizable but only for usermode code; non-user mode is a bit more complicated to manage, but still falls short compared to what VMs do in terms of hardware emulation these days).

Even if you are assuming a state-of-the-art virtualizer with hyperdrivers and hyperbuses and whatever... it's still literally the same concept. The VM host is _emulating_ the hardware shown to the guest OS. It just emulates hardware that is much simpler and much more efficient to emulate because it was designed for VMs in mind. And, guess what, you can also apply the same idea to a purely software-based virtualizer to simplify it in the same way, too! (what layman's call paravirtualization).

Obviously if you assume a virtualizer doing passthrough of any kind.... then the VM is directly accessing the hardware... but that is my point! It is now directly accessing hardware that it could not access before.

As a summary: "CPU virtualization" is not even remotely in the same order of magnitude of headache-inducing-paranoia as allowing direct access to hardware is. The CPU running the VM guest's code is kind of an indisputable fact of virtualization at all, hardware-based or software-based. The VM guest's code directly accessing the host hardware ... is simply not.

> If the device supports SR-IOV or S-IOV then they had to engineer the product to meet the spec.

Are you claiming here that A) the spec defines how hardware should internally multiplex itself? (not true) and B) that the spec claims that hardware must be secure , therefore all hardware is secure! (not true and a very strange argument to make, anyway).

In any case, happy to see you are now accepting my thesis that this is about giving VMs direct access to hardware, and that therefore it is now up to the hardware to really enforce this isolation. Or not.

What else is there left to discuss?

> This is exactly what every modern browser does

You quoted my sentence fully so I know you read it yet you totally miss the point again. I said: JavaScript VM generating native CPU code _vs_ directly executing native CPU code directly from the internet.

Your argument summarizes to: "This is what V8 does, which is to generate native CPU code". I know. That's why I put it as the baseline. You have made no counterargument whatsoever.

> This also isn't true. Peer to Peer DMA support has been commonplace in consumer PCI-E devices (mainly NVME, network HBAs, and GPUs) for years now and has been available in datacenter, etc for a decade at least.

I did admit that there is some hardware that is already used to interfacing with more or less user level code (like GPUs), but this is the _exception_ rather than the rule. And even if it is true, it still doesn't contradict my argument which is that : this is still about VMs having direct access to hardware that they didn't have before! No matter how you frame it, it increases the attack surface by an order of magnitude. Even for GPUs, now your GPU also require protection from the guest driver, where before it was the same as the host's.

("peer to peer" DMA commonplace in consumer hardware??? I don't know what you're talking about. DirectStorage developers would like a word with you...)

> Also minor nit but the 80286 (the 3rd gen of 8086 CPUs, released less than 4 years after the original 8086) had an integrated MMU with proper segmentation support.

Which is exactly why I mentioned the 8086, because it has no MMU and no protected mode... so really don't see what's your argument here.

> Again this is just not true. They provide a framework for segmentation of hardware and enforcing isolation of those segments. That is absolutely intended for "preventing and constraining" access to hardware outside of what the host configures.

This is absolutely ridiculous. Do you think that guest VMs can communicate directly with hardware and that SR-IOV is about "preventing and constraining" it?

What virtualization actually is: https://github.com/tpn/pdfs/blob/master/A%20Comparison%20of%....

0points · 2025-07-01T09:42:49 1751362969

> And why don't you explain what exactly you think the nonsense is rather than violating the HN guidelines with a contentless RTFM message?

Violate..? Relax, dude.

You only had to read the very first sentence, but let me paraphrase:

> In virtualization, single root input/output virtualization (SR-IOV) is a specification that allows the isolation of PCI Express resources for manageability and performance reasons.

AshamedCaptain · 2025-07-01T23:24:56 1751412296

This does not imply what you think it implies. SR-IOV is about multiplexing the _bus_ itself, and offering a way for the devices to recognize which VM it is talking with. It cannot even define what devices do with this nor how they do it because _devices are a myriad_.

Counterexample #1: a SRIOV ethernet card that still allows multiple domains (partitions, virtual functions, whatever) to access the same PHY (aka ethernet port). Who is doing the "bridging" here? The PCIe bus? How do you think that even remotely works? Explain to me like a 5 year old, please.

Counterexample #2: a GPU with SRIOV. Each domain can still access a portion of the VRAM from the GPU. How do you think that works, if it is not the GPU itself who is doing the multiplexing? What do you think a PCIe standard even _has anything to do_ with this. How could it even have something to do?

The GPU is not necessarily even exposing its entire VRAM through PCIe at all. At most, it is exposing the registers that allow you to tell how much VRAM to give each partition through a PCIe BAR. And you can tag the one for each partition with a different VF in the same way you could tag them with a different base address or literally ANYTHING.

I do not understand why you (and the sibling guy) seem to think a standard for a _bus_ is even relevant to counter the argument that all of this is for VMs to make direct access to the hardware. You quote a communications standard for this hardware to be accessed by a host with multiple VMs running concurrently. This is, if anything, _even more evidence_ that this is for VMs directly accessing hardware.

Again: I claim that what you're doing here is directly connecting your VMs to the hardware, where before they were not, only through a software emulation layer. You claim that this is not true, and that I couldn't be more wrong, because there is this magical interface that makes the hardware appear as if it actually was several instances. You totally miss the point: if anything, this makes your VMs _easier to directly connect to hardware_, not less.

In fact, the very second sentence:

> SR-IOV is commonly used in conjunction with an SR-IOV enabled hypervisor to provide virtual machines direct hardware access to network resources hence increasing its performance.

And how is SR-IOV hardware going to magically appear as several interfaces, we leave that for the reader as an exercise, because you will not like the response: closed source firmware, likely an order of magnitude less reviewed than even the worst VM hardware emulator you can think of.

Much more efficient, though.

I am sorry, but you are not making the right argument.