I wish that there was a useful “freeze” intrinsic exposed, even if only for primitive types and not for generic user types, where the values of the frozen region become unspecified instead of undefined. I believe llvm has one now?
Iirc the work on safe transmute also involves a sort of “any bit pattern” trait?
I’ve also dealt with pain implementing similar interfaces in Rust, and it really feels like you end up jumping through a ton of hoops (and in some of my cases, hurting performance) all to satisfy the abstract machine, at no benefit to programmer or application. It’s really a case where the abstract machine cart is leading the horse
I agree, I'd go further and say I wonder why primitive types aren't "frozen" by default.
I totally understand not wanting to promise things get zeroed, but I don't really understand why full UB, instead of just "they have whatever value is initially in memory / the register / the compiler chose" is so much better.
Has anyone ever done a performance comparison between UB and freezing I wonder? I can't find one.
That assumes the compiler reserves one continuous place for the value, which isn’t always true (hardly ever true in the case of registers). If the compiler is required to make all code paths result in the same uninitialized value, that can limit code generation options, which might reduce performance (and performance is the whole reason to use uninitialized values!).
Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.
But, I wonder how much it would reduce performance, if we only have to pick a value the first time the memory is read?
I would imagine there isn't that many cases where we are reading uninitalised memory and counting on that reading not saving a value. It would happen when reading in 8-byte blocks for alignment, but does it happen that much elsewhere?
if you pick a value you have to store it, and if you have to store it it might spill into memory when register allocation fails.
Moving from register-only to stack/heap usage easily slows down your program by an order of magnitude or two. If this is in a hot path, which I'd argue it is since using uninitialized values seems senseless otherwise, it might have a big impact.
The only way to really know is to test this. Compilers and their optimizations depend on a lot of things. Even the order and layout of instructions can matter due to the instruction cache. You can always go and make the guarantee later on, but undoing it would be impossible.
Uninitialized memory being UB isn’t an insane default imo (although it makes masked simd hard), nor is most UB. But the lack of escape hatches can be frustrating
only until you get deeper into how the hardware actually work (and OS to some degree)
and realize sometimes the UB is even in the hardware registers
and that the same logical memory address might have 5 different values in hardware at the same time without you having a bug
and other fun like that
so the insanity is reality not the compiler
(through IMHO in C and especially C++ the insanity is how easily you might accidentally run into UB without doing any fancy trickery but just dumb not hot every day code)
I do not find it so easy to accidentally run into UB in C if you follow some basic rules. The exceptions are null pointer dereferences and out-of-bounds accesses for arrays, both can be turned into run-time traps. The rules include no pointer arithmetic, no type casts, and having some ownership strategy. None of those is difficult to implement and where exceptions are made, one should treat it carefully similar to using "unsafe" in Rust.
Nah it makes some sense for portability between architectures. Or at least it did back when C was invented and there were some wild architectures out there.
And it definitely does allow some optimisation. But probably nothing significant on modern out-of-order machines.
> probably nothing significant on modern out-of-order machines.
having no UB at all will kill a lot of optimizations still relevant today (and won't match anymore to hardware as some UB is on hardware level)
out of order machines aren't magically fixing that, just makes some less optimized code work better, but not all
and a lot of low energy/cheap hardware does have no or very very limited out of order capabilities so it's still very relevant and likely will stay very relevant for a very long time
it kills _a lot_ of optimizations leading to problematic perf. degredation
TL;DR: always freezing I/O buffers => yes no issues (in general); freezing all primitives => perf problem
(at lest in practice in theory many might still be possible but with a way higher analysis compute cost (like exponential higher) and potentially needing more high level information (so bad luck C)).
still for I/O buffers of primitive enough types `frozen` is basically always just fine (I also vaguely remember some discussion about some people more involved into rust core development to probably wanting to add some functionality like that, so it might still happen).
To illustrate why frozen I/O buffers are just fin: Some systems do already anyway always (zero or rand) initialize all their I/O buffers. And a lot of systems reuse I/O buffers, they init them once on startup and then just continuously re-use them. And some OS setups do (zero or rand) initialize all OS memory allocations (through that is for the OS granting more memory to your in process memory allocator, not for every lang specific alloc call, and it doesn't remove UB for stack or register values at all (nor for various stations related to heap values either)).
So doing much more "costly" things then just freezing them is pretty much normal for I/O buffers.
Through as mentioned, sometimes things are not frozen undefined on a hardware level (things like every read might return different values). It's a bit of a niche issue you probably won't run into wrt. I/O buffers and I'm not sure how common it is on modern hardware, but still a thing.
But freezing primitives which majorly affect control flows is both making some optimizations impossible and other much harder to compute/check/find, potentially to a point where it's not viable anymore.
This can involve (as in freezing can prevent) some forms of dead code elimination, some forms of inlining+unrolling+const propagation etc.. This is mostly (but not exclusively) for micro optimizations but micro optimizations which sum up and accumulate leading to (potentially but not always) major performance regressions. Frozen also has some subtle interactions with floats and their different NaN values (can be a problem especially wrt. signaling NaNs).
Through I'm wondering if a different C/C++ where arrays of primitives are always treated as frozen (and no signaling NaNs) would have worked just fine without any noticeable perf. drawback. And if so, if rust should adopt this...
>I’ve also dealt with pain implementing similar interfaces in Rust, and it really feels like you end up jumping through a ton of hoops (and in some of my cases, hurting performance) all to satisfy the abstract machine, at no benefit to programmer or application.
I've implemented what TFA calls the "double cursor" design for buffers at $dayjob, ie an underlying (ref-counted) [MaybeUninit<u8>] with two indices to track the filled, initialized and unfilled regions, plus API to split the buffer into two non-overlapping handles, etc. It certainly required wrangling with UnsafeCell in non-trivial ways to make miri happy, but it doesn't have any less performance than the equivalent C code that just dealt with uint8_t* would've had.
It is as easy as it looks to add `freeze`. That is, value-based `freeze`, reference-based `freeze` while seemingly reasonable is broken because of MADV_FREE.
Some people simply aren't comfortable with it.
Currently sound Rust code does not depend on the value of uninitialized memory whatsoever. Adding `freeze` means that it can.
A vulnerability similar to heartbleed to expose secrets from free'd memory is impossible in sound Rust code without `freeze`, but theoretically possible with `freeze`.
Whether you consider this a realistic issue or not likely determines your stance on `freeze`. I personally don't think it's a big deal and have several algorithms which are fundamentally being slowed down by the lack of `freeze`, so I'd love it if we added it.
also to be realistic in a lot of practical situations I/O buffers are reused so at least for the I/O buffer use-case it can be very viable (perf wise, in most use cases) to just zero or rand initialize it once on alloc and then treat it as frozen in all repeated usages (through it does open the issue of bug now potentially leaking previous content of the I/O buffer).
Write into uninit'd buffers was one of the pain points of Rust for the creator of the new open source "edit" program for Windows[1]. I wonder what he thinks of this article.
> Another thing is the difficulty of using uninitialized data in Rust. I do understand that this involves an attribute in clang which can then perform quite drastic optimizations based on it, but this makes my life as a programmer kind of difficult at times. When it comes to `MaybeUninit`, or the previous `mem::uninit()`, I feel like the complexity of compiler engineering is leaking into the programming language itself and I'd like to be shielded from that if possible. At the end of the day, what I'd love to do is declare an array in Rust, assign it no value, `read()` into it, and magically reading from said array is safe. That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does.
[https://news.ycombinator.com/item?id=44036021]
Abstracting away the `assume_init` is a great idea! I think I could use something like that for the editor. The only concern I have is that the `read` function is templated on the parameter type. I'd ideally _really_ prefer it if I didn't need two copies of the same function to switch over `[u8]` and `[MaybeUninit<u8>]` due to different return types. [^1] I guess the approach could be tuned to avoid this?
Personally, I also like the simpler approach overall, compared to the `BorrowedBuf` trait, for the same reasons outlined in the article.
While this possibly solves parts of pain points that I had, what I meant to write is that in an ideal world I could write Rust while mostly not thinking about this issue much, if at all. Even with this approach, I'd still need to decide whether my API needs to take a `[u8]` or a `Buffer`, just in the mere off-chance that a caller may want to pass an uninitialized array further up in the call chain. This then requires making the call path generic for the buffer parameter which may end up duplicating any of the functions along the path, even though that's not really my intention by marking it as `Buffer`.
I think if there was a way to modify Rust so we can boldly state in writing "You may cast a `[MaybeUninit<T>]` into a `[T]` and pass it into a call _if_ you're absolutely certain that nothing reads from the slice", it would already go a long way. It may not make this more comfortable yet, but it would definitely take off a large part of my worries when writing such unsafe casts. That's basically what I meant with "occupy my mind": It's not that I wouldn't think about it at all, rather it just wouldn't be a larger concern for me anymore, for code where I know for sure that this requirement is fulfilled (i.e. similar to how I know it when writing equivalent C code).
[^1]: This is of course not a problem for a simple `read` syscall, but may be an issue for more complex functions, e.g. the UTF8 <> UTF16 converter API I suggested elsewhere in this thread, particularly if it's accelerated, the way simdutf is.
The basic problem with uninitialized buffers is that they effectively require write-only references to exist, and Rust's type system doesn't have (and doesn't easily support) write-only references, only read-only and read-write. MaybeUninit is a partial solution to the problem, but since it's a library solution and not a language solution, it suffers from a lack of integration with the language, e.g., getting MaybeUninit fields from a MaybeUninit struct is challenging.
And the most aggravating part of all of this is that the most common use case for uninitialized memory (the scenario being talked about both in the article here and the discussion you quote) is actually pretty easy to have a reasonable, safe abstraction for, so the fact that the current options requires both use of unsafe code and also potentially faulty duplication of value calculations doesn't make for a fun experience. (Also, the I/O traits predate MaybeUninit, which means the most common place to want to work with uninitialized memory is one where you can't do it properly.)
> That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does.
UB doesn’t occupy the author’s mind when writing C, when it really should. This kind of lazy attitude to memory safety is precisely why so much C code is notoriously riddled with memory bugs and security vulnerabilities.
There is an important difference for this case though. It C it’s fine to have pointers into uninitialized memory as you as you don’t read them until after initializing. You can write through those pointers the same way you always do. In Rust it’s UB as soon as you “produce” an invalid value, which includes references to uninitialized memory. Everything uses references in Rust but when dealing with uninitialized memory you have to scrupulously avoid them, and instead write through raw pointers. This means you can’t reuse any code that writes through &mut. Also, the rules change over time. At one point I had unsafe code that had a Vec of uninitialized elements, which was ok because I never produced a reference to any element until after I had written them (through raw pointers). But they later changed the Vec docs to say that’s UB, I guess because they want to reserve the right to use references even if you never call a method that returns a reference.
This stopped being much of a problem when MaybeUninit was stabilized. Now you can stick to using &MaybeUninit<T> / &mut MaybeUninit<T> instead of needing to juggle *T / *mut T and carefully track converting that to &T / &mut T only when it's known to be initialized, and you can't accidentally use a MaybeUninit<T> where you meant to use a T because the types are different.
It's not as painless as it could be though, because many of the MaybeUninit<T> -> T conversion fns are unstable. Eg the code in TFA needs `&mut [MaybeUninit<T>] -> &mut [T]` but `[T]::assume_init_mut()` is unstable. But reimplementing them is just a matter of copying the libstd impl, that in turn is usually just a straightforward reinterpret-cast one-liner.
I don’t get the difference. In both C and Rust you can have pointers to uninitialized memory. In both languages, you can’t use them except in very specific circumstances (which are AFAIK identical).
There are two actual differences in this regard: C pointers are more ergonomic than Rust pointers. And Rust has an additional feature called references, which enable a lot more aggressive compiler optimizations, but which have the restriction that you can’t have a reference to uninitialized memory.
That's right. Line 3 is undefined behaviour because you are creating mutable references to the uninit spare capacity of the vec. copy_to_slice only works with writing to initialized slices. The proper way for you example to mess with the uninitialized memory on a vec would be only use raw pointers or calling the newly added Vec::spare_capacity_mut function on the vec that returns a slice of MaybeUninit
Yes, this is the case that I ran into as well. You have to zero memory before reading and/or have some crazy combination of tracking what’s uninitialized capacity or initialized len, I think the rust stdlib write trait for &mut Vec got butchered over this concern.
It’s strictly more complicated and slower than the obvious thing to do and only exists to satisfy the abstract machine.
No. The correct way to write that code is to use .spare_capacity_mut() to get a &mut [MaybeUninit<T>], then write your Ts into that using .write_copy_of_slice(), then .set_len(). And that will not be any slower (though obviously more complicated) than the original incorrect code.
As I wrote in https://news.ycombinator.com/item?id=44048391 , you have to get used to copying the libstd impl when working with MaybeUninit. For my code I put a "TODO(rustup)" comment on such copies, to remind myself to revisit them every time I update the Rust version in toolchain.toml
I suspect that the main reason it doesn't really occupy the author's mind is that even though it's possible to misuse read(), it's really not that hard to actually use it safely.
It sounds like the more difficult problem here has to do with explaining to the compiler that read() is not being used unsafely.
This function now works with both initialized and uninitialized data in practice. It also is transparent over whether the output buffer is an `u8` (a byte buffer to write it out into a `File`) or `u16` (a buffer for then using the UTF16). I've never had to think about whether this doesn't work (in this particular context; let's ignore any alignment concerns for writes into `out` in this example) and I don't recall running into any issues writing such code in a long long time.
If I write the equivalent code in Rust I may write
The problem is now obvious to me, but at least my intention is clear: "Come here! Give me your uninitialized arrays! I don't care!". But this is not the end of the problem, because writing this code is theoretically unsafe. If you have a `[u8]` slice for `out` you have to convert it to `[MaybeUninit<u8>]`, but then the function could theoretically write uninitialized data and that's UB isn't it? So now I have to think about this problem and write this instead:
...and that will also be unsafe, because now I have to convert my actual `[MaybeUninit<u8>]` buffer (for file writes) to `[u8]` for calls to this API.
Long story short, this is a problem that occupies my mind when writing in Rust, but not in C. That doesn't mean that C's many unsafeties don't worry me, it just means that this _particular_ problem type described above doesn't come up as an issue in C code that I write.
The reason this particular UB doesn't need mindspace for C programmers is because it's not even meaningful to do anything with the parts of the buffer beyond the written length.
Most other UBs related to datums that you think you can do something with.
I think this solves his problem. He said he wants a read function that turns the unsafe buffer into a safe buffer, and this API does that.
IIRC it's not that hard to convince the compiler to give you a safe buffer from a MaybeUninit. However, this type has really lengthy docs and makes you question everything you do with it. Thinking through all this is painful but it's not like you don't have to it with C.
> without doing anything hugely inefficient, such as initializing the full buffer
Is this so inefficient? If your code is very sensitive to IO throughput, then it seems preferable to re-use buffers and pay the initialization once at startup.
Some years ago, I needed a buffer like this and one didn't exist, so I wrote one: https://crates.io/crates/fixed-buffer . I like that it's a plain struct with no type parameters.
It can be. If you have large buffers (tuned for throughput) that end up fulfilling lots of small requests for whatever reason, for example. And there's always the occasional article when someone rediscovers that replacing malloc + memset with calloc can have massive performance savings thanks to zeroing by the OS only occuring on first page fault (if it ever occurs), instead of an O(N) operation on the whole buffer up front.
Which, if in the wrong loop, can quickly balloon from O(N) to O(scary).
If I'm reading that log-log plot right, that looks like a significantly worse than 100x slowdown on 1GB data sets. Avoiding init isn't the only solution, of course, but it was a solution.
> then it seems preferable to re-use buffers
Buffer reuse may be an option, but in code with complicated buffer ownership (e.g. transfering between threads, with the thread of origination not necessarily sticking around, etc.), one of the sanest methods of re-use may be to return said buffer to the allocator, or even OS.
> and pay the initialization once at startup.
Possibly a great option for long lived processes, possibly a terrible one for something you spawn via xargs.
Yeah but the C toolchain is a huge pain, and makes things like cross-compiling and compiling to WASM harder. It's really nice if you can keep your program pure Rust.
It does, though even for a small segment you end up needing a lot of boilerplate - a dependency on the cc crate, a build.rs that invokes the cc crate, extern declarations to tell the Rust caller about the C function, potentially some bindgen if your small segment is not so small ... and you still end up having to do some amount of thunking between & / * / MaybeUninit because of that anyway. So if there is a "pure Rust" way to do it with `unsafe`, writing that is often easier. The pure Rust impl also has the advantage that you can validate it with Miri, unlike the C impl case because Miri cannot emulate arbitrary C code.
Pardon my ignorance, but I thought the whole point of Rust was to be a 'safe' modern alternative to C, so all new buffers would be zero'd at a neglible-these-days cost. Why is rust half-assing this?
Rust is being used and is designed to be able to be used everywhere from top of the line PCs, to servers to microcontrollers to virtual machines in the browser.
Not all tradeoffs are acceptable to everyone all of the time
It’s not. That is the case. But in cases where “negligible-these-days” isn’t quite negligible enough, this still matters and unsafe + MaybeUninit is the escape hatch to accomplish it.
Other languages can easily achieve this kind of safety. What makes Rust different is that it tries to provide this level of safety without doing extra work at runtime (because otherwise people will put it in the same pile as Java/C#, and continue using C/C++ for speed).
I’d love to know the actual performance impact of zeroing out memory. I bet the performance cost of zeroing out memory ahead of time is negligible in almost all programs.
YMMV on different operating systems. Of course this is a program only an idiot would write, but things like caches are often significantly bigger than the median case, especially on Linux where you know there is overcommit.
A non-idiot would use calloc(DEFINITELY_BIG_ENOUGH), and that will likely erase the difference because the impl will be able to rely on mmap(ANONYMOUS) creating zero pages for such a large allocation. A more realistic test would be to have a large number of small allocations that get calloc'd and then free'd repeatedly, because a) free'ing a small allocation will not free the underlying page and thus reallocation will not be able to rely on it already being zero, and b) zeroing small allocations doesn't amortize the cost of zeroing as well as zeroing large allocations does.
Surely the point of Rust is 'safety at the price of performance' and if extra performance is required, don't use Rust. Don't bodge the language to accommodate!
The point of rust is explicitly not 'safety at the price of performance', quite the opposite. The whole point was to create language where safety doesn't cost performance like it does in most other languages.
`unsafe` is just a barrier between what compiler proves and what programmer proves.
People should really read more on safety semantics in Rust before making comments like this, it's quite annoying to bump into surface level misunderstandings everytime Rust is mentioned somewhere.
Related to unspecified vs undefined. I recall some C code was trying to be tricky and read from just allocated memory. Something like:
int* ptr = malloc(size);
if(ptr[offset] == 0)
{
}
The code was assuming that the value in an allocated buffer did not change.
However, it was pointed out in review that it could change with these steps:
1) The malloc allocates from a new memory page. This page is often not mapped to a physical page until written to.
2) The reads just return the default (often 0 value) as the page is not mapped.
3) Another allocation is made that is written to the same page. This maps the page to physical memory which then changes the value of the original allocation.
A read from an unmapped page producing a different value than reading from that same page after it's mapped is an OS bug (*). If this was an already allocated page that had something written to it, reading from it would page it back in and then produce the actual content. If this was a new page and the OS contract was to provide zeroed pages, both the read before it was mapped and the read after it was mapped would produce zero.
What could happen is that the UB in that code could result in it being compiled in a way that makes the comparison non-deterministic.
(*): ... or alternatively, we're not talking about regular userspace program but a higher privilege layer that is doing direct unpaged access, but I assume that's not the case since you're talking about malloc.
The closest thing to "conditionally returned to the kernel" is if the page had been given to madvise(MADV_FREE), but that would still not have the behavior they're talking about. Reading and writing would still produce the same content, either the original page content because the kernel hasn't released the page yet, or zero because the kernel has already released the page. Even if the order of operations is read -> kernel frees -> write, then that still doesn't match their story, because the read will produce the original page content, not zero.
That said, the code they're talking about is different from yours in that their code is specifically doing an out-of-bounds read. (They said "If you happen to allocate a string that's 128 bytes, and malloc happens to return an address to you that's 128 bytes away from the end of the page, you'll write the 128 bytes and the null terminator will be the first byte on the next page. So they're very clearly talking about the \0 being outside the allocation.)
So it is absolutely possible to have this setup: the string's allocation happens to be followed by a different allocation that is currently 0 -> the `data[size()] != '\0'` check is performed and succeeds -> `data` is returned to the caller -> whoever owns that following allocation writes a non-zero value to the first byte -> whoever called `c_str()` will now run off the end of the 128B string. This doesn't have anything to do with pages; it can happen within the bounds of a single page. It is also such an obvious out-of-bounds bug that it boggles my mind that it passed any sort of code review and required some sort of graybeard to point out.
I wish that there was a useful “freeze” intrinsic exposed, even if only for primitive types and not for generic user types, where the values of the frozen region become unspecified instead of undefined. I believe llvm has one now?
Iirc the work on safe transmute also involves a sort of “any bit pattern” trait?
I’ve also dealt with pain implementing similar interfaces in Rust, and it really feels like you end up jumping through a ton of hoops (and in some of my cases, hurting performance) all to satisfy the abstract machine, at no benefit to programmer or application. It’s really a case where the abstract machine cart is leading the horse
I agree, I'd go further and say I wonder why primitive types aren't "frozen" by default.
I totally understand not wanting to promise things get zeroed, but I don't really understand why full UB, instead of just "they have whatever value is initially in memory / the register / the compiler chose" is so much better.
Has anyone ever done a performance comparison between UB and freezing I wonder? I can't find one.
That assumes the compiler reserves one continuous place for the value, which isn’t always true (hardly ever true in the case of registers). If the compiler is required to make all code paths result in the same uninitialized value, that can limit code generation options, which might reduce performance (and performance is the whole reason to use uninitialized values!).
Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.
> same uninitialized value, that can limit code generation options
it pretty much requires the compiler to initialize all values when they first "appear"
except that this is impossible and outright hazardous if pointers are involved
But doable for a small subset like e.g.
- stack values (but would inhibit optimizations, potentially pretty badly)
- some allocations e.g. I/O buffers, (except C alloc has no idea that you are allocating an I/O buffer)
But, I wonder how much it would reduce performance, if we only have to pick a value the first time the memory is read?
I would imagine there isn't that many cases where we are reading uninitalised memory and counting on that reading not saving a value. It would happen when reading in 8-byte blocks for alignment, but does it happen that much elsewhere?
if you pick a value you have to store it, and if you have to store it it might spill into memory when register allocation fails. Moving from register-only to stack/heap usage easily slows down your program by an order of magnitude or two. If this is in a hot path, which I'd argue it is since using uninitialized values seems senseless otherwise, it might have a big impact.
The only way to really know is to test this. Compilers and their optimizations depend on a lot of things. Even the order and layout of instructions can matter due to the instruction cache. You can always go and make the guarantee later on, but undoing it would be impossible.
Uninitialized memory being UB isn’t an insane default imo (although it makes masked simd hard), nor is most UB. But the lack of escape hatches can be frustrating
Anything being UB is insane to me...
only until you get deeper into how the hardware actually work (and OS to some degree)
and realize sometimes the UB is even in the hardware registers
and that the same logical memory address might have 5 different values in hardware at the same time without you having a bug
and other fun like that
so the insanity is reality not the compiler
(through IMHO in C and especially C++ the insanity is how easily you might accidentally run into UB without doing any fancy trickery but just dumb not hot every day code)
I do not find it so easy to accidentally run into UB in C if you follow some basic rules. The exceptions are null pointer dereferences and out-of-bounds accesses for arrays, both can be turned into run-time traps. The rules include no pointer arithmetic, no type casts, and having some ownership strategy. None of those is difficult to implement and where exceptions are made, one should treat it carefully similar to using "unsafe" in Rust.
Nah it makes some sense for portability between architectures. Or at least it did back when C was invented and there were some wild architectures out there.
And it definitely does allow some optimisation. But probably nothing significant on modern out-of-order machines.
> there were some wild architectures out there.
what is out there is still pretty wield
just slightly less
> probably nothing significant on modern out-of-order machines.
having no UB at all will kill a lot of optimizations still relevant today (and won't match anymore to hardware as some UB is on hardware level)
out of order machines aren't magically fixing that, just makes some less optimized code work better, but not all
and a lot of low energy/cheap hardware does have no or very very limited out of order capabilities so it's still very relevant and likely will stay very relevant for a very long time
> why primitive types aren't "frozen" by default.
it kills _a lot_ of optimizations leading to problematic perf. degredation
TL;DR: always freezing I/O buffers => yes no issues (in general); freezing all primitives => perf problem
(at lest in practice in theory many might still be possible but with a way higher analysis compute cost (like exponential higher) and potentially needing more high level information (so bad luck C)).
still for I/O buffers of primitive enough types `frozen` is basically always just fine (I also vaguely remember some discussion about some people more involved into rust core development to probably wanting to add some functionality like that, so it might still happen).
To illustrate why frozen I/O buffers are just fin: Some systems do already anyway always (zero or rand) initialize all their I/O buffers. And a lot of systems reuse I/O buffers, they init them once on startup and then just continuously re-use them. And some OS setups do (zero or rand) initialize all OS memory allocations (through that is for the OS granting more memory to your in process memory allocator, not for every lang specific alloc call, and it doesn't remove UB for stack or register values at all (nor for various stations related to heap values either)).
So doing much more "costly" things then just freezing them is pretty much normal for I/O buffers.
Through as mentioned, sometimes things are not frozen undefined on a hardware level (things like every read might return different values). It's a bit of a niche issue you probably won't run into wrt. I/O buffers and I'm not sure how common it is on modern hardware, but still a thing.
But freezing primitives which majorly affect control flows is both making some optimizations impossible and other much harder to compute/check/find, potentially to a point where it's not viable anymore.
This can involve (as in freezing can prevent) some forms of dead code elimination, some forms of inlining+unrolling+const propagation etc.. This is mostly (but not exclusively) for micro optimizations but micro optimizations which sum up and accumulate leading to (potentially but not always) major performance regressions. Frozen also has some subtle interactions with floats and their different NaN values (can be a problem especially wrt. signaling NaNs).
Through I'm wondering if a different C/C++ where arrays of primitives are always treated as frozen (and no signaling NaNs) would have worked just fine without any noticeable perf. drawback. And if so, if rust should adopt this...
>I’ve also dealt with pain implementing similar interfaces in Rust, and it really feels like you end up jumping through a ton of hoops (and in some of my cases, hurting performance) all to satisfy the abstract machine, at no benefit to programmer or application.
I've implemented what TFA calls the "double cursor" design for buffers at $dayjob, ie an underlying (ref-counted) [MaybeUninit<u8>] with two indices to track the filled, initialized and unfilled regions, plus API to split the buffer into two non-overlapping handles, etc. It certainly required wrangling with UnsafeCell in non-trivial ways to make miri happy, but it doesn't have any less performance than the equivalent C code that just dealt with uint8_t* would've had.
This isn't just about the abstract machine. This is also about making it hard to end up using uninitialized memory, which is a security hole.
Abstractions like ReadBuf allow safe code to efficiently work with uninitialized buffers without risking exposure of random memory contents.
This is already discussed for Rust: https://github.com/rust-lang/rfcs/pull/3605. TL;DR: it's not as easy as it looks to just add "freeze."
It is as easy as it looks to add `freeze`. That is, value-based `freeze`, reference-based `freeze` while seemingly reasonable is broken because of MADV_FREE.
Some people simply aren't comfortable with it.
Currently sound Rust code does not depend on the value of uninitialized memory whatsoever. Adding `freeze` means that it can. A vulnerability similar to heartbleed to expose secrets from free'd memory is impossible in sound Rust code without `freeze`, but theoretically possible with `freeze`.
Whether you consider this a realistic issue or not likely determines your stance on `freeze`. I personally don't think it's a big deal and have several algorithms which are fundamentally being slowed down by the lack of `freeze`, so I'd love it if we added it.
also to be realistic in a lot of practical situations I/O buffers are reused so at least for the I/O buffer use-case it can be very viable (perf wise, in most use cases) to just zero or rand initialize it once on alloc and then treat it as frozen in all repeated usages (through it does open the issue of bug now potentially leaking previous content of the I/O buffer).
but I guess this isn't just about I/O buffers ;)
Write into uninit'd buffers was one of the pain points of Rust for the creator of the new open source "edit" program for Windows[1]. I wonder what he thinks of this article.
> Another thing is the difficulty of using uninitialized data in Rust. I do understand that this involves an attribute in clang which can then perform quite drastic optimizations based on it, but this makes my life as a programmer kind of difficult at times. When it comes to `MaybeUninit`, or the previous `mem::uninit()`, I feel like the complexity of compiler engineering is leaking into the programming language itself and I'd like to be shielded from that if possible. At the end of the day, what I'd love to do is declare an array in Rust, assign it no value, `read()` into it, and magically reading from said array is safe. That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does. [https://news.ycombinator.com/item?id=44036021]
Abstracting away the `assume_init` is a great idea! I think I could use something like that for the editor. The only concern I have is that the `read` function is templated on the parameter type. I'd ideally _really_ prefer it if I didn't need two copies of the same function to switch over `[u8]` and `[MaybeUninit<u8>]` due to different return types. [^1] I guess the approach could be tuned to avoid this?
Personally, I also like the simpler approach overall, compared to the `BorrowedBuf` trait, for the same reasons outlined in the article.
While this possibly solves parts of pain points that I had, what I meant to write is that in an ideal world I could write Rust while mostly not thinking about this issue much, if at all. Even with this approach, I'd still need to decide whether my API needs to take a `[u8]` or a `Buffer`, just in the mere off-chance that a caller may want to pass an uninitialized array further up in the call chain. This then requires making the call path generic for the buffer parameter which may end up duplicating any of the functions along the path, even though that's not really my intention by marking it as `Buffer`.
I think if there was a way to modify Rust so we can boldly state in writing "You may cast a `[MaybeUninit<T>]` into a `[T]` and pass it into a call _if_ you're absolutely certain that nothing reads from the slice", it would already go a long way. It may not make this more comfortable yet, but it would definitely take off a large part of my worries when writing such unsafe casts. That's basically what I meant with "occupy my mind": It's not that I wouldn't think about it at all, rather it just wouldn't be a larger concern for me anymore, for code where I know for sure that this requirement is fulfilled (i.e. similar to how I know it when writing equivalent C code).
Edit: jcranmer's suggestion of write-only references would solve this, I think? https://news.ycombinator.com/item?id=44048450
[^1]: This is of course not a problem for a simple `read` syscall, but may be an issue for more complex functions, e.g. the UTF8 <> UTF16 converter API I suggested elsewhere in this thread, particularly if it's accelerated, the way simdutf is.
The basic problem with uninitialized buffers is that they effectively require write-only references to exist, and Rust's type system doesn't have (and doesn't easily support) write-only references, only read-only and read-write. MaybeUninit is a partial solution to the problem, but since it's a library solution and not a language solution, it suffers from a lack of integration with the language, e.g., getting MaybeUninit fields from a MaybeUninit struct is challenging.
And the most aggravating part of all of this is that the most common use case for uninitialized memory (the scenario being talked about both in the article here and the discussion you quote) is actually pretty easy to have a reasonable, safe abstraction for, so the fact that the current options requires both use of unsafe code and also potentially faulty duplication of value calculations doesn't make for a fun experience. (Also, the I/O traits predate MaybeUninit, which means the most common place to want to work with uninitialized memory is one where you can't do it properly.)
> That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does.
UB doesn’t occupy the author’s mind when writing C, when it really should. This kind of lazy attitude to memory safety is precisely why so much C code is notoriously riddled with memory bugs and security vulnerabilities.
There is an important difference for this case though. It C it’s fine to have pointers into uninitialized memory as you as you don’t read them until after initializing. You can write through those pointers the same way you always do. In Rust it’s UB as soon as you “produce” an invalid value, which includes references to uninitialized memory. Everything uses references in Rust but when dealing with uninitialized memory you have to scrupulously avoid them, and instead write through raw pointers. This means you can’t reuse any code that writes through &mut. Also, the rules change over time. At one point I had unsafe code that had a Vec of uninitialized elements, which was ok because I never produced a reference to any element until after I had written them (through raw pointers). But they later changed the Vec docs to say that’s UB, I guess because they want to reserve the right to use references even if you never call a method that returns a reference.
This stopped being much of a problem when MaybeUninit was stabilized. Now you can stick to using &MaybeUninit<T> / &mut MaybeUninit<T> instead of needing to juggle *T / *mut T and carefully track converting that to &T / &mut T only when it's known to be initialized, and you can't accidentally use a MaybeUninit<T> where you meant to use a T because the types are different.
It's not as painless as it could be though, because many of the MaybeUninit<T> -> T conversion fns are unstable. Eg the code in TFA needs `&mut [MaybeUninit<T>] -> &mut [T]` but `[T]::assume_init_mut()` is unstable. But reimplementing them is just a matter of copying the libstd impl, that in turn is usually just a straightforward reinterpret-cast one-liner.
I don’t get the difference. In both C and Rust you can have pointers to uninitialized memory. In both languages, you can’t use them except in very specific circumstances (which are AFAIK identical).
There are two actual differences in this regard: C pointers are more ergonomic than Rust pointers. And Rust has an additional feature called references, which enable a lot more aggressive compiler optimizations, but which have the restriction that you can’t have a reference to uninitialized memory.
Bizarre. I think I've been writing broken Rust code for a couple years. If I understand you correctly something like:
is UB?That's right. Line 3 is undefined behaviour because you are creating mutable references to the uninit spare capacity of the vec. copy_to_slice only works with writing to initialized slices. The proper way for you example to mess with the uninitialized memory on a vec would be only use raw pointers or calling the newly added Vec::spare_capacity_mut function on the vec that returns a slice of MaybeUninit
Yes, this is the case that I ran into as well. You have to zero memory before reading and/or have some crazy combination of tracking what’s uninitialized capacity or initialized len, I think the rust stdlib write trait for &mut Vec got butchered over this concern.
It’s strictly more complicated and slower than the obvious thing to do and only exists to satisfy the abstract machine.
No. The correct way to write that code is to use .spare_capacity_mut() to get a &mut [MaybeUninit<T>], then write your Ts into that using .write_copy_of_slice(), then .set_len(). And that will not be any slower (though obviously more complicated) than the original incorrect code.
Oh this is very nice, I think it was stabilized since I wrote said code.
write_copy_of_slice doesn't look to be stable. I'll mess around with godbolt, but my hope that whatever incantation is used compiles down to a memcpy
As I wrote in https://news.ycombinator.com/item?id=44048391 , you have to get used to copying the libstd impl when working with MaybeUninit. For my code I put a "TODO(rustup)" comment on such copies, to remind myself to revisit them every time I update the Rust version in toolchain.toml
In other words the """safe""" stable code looks like this:
That's correct.
Valgrind it :)
It is also not UB to read uninitialized values through a pointer in C for types that do not have non-value representations.
I suspect that the main reason it doesn't really occupy the author's mind is that even though it's possible to misuse read(), it's really not that hard to actually use it safely.
It sounds like the more difficult problem here has to do with explaining to the compiler that read() is not being used unsafely.
What I meant is that if I write a UTF8 --> UTF16 conversion function for my editor in C I can write
This function now works with both initialized and uninitialized data in practice. It also is transparent over whether the output buffer is an `u8` (a byte buffer to write it out into a `File`) or `u16` (a buffer for then using the UTF16). I've never had to think about whether this doesn't work (in this particular context; let's ignore any alignment concerns for writes into `out` in this example) and I don't recall running into any issues writing such code in a long long time.If I write the equivalent code in Rust I may write
The problem is now obvious to me, but at least my intention is clear: "Come here! Give me your uninitialized arrays! I don't care!". But this is not the end of the problem, because writing this code is theoretically unsafe. If you have a `[u8]` slice for `out` you have to convert it to `[MaybeUninit<u8>]`, but then the function could theoretically write uninitialized data and that's UB isn't it? So now I have to think about this problem and write this instead: ...and that will also be unsafe, because now I have to convert my actual `[MaybeUninit<u8>]` buffer (for file writes) to `[u8]` for calls to this API.Long story short, this is a problem that occupies my mind when writing in Rust, but not in C. That doesn't mean that C's many unsafeties don't worry me, it just means that this _particular_ problem type described above doesn't come up as an issue in C code that I write.
The reason this particular UB doesn't need mindspace for C programmers is because it's not even meaningful to do anything with the parts of the buffer beyond the written length.
Most other UBs related to datums that you think you can do something with.
I think this solves his problem. He said he wants a read function that turns the unsafe buffer into a safe buffer, and this API does that.
IIRC it's not that hard to convince the compiler to give you a safe buffer from a MaybeUninit. However, this type has really lengthy docs and makes you question everything you do with it. Thinking through all this is painful but it's not like you don't have to it with C.
> without doing anything hugely inefficient, such as initializing the full buffer
Is this so inefficient? If your code is very sensitive to IO throughput, then it seems preferable to re-use buffers and pay the initialization once at startup.
Some years ago, I needed a buffer like this and one didn't exist, so I wrote one: https://crates.io/crates/fixed-buffer . I like that it's a plain struct with no type parameters.
> Is this so inefficient?
It can be. If you have large buffers (tuned for throughput) that end up fulfilling lots of small requests for whatever reason, for example. And there's always the occasional article when someone rediscovers that replacing malloc + memset with calloc can have massive performance savings thanks to zeroing by the OS only occuring on first page fault (if it ever occurs), instead of an O(N) operation on the whole buffer up front.
Which, if in the wrong loop, can quickly balloon from O(N) to O(scary).
https://github.com/PSeitz/lz4_flex/issues/147
https://github.com/rust-lang/rust/issues/117545
If I'm reading that log-log plot right, that looks like a significantly worse than 100x slowdown on 1GB data sets. Avoiding init isn't the only solution, of course, but it was a solution.
> then it seems preferable to re-use buffers
Buffer reuse may be an option, but in code with complicated buffer ownership (e.g. transfering between threads, with the thread of origination not necessarily sticking around, etc.), one of the sanest methods of re-use may be to return said buffer to the allocator, or even OS.
> and pay the initialization once at startup.
Possibly a great option for long lived processes, possibly a terrible one for something you spawn via xargs.
Just dropping to C for a smallish segment of a rust program kind of makes sense if you want to eke out performance here, no?
Yeah but the C toolchain is a huge pain, and makes things like cross-compiling and compiling to WASM harder. It's really nice if you can keep your program pure Rust.
Go has similar characteristics.
It does, though even for a small segment you end up needing a lot of boilerplate - a dependency on the cc crate, a build.rs that invokes the cc crate, extern declarations to tell the Rust caller about the C function, potentially some bindgen if your small segment is not so small ... and you still end up having to do some amount of thunking between & / * / MaybeUninit because of that anyway. So if there is a "pure Rust" way to do it with `unsafe`, writing that is often easier. The pure Rust impl also has the advantage that you can validate it with Miri, unlike the C impl case because Miri cannot emulate arbitrary C code.
You can always just use unsafe. This is about how to allow code to do this without unsafe blocks.
Pardon my ignorance, but I thought the whole point of Rust was to be a 'safe' modern alternative to C, so all new buffers would be zero'd at a neglible-these-days cost. Why is rust half-assing this?
The cost might not be negligible for everyone?
Rust is being used and is designed to be able to be used everywhere from top of the line PCs, to servers to microcontrollers to virtual machines in the browser.
Not all tradeoffs are acceptable to everyone all of the time
It’s not. That is the case. But in cases where “negligible-these-days” isn’t quite negligible enough, this still matters and unsafe + MaybeUninit is the escape hatch to accomplish it.
Also not every type has a valid "all zeroes" value in the first place.
Yuck. In my mind, 'using C and not using Rust in the first place' is the escape hatch and Rust shouldn't even go there. Jeez, what a mess.
This is why we're spending substantial energy building better abstractions that don't require you to write any unsafe code.
Other languages can easily achieve this kind of safety. What makes Rust different is that it tries to provide this level of safety without doing extra work at runtime (because otherwise people will put it in the same pile as Java/C#, and continue using C/C++ for speed).
what if your buffer is 64 GiB? (ok, in practice it will be zero'd on demand by the OS but still)
I’d love to know the actual performance impact of zeroing out memory. I bet the performance cost of zeroing out memory ahead of time is negligible in almost all programs.
Here is a really dumb example:
YMMV on different operating systems. Of course this is a program only an idiot would write, but things like caches are often significantly bigger than the median case, especially on Linux where you know there is overcommit.A non-idiot would use calloc(DEFINITELY_BIG_ENOUGH), and that will likely erase the difference because the impl will be able to rely on mmap(ANONYMOUS) creating zero pages for such a large allocation. A more realistic test would be to have a large number of small allocations that get calloc'd and then free'd repeatedly, because a) free'ing a small allocation will not free the underlying page and thus reallocation will not be able to rely on it already being zero, and b) zeroing small allocations doesn't amortize the cost of zeroing as well as zeroing large allocations does.
c) SIMD won't be as helpful when you are zeroing small buffers of non-round sizes
That's what (b) is about.
Surely the point of Rust is 'safety at the price of performance' and if extra performance is required, don't use Rust. Don't bodge the language to accommodate!
The point of rust is explicitly not 'safety at the price of performance', quite the opposite. The whole point was to create language where safety doesn't cost performance like it does in most other languages.
It is not. If it were, it wouldn't even have raw pointers, for example.
and yet an article about a 'safe' language has code examples full of 'unsafe'... wut.
`unsafe` is just a barrier between what compiler proves and what programmer proves.
People should really read more on safety semantics in Rust before making comments like this, it's quite annoying to bump into surface level misunderstandings everytime Rust is mentioned somewhere.
Related to unspecified vs undefined. I recall some C code was trying to be tricky and read from just allocated memory. Something like:
int* ptr = malloc(size); if(ptr[offset] == 0) { }
The code was assuming that the value in an allocated buffer did not change.
However, it was pointed out in review that it could change with these steps:
1) The malloc allocates from a new memory page. This page is often not mapped to a physical page until written to.
2) The reads just return the default (often 0 value) as the page is not mapped.
3) Another allocation is made that is written to the same page. This maps the page to physical memory which then changes the value of the original allocation.
A read from an unmapped page producing a different value than reading from that same page after it's mapped is an OS bug (*). If this was an already allocated page that had something written to it, reading from it would page it back in and then produce the actual content. If this was a new page and the OS contract was to provide zeroed pages, both the read before it was mapped and the read after it was mapped would produce zero.
What could happen is that the UB in that code could result in it being compiled in a way that makes the comparison non-deterministic.
(*): ... or alternatively, we're not talking about regular userspace program but a higher privilege layer that is doing direct unpaged access, but I assume that's not the case since you're talking about malloc.
It was from C++con 2016 - Facebook take on small strings https://www.youtube.com/watch?v=kPR8h4-qZdk&t=1343s I believe it is about a page that was conditionally returned to the kernel.
The speaker was mistaken / misspoke.
The closest thing to "conditionally returned to the kernel" is if the page had been given to madvise(MADV_FREE), but that would still not have the behavior they're talking about. Reading and writing would still produce the same content, either the original page content because the kernel hasn't released the page yet, or zero because the kernel has already released the page. Even if the order of operations is read -> kernel frees -> write, then that still doesn't match their story, because the read will produce the original page content, not zero.
That said, the code they're talking about is different from yours in that their code is specifically doing an out-of-bounds read. (They said "If you happen to allocate a string that's 128 bytes, and malloc happens to return an address to you that's 128 bytes away from the end of the page, you'll write the 128 bytes and the null terminator will be the first byte on the next page. So they're very clearly talking about the \0 being outside the allocation.)
So it is absolutely possible to have this setup: the string's allocation happens to be followed by a different allocation that is currently 0 -> the `data[size()] != '\0'` check is performed and succeeds -> `data` is returned to the caller -> whoever owns that following allocation writes a non-zero value to the first byte -> whoever called `c_str()` will now run off the end of the 128B string. This doesn't have anything to do with pages; it can happen within the bounds of a single page. It is also such an obvious out-of-bounds bug that it boggles my mind that it passed any sort of code review and required some sort of graybeard to point out.
I don't believe they are allocating 128 bytes, or accessing out of bounds memory.
He explicitly states 128byte filename allocates 129 bytes. https://www.youtube.com/watch?v=kPR8h4-qZdk&t=1417s
In that case the bug he described simply does not exist.
This is well outside my expertise, but some discussion happened at the time https://www.reddit.com/r/programming/comments/56xxmb/the_str...
Some people suggest that maybe Facebook runs with MAP_UNINITIALIZED