Rendered at 01:36:05 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
WalterBright 1 days ago [-]
NaNs are a very underappreciated feature of IEEE-754 floating point. In the D programming language, floats get default initialized to NaN, not to 0.0.
double y = 0.0; // initialized to 0.0
double x; // initialized to NaN
The discussion routinely comes up as "why not default initialize to 0.0?" The reason is a routine mistake in programming is forgetting to initialize a variable. With a floating point 0.0, one may never realize that the floating point calculation results are wrong. But with NaN, the result of a floating point computation will be NaN, which is unlikely to go unnoticed.
I don't know of any other programming language with this safety feature.
Also, the D `char` type is initialized to 0xFF, not 0, because Unicode says that 0xFF is an invalid character.
p1necone 1 days ago [-]
Just requiring explicit assignment before first use feels like the superior approach to automatic initialization, regardless of whether the automatic initialization is with 0 or with NaN.
WalterBright 1 days ago [-]
That suggestion is often made.
The trouble with it is a bug I've seen often. People will get an error message about an "uninitialized variable". Then they go into "just get the compiler to shut up" mode, amd pick "0" as the initializer. Then, the program compiles and runs, and silently produces the wrong answer. Code reviews will simply pass over the "0" initializer, as it looks right.
With default NaN initialization, the programmer is more likely to stop and think about it, not just insert 0.
For the purposes of code clarity I don't want to see a variable initialized to a value that is never used, just to shut the compiler up.
ncurses1010 1 days ago [-]
With the default initialization to nan, do you ever run into situations where people are searching for common sources for nan (nan literals, div by zero) and they can't find it? Or cases where only some branches but not others initialize the float?
WalterBright 24 hours ago [-]
To leave a variable uninitialized, use the construction:
int x = void;
Note that nobody is going to write this by accident. And it's easy to grep for.
To find the source of a NaN, it helps to know that every operation that has a NaN as an operand produces a NaN as a result. So if you see a NaN in the output, you can work backwards to where it originated.
jcranmer 23 hours ago [-]
> every operation that has a NaN as an operand produces a NaN as a result.
That's not true. The minimum/maximum functions (fmin and fminimum_num variants, but not the fminimum one) treat NaN inputs as not-present, so return the non-NaN value if there is one. Similarly, hypot also treats NaN inputs as not-present. pow and compoundn will ignore NaN exponents if the base is 1.
WalterBright 22 hours ago [-]
Yes, there are some functions where if one operand has no effect on the result, a NaN value will also have no effect.
ncurses1010 19 hours ago [-]
What do you think of kotlins approach where it has a 'todo' function that can always coerce to a type, but instead of populating the variable with a default value that's valid, it just throws
WalterBright 9 hours ago [-]
Sorry, I know nothing about Kotlin.
billforsternz 22 hours ago [-]
How long did you think about this before making this declaration? How long did Walter Bright think about this before making his decision when designing his language? Not saying you're wrong, just something to think about perhaps.
electroly 22 hours ago [-]
C# requires explicit assignment. If an appeal to authority sways you (it shouldn't), you can substitute Anders Hejlsberg instead of this random OP. How long do you suppose Anders Hejlsberg thought about this?
But I contend it's more useful (and interesting) to think about the idea with your own mind instead of tallying up the perceived authority of its supporters and relying on trust. It was also somewhat rude to suggest that the OP had not given their idea much thought. This is a forum for discussion, isn't it?
WalterBright 20 hours ago [-]
Unfortunately, what happens with explicit assignment is programmers often enough will:
1. just insert '= 0;' to get it to compile
2. insert '= 0;' and then be puzzled by an initialization further along in the code
3. see the '= 0;' and wonder why the programmer did that as 0 was not a valid value for it
A goal of D is to be able to make code more understandable. Forcing a vacuous initialization on the programmer is not conducive to that.
electroly 11 hours ago [-]
I think you are misunderstanding what C# does here, and what the original poster was suggesting. Maybe we did a poor job of describing it; I assumed people knew C#. The key words in OP's post were "before first use." It sounds like you interpreted this to mean "every variable declaration must immediately assign a value" but that's not how it works. I'll explain C#'s semantics.
An assignment is required at some point before the first read, not in the declaration. It tracks assignments and usages, and it flags a compiler error if you read a variable before assigning to it for the first time. A variable that hasn't been assigned cannot be read.
It means you can do "int a;" and then later in the function do "a = 5;" and the compiler guarantees that you never read the variable before the assignment in any path through the function. You cannot do "int a;" and then read from it; that's a compile-time error.
It does not mean you have to assign something in the declaration. We never need "vacuous" initializations, and this solution works on all types. Indeed, we avoid vacuous initializations so that the compiler will catch use-before-assign bugs at compile time. The situation you described doesn't happen in C#. Our C# variables become readable on their first assignment, not their declaration; the declaration merely sets the scope. There's no need for a state where it's initialized to an invalid value before receiving the first intended assignment, because in C# the variable is completely inaccessible during that time.
WalterBright 9 hours ago [-]
Ok, I understand that. Thank you for the explanation.
An analogous thing happens in D:
int x;
x = 5;
The compiler front end generates two assignments. But then, when it goes through the backend, the first assignment is deleted by what is known as the "dead assignment optimization".
WalterBright 22 hours ago [-]
Thank you. I've made many counter-intuitive decisions based on long experience. Sometimes I just have to say "trust me".
Like not allowing macros in D, or version algebra.
lmm 1 days ago [-]
Yep. This is NaN as a billion dollar mistake all over again.
WalterBright 20 hours ago [-]
Unrecognized subtle errors in floating point calculations are worse problems.
lmm 17 hours ago [-]
Sure, but that only matters if default-initialising to NaN significantly reduces them compared to the alternatives. IME it takes a very finely calibrated level of thoughtfulness for your argument in https://news.ycombinator.com/item?id=47928539 to work, to have a programmer who is simultaneously thoughtless enough to initialise to 0 without thinking if the compiler requires initialisation, but thoughtful enough to stop and think about it when the compiler initialises to NaN.
WalterBright 9 hours ago [-]
The idea is to make errors obvious rather than subtle. A NaN output is far more obvious than a number that is 2% off.
lmm 7 hours ago [-]
A compilation failure is even more obvious.
WalterBright 1 days ago [-]
Another crucial use of NaNs is if you have a sensor. If the sensor has failed, the sensed value should be transmitted as NaN, not 0, so the receiver knows the data is bad.
AlotOfReading 1 days ago [-]
My experience is that if you write an interface that (rarely) returns NaNs, someone will use it assuming it's never NaN no matter how good the docs are. Then their code does bad things and you have to patiently explain why they're wrong and yes, they are holding isnan() wrong (in C/C++).
adrian_b 19 hours ago [-]
When such users are expected, there exists only one solution.
Do not mask the invalid operation exception, which was actually the original recommendation of the IEEE standard, which was that the default behavior should be to mask all exceptions, except the invalid operation exception.
When the invalid operation exception is not masked, NaNs are never generated and any NaN present in the input data will generate an exception, which will abort the program, unless the exception is handled.
This behavior avoids the bugs caused by careless programmers. Unfortunately, the original suggestion was not adopted by most programming language implementers, so nowadays the typical default setting is to have all exceptions masked. When the programmers also omit to handle the special values, bugs may remain unnoticed.
Special values need not be handled everywhere, because infinities and NaNs will propagate through many operations, so they will remain in the final results. But wherever a value is not persistent, but it is used in some decision and it is discarded after that, special values like NaNs must be handled correctly.
WalterBright 1 days ago [-]
NaN for a failed sensor is objectively better than any other value. But at some point you just cannot help some people.
9 hours ago [-]
bumby 14 hours ago [-]
Doesn’t this completely depend on the sensor failure mode? Eg if a voltage sensor internally shorts to ground, the failure will read 0V, not NaN. Or are you using “failed sensor” to only mean “not reporting” here?
I think your initialization is smart in many use cases, but the sensor application probably isn’t one of them except for that single failure mode. It can still lead to masked failures and false assumptions (“the sensor is getting a value so it must be working”). That’s the same issue as what you’re supposedly fixing by that design choice. It still requires engineering knowledge to assess correctly.
WalterBright 9 hours ago [-]
Yes, I assume the sensor is designed to detect its own failures. If a sensor is capable of emitting floating point values, surely its software can emit a NaN.
The point of a NaN value is it does not require sophisticated engineering knowledge to realize that a NaN output is not what you're expecting.
bumby 4 hours ago [-]
>I assume the sensor is designed to detect its own failures.
Bold assumption. I would be willing to bet this is more the exception than the rule on most sensors/systems.
>The point of a NaN value is it does not require sophisticated engineering knowledge to realize that a NaN output is not what you're expecting.
What I was pointing out is this only captures a relatively narrow set of failure modes and may lead to bad assumptions due to automation bias. E.g., "I only need to think about failures if the sensor gives an NaN because it's based on the assumption that a failure produces an NaN" whereas having an actual principled knowledge of operation can catch the other errors.
anitil 1 days ago [-]
That's a very thoughtful decision, I always enjoy your updates on D
addaon 10 hours ago [-]
What's the cost of this in terms of not being able to bzero() simple data structures, or use OS-cleared pages directly without dirtying them? This seems like it would turn some sparse memory usage patterns dense…
WalterBright 9 hours ago [-]
You can always statically initialize them with 0:
static float[10] array = 0.0;
wpollock 1 days ago [-]
> ... Unicode says that 0xFF is an invalid character.
Not so. You may be thinking of UTF-8 encoding. 0xff is DEL in Unicode.
LittleLily 1 days ago [-]
DEL is unicode codepoint U+007F, which is the byte 0x7F in UTF-8, not 0xFF.
Perhaps you were thinking of ÿ which is codepoint U+00FF, which encodes to the bytes 0xC3 0xBF in UTF-8.
WalterBright 1 days ago [-]
The "char" type in D represents a UTF-8 code unit, the byte 0xFF is not a valid character code and is strictly forbidden.
GMoromisato 1 days ago [-]
I use nan boxing in GridWhale. It feels like the Infinite Hotel[1]: you can always add another type. Note that these techniques also rely on the fact that we don't use all 64-bits for memory addressing. If we ever do, lots of VMs will break.
For me, the major advantage of nan boxing is that you don't have to allocate a whole class of types (like floats). That saves so much at garbage collection time.
This is super useful, thanks. So if I were implementing a programming language, and wanted to have symbols to specify NaN in source code, I'd really only need quiet NaN, right? Because signaling NaN is supposed to always to raise an exception anyway?
WalterBright 1 days ago [-]
I originally implemented Signalling and Quiet NaNs in the compiler. It was an abject failure. With all the transformations a compiler does, where the signalling turns into a quiet is lost. So just quiet NaNs are used.
I don't know of any other programming language with this safety feature.
Also, the D `char` type is initialized to 0xFF, not 0, because Unicode says that 0xFF is an invalid character.
The trouble with it is a bug I've seen often. People will get an error message about an "uninitialized variable". Then they go into "just get the compiler to shut up" mode, amd pick "0" as the initializer. Then, the program compiles and runs, and silently produces the wrong answer. Code reviews will simply pass over the "0" initializer, as it looks right.
With default NaN initialization, the programmer is more likely to stop and think about it, not just insert 0.
Another issue with it is:
For the purposes of code clarity I don't want to see a variable initialized to a value that is never used, just to shut the compiler up.To find the source of a NaN, it helps to know that every operation that has a NaN as an operand produces a NaN as a result. So if you see a NaN in the output, you can work backwards to where it originated.
That's not true. The minimum/maximum functions (fmin and fminimum_num variants, but not the fminimum one) treat NaN inputs as not-present, so return the non-NaN value if there is one. Similarly, hypot also treats NaN inputs as not-present. pow and compoundn will ignore NaN exponents if the base is 1.
But I contend it's more useful (and interesting) to think about the idea with your own mind instead of tallying up the perceived authority of its supporters and relying on trust. It was also somewhat rude to suggest that the OP had not given their idea much thought. This is a forum for discussion, isn't it?
1. just insert '= 0;' to get it to compile
2. insert '= 0;' and then be puzzled by an initialization further along in the code
3. see the '= 0;' and wonder why the programmer did that as 0 was not a valid value for it
A goal of D is to be able to make code more understandable. Forcing a vacuous initialization on the programmer is not conducive to that.
An assignment is required at some point before the first read, not in the declaration. It tracks assignments and usages, and it flags a compiler error if you read a variable before assigning to it for the first time. A variable that hasn't been assigned cannot be read.
It means you can do "int a;" and then later in the function do "a = 5;" and the compiler guarantees that you never read the variable before the assignment in any path through the function. You cannot do "int a;" and then read from it; that's a compile-time error.
It does not mean you have to assign something in the declaration. We never need "vacuous" initializations, and this solution works on all types. Indeed, we avoid vacuous initializations so that the compiler will catch use-before-assign bugs at compile time. The situation you described doesn't happen in C#. Our C# variables become readable on their first assignment, not their declaration; the declaration merely sets the scope. There's no need for a state where it's initialized to an invalid value before receiving the first intended assignment, because in C# the variable is completely inaccessible during that time.
An analogous thing happens in D:
The compiler front end generates two assignments. But then, when it goes through the backend, the first assignment is deleted by what is known as the "dead assignment optimization".Like not allowing macros in D, or version algebra.
Do not mask the invalid operation exception, which was actually the original recommendation of the IEEE standard, which was that the default behavior should be to mask all exceptions, except the invalid operation exception.
When the invalid operation exception is not masked, NaNs are never generated and any NaN present in the input data will generate an exception, which will abort the program, unless the exception is handled.
This behavior avoids the bugs caused by careless programmers. Unfortunately, the original suggestion was not adopted by most programming language implementers, so nowadays the typical default setting is to have all exceptions masked. When the programmers also omit to handle the special values, bugs may remain unnoticed.
Special values need not be handled everywhere, because infinities and NaNs will propagate through many operations, so they will remain in the final results. But wherever a value is not persistent, but it is used in some decision and it is discarded after that, special values like NaNs must be handled correctly.
I think your initialization is smart in many use cases, but the sensor application probably isn’t one of them except for that single failure mode. It can still lead to masked failures and false assumptions (“the sensor is getting a value so it must be working”). That’s the same issue as what you’re supposedly fixing by that design choice. It still requires engineering knowledge to assess correctly.
The point of a NaN value is it does not require sophisticated engineering knowledge to realize that a NaN output is not what you're expecting.
Bold assumption. I would be willing to bet this is more the exception than the rule on most sensors/systems.
>The point of a NaN value is it does not require sophisticated engineering knowledge to realize that a NaN output is not what you're expecting.
What I was pointing out is this only captures a relatively narrow set of failure modes and may lead to bad assumptions due to automation bias. E.g., "I only need to think about failures if the sensor gives an NaN because it's based on the assumption that a failure produces an NaN" whereas having an actual principled knowledge of operation can catch the other errors.
Not so. You may be thinking of UTF-8 encoding. 0xff is DEL in Unicode.
For me, the major advantage of nan boxing is that you don't have to allocate a whole class of types (like floats). That saves so much at garbage collection time.
------------
[1] https://en.wikipedia.org/wiki/Hilbert%27s_paradox_of_the_Gra...