Mac Intel Code Generator Bug

| | Comments (28)

So I found a rather interesting code generator bug in XCode that happens when converting a 64-bit floating point value to an unsigned 32-bit integer. Here's a good example of what I'm talking about in REALbasic:


dim d as Double = -1.0
dim u as UInt32 = d

MsgBox Str( u )

If you run that code, it will display 0, and not 0xFFFFFFFF like you'd expect! In fact, any negative number will still display 0. But this bug isn't limited to just the REALbasic compiler. You can see it in a straight-up C application too:

double d = -1.0;
unsigned long u = (unsigned long)d;
printf( "%d\n", u );

When you run this on the command line, you'll also see the value zero printed out to the terminal.

When you disassemble the source code, and trace your way through it, the problem will become clear -- the code generator is constraining the results within the range 0..MAX_UINT. What's bizarre about that is: it's not needed, and it's the cause of the issue. Any negative values end up getting mapped to zero, which is silly because the conversion is to UInt.

The reason REALbasic demonstrates the same issue is because we make use of an intrinsic instead of doing the conversion ourselves. We use intrinsics to do conversions as a way to save on space (it takes less space to call a function than it does to do the conversion), and so our conversions suffer from the same problems as the XCode application.

So watch out for this one, as it can be particularly painful. Thankfully, it only affects conversions from double to uint32 -- other conversions all appear to be fine in quick tests. We've filed a bug report with Apple, but since Radar appears to be a closed system, no one gets to see how the report progresses. :-P

28 Comments

And the difference between Radar and FrogBugs is....? The new-style RB does show anything either... :-(

This might be a good time to replace the intrinsic call with an inline operation...

@Mars -- yeah, we've fixed the intrinsic call up with a workaround for Mac, but the thought has certainly crossed my mind. Apple thinks this is a bug and is taking it seriously, but interestingly enough, the C99 spec does seem to indicate that this is undefined behavior. Bleh. So inlining it may actually be the safest answer regardless.

You say Xcode, but I assume you mean GCC? I don't believe Xcode itself has the ability to generate code.

Yep. I just tried this in pure GCC (without Xcode) and get the same result (0). Which makes sense since Xcode is just front-ending GCC. If you think this is a bug, you might want to file a bug report with the GCC guys.

@Will -- you're absolutely correct, I do mean gcc. I just wanted to make sure there was no confusion about "which" gcc though, as this problem does *not* manifest itself in gcc on Linux, only gcc on Mac. XCode was the easiest way for me to distinguish between the two. :-P

Interesting. Apple has made changes to gcc, so maybe this is one of those changes. I did want to try to scare up a Linux box to try it out on, but I couldn't come up with one.

Just out of curiosity, which version of gcc did you try this on in Linux. On Mac OS X you'd be using 4, but on Linux I guess it might still be 3?

Hmm. I just tried this with gcc 3.3 on Suse and got "-1" printed to the console. That doesn't seem right either :)

Ok. Obviously I didn't have enough coffee before posting that last comment (translation: I am an idiot). Anyway, clearly the signed representation of 0xFFFFFFFF is -1, which is what one would expected to get printed for a "%d" format in printf.

Sorry about (continuing) noise.

@Will -- We're using gcc 3 on both systems, actually -- but my understanding was that it's a reproducible issue with gcc 4 as well.

Ahh... yes... I was right -- Apple decided to pull out the "this is because it's up to us what that means" card instead of doing something sensible. Here's their response to the bug:

Engineering has determined that this issue behaves as intended based on the following information:

It appears you are asking that an undefined operation (convert negative FP -> unsigned int) behave the same on x86 as on PPC. Since the operation is formally Not Defined by the IEEE and/or C standards, the compilers are free to do as they wish. The fact that the PPC and x86 compilers have chosen differently is unfortunate, but the compilers seem to be correct in this case.

We consider this issue closed. Thank you for taking the time to notify us of this issue.

It's unfortunate that every compiler outside of the gcc that comes from Apple manages to behave the same in this very, very simple situation. But such is life! I kind of figured this would be their response, as unfortunate as it may be.

I didn't think you could build an app for a version of Mac OS X later than 10.3.9 without using gcc 4. I'm guessing you are using gcc 4 to build your Intel version and gcc 3 to build your PPC version. That's usually the way universal binaries are built.

In which case, you may, in fact, be testing this with gcc 4, not gcc 3. And this may turn out to be something new in gcc 4 itself (not introduced by Apple), which I suspect is really the case. But I don't have definitive proof of that yet :)

In the interest of science, I downloaded and built gcc 4.0.1 (the same version number I've got on my MacBook Pro) for Suse Linux and tried your code their. Unfortunately (for me), it printed "-1". So, I guess you are right. This really is an Apple change.

@Will -- thanks for the second set of eyes on this! It's always nice to have independent confirmation. :-) Yeah, it appears to be an Apple change, and unfortunately, it appears they're not interested in modifying their code generator to alter the behavior to be more inline with... every other compiler I've been able to find. :-P

Here is an interesting data point. This prints "0":

double d = -1.0;
unsigned long u = (unsigned long)u;
printf("%d\n", u);

But this prints "-1":

double d = -1.0;
unsigned long long u = (unsigned long long)u;
printf("%d\n", u);


Maybe what we're seeing isn't a change to the code by Apple, but a difference in compiling for a 32-bit architecture (PPC Macs and Linux) versus a 64-bit architecture (Intel Macs).

@Will -- yes, this is a 32-bit conversion bug only. 64-bit conversions don't demonstrate the same behavior. And signed conversions don't either (regardless of size).

Maybe I'm missing something, but the original behavior seems correct to me.

-1 doesn't have a representation as an Uint

0 is the closest Uint value, and 0 is sometimes meant to be the null value.

0 seems like the most desired outcome.

I can' see why you'd expect a -1 double to yield a 0xFFFFFFFF Uint, because that would be a very BIG number and thus completely wrong.

What am I missing???

@Joe -- the reason this conversion is so disconcerting is because the behavior is entirely unexpected from another common conversion: int32 -> uint32. In that case, the signed number's bit format is just interpreted as unsigned, so -1 is interpreted as 0xFFFFFFFF. So in the floating-point case, the usual way the conversion happens is like this: double->int32->uint32 (at least, you can think of it happening that way logically -- it's a bit different under the hood). So in that case -1 (double) -> -1 (int32) -> 0xFFFFFFFF (uint32) is the generally expected results.

I should also note that technically, this is not a bug but an unfortunately common problem in C/C++ -- the specs define certain operations as being left up to the compiler writer. However, in the most common cases, common conventions win out and most compilers tend towards the same behaviors. In this case, gcc on Mac is the *only* compiler I could find (out of 5 tested) that behaves different from the rest.

You are correct that -1 to anything positive would seem a bit odd at first blush, but this is also why sign conversions are flagged as warnings. Sometimes they're correct, and sometimes they're wrong. The main complaint here is that they come out *different* depending on compiler and architecture. gcc on Mac compiling for PPC yields 0xFFFFFFFF, as does gcc on Linux for x86, VS 2005, VS 6 and CodeWarrior 8 (for PPC). gcc on Mac for x86 yields 0. That's a pretty drastic change in behavior for a common typecast operation.

It's unfortunate and annoying that Apple pulled out the "by spec, it's undefined" card, but it's also their right to do so.

In my book that kind of operation should fire an exception. There isn't any reasonable value to use for -1 in an UInt.

A compiler shouldn't just silently throw away the sign! If you're not going to flag an exception, I'd still prefer to get 0 instead of a huge positive number.

It all depends on whether you consider this a typecast (give me the bits) or a conversion (give me the value)! Since RB has an explicit typecast mechanism (that isn't used in this case), I'd consider this assignment to be a conversion.

UInt = -1.0 should Convert to 0, or better yet fire an exception
Uint = UInt(-1.0) could reasonably give 0xFFFFFFFF, but even that is a mistake since that is only a portion of the 64 bit float.

C and C++ tend to have a lower level, literal perspective. Do what I say not what I mean. ie move the bits over here regardless of what they mean.

RB tried to take a higher level perspective and help people do what they mean. is copy the value of those bits from one data type to another.

Hence I think it should be considered a Conversion in RB and either fire an exception since it can't be converted accurately. Or at least convert to the closest value which is 0.

@Joe -- except that it *is* perfectly reasonable to assign -1 to a uint, because -1 is 0xFFFFFFFF and there's no way to tell the difference in hardware. ;-) It is a very normal operation to mix signs, and one that has very well-defined behavior when going from int->uint or vice versa. The complaint here is that the behavior isn't well defined when going from float->uint.

As for firing an exception, that's simply never going to be a viable option. It would slow down your application incredibly because each and every assignment (including to temps) would be required to have dozens of assembly instructions to check ranges -- and even then, you'd still be stuck with all sort of legitimate assignments that throw exceptions. Sorry, but we'll have to agree to disagree on this one. ;-)

There is no need to check ranges for normal assignments, the types would already match.

The issue is that this case isn't simply an assignment, it's a typecast or a conversion.

You want to consider it a typecast, where you simply copy bits (or a reasonable subset of bits) and don't do any range checking.

Apple and I want to consider it as a conversion, where the value should be preserved and IMHO an exception should be fired if the value can't be preserved and data would be lost. Yes that costs cycles, but that's expected for a conversion.

And conversions should be rare in well written code. In fact I'd prefer that RB didn't even allow implicit conversions to help us keep our code clean and free of unintentional truncation errors and conversion anomalies like this one.

In such a system there would be no penalty for range checking and appropriately firing exceptions unless a conversion was explicitly requested. Less errors, and more accurate data that way.

@Aaron -- "except that it *is* perfectly reasonable to assign -1 to a uint, because -1 is 0xFFFFFFFF and there's no way to tell the difference in hardware. ;-)"

Aaron, yes we can have different opinions, but this statement is just factually incorrect.

- 1 is a concept that simply can't be represented in a Uint, by definition.

-1 is 0xFFFFFFFF ONLY for SOME binary representations, certainly not for an Unsigned representation as we have here (and also not for a Signed int64 and also not for a float)

Some parts of the hardware don't care what the bits represent (eg Load and Store) but other parts of the hardware care a LOT.

Now of course you already know these basic things, which makes it odd that you would have ever made the statement that "it *is* perfectly reasonable to assign -1 to a uint" when you know it to be impossible by definition.

@Joe -- as I said before, we're going to have to agree to disagree on this topic. I see where you're coming from with your statements, I just happen to hold a different opinion on the topic.

That works for me. :-)

Coming from doing some assembly stuff before, IMO, -1 *is* 0xFFFFFFFF for integer representations. 0xFFFFFFFF is also a really big unsigned number, but -1 makes perfect sense. Floats get messy in binary representation, so Float->uint is kinda weird. I, personally, would expect something close to what I'd get from Float->signed int-> uint, so that -1.0 would become -1, which would be 0xFFFFFFFF. However, my perspective isn't coming from compiler standards, more of how I would like my numbers to change around. If I wanted negative numbers to become 0, I would make code to test for that. IMO, anything else destroys data that I might want to make use of.

I can't imagine that float->unit should fire an exception. It is a valid conversion, but should be warned about.

It is interesting that Apple made that change to gcc, for some unknown reasons, as they didn't have to.

I remain unconvinced this is an Apple change. I looked at the source to gcc and I believe the code that causes the "bug" (if it is one) is in real.c. There is a function that handles converting reals to integers and there can be cases where the floating-point representation overflows the amount of storage a "natural" integer can hold. I think that's what is going on here, with the critical difference between PPC and Linux being 64-bit (PPC is 32-bit, as are most legacy Linux systems, I would think). There is definitely code in the vanilla real.c you get from gnu that will return 0 for negative floating-point numbers.

It would be interesting to see what happens with a version of gcc that was built to run on Linux to target a 64-bit machine. That would be something like the situation we are seeing here.

Except in this case, there's no overflow. If the conversion were from double->int64->uint32 (or some variation thereof), the behavior would still be the same, just more of it. -1 as a uint64 is just twice as many F's as -1 as a uint32, after all.

It may not be an Apple change though, because you are right -- I'm using a 32-bit version of Linux, and building 32-bit apps on Linux. So it's entirely possible this is just a 64-bit problem. But I still assert that the behavior is less than useful, even if it is entirely correct.

Also, I should point out that my intentions aren't to build a 64-bit application, and I'm on a 32-bit chipset. So maybe I'm missing something, but I'd expect it to generate code for a 32-bit architecture.

Much ado about nothing... Sounds a lot like sour grapes and Aaron's bitterness towards Apple.

Leave a comment

Disclaimer

I'm currently an employee of REAL Software. My blog is mine. The opinions represented in this blog are mine as well and may not represent my employer's opinions. All original material is copyrighted and property of the author.

REALbasic® is a registered trademark of REAL Software, Inc. REAL SQL Server™ and Lingua™ are pending trademarks of REAL Software, Inc. All rights reserved.