Beware: mixing signs breeds dragons

| | Comments (18)

This is a blog posting about the current behavior of the REALbasic compiler. This information isn't documented anywhere else because it's liable to change at a whim. As such, you should take this information with a grain of salt. Don't assume this will always be the case.

I've seen a few bug reports about this particular issue, and so I feel it'd be a good idea to warn people: do not mix signed and unsigned integer operations without careful thought. REALbasic doesn't have a warning system (for good or for bad, that's the way it is currently) and so the compiler will just try to be as helpful as it can. However, the compiler isn't a mind reader, and sometimes it is wrong (though it is always consistent). Whenever the compiler has to pick a common datatype between two choices, it prefers signed over unsigned datatypes. This means that if you compare an Int32 and a UInt32, it does a signed integer comparison.


if someBigUInt32 > 0 then
MsgBox "Uh oh!"
end if

If someBigUInt32's value is > 2^31, then that comparison will fail even though it seems like it shouldn't. The reason is that the constant 0 is considered to be a signed 32-bit integer, and so the common type is Int32.

The solution to this issue isn't to prefer unsigned integers, as it might seem at first blush. That's just robbing Peter to pay Paul -- there will still be information loss. For instance, if that were the case, then the following code would be just as problematic:


dim i as Integer = -1
dim j as UInt32
if i < j then
MsgBox "Uh oh"
end if

It's a bit harder to get into the situation because you'd have to be explicit about j being a UInt32, but the problem still exists.

So, when you're working with signed and unsigned integers in the same code, you need to be aware of the fact that mixing and matching is a dangerous operation. Pay close attention to what you're doing, and try to be explicit about datatypes whenever possible.

18 Comments

It certainly seems like the compiler ought to be smarter about type inference in such situations.

Ought to be smarter is an easy statement to make. It ought to do everything the programmer is thinking. ;-)

The real question is: how? Whenever you have signed vs unsigned, one "side" is going to lose information since you have to pick one type of operation or another.

In this case, I'd suggest that "smarter" implies that in any comparison, assignment or expression involving only one type of integer variable and a literal integer constant, the constant should be cast to the same type as the variable.

In general, you can always wish the compiler were smarter. But for unsigned vs literal comparisons, I'd think it's a no-brainer.

if( myUint > -1 ) then
...

This code is a tautology (well, it at least SHOULD be), so the compiler should be able to optimize the whole if/elseif/else out of there and only compile what code exists within this particular branch.

Why is it that the compiler can't perform the extra step of checking for this scenario? Certainly there will still be issues (if myGinormousUint > someNegativeInt), but at least the non-obvious issue you described can be alleviated. Or maybe the only reason this can't be done is because no bug report exists? ;)

@Steve -- but that doesn't really sovle the issue (and plus, doing that is *damned* hard). Then you still run into the problem when you do:

dim i as Integer = 0
if someBigUInt32 > i then

MsgBox "Uh oh!"

end if

So instead of being consistent and saying "the compiler always does this", you now have to say "the compiler sometimes does this, except when that" and it still leaves the original issue intact.

So for this particular case, why not have the compiler cast both to Int64s? Of course, that doesn't solve the problem for UInt64Int64 comparisons, but at least you can say that 32- bit comparison is always guaranteed to be safe and correct.

Interesting.
A quick test suggests this is consistent with C in this regard :)

long i = -1 ;
unsigned long j = 0 ;

if (i std::cout else
std::cout

the results might surprise some as in XCode this says j

Let me be clear -- there are *some* cases which the compiler can detect, to be sure. The trouble is that the compiler cannot catch all cases because this is generally a *runtime* issue. So the best the compiler can do is warn you that you're doing something wrong. It can only rarely catch actually erroneous cases.

@Aaron, I appreciate the warning in your post and I agree that in the case of variables, once it's been pointed out, it only requires the coder to be careful.
In the case of constants, RB seems to unaccountably fail to allow proper typing or casting of numeric constants.
And, to develop the point even further off-topic, IMHO RGB tries to be too helpful generally in implicitly converting between numeric data types and *way* too often, some internal process seems to use doubles as the line of least resistance.
Sadly, I'm sure most of those ships have long-since sailed.

@Adam Ernst -- because then you're trading off for efficiency. 64-bit comparisons are significantly more costly than 32-bit comparisons (on a 32-bit machine). So yes, your code would be less prone to errors, but it'd also make code slower. What's worse, it hides a mistake with your code that's easily repeatable with 64-bit integers.

Another thing to keep in mind is that this just one example of an entire class of programmer mistakes. Here's another *very* common one:

if someFloat = someOtherFloat then ...

Any time you do comparisons, you have to recognize the problems that can await you. The compiler can only help you so much.

The real bummer here is that REALbasic's compiler doesn't *warn* you about these issues. Trying to make the compiler smarter is a good goal, but it's simply never going to catch all the typing issues. Some form of code analysis would be a much better gain than trying to make the compiler smarter for a handful of contrived cases (I think).

@Steve -- yeah, the fact that you cannot explicitly type class- and module-level constants is a problem. It stems from the fact that REALbasic used to have only a very few, distinct types. So there was no need to have the extra hassle of making the user pick a specific type. But now, I think it's a *very* reasonable feature request to allow the user to manually pick the constant datatype in a more formal manner. REALbasic can certainly make some guesses for you like it already does, but there's no technical reason you can't make an Int8 constant, or Currency.

To be fair, it's not as bad as I thought. I seem to have wrongly remembered a compile error when casting to integer types. I now see this is legal
Dim someBigUInt32 As UInt32 = &hffffffff
if someBigUInt32 > UInt32( 0 ) then

Sorry, that anonymous was me - don't know what happened

Any clue how far back this goes? Just so I know about sticky situations. Just back to the introduction of the UInt, probably.

I'm not sure, but is it possible to have a compiler warning about that so that you are aware? Maybe have some more info in the warning explaining this stuff, and change the warning when/if the behavior changes?

@jdiwnab -- this explicit problem goes as far back as the introduction of unsigned integers. However, the general problem goes back to RB 1.0, which is that mixing types can be dangerous. It's just as dangerous to compare an integer and a floating-point number because there is data loss there as well.

And yes, if REALbasic had a warning system, then it would certainly be possible to add warnings about this sort of thing. However, REALbasic currently only reports errors, and not warnings, which is why you have to be so careful. This sort of case can silently cause problems for you.

"Numerical Recipes in C" should be require reading so folks get why

if someFloat = someOtherFloat then ...

is simply asking for trouble. Especially after doing calculations and then doing an exact comparison like this.

@Aaron

Hey Aaron, you're a fucking prick.

@Dave -- thanks for your feedback; always nice to receive constructive criticism.

Leave a comment

Disclaimer

I'm currently an employee of REAL Software. My blog is mine. The opinions represented in this blog are mine as well and may not represent my employer's opinions. All original material is copyrighted and property of the author.

REALbasic® is a registered trademark of REAL Software, Inc. REAL SQL Server™ and Lingua™ are pending trademarks of REAL Software, Inc. All rights reserved.