Explicit type specifiers are not the answer

| | Comments (14)

This topic keeps coming up in various places under various guises, so I figured it'd be worth talking about. Let me make my stance very clear: REALbasic does not need, and should not have some form of explicit type specification that is inline with a numeric literal. It's a bad idea for the language, and I hope it never happens. Hopefully there's no misunderstanding my stance now. ;-)

For starters, REALbasic made a mistake when adding all the fancy new numeric datatypes. At the time, it didn't feel like a mistake -- it felt like the right direction to go. Unfortunately, hindsight is 20/20 and I now firmly believe the numeric types are a mistake. The fact is, most users don't use them as they were intended to be used. All too frequently, I see naive mistakes from people with great intentions but an unsound understanding of the types. Really, the only reason for needing these types at all is for communicating with the outside world. Declares, protocols, file formats, etc all have to worry about silly things like type sizes, and how to interpret signs, etc. So I think that things like structures, memory blocks, binary streams and declares should have access to the myriad of datatypes. However, for everyday types (like locals, parameters, and properties), they're a bad idea. Just use Integer or Double and make your life, much, much easier.

But that's just the start of the mess. Obviously, without a ton of varying numeric datatypes, people wouldn't be clamoring for explicit literals. In fact, if it were just Integer and Double, you'd already have the explicitness built in to the very nature of the literals themselves!

The mess catapulted into the minds of the masses because of another well-intentioned idea: the Analyze Project command. If it weren't for Analyze Project hounding you about numeric type mismatches all the time, you would never even think about the implicit conversions because they'd just be a part of business as usual. However, the analysis is simply trying to make a bad situation better -- but people are misunderstanding the feature due to preconceived notions I was fighting against. Analyze Project was never, ever intended to be some list of things for you to fix. In fact, it was quite the opposite. It was a list of things for you to look at, and then determine whether something needs fixing or not. There are always going to be false positives, and it was only going to become "worse" in that the list was meant to grow and grow with each release. Unfortunately, too many people seemed to think of it as a warning system, and you need to get the warnings down to zero. In retrospect, this is a perfectly natural thing to want to do and the feature's design lended itself to thinking about it in the wrong way. That's not to say the feature doesn't provide you with useful information -- it most certainly does! It's that the way in which it presents the information isn't ideal. It really plagues the people who want lists of warnings to get to zero. And so we see all sorts of problems crop up, such as people littering their code with CType in hopes of getting the list to zero.

Explicit types do not solve this problem -- they only put a band aid over the previous band aid to the previous "problem" (which wasn't even a problem in the first place!). Here's the lineage: RB introduces new datatypes to combat a perceived issue with communicating to the outside world. People misuse it in ways that aren't safe. So we add a way to remind users "hey, this may or may not be what you intended." Then people have issues getting rid of those messages when it is what we intended. So then we add CType as a way to get of the message. You can see where this is going right? Explicit type specifiers are just a way to get rid of having to use CType for many operations -- it's yet another band-aid in a litany of good intentions.

So what is the solution? Obviously, we can't simply get rid of all the different numeric datatypes. That's going to break code, which isn't an option. My suggestion is to simply get rid of the analyze project message for numeric conversions. Let's face it -- no one currently at RS is going to push the Analyze Project system forward the way it was intended to go. So it's highly unlikely the user experience from it will ever be achieved. It's also highly unlikely that the full utility will ever be reached, unfortunately. Because there's probably not going to be a way to say "ignore this one issue, because I know the code is fine" any time soon, it's also obvious that the numeric conversion issue is going to continue to be a major annoyance. So get rid of the issue. Yes, it does mean you will lose some useful information. However, I also think it's better than annoying the majority of customers for the foreseeable future. I also think it's much better than adding yet another patch to numeric datatypes -- REALbasic is not C, and does not need this concept.

14 Comments

I agree that the introduction of the new various int types a while ago was badly done.

In fact, I am sure I had warned of the consequence early on because I had worked on improving a Modula-2 compiler in the mid-80s, facing the very same issues. I also was partaking in the ISO standards committee for M-2, discussing better solutions with a lot of bright people.

But I disagree with the option you offer, Aaron. It's not the warnings that are wrong, but a part of the compiler's handling of literals. We've already had a chat on this and I noticed that you were not comfortable with my suggestion. I've since talked with a few other experienced programmers on this topic and, at least the way I described it (which is, of course, not entirely opinions-free), all of them agreed with me that the solution to this is what even C does: turn the types of the literals into their expression's neighbor, just as if the user had done that manually to get rid of the current warnings.

To entertain other readers here, let me state again that suggestion, together with others, making it a fairly complete solution:

1. The first step is that the compiler, if it encounters a sub-expression such as "a + b", makes sure that both have the same type. If they have different types, and if one of them is a literal, its type is converted to the other's types. If they have different types and none of them a literal, see 2. So far, this is what even C++ does, and thus it's a fairly common solution.

2. Now to the part that C doesn't do but what would help many casual RB programmers, and which was also the solution discussed in the Modula-2 committee. There are actually two parts to it:

a) Where possible, do all calculations on the widest base available. That means for RB: do all int math on 64 bit. With the modern CPUs this shouldn't be a problem any more (at least not on PPC - I don't know the use of Intel's register model by RB enough to know if this can be done without much disadvantage there as well - Aaron, can you elaborate?)

b) Add overflow and bounds checking, causing exceptions where values get out of the valid range. This option should be enabled by default because most casual (RB) programmers don't even think about bit representations and their ranges (as Aaron mentions above himself), and just want a number be computed to what their everyday life algebraic understanding lets them expect.

The special, uncommon, case is where someone actually wants to deal with binary results, anticipating overflows turning positive values into negative, or cause wrap-arounds. Those could be handled by special pragmas (which is the ugly C solution) or, by what Modula-2 introduced, special functions that spell out "this is tricky binary stuff". In M-2, when doing this, one actually had to use one common keyword, which would indicate to anyone quickly, where code was system-dependent (by using the "SYSTEM" module). This means: Instead of, like now, any normally anticipated conversion has to be handled by a "CType" invocation, the opposite should be enabled: That one has to use CType or something similar where one wants to override the "smart" ways the compiler should do for the casual programmer.

I know that this works, as I have not read this in a magazine or heard from someone but because I'd spent years not only implementing but using this! The Modula-2 compiler was, at that time, a very popular development system, with IDE, debugger and all, for the Atari ST and I earned quite some money with it back then. We were quite a small team, similar to what RB was in its early years, and when I saw RB mature, I saw OFTEN all the mistakes we already dealt with 15 years earlier and knew how to do them better - yet RB would do the same mistakes again, despite my attempts to prevent that.

This became a bit huge for a comment. I'll see that I get this text onto my website (although there is no way to leave comments on mine, unfortunately).

One last question to you, Aaron: I and others have frequently asked for overflow & range checks with exception raising, but there was never any comment from RS for why this wasn't even done. Can you explain the reasons for why the compiler never got this?

"simply get rid of the analyze project message for numeric conversions"

Yup. Amen, brother.

@Thomas -- you're still attacking this from the viewpoint that multiple numeric datatypes are needed for non-external purposes, which is why I disagree with you. Yes, if we were assuming that RB should be like C, or god forbid, Modula-2, then it would make sense to have communicable types. However, that's not what RB should have been designed to do -- it shouldn't have been in this mess in the first place.

Your idea didn't scare me from the standpoint of "is it reasonable or not." Yes, it's a reasonable approach to consider. Your idea scared me from the standpoint of "is it possible to do with the resources at hand" because it's an architecturally hard problem to solve with the current code base. Also, it scared me because I hadn't taken the time to really think through all of the ramifications of such behavior (not only technical, but human ramifications -- will the target audience understand the concept, or will they trip over it).

As for the overflow and range checking question, I did address that in another blog posting. But the long and short of it is two-fold: 1) It'll break existing code, 2) it's prohibitively expensive to do at runtime.

1) It makes no sense to implement it as a pragma that you turn on because you've already solved the problem by just thinking "hey, I'll turn this pragma on." The issue isn't the overflow or division, the issue is that the programmer hasn't thought about them being a problem in the first place. If you make it the default behavior to throw exceptions, then you'd have the potential to break every existing project out there.

2) It would cause almost all mathematical operations to explode in size and efficiency (we're talking anywhere from 3-10x the size, and probably close to a 500% decrease in speed), assuming it was not an operation that could be caught by the compiler (such as via constant folding). What's more, since it would have to be applied in a shotgun fashion to a significant number of operations, you are penalizing people who actually validate input on behalf of everyone who decides not to.

One of the reasons I really liked Modula-2, is that you could know "why?". The documentation and notes from Wirth and the committees involved explicitly stated what the compiler was expected to do and why the decision was made to do it that way.

Other languages had (have) so many loose ends -- things that are left to the implementation. Modula spelled it all out for you.

I used it daily from the time I discovered it in about 1983 until finding RB at version 3.0.

Sadly, having read those notes and shared the views of the authors, I often look at alternate implementations as "wrong" rather than just "different".

That said, Real does a poor job explaining that Double and Integer are first class data types and Single, and the other Integers are second class citizens. I program this way from years of habit. But new users will see UInt8 and want to use it (because they did in C).

I do find myself increasingly needing Int64 to be a first class data type. (It may be better than the other Ints, I've just approached it cautiously by habbit.)

@Kirkgray -- I'm not certain where you got the idea that Single and non-32 bit integers aren't first class datatypes. They certainly are first class from the compiler's perspective! That being said, there's really no need for anything aside from Integer (because it's the native word size), Double (because of its precision) and 64-bit integers (because of their range). The remainder have uses, to be sure... but they're generally only useful in a limited capacity.

Second class citizens in the sense that you can't count on working strictly within that data type.

Specifically things like Ceil, Floor, Round, Val, Cdbl, etc. are defined only on Double. Try using Single or Currency and your data is converted to Double before the operation and your results converted back resulting in a performance hit and possible unexpected data loss. You can't even do simple addition on a Single without a conversion to and then from Double.

The same is true with the Ints. Data type conversions are going to happen when you'd not expect -- as examples on the lists have shown.

Integer and Double are the only two numericals you can be (relatively) certain won't get converted on you -- first class citizens. The others will get converted at some point -- second class citizens.

That's why I only work in Doubles and Integers (and Bob Delaney's Decimal).

When I do have to use Int64s, I make sure to never use constants or literals. I avoid constants and literals because it's too easy to make a mistake:

dim bigInt as Int64

bigInt = 2100000000 + 100000000

if bigInt > 10 then
beep
end

Most people are busy putting CType into their code for stuff like this.

You can't do something like this in RB:

const a as Int64 = 2100000000

So my code for the above would be:

dim bigInt, a, b, ten as Int64

a = 2100000000
b = 100000000
ten = 10

bigInt = a + b

if bigInt > ten then
beep
end

It's that or litter my code with CType. So I would have to echo the call for a way to define types of constants and literals in code.

And while we're on it.

a = Int64(2100000000)

Would be infinitely more readable than

a = CType(2100000000, Int64)

@Kirkgray -- You can't have explicit types for constants in r4, but I did add that to r5, and I doubt anyone's going to go remove it. :-P As for your alternate CType syntax, I humbly point you to a previous blog posting which discusses the difference between type conversion and type casting.

http://ramblings.aaronballman.com/2008/08/type_conversion_vs_casting.html

Good to know about the explicit types for constants in r5. What's the syntax? I've tried a few things in r5b2 and keep getting a syntax error. And I couldn't find it in a quick search of the release notes (maybe nobody knows it's in there :o). Still, something for literals would be useful. ;-)

Thanks for the the conversion vs. casting link. Somehow I missed that post. It is helpful (as always). Thanks.

Also, thanks for all you've done for RealBasic and the community both at Real and on your blog. I know I usually only speak up when I see things differently than the powers that be, so it may appear that I'm less than pleased with RB. Actually, I wouldn't be using it if it weren't the best solution out there for what I'm doing. So Thanks. And best of luck in whatever is next for you.

@Kirkgray -- Oops, I retract my statement! I worked on it, implemented it, made test cases for it, but didn't actually turn it on in a public build. So it's something that someone would have to go turn on, if RS wanted it. I think it's a good idea because it complements the Dim syntax nicely. Of course, as I had mentioned in another blog posting about the feature (when I was still musing about it), there's a whitelist of types. You can't make a constant Window, for instance.

As for your kind words, thank you! And you're welcome. :-) It's been my pleasure to be of assistance, and if you ever need anything, you can always shoot me off an email. :-)

Just in case someone else has followed my thoughts above -- After Aaron's comments I've updated my proposal here:

http://www.tempel.org/RBCompilerImprovements

It doesn't address all of Aaron's responses, though. E.g, I am confident that the extra overflow and range checking code is not making the code much slower. Also, I see the checks as a debuggint aid, which may be disabled in the released app if a user really thinks that's better. Finally, I don't think it'll break much code, and I've explained a little of that on the above page. Both the speed impact and the amount of breaking code will only be seen once this would be tested with the actual RB users. I, for instance, am sure that hardly any of my code will be affected because I learned never to rely on side effects from overflows and such.

The problem with the numerous warnings about possible conversion issues is that a developer so easily gets into the habit of simply ignoring them all and in the masses of what they are ignoring they may miss something which is quite important they would like to know about.

The simple solution is two types of Analyze Project; full or ignoring conversion issues. Let the developer decide what they want to see and what they want to ignore.

@XVB -- you can already do that today by simply filtering out the issues report list. Once the list comes up (for any given project), there's a "filter" toolbar button which allows you to set which issues to display. You can go in there and turn off whatever issues you're not interested in seeing.

Leave a comment