It's recently come up on the NUG that there is some confusion over division by zero, and what it should do. So I figured I'd take a stab at explaining the technical side of things.
The problem is, what to do when the user writes code that divides by zero (whether by floating-point division, integer division or modulo operations). Let me start by describing what REALbasic already does.
If you are dividing two values and expect to get a floating-point result, then the result is a sentinel value that represents positive or negative infinity, depending on whether an operand is negative or not. This sentinel value can propagate the infinite result through to other calculations (so, infinity plus one is still infinity). If you are instead expecting an integer result, then the return value is undefined because integers have no sentinel value they can use, nor can they propagate results.
This behavior is not buggy, nor is it inconsistent, as some have stated. The behavior is very much by design (which we will get to). As for it being inconsistent, I have a hard time with that phrase. You are dealing with two entirely different datatypes. The fact that they both happen to represent some numeric values is immaterial. No one would claim that the behavioral differences between numbers stored in strings and numbers stored in Int32 are bugs. Floating-point value storage is *different* from integer value storage (heck, they use entirely different processors!), so claiming inconsistencies between them in this case seems like a stretch to me.
The design of this behavior was decided based on many factors, most of which I can only guess at after the fact, since this behavior was decided close to 15 years ago. The first question was, "how is this different on PPC compared to x86?", and the answer may be surprising. The difference between the two platforms is a crash -- PPC does the division and happily stores the results. x86 throws an exception, which when unhandled, crashes your application. So, obviously we need to do something to bring those behaviors more in line with one another. The next question was, "what's the default behavior of the processor?" The default behavior of the FPU is to store +/-Inf and happily move on. The default behavior of the CPU is to crash/store undefined values depending on the architecture. Given the choice between the two, we picked "undefined values", of course. The third question is: "what's the cost vs benefit of altering the default behavior?" This is the one where everyone assumes it's better to raise an exception. The benefit is that you now have a consistent response to division by zero, regardless of the operand's storage. The cost is that now every division operator becomes larger (because it has to be able to throw the exception, or jump over the code that would throw it). Also, the cost is that now all division operations must be wrapped in try/catch blocks in order to handle errors properly, which also adds to code size since an exception handler takes up more space than an if/then block. The main difference between the two "correct behavior" branches essentially boils down to this:
// Way one
if divisor <> 0 then
dividend = quotient / divisor
else
// Handle error
end if
// Way two
try
dividend = quotient / divisor
catch err as DivideByZeroException
// Handle error
end if
The two pieces of code are basically the same in terms of readability. The difference is that the first way is significantly smaller in size compared to the second way because of the exception system. So it was decided there would not be exceptions.
Ok, fine, "so it was decided." So what? Why not change it? There's a very sound technical reason which harks back to my previous blog posting about personal philosophies: we can't. If we started throwing exceptions during division operations, that would break every application using division if the user upgraded. What's worse, it would silently break code because no one ever had to code for that exception before. And, it would reduce execution speed. There are a lot of very good reasons for not changing the behavior, which is why the behavior won't be changed. Btw, you can apply this same reasoning to the other discussions that have cropped up similarly: overflow/underflow, NaN, mismatched comparisons, etc.
The discussion of which is better between the current behavior and throwing exception is an interesting academic one. My personal stance is that exceptions are for exceptional situations and that you should just do your own input validation. But that's neither here nor there in this case -- the fact remains that the behavior cannot change without risking way more than it's worth.
Checking fro division by zero should always be checked in code ... But like many other things (such as considering NIL objects) we sometimes for get to do it...
For NIL objects the compiler helps us out in finding those coding errors.. But in calculations you can chug merrily along and if you convert to integer along the way get wrong results that can easily be missed on testing.
For some types of apps (those that do long complicated calculations) that can be a HUGE problem... That's why several of us were asking for a compiler switch to toggle divide by 0 exceptions on and off. Thinking about that ideally we would have the same for overflows and underflows.
BTW I learned how to program on mainframes and in languages where divide by 0, overflows and underflows caused programs to crash outright... So those bugs mostly got caught during testing.
I find it hard to be believe that such switches could not in principle be added safely with little performance penalty (when turned off at least, and when turned off would not break any existing code)... but I suspect what you are really saying is that the amount of work to do that would be prohibitive for what you see as the gain.
- Karen
@Karen -- no, what I'm really saying is what I actually said. ;-) Short of coming up with different operators with the new behavior, there is no way to modify the behavior of division operators without running the risk of breaking a lot of code. You can't just make a new pragma and ignore the fact that you can still break code. You might turn the pragma on in a method, and fail to properly handle the exception, thereby propagating the failure to other libraries which wouldn't expect it (but would otherwise be able to easily handle the floating-point Inf propagation, since that's automatic).
As for the performance penalty, it would only take a very quick look at disassembled source code to see how much overhead exceptions take. I'll do some napkin math for you: a floating point division is about 4 bytes, and that's the baseline we're going to compare against. Also, we're going to ignore the instructions to load and store the results, since those are going to be common to either implementation. To add the check, intrinsic call, and jmp, we would add 13 bytes for the checks and jumps, and another 8 bytes for the intrinsic call (in the best case, and with optimizing the jumps), for a total of 21 bytes.
Now, if we also assume that the user is writing "correct" code which handles errors, then in the current case, the user writes an if statement, which takes up about 12 bytes in the optimized case (about 20 in the unoptimized case). In the exception case, the try/catch block takes up 38 bytes in the optimized case.
Tallying things up on the back of our napkin, the current case takes up 16 bytes, and the exception case takes up 59 bytes. So roughly four times larger code to accomplish exactly the same thing. This, of course, doesn't take into account the fact that calling an intrinsic is a heck of a lot more expensive than a simple branch just in terms of execution time (so it's not just a disk space and memory issue, it's also a performance issue).
I've been pondering the pragma thought a bit more, and wanted to get it out on paper, so to speak. I think I was a bit harsh on that idea without actually explaining myself (sorry!).
Pragmas are used to change the way the compiler generates code, so you can modify the compiler's behavior in some special cases. The idea of having a pragma for asking the compiler to generate divide by zero errors is a reasonable one. However, I'm against the idea for a few reasons. The first reason is a bit silly: we don't have any pragmas that turn ON extra checks, only turn OFF existing ones. This would be the first. That's not a terribly good reason to be against the idea, but it segues into my second problem nicely. Anyone who is consciously adding a pragma to turn *on* divide by zero exceptions is already actively thinking about problem of division by zero. That pretty much defeats the purpose of the exception then -- they're expecting problems. It's just as easy, and more efficient (in terms of machine code) to simply validate your input at that point. Heck, you can even raise your own exception in that case, just as easily. ;-) The person who needs the exception to be on by default is the person who's *not* actively thinking about division by zero, and they want it as a not-so-gentle nudge. However, as I've already pointed out, that's simply impossible to do without turning back the hands of time (or coming up with other operators for division). So I don't feel the pragma is really an appropriate solution to the problem.
Actually in this case it would make sense (and i would prefer actually) to have a preference where you can turn it on globally for the app for divide by 0 and overflow/underflow checking and turn it off locally for speed with the PRAGMA. If the global switch is off then that PRAGMA is a no-op.
The switch would be off by default for backwards compatibility of course.
- Karen
Dividing by zero is not an exceptional condition. The operation is well understood and the result is unambiguous. A positive number divided by zero equals INF, and a negative number divided by zero equals -INF. An exception would only be appropriate if there were no unambiguous result, as is the case with a nil object reference or an array-bounds exception.
The result of a conversion from float to integer when the value is INF or -INF is less clear. The behavior I intended was to convert INF to the largest positive value and -INF to the largest negative value; if the compiled code does something different, that is either a bug in the compiler or a place where RS has changed the definition of the language since I left. This is, however, a distinctly arguable choice, and I can see a good case for the idea that converting INF or NaN to integer ought not to be allowed.
This would be something unprecedented in REALbasic, however, which has always allowed any numeric value to be assigned to any numeric type without complaint; it simply does the best job it can at representing the value in the target type.
@Mars -- Actually, the behavior you intended wasn't as you stated, judging by your comments in the code. ;-) It's neither a bug in the compiler, nor is it a behavior change. The integer division by zero operation has always yielded undefined results on both x86 and PPC, and the float to integer conversion where the float holds NaN or Inf is likewise undefined. Both of these are by design (both from my understanding, and from the comments in the code).
But you are right that division by zero is not an exceptional situation. :-)
Hi Mars,
Unprecedented is not necessarily bad (with a switch to to turn checking on and off globally and pragma to turn it off locally).
Division by zero is rarely if what a programer intends with reals or integers. Doing math on NAN Or INF certainly is not. By that reasoning those ARE exceptional conditions... Quite frankly I would rather the app crash than just chug along merrily producing garbage results.
The current behavior allows subtle and difficult to find bugs in complex numerical calculations because o an operation the 99.9999% of the time the programmer really did not want to do.
If RB is supposed help us find our mistakes as best it can, this is one area it could do more.
- Karen
@Karen -- unprecedented without cause is bad. I think the issue here is that you're assuming Inf and NaN aren't valid results, when they actually are. You view them as bugs, and we view them as perfectly valid numeric results. By turning them into exceptions to make your life easier, you're forcing overhead and extra work onto people who want to use the datatypes as they were intended to be used and making their lives harder. That may seem fine and dandy to you, but it doesn't to me. If you follow your feature through to its logical conclusion, you'll see some pretty heinous problems with it (aside from those I've already pointed out to you): let's say you *want* Inf stored in your result -- how would that work? The naive approach is entirely wrong:
Can you spot the bug? The problem is that bar / baz is what throws the exception, so the assignment never happens. Now what? Now foo doesn't hold Inf, even though that's the result you wanted. So what do you put into the catch block? Well, unless we come up with a constant that holds Inf (Nan, et al), there's nothing you can put there since it would throw the exception again. So now we're adding constants for all of the sentinel values. All this just so that you can do your data validation with a try/catch block (or unhandled exception) instead of with an if block (or incorrect results), slow down your application performance, and increase your application size. And, since RB hasn't worked this way in the past 15 years, possibly break functioning applications.
For my two cents, I'd have to agree with the exception-as-a-default-option crowd on integer calculations. On FPU, I have no opinion. I consider myself an amateur and not academically-inclined when it comes to programming, even though I've been doing recreational programing for a quarter century.
Like people have said, on other platforms the undefined INTEGER division crashes. I know this and everyone else who programs seriously knows it. You can't divide by zero. Period.
The problem is that computers regularly generate unanticipated mathematical chains, in which bad data may cause a zero to slip in somewhere and propogate along. Like others, I would rather that be an "exceptional" case, since that's not what I wanted, nor what I'd normally expect. I'd rather the program hard crash (or have the abiltiy to tell me I oopsed) than rely on a bogus value midway through my calculations. I take my exceptions seriously, and one hit on that code would cause me to do a new round of debugging, finding and eliminating the error. Finding a tweaked value may be more difficult.
@Aaron - While I think I understand the technical problems that you describe, your defensive programming solution, to me (the amateur), flies in the face of all of the reasons I've chosen RB. I like RB because it's RAD. I don't always want to do bounds checking or nil checking or divzero checking when slapping something together. With the philosophy that if I don't check it's MY fault, I'd respond that there's no need for garbage collection either, since we should all release memory when we're done. Many of these types of tools were provided by RS as a protective backstop and to ease the programmer's work.
To me, to say that programs would break if you institute checking is wrong to begin with because if a person is using undefined values as true results, the program is broken already. Also, if the program does break, the preference / pragma could always be utilized for that application, effectively undoing the feature.
To say that current practice is the "Right" answer is a philosophical conclusion however, and not necessarily a pragmatic or IMHO reasonable one. How can one say that bad integer results are correct, especially after doing another operation to it, changing it to a valid, but incorrect, value? I have tried for the past week (during the NUG wars) to wrap my brain around ONE valid example of when a programmer would want integer division to return a nonsense value and allow the variable's re-use and change. I haven't found one, and so far, nobody in the other camp has presented one. I'm all ears if someone else has an idea though.
One thing to keep in mind is that IF you worry about division by zero at all then you are already aware that a problem MIGHT exist and probably code defensively to deal with it.
So a pragma doesn't necessarily help in that case as you likely already have added code to check for 0 as a divisor.
Where it would help to throw an exception is in the case where you never considered the problem.
And ALL of those cases would suddenly either break OR require every use to be handled in a try /catch block. Considering how much code this might affect is a big deal.
Now maybe an exception is not the right way to deal with this and there may be some other means like maybe some kind of "System.MathFlags" or something. Again the trick with this is that IF you care about the div by zero condition enough to check these flags maybe just check for 0 as a divisor ?
Where something like this might help is in the underflow / overflow cases where they can result in a perfectly valid result that has exceeded the size of the type you're using. That you have no way to code for or test for.
ie/ try
dim i as Uint8
i = 255
i = i * 4
The result of i * 4 overflows the value that a Uint8 can hold but there's no way to tell
So, I guess that makes me opposed to an exception for the div by zero but SOME means to detect special conditions like over flow & under flow
Norm,
If you remember to check the flag then you would remember to check for division by zero... which you can code right now as Aaron pointed out.
This is for when you DON'T remember to do it (or are using someone else's code and THEY did not think to do it) Thsi woudl catch logic errors and/or cases not considered in the logic.
Well I guess I'll have to not use RB code to run automated trains or air traffic control systems and such! ;)
Seriously, correct numerical results can be very important for a lot of things and anything that can help prevent bogus numerical results is a good thing. Subtle hard to find situations that can cause such errors are bad things.
- Karen
Overflow and underflow you have NO way to detect today
Div by zero you do. Check for a 0 divisor before you do the division.
Hence why I said we might need some way to detect overflow & underflow conditions (not necessarily an exception) but I'd not be in favor of an exception for div by zero since if you already worry about it you already check for a 0 before doing a division.
There are a variety of languages that treat div by 0 differently
Check how C / C++ treat it vs how Java and Ada do
Even VB and VB.Net do it differently from what I gather
VB even handles it differently if you use / or \
I understand that division by zero gives a valid mathematical result.
I also understand that Float does have a way to represent this (positive or negative INF). What seems to be a problem is that when you divide by zero and your target is an integer (or you convert the INF float to an integer), you end up with undefined, but undefined (currently) means you do get a numeric result in your integer. Since this is a valid number, you have no way of checking if this was the result of undefined or not, and you can end up doing calculations with invalid results.
It is of course correct that it is best to test for the divisor being 0 before doing the calculation, as compared to catching the exception, but I think it leads to problems for inexperienced programmers (the ones that might choose to use a Basic language to begin with).
The resulting problem is as follows:
Way of writing the calculation as I think a normal user would:
result=a/((b*c)-d+e)
Now what is suggested is that they never write this, because it is unreliable and that they write the following instead:
divisor=(b*c)-d+e
if divisor=0 then
result=a/divisor
else
//handle problem
end if
I think that it is reasonable for an inexperienced programmer to expect to be able to write a calculation the way I wrote it here, and get warned about it if it goes wrong. Being able to write code like this seems to me to be what Basic is all about (maybe that's just me though).
Of course it also means that if the divide by zero causes an exception, which it didn't in the past, and that exception isn't caught, it will lead to a crash in all software written in the past that has a divide by zero situation. This exception would never get caught, since it didn't exist in the past. Also, as was rightly pointed out, if one is going to test for an exception, you mighty as well test the divisor, since that is the better solution.
I guess the question that remains, is if we should be able to convert INF to an integer, if the integer target has no defined way to handle INF, and hence results in a valid number that isn't INF.
Of course if we can divide by zero for floats, but not for integers, we would have an inconsistency in the language, which would be confusing for inexperienced programmers as well.
Maybe changing the way the language works is indeed not the way to go considering the above.
Wouldn't it be possible to have a language addition as follows?
iResult=ParseFormula("a/((b*c)-d+e)",a)
This would return the following possible results:
0: result of the calculation has been put in a, no problems
1: result of the calculation has been put in a, undefined
2: result of the calculation has been put in a, overflow
...
Since this would be an addition to the language, it wouldn't break code. It would also provide Realbasic with an Eval type function that other languages have.
Sorry for thinking out loud ;)
As a math major, I must correct a mistake in this thread.
When a non-zero number is divided by zero, the result is + or - infinity unless the numerator is also zero. That's true for integers as well as for other real numbers. [Integers were defined long before computers were invented.] So it's just plain wrong to call the result of an integer division by zero "undefined" unless the numerator is also zero.
When zero is divided by zero, the result is "undefined". However, there are cases involving functions f(x) and g(x), which may both be equal to zero for some value of x, yet f(x)/g(x) may be equal to some ordinary real number, not "undefined".
So what should programming languages do when they encounter a division by zero? Since the floating point format includes a representation of 'infinity', they should deliver that result result. But integer data types don't include a representation of 'infinity'. Some folks think it's OK to set the result at the largest positive (or negative) number that can be represented by the particular integer data type. Some even endorse setting the result to some other arbitrary value! Sorry, but mathematically, that's just not correct. The only correct result is some sort of error.
In this thread the misnamed 'computer' programming experts seem to have reached a consensus that it's better to deliver incorrect results for the division by zero operation than to deal with the challenge of figuring out how to efficiently generate only correct results for division calculations. Maybe we should quit calling these machines 'computers' since the experts refuse to try to get them to generate correct results for simple calculations.
Norm.
In a sense I agree...
IMO INF is is an overflow and - INF is an underflow as these are not valid real numbers - they are concepts that don't translate to a concrete number that should be used in calculations. Anything that causes them should be (optionally) caught... Division by 0 is just the most common case .
Getting valid concrete numbers is what computing numerical values is all about.
- Karen
@Ron -- you are right in that I'm being lenient with the term "undefined." I wasn't speaking with my math hat on, but with my compiler architect hat. You are correct that mathematically, the results are only undefined in the 0/0 case (which is NaN), but otherwise have a valid definition (+/- Inf).
I was speaking about what results the user can expect from the compiler when I said "undefined." This simply means "you can expect the results to be any valid integer value -- no one value is preferred over another."
@Ron - Exactly right.
@Norm - Yes it is equally important to detect and report overflows, underflows, and other runtime errors too. And of course you can precheck for overflows too using RB code just like you can for DivByZero, although the overflow check is more involved.
And it's even easier for a user to check for a nil reference before using it. Why does RB provide a check for that? To help protect us from ourselves. :-)
@Dirk -- After giving it not nearly enough thought, you might be able to write your own Eval function like that using RBScript. Though it would probably be a decent amount of work.. hmm..
@Joe Huber -- under/overflow checking would be an even larger mess than div by zero because of the frequency. At least with div by zero, you only have to watch out for /, \ and mod operators. Over/underflow affects much, much more. All math operators, parameter passing, assignment are all affected if you want reasonable coverage.
Let's look at size/speed issues. Check out the following simple math expression: x = 12 * 5 + 4. Without over/underflow checking, that would equate to roughly this in x86:
mov reg1, 12
mov reg2, 5
mul reg1, reg2
mov reg2, 4
add reg1, reg2
mov x, reg1
So about six instructions, totaling about 20 bytes. Now with under/overflow checking which just sets a flag (exceptions would make this code even larger), in the best case:
mov reg1, 12
or someFlag, EFLAGS // If we've overflowed, this sets our stick bit
mov reg2, 5
or someFlag, EFLAGS
mul reg1, reg2
or someFlag, EFLAGS
mov reg2, 4
or someFlag, EFLAGS
add reg1, reg2
or someFlag, EFLAGS
mov x, reg1
or someFlag, EFLAGS
You'll quickly notice that this code is more complex. Instead of taking 6 instructions, it now takes 12 instructions. Also, instead of being about 20 bytes, it's now about 32 bytes. So it's going to be roughly twice as slow, and one and a half times larger. If exceptions are your preference, you can replace each call to or someFlag, EFLAGS with this instead:
cmp EFLAGS, 0 // Roughly -- see if over/underflow happened
jnz +X // Skip over the exception raising
call RaiseOverflowException // Raise the exception
In the usual case where no overflow happens, the code is still twice as slow, but now it's 92 bytes instead of 20 bytes -- so roughly 4.5 times larger.
Extrapolate this a bit -- think about just how many times you pass parameters to functions, or make assignment statements, or do addition, or use for loops, etc. Each one of those things now requires *twice* as many instructions *in the best case*, just to do under/overflow checking with a sticky bit (god forbid should it do exceptions!).
Perhaps a pragma would make sense in this case though, simply because the checks are more difficult to do yourself. But there's no way we could make the default be on due to size/speed concerns, as well as the major point that it'd break projects (assuming we were talking about an exception instead of a flag to check). More thought is needed about the API and the ramifications.
@All -- I know it's been said that I come across as being pig-headed on some topics, but do keep in mind that it's my job to think about what's the best balance to strike between all of the different pressures. It's my job to determine whether the benefits outweigh the costs, and for how many people. It's my job to determine what is most in-line with the REALbasic design philosophy. And it's my job to worry about breaking people's projects. It's very easy to be an armchair quarterback and say "this is the proper way to do it", but it's much harder when you're actually the quarterback. ;-) I'd much rather be called pig-headed and move slowly, than to attempt to please everyone by doing what various groups feel is "right." At the time when I give in instead of being obstinate, is the time when REALbasic becomes a bloated, difficult to use product, like Visual Studio or XCode. So please don't mistake my intentions as being a sign of "not invented here" syndrome; my intentions are "make the best product possible."
Hi Aaron
I appreciate the frank comments. My goal is not to bloat RB either. But I do think it's worthwhile to consider a pragma that OPTIONALLY puts correctness and lack of ambiguity before raw performance.
The performance bottleneck is often not where I thought it would be. So I code for correctness and robust error reporting first, and then optimize for performance as needed. Much of what I do entails calling system functions, and so I'm less concerned about optimizing my RB code speed. I realize that others have significantly different needs.
I also would like to see an exception for text operations on strings with an undefined encoding, since by definition they are impossible. Currently RB guesses what the encoding might be and blithely performs the text operation. Joe Strout thought there was a central routine that guessed these encodings and might be reasonable place to raise an OPTIONAL exception.
I track exceptions very carefully in my apps and report them to my central server along with the corresponding stack trace and log file. And then my apps keep on running from the main event loop.
I would love to have broader runtime exception coverage for anything that's reasonably detectable by the RB runtime.
Aaron,
I have always understood your motives, so while we rarely have agreed on possible RB features, I never saw it as "pigheadedness" for "pigheadedness" sake! ;)
BTW the last (and only time thank goodness) that I did assembler was for a course back in the mid 70's in college ... so I'm not going to try and think about the details of your last post! ;)
- Karen
To paraphrase a famous commercial...
Cost of adding overflow checks: 12 bytes
Cost of adding exceptions: 72 bytes
Cost of correctness: Priceless
@Joe Huber -- The pragma is worth considering for over and underflow, but I disagree about it for div by zero. The reason is: over/underflow is hard to test for properly, but div by zero is a line of code. So if you're thinking about turning a pragma on for div by zero, you've already "won the battle" and are thinking about the problem. So just check the divisor. It's true that it could be added to the compiler, but that takes time and effort that could be better spent elsewhere for so easy a check. Now, with over/underflow, you still need to remember to turn the pragma on -- but the checks are much harder for you to do. So in that case, it might make more sense for me to implement it as a pragma. Maybe.
As for the strings without encodings, I disagree. ;-) There's an unfortunate nomenclature and fuzzy APIs in REALbasic with strings. Strings aren't always text -- quite often they're a bucket of bytes, and an encoding doesn't make sense. Of course, if the string is holding text, then an encoding makes more sense. If I had my druthers, I'd like to split string in half -- make a base type called Data or BucketOfBytes, and make a sub type called Text. The difference between the two is that Text has a required encoding, so we can throw exceptions if you try to operate on encodingless Text. But alas, that's a difficult problem to solve without breaking code. :-/
Hi Aaron
Not surprised that you have an alternate view. :-) Two points...
1. Nil Objects would require a one line check, and yet they already warrant an exception. The point of the DivByZero discussion is what should happen if you miss adding that one line check or the denominator becomes zero in a highly unlikely place due to some side effect of another calculation. Some of us would like a notification like the one we get for NOEs.
2. Yes the RB string in multipurpose and I'd be happy if you were to separate them. However the current architecture has many clearly defined places where RB is clearly treating a string as text, such as any of the non-B versions of the string functions. I'd request an exception for TEXT operations on encodingless strings. I've discussed this for years with Joe Strout and he finally agrees that such an exception is warranted.
"I agree, if you treat a string asĀ undefined encoding as text, it's an error and should be signalled as such. If I recall correctly, there already is a routine that gets called to guess the encoding of a string that doesn't have one, when you need to treat it as text (and it just returns Encodings.SystemDefault). So I think it wouldn't be too hard to make this change."
Non-B versions do not necessarily mean TEXT though. You can use them on a bucket of bytes just as well as the B versions. The only time you get any difference is when you do have a mutlibyte encoding in use like UTF-16 or use some non-ASCII characters that is not encoded in a single byte.
If all you use is ASCII len and LENB return identical results as do all the other B and non-B versions
The B versions DO explicitly means bytes though where you probably intend to treat the string as "not text"
The trick is knowing in the existing routines when you mean "text" and when you mean "bucket of bytes"
Nilobjects are in a whole different league than div by zero and overflow. Dereferencing a zero pointer leads to severe problems, whereas overflow merely wraps a value around. You have an incorrect result, but at least you haven't overwritten memory.
Tim
@Joe Huber -- NilObject checks shore up security holes (there are several papers out there discussing this very flaw). DivByZero can never be a security hole. They're a different class of issues. What's more, dereferencing a nil pointer is *never* valid. Dividing by zero can be valid, even if the results aren't what you want them to be.
Also, as others have pointed out, you cannot simply assume that the non-B functions mean "encoding isn't optional." By adding an exception, we'd break existing code that does work. I can think of several pieces of code I've written that would break off the top of my head! Now, if we were starting over from scratch, it'd be a different discussion altogether (especially since there'd be a more linguistic distinction between the two concepts)...
IMHO Non-B versions of the string functions MUST know the encoding to function. Without an encoding it is meaningless to consider the bytes as text characters.
Anyone have an example of a valid use of a text operation on an encodingless string? I certainly can't think of one. The closest I could come would be to ASSUME that the encoding was ASCII or some other single byte encoding, and to me that's just not a valid assumption.
Anyone with an example?
@Joe Huber -- The assumption that you're dealing with a single-byte encoding is still a valid one if you control both sides of a file or network connection. While it might not be the best practice, it's still valid.
Also, consider older code, written before encoding support. That would be another example of code that is functional and would break if exceptions were thrown now but not before.
And in fact you;'d be dead wrong
Non-B versions do NOT require an encodind to do their bit
they all work fine on strings with no encoding
try this
s has no encoding and all this code works just fine
dim mb as new memoryblock ( 10 )
mb.byte (0) = asc("a")
mb.byte (1) = asc("b")
mb.byte (2) = asc("c")
mb.byte (3) = asc("d")
mb.byte (4) = asc("e")
mb.byte (5) = asc("f")
mb.byte (6) = asc("g")
mb.byte (7) = asc("h")
mb.byte (8) = asc("i")
mb.byte (9) = asc("j")
dim s as string = mb.StringValue(0,10)
msgbox "len of s is " + str(len(s) )
s = s + "123"
msgbox s
It happens that these are ASCII characters but even if you shove in bytes > 127 it still works (well maybe the msgbox doesn't)
Bytes are bytes as far as these functions are concerned
THe nice thing is they ARE encoding aware and deal with encodings properly IF defined
But they do not require them
Changing this behavior would break a lot of code
Apparently may people here misunderstand how RB's text encoding works.
A string MUST have an encoding for the bytes to be considered as text characters. If a string doesn't have an encoding then RB uses the SystemDefault encoding from the locally running OS. This has significant implications because your string may be treated as ASCII on a US system, but the same string will be treated as multibyte characters on a Japanese system.
Here's a simple example of exactly this problem that bit me recently. As part of my licensing scheme I mistakenly used LEN on a string that didn't have an encoding set. Everything worked great for many years and I considered that code hardened. Then one day a Japanese customer claims that the license key I sent him didn't work. I tried it on my system and it worked fine. After many rounds of debugging it turned out that LEN on his system gave a different result than LEN on my system, using a string of exactly the same bytes.
The issue was that the string had no encoding and was being treated as Japanese text on his system and English text on my system.
So without a proper encoding set, you cannot even count on LEN working correctly.
I am sure that there are MANY errors like this waiting to happen in code that people think works properly. And that's exactly why an exception on text operations on encodingless strings is so important.
Norman
You gave the perfect example of why we DO need a runtime check for text operations on encodingless strings. Without an encoding specified, LEN will use the SystemDefault encoding which will vary based on the runtime user's OS.
Your test will appear to work OK on an English system. But it is very likely to fail on any systems that use a multibyte OS.
Same code, different answer. Not good.
@ Anonymous By definition an encoding is required to treat bytes as text characters. If you don't specify one, RB will guess for you every time it needs to treat bytes as text.
It is simply not safe to assume what RB will do with an encodingless string unless you also control how it guesses the encoding. And currently its guess varies based on the OS your app is running on.
Just some examples to show the problems:
Formula Should show Shows
--------------------------- -------------------- --------------------
dim i8 As Int8 = 2^8-1 overflow -1
dim i8 As Int8 = -2^8-1 underflow -1
dim ui8 As Uint8 = 2^8-1 255 255
dim ui8 As Uint8 = -2^8-1 invalid negative assignment 255
dim i16 As Int16 = 2^16-1 overflow -1
dim i16 As Int16 = -2^16-1 underflow -1
dim ui16 As Uint16 = 2^16-1 65535 65535
dim ui16 As Uint16 = -2^16-1 invalid negative assignment 65535
dim i32 As int32 = 2^32-1 overflow -2147483648
dim ui32 As Uint32 = 2^32-1 4294967295 2147483648
dim i64 As int64 = 2^64-1 overflow -9223372036854775808
dim ui64 As Uint64 = 2^64-1 18446744073709551615 9223372036854775808
dim s As Single=1.0/1.0e-39 underflow 1.#INF00e+
dim s As Single=1.0*1.0e+39 overflow 1.#INF00e+
dim s As Single=1.0/0.0 divbyzero 1.#INF00e+
dim d As double=1.0/1.0e-309 underflow 1.#INF00e+
dim d As double=1.0*1.0e-309 overflow 1.#INF00e+
dim d As double=1.0/0.0 divbyzero 1.#INF00e+
Strange re-appearing bug with uint32 and uint64 not able to show the maximum value, change of signs while in over- or underflow.
The single and double datatype work OK although i preferred to a distinguished value for under-/underflow and divide by zero.
Regards, Andre
Overflow and underflow are harder to detect
But div by zero is pretty easy for integer and floating point types
define a new subclass of Runtime Exception
dim d1 as Double
d1 = 1/0
if str(d1) = "INF" or str(d1) = "-INF" or left(str(d1),3) = "NAN" or left(str(d1),4) = "-NAN" then raise new DivByZeroException
d1 = -1/0
if str(d1) = "INF" or str(d1) = "-INF" or left(str(d1),3) = "NAN" or left(str(d1),4) = "-NAN" then raise new DivByZeroException
d1 = 1/-0
if str(d1) = "INF" or str(d1) = "-INF" or left(str(d1),3) = "NAN" or left(str(d1),4) = "-NAN" then raise new DivByZeroException
d1 = -1/-0
if str(d1) = "INF" or str(d1) = "-INF" or left(str(d1),3) = "NAN" or left(str(d1),4) = "-NAN" then raise new DivByZeroException
// for integers it's simpler
dim i1 as integer
dim i2 as integer
dim i3 as integer
if i2 = 0 then
raise new DivByZeroException
else
i3 = i1/ i2
end if
I have not read all comments as I'm currently on the playa with not much time to hang out at a computer...
My thoughts:
1. There should be some option for programmers to tell the compiler to get exceptions raised for both div-by-zero and for overflows between value conversions. The latter would solve the problem where a INF float value would be assigned to an integer: it's out of range and could be flagged by an exception.
2. The harder question is for how to provide that option. I don't like pragmas used for this. It's awkward. Plus, my personal preference would be to have my app be _safe_ first - then tell the compiler where it can leave out checks once I've verified the code to be save against such exceptions (this is how it's done already with bounds checking, for instance).
The remaining problem is how to deal with _old_ code. My suggestion: Can't older code be detected by the IDE when the code gets opened, and then be tagged accordingly, maybe using the new Dialog telling us about those changes to constructor naming, and mark such code to be _not_ using those new checks by default, while all new code gets those checks?
I hope someone else has already suggested the same :)
No one mentioned:
Dim i As integer=1/0
If i>i then msgBox "nan!"
Thats has to be faster than the str(i) approach.
I even extended the Vector3D class to have a nan check
Function NAN(Extends V As Vector3D) As boolean
Return V.X>V.X or V.Y>V.Y or V.Z>V.Z
End Function
Simple fast and short. Really saved me when the camera position was going to NAN!
Its often easier to simply do all the math, then it it's inf or nan, set it to what you want in that case
//math here
if i>i then i=-1//i was nan, so lets set it to -1
Really,
d1 = 1/0
if str(d1) = "INF" or str(d1) = "-INF" or left(str(d1),3) = "NAN" or left(str(d1),4) = "-NAN" then raise new DivByZeroException
should be:
d1 = 1/0
if d1>d1 then raise new DivByZeroException //Shorter faster better?
is using i>i documented somewhere? I don't even know where to look. I use it all the time.
Looks like the formatting here stripped all the not equal bracket sets down to just >. I can't even figure out how to put them in now, so in my last post replace each > with "Less than bracket, greater than bracket", meaning not equal to as in "If i not equal to i" or "If not i=i then"
@Craig -- yeah, the blogging software seems to think everyone loves to use HTML entities instead of actual ASCII values. Sorry!
As for using i <> i being documented, I am not certain that it is. However, it is something you can rely on for floating-point values, because the comparison operator will always return false for NaN. But, to correct something: 1/0 isn't NaN. I think you meant 0/0. Also, there are ANSI C functions for isnan and isfinite to test whether a number is NaN or is Inf.