So I was fixing some bugs today (you know, almost all my really good stories start out this way)...
I was looking into the reports I verified over the weekend that dealt with threads. I started out by peeking at the CPU tanking on OS X one since that seemed really mysterious. It turns out to be a rather mundane little detail with how events work and how we interact with the event pump.
Normally, if you have more than one thread running, we tell the event pump that it can't sleep. This way we can give more time to the thread scheduler so that it can figure out who's going to be running and when. This usually isn't an issue though since the scheduler will also give some time back to the OS to ensure that we play nicely.
When you put a thread to sleep, we take note of the min sleep time of all the threads (the min of the min, so to speak) and let the event pump sleep for that long (could be less though). Why bother having the scheduler do all that work when you already know up front that we can do some yielding if we need to.
However, it turns out that when you put all the threads but one in a suspended state (such as the case with a thread locked on a semaphore while the main thread does its thing), then the min sleep time is 0, and so no sleep happens. But the scheduler doesn't yield time back because there are active threads doing their thing still. So the CPU pretty much melts.
Very easy to fix, and wasn't too hard to track down. However! It got me to thinking of other cases where interactions could get strange, and it led me to a dire warning for everyone.
Be careful when you use locking mechanisms.
You see, there's no way for you to directly manipulate the main thread. You can't set its priority, you can't call Sleep on it, you can't call Suspend on it, etc. This has the nice effect of saving you from accidentally blowing your application up in many colorful ways. However, it also means that you tend to think these things can never happen. But they can.
Using a locking mechanism (like semaphore or critical section), you can manage to get into a state where all the threads in your application are suspended. Basically, calling Signal (or Enter) improperly will cause very bad things to happen. In the current version, that "bad thing" is a crash. We overflow the stack in the thread scheduler while trying to figure out what thread to run (because there ARE no threads to run!).
Which brings up a very interesting quandry. How do you alert the user that they've gotten into the state where their application isn't really running anymore?
Turns out that the only proper answer is "with a modal dialog." Because modal dialogs have their own event pump (essentially, using DoEvents internally), we can display a modal dialog that you can interact with, even though the rest of your app is totally locked and hosed. So a future version will nicely display an assertion dialog with a friendly message before imploding the application.
While doing all these random bug fixes for the Mac, I think I may have accidentally fixed the double Kill crashing thing on Windows. It got too late to really test it hard though, so I may just be talking out of my rear... but preliminary tests seem to show that the bug just disappeared. So that seems rather strange to me as well. Take a wild guess what I'll spend part of tomorrow looking futher into. ;-)
Aren't threads fun?
I disagree. Most of the really good stories I've heard lately start off, "So, I was driving up my mountain yesterday......"
TL and I used that ALL WEEKEND LONG. Too funny.
>
When you put a thread to sleep, we take note of the min sleep time of all the threads (the min of the min, so to speak) and let the event pump sleep for that long (could be less though). Why bother having the scheduler do all that work when you already know up front that we can do some yielding if we need to.
>
Sorry for copying the whole paragraph, but I lost perspective ("You" or "we"..."do all that work when you?" Who?)
> So a future version will nicely display an assertion dialog with a friendly message before imploding the application.
What does it do for ConsoleApplications?
Looking forward to more information on: Improper Semaphore and Critical Sections - calling Signal wrong, and the like.
This is very near and dear to my world. I have an RB2005v2 app that runs fine on windows, and on 2005v4 it pegs the CPU after a certain Suspend call. The world is stumped. I haven't tried 2006 yet. ...and another case on OSX 10.3 (only) ServiceApp where the whole app is suspended, but the world still expects this is an obscure app and OSX thing.
> I think I may have accidentally fixed the double Kill crashing thing on Windows
Ack! Got an RB report #?
1) Let me retry that paragraph. "When a thread is sleeping, the event pump takes no of the min sleep time of all the threads (the min of the min, so to speak) and lets the OS event pump sleep for that long (could be less though). Why bother having the scheduler do all that work when it's already known up front that the yielding can be done if needed.
2) Interesting question! It prints out to the console. This works because it'll print and flush the buffers. Rremember, *something* is running (the framework), otherwise you wouldn't even get the assertion dialog in a GUI app. The main trouble is figuring out what's the proper way to tell the programmer that none of their code can execute now. Since none of their code can execute....
3) This bug wouldn't happen on Windows at all, and I was sure to test all the code changes on Windows as well. The original code never pegged the CPU, and the new code doesn't either.
4) As a matter of fact, I do.
So I was creating some bugs today... :)
There have been two aspects of REALbasic programming that I've tended to avoid, mostly out of hesitation to tread uncharted waters -- threading and networking. Networking is less necessary for the type of programming I do, but threading could be helpful during long calculations. By the way, it's not RB per se -- I avoid them in Java and Cocoa too. I've read about both for a long time, including a few books on threading in Java, but when I actually sit down at the computer, I avoid them again. I suppose I need a good "kick in the pants" reason to add threading to one of my programs and force myself to just do it.
Those two topics are kind of scary for people to approach, but to be honest, they're also very easy concepts in REALbasic. RB protects you from being able to shoot yourself in the foot with both those concepts. Because our threads are cooperative, you don't have to worry very much about data protection (only in certain cirumstances do you really need a semaphore or critical section). And because RB does all the data buffering, you don't have to worry about buffer overflow security issues with networking code.
I'd highly recommend you check out Networking for N00bs and the two threading articles at RBLibrary.com. They both go a long ways towards explaining the concepts in an easy to grok fashion.
Thanks!
You're quite welcome. I tend to personify my explanations a little too much some days. ;-)
Just so I am clear, is this a problem for all platforms or just OSX?
Thanks, John
The suspended threads chewing up CPU is Mac-only and doesn't produce "incorrect" behavior in that the data is still protected properly. It just makes people's CPUs run hot.
The kill threads multiple times is a Win32-only bug and it doesn't behave correctly since it crashes the app. But only the second time you go to kill a thread for some reason. And there's a workaround (don't use Kill -- just terminate the thread the old-fashioned way, like your code would do pre-2006r1).