(Note, all of the information presented here is just what I've happened to glean from other sources, so YMMV.)
When looking into the recent spate of stack-related questions about REALbasic, I realized that my understanding of how Windows deals with stack space was rather weak. Of course, at the high level, all call stacks are basically handled in the same fashion. The stack starts at an address, grows downward, and when you run out of it, bad things happen. The part I was confused by was: why were we getting stack exceptions from the OS that our own REALbasic stack overflow checker wasn't able to catch.
On Windows, the stack size of your application's main thread is determined by two fields in the PE32 file header: SizeOfStackReserve and SizeOfStackCommit. The stack "reserve" is the maximum amount of stack space that Windows will guarantee your application can use. It is possible that you can use more stack space than the reserve, but the OS is making no promises. The stack "commit" is the initial amount of stack space the OS will commit for you when your application launches. Once you've reached your commit limit, the OS will commit another page of memory (assuming you're still under the reserve). By doing a scheme like this, Windows can keep your actual memory requirements low, but still have the ability to meet your application needs.
These two PE32 fields are controlled by the linker, and they default to 1MB of reserved space and 4k of commit size for non-embedded system applications. However, you can set your own values by using the /STACK command line switch and specifying the number of bytes. So, for instance, if you want 5 MB of reserved space, and an initial commit of 1MB, you could specify: /STACK:5242880,1048576 The reserve size comes first, and optionally, the commit size comes second.
I've alluded to the way the stack works, but it may help to use a contrived example of how it actually functions. Sticking with the default settings, when your application launches, your current stack will look something like this:
(1)
BaseAddress: 00030000
RegionSize: 000fd000
State: 00002000 MEM_RESERVE
Type: 00020000 MEM_PRIVATE
(2)
BaseAddress: 0012d000
RegionSize: 00001000
State: 00001000 MEM_COMMIT
Protect: 00000104 PAGE_READWRITE + PAGE_GUARD
Type: 00020000 MEM_PRIVATE
(3)
BaseAddress: 0012e000
RegionSize: 00002000
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 00020000 MEM_PRIVATE
I numbered the different pieces of information so that we could talk about them a bit easier (and if you're wondering where I got the output from, it's a !vadump from WinDbg). Since stacks grown downward, we're going to start at (3) instead of (1). This is the base of your stack, where the frames will start being allocated from. You can see that it's committed memory, and it's available for reading and writing. It's also 8k in size, which is larger than what we'd expect the default to be (since the default commit size is supposed to be 4k, not 8k). I've not been able to find any explanation of why there's a size discrepancy, but I can only assume that the default has changed.
The way the stack works is that your application will allocate stack frames, and eventually you run out of the 8k of committed memory. At this point, you hit (2), which you will notice is marked as read/write as well as guarded. That means that your attempt to read or write to that memory will cause a guard page fault that the OS traps. When this happens, the OS will merge sections (1) and (2) together, remove the guard flag, and allow the read/write operation to continue. Then it will set up another guard page so the process can repeat. So here's what the stack would look like after we hit the guard page:
(1)
BaseAddress: 00030000
RegionSize: 000fc000 <--- fd changed to fc
State: 00002000 MEM_RESERVE
Type: 00020000 MEM_PRIVATE
(2)
BaseAddress: 0012c000 <---- 12d changed to 12c
RegionSize: 00001000
State: 00001000 MEM_COMMIT
Protect: 00000104 PAGE_READWRITE + PAGE_GUARD
Type: 00020000 MEM_PRIVATE
(3)
BaseAddress: 0012e000
RegionSize: 00003000 <--- 2000 changed to 3000
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 00020000 MEM_PRIVATE
I've marked the areas that have changed, but basically the difference is that we've now got a larger committed stack area. We can repeat this process until we reach the memory referenced by (1), which is the point at which you've overflowed your stack (with the assumption that the OS cannot grab more space than what you've reserved).
So to recap what I've covered so far: your stack starts at a base address, and it grows by hitting guard pages. Once a guard page faults, the OS will commit that page, and set a new guard up. In this fashion, your stack is able to grow until it reaches the reserve limit. At that point, the OS can no longer guarantee that it will be able to grow the stack (but it's possible that it can, that's situational and shouldn't really be relied upon), and you'll get an OS stack overflow exception.
Now I'd like to switch gears a little bit and look at how the Windows stack model runs into problems with REALbasic applications. This isn't a problem specific to REALbasic as it can happen to any application running on Windows. But since I'm the compiler guy, I'll stick with my product. ;-)
Imagine what happens in this scenario: the current location on the stack is somewhere relatively near the guard page boundary, and the next instruction is a method call which has large stack requirements. If the stack requirements are sufficiently large, you can actually skip over the guard page (2) and into the stack overflow page (1). Let's say you're 100 bytes below the boundary before the call. That means if your new method's stack requirements are 4k + some change, then you'll be allocating space beyond the guard page. If you end up touching the stack in the guard page first, then everything behaves properly and the new page is setup. However, if you touch space beyond the guard page first, the kaboom!
What this effectively means is that you have a 4k limit of how much stack space your frame can take up if you want to always be assured that the stack will grow (up to the reserve limit). Anything more than 4k and you're at the mercy of the code generator. This is why your REALbasic applications are able to "run out of stack space" without actually firing a StackOverflowException. You're not out of stack space -- there's plenty of room to grow into. You just are using such a large frame that it ends up jumping over the OS' mechanism for growing the stack, and that's why you crash.
Now the question becomes: how can this be dealt with? There are a few solutions available, each with their own pros and cons. In no particular order:
We could just make it a backend error for the user to have more than 4k of space for a given stack frame. This is very heavy-handed, but it'd work. The project I was looking into which demonstrated the issue used 28k for a single method, so 4k seems quite paltry. But by adding an optimizer harness, we could make better use of temps, which should reduce the stack requirements by a large amount, so it could be reasonable.
We could make the commit size of the stack be the same as the reserve size of the stack. Now there's no longer a guard page in there: we use the entire stack. This is pretty darned easy to do, and solves all of the problems I've detailed. The RB stack checker will continue to work, you won't ever crash without hitting it first. However, this also increases the memory footprint of your application. Instead of taking up 8k of stack space on launch, it now takes up 1MB of stack space.
We could try to make the code generator smarter in terms of how it accesses data on the stack. This sounds like it's a great idea at first blush (best of all worlds), but it turns out to be a very fragile thing to do. When working on the codegen, no one things "I wonder what order this will access things on the stack because it could be a problem." It's just not one of those things anyone thinks about up front. What's more, there's really no easy way to enforce this via code -- so it's pretty much up to the coder to read comments. So the solution is actually very fragile.
On Vista and up, there's a Win32 API (SetThreadStackGuarantee) which allows you to set the guard page size to an arbitrary value instead of forcing it to be 4k. This allows us to set it to an arbitrary size, which we can then combine with the backend error I've already discussed. For instance, we could go with the PPC "method too large" amount of 32k, which is more than enough stack space for a sane method. So if your method requires > 32k, it's an error. But if it's <= 32k, then we can assure you it will still never crash because it'll always fall into the guard page. The downside is that this API is only available on newer OSes, so what happens for older OSes? Since you don't compile an app for a particular OS, there's no compile error we can make. It's possible we could turn it into a warning to have >4k of stack space, but that's pretty lame.
So there are a few ways to tackle the problem, but all of them have downsides. Personally, I'm leaning towards the last solution. It's not ideal, but it also doesn't penalize people who have "reasonable" stack frame requirements in their application (which is the majority of our users). And, since it's a solution going forward, it means that the problems will eventually go away entirely because at some point, all OSes will support the API. But I've certainly not made any sort of finalized decisions on the topic.
Hopefully this gives you a bit better of an idea about how stacks work on Windows, as well as how this might affect your applications. For the vast majority of people, this problem will probably never crop up. But if you do run into stack-related crashes in your application, then you might want to check out my previous post on how to reduce your risks.
After talking about it some with a very smart compiler guy, we came up with the crazy idea of touching the pages we expect our stack frame to use before actually allocating the stack space. If we do this in 4k chunks, then we can be assured that we'll hit the OS guard page, which will grow the stack as needed. So, instead of doing:
sub esp, someHugeValue
We'd do:
mov eax,dword ptr [esp-1000h]
mov eax,dword ptr [esp-2000h]
...
mov eax,dword ptr [esp-X000h] sub esp, someHugeValue
The mov instructions are just harmless ways to read the memory at those pages, thereby ensuring that they've been committed. The sub actually takes care of allocating the space (in an interrupt-safe manner).
This increases application size slightly, but not by enough to worry me. And it incurs no penalty for anyone while working on all versions of Windows. Whee!
I think SetThreadStackGuarantee works for 64 bit XP as well but I'm not sure (that was the box I was trying it on I think). I'm also pretty sure it doesn't work for 32 bit XP so it doesn't help for that vast install base like you said. I hadn't had 100% success with it anyway when I tried. Turns out what I thought was stack related in that app was a completely different bug-a-boo, but anyway....
Hmmm .... your Random Smart Complier Guy is indeed smart. I'd listen to That Guy. :)
Hi Aaron, i'm lost with the stack-background provided, sorry, but if the "crazy idea" would fix the IDE crashes on Windows* some users encounter (me too), i think, we will all applaude! ;-)
Best regards
Thomas
* even with an empty project debug-run cycles are getting slower and slower and the IDE crashes silently after 2-5 "cycles".
@Travis -- SetThreadStackGuarantee is said to work in XP 64/Vista and up on the client, and Server 2008/2003 SP1 on the server. So you're right, it doesn't work on 32-bit XP, or Win2k. And yes, it's nice having other compiler people I can turn to when I need to bounce ideas off someone who'll understand them. :-)
Good article. I was going to suggest that you hit each page as you go; glad to see that you already deduced that this is a good solution.
I would make some additions to your (excellent) summary of how win32 stack works. You are a bit vague about what happens when the guard page is hit when you are on the verge of running out of stack, and I think that is probably relevant to your situation.
If my memory from The Before Time (that is, when I wrote the stack handling code for VBScript) is still good, I think it works like this: there are three relevant guarded pages.
There's the one that is just above the top of your stack. When that one is hit under "normal" circumstances, it becomes the new top of the stack, and the page beyond it becomes the new "top of stack guard".
There's the one that is one page away from the absolute limit of the stack. If the "regular" top of stack page and that page are in fact the same page, then I believe that page gets faulted in AND you get an "out of stack" exception.
Then there is the page that is really at the very top limit of the stack. If THAT page is ever hit, the world ends. The OS immediately terminates the process on the assumption that some code has blown right past the guard page, eaten all the exceptions, and is hell bent on writing beyond the stack. The safest thing to do is to terminate everything and not give the hostile/buggy code a chance to continue to read or write past the stack into who knows what region of memory.
It's therefore dangerous to assume that an "out of stack" exception will always be thrown when you're getting close to the danger region. If you've hit the second-to-last guard page before and it has thrown, and you've caught, and failed to reset the second-to-last guard page back into its pristine state, then there is nothing stopping anyone from continuing to allocate stack on into the instantaneous death zone.
What I did for VBScript is we stash away the location of the top of the stack, and when we do a method call, we check to see whether there is a "reasonable" number of pages left BEFORE the second-to-last-guard page. If there is not, then we prematurely trigger a VBScript "out of stack" error rather than waiting for the OS to throw a real exception that we would then have to clean up. This is a bit more expensive, of course, but it is a lot safer.
If this happens, we also have heuristics where we do our best to unwind the stack back a few frames before calling into the host (that is, Internet Explorer or ASP typically) to tell the host "hey, the script engine just ate up all the stack on this thread". We've found that when we just called the host immediately, the host would then chew up all the stack in order to run its error display code, and BOOM, either an exception (best case) or the guard pages are sufficiently screwed up that the instant death page is hit. (This is why so often you saw IE 3 and 4 just suddenly disappear when it was running a script that happened to have an unbounded recursion.)
@Eric -- thanks for the thoughtful response! You're right about the way the frames are set up. There usually are a few different guard pages for different purposes (I don't think I really made that clear).
In terms of how we handle the REALbasic out of stack space problem is by percentage. It's a bit skeezy, I'll admit, but it will always work. We know the default stack size is 1MB, and so if you only have 102.4k left of stack space, we'll throw the StackOverflowException. If 100k of stack space isn't enough for you to handle that exception (which generally just involves tossing up an unhandled exception dialog), then you'll explode. But we've not found anyone to have a problem with that, since 100k of stack is pretty frickin huge.
I'm curious how other Windows apps typically handle this? It's an issue that faces every Windows app... What's considered standard best practice for dealing with stack frame allocation? This must be something that has a well specified policy for how windows apps are supposed to behave, after all virtually every app needs to grow it's stack space in a safe manner, right?
@Joe -- most applications don't grow their stack frames by obscene amounts. The usual behavior is to crash, but it doesn't come up terribly frequently because you have to work harder at it.
Aaron, but doesn't Windows have a correct way to safely grow an app's stack? I can't believe they just leave it up to chance and then just crash if a particular stack frame is larger than their guard frame. Or is that really the standard expectation of Windows apps?
I'm not a big Windows fan, but I give them credit for being better than leaving something like this undefined... If Windows provides those guard pages there must be a prescribed way that apps are supposed to interact with them...
Hi Aaron
I'm for any of the alternatives that you mentioned that is deterministic and avoids a silent crash. Even if it means a warning when a stack frame exceeds 4kB. Compiling without showing any warnings and then potentially crashing silently just isn't a good situation to allow.
@Joe -- the whole blog posting describes how Windows handles stack growing. ;-) The basic idea is that it hits the guard pages which allows the OS to grow the stack. The unusual bit is when your application grows the stack by > 4k in a single frame. That's a fairly hefty requirement for a single frame!
The application never has to do anything special to grow the stack under normal usage. If anything, the application merely needs to tell the linker to increase the stack reserve.
If Windows only gracefully handles stack frames of 4K or less then that's the constraint that RB needs to enforce on our Windows apps.
The key point I'm trying to make is that there must be some sort of deterministic stack behavior on windows and we should be following that. It still baffles me why there would be any discussion of these strategies for growing the stack. Let's just do what Windows officially supports, and there's gotta be some clear best practices for such a critical aspect of OS support.
This can't new ground, so why are we working on coming up with new solutions?
@Joe -- I think you might be misunderstanding. There is a deterministic stack behavior on Windows. The purpose of this blog posting was to explain what that behavior is, and why REALbasic applications can crash due to it. Sorry if I've not been clear enough before, but really, there's nothing to "discuss" on the topic. I presented factual information about how the stack actually works and mentioned some ideas of how to play nice with the stack.
Seems to me that Eric provided a very good approach to growing the stack by more than 4 KB. So I was wondering why a solution to such a fundamental question would be discovered by a blog posting vs in standard Windows documentation. And Eric also pointed out some risks to that approach.
So I'm still amazed that reliably growing the stack on Windows by more than 4KB is considered an unsolved problem. I guess that explains some of the stability problems we see from Windows apps, even beyond those built by RB.
@Joe -- yes, Eric's suggestion was a reasonable one. However, you're still missing several points, judging by your last comment.
1) If you need more than 4k of stack, you're doing it wrong. Either you're misusing _alloca, or you've got 8000 local variables. Yes, I realize this means REALbasic is doing it wrong. I agree. We require better temporary variable management in the compiler which will greatly alleviate the issue for REALbasic users.
2) If you need more than 4k of stack, you can use a Win32 API to do it. It's just that the API exists only on newer OSes that prevents REALbasic from using it. That API is SetThreadStackGuarantee.
3) If you don't want to use the newer API, you can change the fields in the PE32 header for reserve and commit size. Set the reserve to however much you need, and set the commit to the same value. Voila, you no longer run into the issue. Heck, you don't even need access to linker flags, you can just use editbin.exe to do it.
4) Your comment about stacks being a stability issue for Windows apps makes me wonder if you're trolling. Do you think you can't do something similar on OS X or Linux? Every OS has its own way to do stack management, and every one of them has a way to crash because of it. I was simply explaining how Windows works.