Does the OS reserve the fixed amount of valid virtual space for stack or something else? Am I able to produce a stack overflow just by using big local variables?
I've wrote a small C
program to test my assumption. It's running on X86-64 CentOS 6.5.
#include <string.h>
#include <stdio.h>
int main()
{
int n = 10240 * 1024;
char a[n];
memset(a, 'x', n);
printf("%x\n%x\n", &a[0], &a[n-1]);
getchar();
return 0;
}
Running the program gives &a[0] = f0ceabe0
and &a[n-1] = f16eabdf
The proc maps shows the stack: 7ffff0cea000-7ffff16ec000. (10248 * 1024B)
Then I tried to increase n = 11240 * 1024
Running the program gives &a[0] = b6b36690
and &a[n-1] = b763068f
The proc maps shows the stack: 7fffb6b35000-7fffb7633000. (11256 * 1024B)
ulimit -s
prints 10240
in my PC.
As you can see, in both case the stack size is bigger than which ulimit -s
gives. And the stack grows with bigger local variable. The top of stack is somehow 3-5kB more off &a[0]
(AFAIK the red zone is 128B).
So how does this stack map get allocated?
3 Answers 3
It appears that the stack memory limit is not allocated (anyway, it couldn't with unlimited stack). https://www.kernel.org/doc/Documentation/vm/overcommit-accounting says:
The C language stack growth does an implicit mremap. If you want absolute guarantees and run close to the edge you MUST mmap your stack for the largest size you think you will need. For typical stack usage this does not matter much but it's a corner case if you really really care
However mmapping the stack would be the goal of a compiler (if it has an option for that).
EDIT: After some tests on an x84_64 Debian machine, I've found that the stack grows without any system call (according to strace
). So, this means that the kernel grows it automatically (this is what the "implicit" means above), i.e. without explicit mmap
/mremap
from the process.
It was quite hard to find detailed information confirming this. I recommend Understanding The Linux Virtual Memory Manager by Mel Gorman. I suppose that the answer is in Section 4.6.1 Handling a Page Fault, with the exception "Region not valid but is beside an expandable region like the stack" and the corresponding action "Expand the region and allocate a page". See also D.5.2 Expanding the Stack.
Other references about Linux memory management (but with almost nothing about the stack):
- Memory FAQ
- What every programmer should know about memory by Ulrich Drepper
EDIT 2: This implementation has a drawback: in corner cases, a stack-heap collision may not be detected, even in the case where the stack would be larger than the limit! The reason is that a write in a variable in the stack may end up in allocated heap memory, in which case there is no page fault and the kernel cannot know that the stack needed to be extended. See my example in the discussion Silent stack-heap collision under GNU/Linux I started in the gcc-help list. To avoid that, the compiler needs to add some code at function call; this can be done with -fstack-check
for GCC (see Ian Lance Taylor's reply and the GCC man page for details).
-
That seems the correct answer to my question. But it confuses me more. When will the mremap call get triggered? Will it be a syscall built into the program?Amos– Amos2014年07月20日 11:19:08 +00:00Commented Jul 20, 2014 at 11:19
-
@amos I assume that the mremap call will be triggered if need be at a function call or when alloca() is called.vinc17– vinc172014年07月20日 11:22:38 +00:00Commented Jul 20, 2014 at 11:22
-
It would probably be a good idea to mention what mmap is, for people who don't know.Faheem Mitha– Faheem Mitha2014年07月20日 13:35:20 +00:00Commented Jul 20, 2014 at 13:35
-
@FaheemMitha I've added some information. For those who don't know what mmap is, see the memory FAQ mentioned above. Here, for the stack, it would have been "anonymous mapping" so that unused space wouldn't take any physical memory, but as explained by Mel Gorman, the kernel does the mapping (virtual memory) and the physical allocation at the same time.vinc17– vinc172014年07月20日 14:42:54 +00:00Commented Jul 20, 2014 at 14:42
-
1@max I've tried the OP's program with
ulimit -s
giving 10240, like under the OP's conditions, and I get a SIGSEGV as expected (this is what is required by POSIX: "If this limit is exceeded, SIGSEGV shall be generated for the thread."). I suspect a bug in the OP's kernel.vinc17– vinc172017年02月21日 16:35:47 +00:00Commented Feb 21, 2017 at 16:35
Linux kernel 4.2
- mm/mmap.c#acct_stack_growth decides if it will segfault or not. It uses
rlim[RLIMIT_STACK]
which corresponds to the POSIXgerlimit(RLIMIT_STACK)
- arch/x86/mm/fault.c#do_page_fault is the interrupt handler that starts a chain which ends up calling
acct_stack_growth
- arch/x86/entry/entry_64.S sets up the page fault handler. You need to know a bit about paging to understand that part: How does x86 paging work? | Stack Overflow
Minimal test program
We can then test it up with a minimal NASM 64-bit program:
global _start
_start:
sub rsp, 0x7FF000
mov [rsp], rax
mov rax, 60
mov rdi, 0
syscall
Make sure that you turn off ASLR and remove environment variables as those will go on the stack and take up space:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
env -i ./main.out
The limit is somewhere slightly below my ulimit -s
(8MiB for me). Looks like this is because of extra System V specified data initially put on the stack in addition to the environment: Linux 64 command line parameters in Assembly | Stack Overflow
If you are serious about this, TODO make a minimal initrd image that starts writing from the stack top and goes down, and then run it with QEMU + GDB. Put a dprintf
on the loop printing the stack address, and a breakpoint at acct_stack_growth
. It will be glorious.
Related:
- https://softwareengineering.stackexchange.com/questions/207386/how-are-the-size-of-the-stack-and-heap-limited-by-the-os
- Where is the stack memory allocated from for a Linux process? | Stack Overflow
- What is the Linux Stack? | Stack Overflow
- What is the maximum recursion depth in Python, and how to increase it? on Stack Overflow
By default, the maximal stack size is configured to be 8MB per process,
but it can be changed using ulimit
:
Showing the default in kB:
$ ulimit -s
8192
Set to unlimited:
ulimit -s unlimited
affecting the current shell and subshells and their child processes.
(ulimit
is a shell builtin command)
You can show the actual stack address range in use with:
cat /proc/$PID/maps | grep -F '[stack]'
on Linux.
-
So when a program is loaded by the current shell, OS will make a memory segment of
ulimit -s
KB valid for the program. In my case it's 10240KB. But when I declare a local arraychar a[10240*1024]
and seta[0]=1
, the program exits correctly. Why?Amos– Amos2014年07月20日 10:27:05 +00:00Commented Jul 20, 2014 at 10:27 -
Try to set the last element too. And make sure that they are not optimized away.vinc17– vinc172014年07月20日 10:34:58 +00:00Commented Jul 20, 2014 at 10:34
-
1I.e. The limit on stack space is not enforced. It's just a caveat -- go beyond this and risk trouble.goldilocks– goldilocks2014年07月20日 11:10:22 +00:00Commented Jul 20, 2014 at 11:10
-
2@amos So, as you can see,
a[]
has not been allocated in your 10MB stack. The compiler might have seen that there couldn't be a recursive call and has done special allocation, or something else like a discontinuous stack or some indirection.vinc17– vinc172014年07月20日 12:20:17 +00:00Commented Jul 20, 2014 at 12:20 -
2-s only gives the soft stack limit, you need -Hs to get the hard limit. (-s -H is also ok, but not -sH as -s can be used to set a limit and H is not a valid value) on my Debian Bullseye -S -s gives 8192 KiB but -H -s gives "unlimited". From bash manual page, "A hard limit cannot be increased by a non-root user once it is set; a soft limit may be increased up to the value of the hard limit. ... If limit is omitted, the current value of the soft limit of the resource is printed, unless the -H option is given. "Max Power– Max Power2022年03月16日 18:39:25 +00:00Commented Mar 16, 2022 at 18:39