AHdark

AHdark Blog

Senior high school student with a deep passion for coding. Driven by a love for problem-solving, I’m diving into algorithms while honing my skills in TypeScript, Rust, and Golang.
telegram
tg_channel
twitter
github

Golang PIE Compilation Test

In Golang version 1.61, Golang introduced the PIE (Position Independent Executable) compilation mode2. In 1.15, this compilation mode became the default compilation mode for the Windows platform3. The essence of this mode is to create a special type of binary file that can be loaded into any location in memory at runtime, rather than a fixed address.

About PIE Compilation#

In the past decade, major operating systems and distributions (such as macOS, iOS, Android, FreeBSD, Ubuntu, Debian, Fedora) have either enabled or begun to support PIE by default. The focus on memory safety issues has led to the gradual phasing out of non-PIE binary files.

Readers well-versed in this field may question the relevance of PIE. Isn't Golang's managed memory model already preventing the issues that ASLR claims are difficult to exploit? To some extent, this is true: if you are not using CGO to build, it is not easy to write code with such issues. However, if you use CGO, you are linking C code. C code does not have the same protection mechanisms as Go, and its memory issues have always been and will forever be a major pain point, meaning that even the strongest developers cannot guarantee they will never make mistakes. Thus, we still face memory safety issues, and the PIE compilation mode can help us address this problem.

Loaded into Any Memory Location#

In traditional executable files, the program's code and data typically have a fixed address space layout. This means that every time the program runs, its code and data are loaded into the same memory addresses. This method is simple and effective, but it is also vulnerable to certain types of security attacks, such as buffer overflow attacks, because attackers can predict the memory addresses of the code and data. In contrast, position-independent executables (such as files generated with -buildmode=pie) are designed to dynamically select memory addresses at load time. The operating system can choose different memory locations to load the program's code and data each time the program is run. This method enhances security because it increases the difficulty for attackers to predict the program's memory layout. However, it may also incur additional performance overhead, as the program needs to reference memory addresses indirectly.

Traditional Memory Loading Method (Non-PIE)#

Suppose we have a simple program that contains some functions and data. In traditional non-PIE mode, this program might be designed to always load at the same memory address. For example:

  • The program's starting point is always at memory address 0x1000.
  • An important data structure is always at 0x2000.

When this program runs, it will always be loaded at these specific memory addresses, regardless of when or where it runs. This makes the program simple and efficient, but also vulnerable to security attacks, as attackers can exploit this determinism to perform malicious actions.

Memory Loading Method in PIE Mode#

Now, suppose the same program is compiled as PIE. In this case, the program's loading address is no longer fixed:

  • The program may be loaded at memory address 0x5000 the first time it runs, and at 0x9000 the next time.
  • Similarly, that important data structure will also change with the program's loading address, for example, being at 0x6000 the first time it runs and at 0xA000 the next time.

Each time the program starts, the operating system selects a new random address to load the program and its data. This means that attackers cannot know in advance at which memory address the program will run, greatly increasing the difficulty of exploiting memory addresses in security attacks.

With PIE, since the executable can be loaded at any location in memory, this increases system security because it makes it difficult for attackers to predict the memory addresses of the program's code, thereby thwarting some common attack methods, such as buffer overflow attacks.

Performance Overhead of PIE Compilation Mode#

Position-independent code can be copied to any location in memory without modification. This is different from relocated code, which requires special handling by the linker or loader to determine the appropriate runtime memory address. Position-independent code must follow a specific set of semantics at the source code level and requires compiler support. Instructions that reference absolute memory addresses (such as absolute jump instructions) must be replaced with PC-relative addressing instructions. These indirect handling processes may lead to a decrease in the runtime efficiency of PIC, but most processors currently have good support for PIC, making this slight decrease in efficiency negligible.

—— Translated from Wikipedia

But how much performance and memory overhead does PIE actually incur? This article will test the PIE compilation mode to verify this claim.

Testing PIE#

Since Go's PIE build mode is not enabled by default on linux/amd64 and linux/arm64, this article will conduct tests on the linux/amd64 architecture. All test code is located in the repository: https://github.com/AH-dark/go-pie-comparation.

Testing Environment#

Since I use a Mac as my development device, we will use GitHub Codespace as the testing environment for this test.

  • CPU: AMD EPYC 7763 64-Core Processor × 2
  • Memory: 8GB
  • OS: Ubuntu 20.04.2 LTS
  • Go: 1.21.5 linux/amd64
  • GCC: 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)

Performance Testing#

We first compile applications in both PIE mode and non-PIE mode, then run them separately and use the time command to calculate the time taken for the applications to complete all tasks.

package main

import (
	"fmt"
	"math"
	"sync"
)

var wg = sync.WaitGroup{}

func compute(start, end int) {
	defer wg.Done()
	var result float64
	for i := start; i < end; i++ {
		num := math.Sqrt(float64(i)) * math.Sin(float64(i)) * math.Cos(float64(i))
		result += num
	}
}

func doCompute() {
	const numWorkers = 4
	const numElements = 25000000

	// Concurrent mathematical computation
	for i := 0; i < numWorkers; i++ {
		wg.Add(1)
		go compute(i*numElements/numWorkers, (i+1)*numElements/numWorkers)
	}
}

func writeMemory() {
	// Memory operations
	data := make(map[int]int)
	for i := 0; i < 10000000; i++ {
		data[i] = i ^ 0xff00
	}
}

func main() {
	doCompute()
	writeMemory()

	wg.Wait()

	fmt.Println("Test completed")
}

We use a pre-configured makefile for compilation and testing.

make build

# Time test
make bench_time

The results of the two sets of tests are as follows:

Benchmarking time for non-PIE executable
Test completed
2.62user 0.48system 0:02.78elapsed 111%CPU (0avgtext+0avgdata 638976maxresident)k
0inputs+0outputs (0major+159478minor)pagefaults 0swaps

Benchmarking time for PIE executable
Test completed
2.70user 0.45system 0:02.68elapsed 117%CPU (0avgtext+0avgdata 638836maxresident)k
0inputs+0outputs (0major+158447minor)pagefaults 0swaps
Benchmarking time for non-PIE executable
Test completed
2.61user 0.50system 0:02.71elapsed 115%CPU (0avgtext+0avgdata 639012maxresident)k
0inputs+0outputs (0major+159475minor)pagefaults 0swaps

Benchmarking time for PIE executable
Test completed
2.54user 0.48system 0:02.59elapsed 116%CPU (0avgtext+0avgdata 638356maxresident)k
0inputs+0outputs (0major+159312minor)pagefaults 0swaps

It can be seen that the performance overhead caused by PIE is very low, and in some cases, it even brings performance improvements. For large applications, this performance overhead may be more significant, but for most applications, this overhead is negligible.

Memory Usage Testing#

We add the net/http package to the program to facilitate memory usage testing using net/http/pprof.

package main

import (
    "fmt"
    "math"
    "net/http"
    _ "net/http/pprof"
    "sync"
)

var wg = sync.WaitGroup{}

func compute(start, end int) {
    defer wg.Done()
    var result float64
    for i := start; i < end; i++ {
        num := math.Sqrt(float64(i)) * math.Sin(float64(i)) * math.Cos(float64(i))
        result += num
    }
}

func doCompute() {
    const numWorkers = 4
    const numElements = 25000000

    // Concurrent mathematical computation
    for i := 0; i < numWorkers; i++ {
        wg.Add(1)
        go compute(i*numElements/numWorkers, (i+1)*numElements/numWorkers)
    }
}

func writeMemory() {
    // Memory operations
    data := make(map[int]int)
    for i := 0; i < 10000000; i++ {
        data[i] = i ^ 0xff00
    }
}

func main() {
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    doCompute()
    writeMemory()

    wg.Wait()

    fmt.Println("Test completed")

    select {}
}

After starting the program, it will not exit after completing the task, allowing us to use net/http/pprof for memory usage testing.

make build

# Memory usage test
./bin/math_test_non_pie
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap

./bin/math_test_pie
go tool pprof -http=:8081 http://localhost:6060/debug/pprof/heap

The test results are as follows:

File: math_test_non_pie
Type: inuse_space
Time: Dec 22, 2023 at 4:07pm (UTC)
Showing nodes accounting for 239.41MB, 100% of 239.41MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                          238.91MB   100% |   main.main /workspaces/go-pie-comparation/calculation-intensive/main.go:49 (inline)
  238.91MB 99.79% 99.79%   238.91MB 99.79%                | main.writeMemory /workspaces/go-pie-comparation/calculation-intensive/main.go:37
----------------------------------------------------------+-------------
                                            0.50MB   100% |   net/http.init /usr/local/go/src/net/http/h2_bundle.go:1189
    0.50MB  0.21%   100%     0.50MB  0.21%                | net/http.map.init.0 /usr/local/go/src/net/http/h2_bundle.go:1189
----------------------------------------------------------+-------------
                                          238.91MB   100% |   runtime.main /usr/local/go/src/runtime/proc.go:267
         0     0%   100%   238.91MB 99.79%                | main.main /workspaces/go-pie-comparation/calculation-intensive/main.go:49
                                          238.91MB   100% |   main.writeMemory /workspaces/go-pie-comparation/calculation-intensive/main.go:37 (inline)
----------------------------------------------------------+-------------
                                            0.50MB   100% |   runtime.doInit1 /usr/local/go/src/runtime/proc.go:6740
         0     0%   100%     0.50MB  0.21%                | net/http.init /usr/local/go/src/net/http/h2_bundle.go:1189
                                            0.50MB   100% |   net/http.map.init.0 /usr/local/go/src/net/http/h2_bundle.go:1189
----------------------------------------------------------+-------------
                                            0.50MB   100% |   runtime.main /usr/local/go/src/runtime/proc.go:249 (inline)
         0     0%   100%     0.50MB  0.21%                | runtime.doInit /usr/local/go/src/runtime/proc.go:6707
                                            0.50MB   100% |   runtime.doInit1 /usr/local/go/src/runtime/proc.go:6740
----------------------------------------------------------+-------------
                                            0.50MB   100% |   runtime.doInit /usr/local/go/src/runtime/proc.go:6707
         0     0%   100%     0.50MB  0.21%                | runtime.doInit1 /usr/local/go/src/runtime/proc.go:6740
                                            0.50MB   100% |   net/http.init /usr/local/go/src/net/http/h2_bundle.go:1189
----------------------------------------------------------+-------------
         0     0%   100%     0.50MB  0.21%                | runtime.main /usr/local/go/src/runtime/proc.go:249
                                            0.50MB   100% |   runtime.doInit /usr/local/go/src/runtime/proc.go:6707 (inline)
----------------------------------------------------------+-------------
         0     0%   100%   238.91MB 99.79%                | runtime.main /usr/local/go/src/runtime/proc.go:267
                                          238.91MB   100% |   main.main /workspaces/go-pie-comparation/calculation-intensive/main.go:49
----------------------------------------------------------+-------------

File: math_test_pie
Type: inuse_space
Time: Dec 22, 2023 at 4:08pm (UTC)
Showing nodes accounting for 233.91MB, 100% of 233.91MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                          233.91MB   100% |   main.main /workspaces/go-pie-comparation/calculation-intensive/main.go:49 (inline)
  233.91MB   100%   100%   233.91MB   100%                | main.writeMemory /workspaces/go-pie-comparation/calculation-intensive/main.go:37
----------------------------------------------------------+-------------
                                          233.91MB   100% |   runtime.main /usr/local/go/src/runtime/proc.go:267
         0     0%   100%   233.91MB   100%                | main.main /workspaces/go-pie-comparation/calculation-intensive/main.go:49
                                          233.91MB   100% |   main.writeMemory /workspaces/go-pie-comparation/calculation-intensive/main.go:37 (inline)
----------------------------------------------------------+-------------
         0     0%   100%   233.91MB   100%                | runtime.main /usr/local/go/src/runtime/proc.go:267
                                          233.91MB   100% |   main.main /workspaces/go-pie-comparation/calculation-intensive/main.go:49
----------------------------------------------------------+-------------

For another set of io-copy tests in the repository, similar results were observed.

File: math_test_non_pie
Type: inuse_space
Time: Dec 22, 2023 at 4:04pm (UTC)
Showing nodes accounting for 1GB, 100% of 1GB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                               1GB   100% |   runtime.main /usr/local/go/src/runtime/proc.go:267
       1GB   100%   100%        1GB   100%                | main.main /workspaces/go-pie-comparation/io-copy/main.go:54
----------------------------------------------------------+-------------
         0     0%   100%        1GB   100%                | runtime.main /usr/local/go/src/runtime/proc.go:267
                                               1GB   100% |   main.main /workspaces/go-pie-comparation/io-copy/main.go:54
----------------------------------------------------------+-------------

It appears that the PIE compilation mode incurs almost no additional memory usage.

Conclusion#

PIE has been introduced for a long time and is currently the default compilation mode on many platforms. However, we still see many discussions about the performance overhead of the PIE compilation mode. This article tested the PIE compilation mode, and the results show that it incurs almost no additional performance overhead or memory usage.

This suggests that we should approach emerging technologies with a pragmatic attitude rather than blindly believing others' claims. We should put them into practice to verify whether they truly fit our application scenarios.

P.S.: The test package includes a diff of the assembly code. If you compile using the same command on arm64, you will find that the assembly code for PIE mode and non-PIE mode has almost no differences. However, on amd64, the assembly code for PIE mode changes by several MB, as PIE is enabled by default on darwin/arm64 but not on linux/arm64.

Acknowledgments#

  • Thanks to TheOrdinaryWow for providing questions regarding the PIE build mode.

Footnotes#

  1. Go 1.6 Release Notes

  2. Wikipedia: Position-independent code

  3. Go 1.15 Release Notes

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.