Be Cautious When Using time.Time as Keys in Go Maps
Publish date: 2024-08-13
Intro
Recently, I encountered a strange bug in ppacer. The scheduler had a bug that only appeared on Linux, which was particularly concerning since it directly affected ppacer's core functionality: scheduling.
Before we dive into the details, I want to mention that throughout the development of ppacer, I rarely encountered issues that took more than a few minutes to debug. However, this particular bug took me 4-5 hours to track down. That’s why I thought that this behavior might not be obvious to others as well. So, I’d like to share what I’ve learned.
I'll try to focus on the bug and solution in abstraction, without getting into details of ppacer internal types and logic.
Symptoms
Using ppacer, I wanted to schedule and execute a process which, after the first task, had another two concurrent tasks to be scheduled. The first task was scheduled and executed successfully, but then its children never got scheduled. No errors, nothing. They simply wouldn't schedule.
This issue occurred on a relatively modest EC2 instance (2GB RAM, 2 vCPU) running Linux. Strangely, the exact same program ran perfectly on Windows and macOS (MacBook Air M1). That was puzzling. For context, ppacer is built with Go 1.22, and its only dependency is a driver for SQLite.
Debugging
Initially, I thought the issue might be due to the limited resources on the EC2 instance. However, ppacer generally doesn’t consume much in terms of resources, and the particular example in question was relatively simple, so that didn’t seem to be the case.
After enabling `DEBUG` level logging, I discovered that child tasks weren’t getting scheduled
because the parent task’s status in the cache was SCHEDULED rather than SUCCESS.
I couldn’t figure out how this was possible. The logs clearly showed that the correct SUCCESS
status was stored in the cache, but when another goroutine retrieved the value from the cache for
the same key, it returned SCHEDULED instead. That made no sense!
Looking deeper into the cache itself, I noticed two identical-looking keys with different values.
This pointed me towards an issue with using time.Time as map keys.
time.Time in Go map keys
The root cause of the issue is that time.Time contains both a "wall clock" time and
a monotonic clock value. While two time.Time values might look identical when printed,
they can have different monotonic clock values, causing them to be treated as distinct keys in a map.
Additionally, monotonic clock values are system-dependent, and they are not preserved when
time.Time is serialized. This explains why the bug only appeared when running on Linux.
Reproducible example
Consider the following simple Go program:
package main
import (
"fmt"
"time"
)
type Key struct {
Name string
Timestamp time.Time
}
func main() {
cache := make(map[Key]int)
now := time.Date(2024, 8, 12, 16, 5, 0, 0, time.Local)
k1 := Key{Name: "Damian", Timestamp: now}
cache[k1] = 42
nowStr := now.Format("2006-01-02T15:04:05.999999MST-07:00")
now2, _ := time.Parse("2006-01-02T15:04:05.999999MST-07:00", nowStr)
k2 := Key{Name: "Damian", Timestamp: now2}
cache[k2] = -123
for k, v := range cache {
fmt.Printf(" -%v: %d
", k, v)
}
} Running this on macOS outputs:
-{Damian 2024-08-12 16:05:00 +0200 CEST}: -123 But on Debian, it outputs:
-{Damian 2024-08-12 16:05:00 +0000 UTC}: 42
-{Damian 2024-08-12 16:05:00 +0000 UTC}: -123 The two entries look the same but are different due to the hidden monotonic clock values.
Solution
The fix was straightforward: use a serialized timestamp string instead of time.Time in cache keys.
type DRTBase struct {
DagId string
AtTime string // Changed from time.Time
TaskId string
} Summary
Be cautious when using time.Time as keys in Go maps. While it can be useful, always
be aware of the monotonic clock values and how they might impact key uniqueness.
For more details, check out the official Go documentation on monotonic clocks.