Google Cloud Platform Logs
If you write logs in the correct format, Google Cloud’s Stackdriver Logging will understand the timestamps, severity levels, collect structured logs, and report stack traces in the error report. This package contains some code to make this easier, as well as some experiments I used to reverse engineer it. The code is in Go, but the formatting things are mostly language independent.
This also contains
gcpzap, a wrapper for the zap logging library which configures it so Stackdriver understands the logs.
- Write JSON logs, one payload per line, to stdout or stderr.
messagefor the message,
severityfor the severity level, and one of the time formats. See the Stackdriver documentation about message formats for details.
- Export logs to BigQuery to be able to search.
The Stackdriver documentation about time fields is incorrect. It only seems to support the following formats:
timestampas a struct:
timeas a RFC3339/ISO8601 string:
Notably, this means time as Unix seconds dot nanoseconds (“SSSS.NNNNNNNNN”) does not work, either in a JSON string or a JSON float (public issue for this).
The Google Cloud HTTP load balancer attaches
X-Cloud-Trace-Context headers to incoming requests. The format is
X-Cloud-Trace-Context: TRACE_ID/SPAN_ID;o=TRACE_TRUE. If you include the trace ID in the right format, Stackdriver will parse it. For now, this seems to only useful for querying logs, and for collecting logs together in App Engine (see below).
If you write out a panic, it will get reported in the Stackdriver error reporter. It must either look like a “default” panic, or the panic caught by the HTTP server. See examples below. You can make some small edits. Google publishes a fluentd output plugin that scans for exception patterns. The ones used by Stackdriver in production are different, but the concept is very similar.
Things you can remove from the stack trace:
- Function addresses (
- Exit status line
- Function arguments between
(..)is also okay
Things you cannot remove:
- Blank line after panic
- Change the goroutine number to a string
- The parenthesis after the function name
Traces are not as useful as you might hope, but it does let you query across the HTTP load balancer logs and the container logs. The error reporter does not capture panics from the HTTP server, but does capture the default formatted panics.
Example query to combine HTTP and container logs
SELECT log.* EXCEPT (trace), http.* EXCEPT (timestamp, trace) FROM ( SELECT trace, timestamp, severity, COALESCE(textPayload, jsonPayload.message) AS text FROM `bigquery-tools.logs.stderr_20190224` ) AS log LEFT OUTER JOIN ( SELECT trace, timestamp, httpRequest.status, httpRequest.requestMethod, httpRequest.requestUrl, jsonpayload_type_loadbalancerlogentry.statusdetails FROM `bigquery-tools.logs.requests_20190224` ) AS http ON log.trace = http.trace WHERE log.trace = 'projects/bigquery-tools/traces/9572036a5aff8aae3cd8f1f053d348b1' ORDER BY COALESCE(log.timestamp, http.timestamp)
App Engine (New Version)
This is about the “new” App Engine Standard (Java8, Python3, Go111), not the deprecated “legacy” App Engine or App Engine Flexible. The old legacy App Engine used to collect all logs from a single request together. This was incredibly useful. If you include the trace ID in the correct log format, the log viewer will do this, when you expand the request entry in the combined log:
Good news: You don’t need to do anything to get sensible logs with Cloud Functions!
Bad news: You can’t customize how it handles logs at all.
Since currently Cloud Functions only allow a single request to execute a time, it automatically tags all log lines with both
labels.execution_id, so you can easily correlate log lines for a single request. All you need to do is write out your logs. Unfortunately, it does not appear that Cloud Functions will parse JSON, so you can’t use structured logs.
Similarly, Cloud Functions will not automatically report errors for things that “look like” panics. Instead, it installs some sort of panic handler itself, which reports it to Stackdriver. It then writes the panic to the HTTP response with something that looks like the following:
Function panic: real panic goroutine 6 [running]: runtime/debug.Stack(0xc000153688, 0x6d5b20, 0x79c830) /go/src/runtime/debug/stack.go:24 +0xa7 main.executeFunction.func1(0x7a0180, 0xc0001380e0) /tmp/sgb/gopath/src/serverlessapp/worker.go:116 +0x6e panic(0x6d5b20, 0x79c830) /go/src/runtime/panic.go:513 +0x1b9 serverlessapp/vendor/gcf.LogExample(0x7a0180, 0xc0001380e0, 0xc000130200) /tmp/sgb/gopath/src/serverlessapp/vendor/gcf/gcf.go:16 +0x5a8 main.executeFunction(0x7a0180, 0xc0001380e0, 0xc000130200) /tmp/sgb/gopath/src/serverlessapp/worker.go:123 +0x148c net/http.HandlerFunc.ServeHTTP(0x75e970, 0x7a0180, 0xc0001380e0, 0xc000130200) /go/src/net/http/server.go:1964 +0x44 net/http.(*ServeMux).ServeHTTP(0x9b8540, 0x7a0180, 0xc0001380e0, 0xc000130200) /go/src/net/http/server.go:2361 +0x127 net/http.serverHandler.ServeHTTP(0xc000087040, 0x7a0180, 0xc0001380e0, 0xc000130200) /go/src/net/http/server.go:2741 +0xab net/http.(*conn).serve(0xc00012e0a0, 0x7a0400, 0xc00001c200) /go/src/net/http/server.go:1847 +0x646 created by net/http.(*Server).Serve /go/src/net/http/server.go:2851 +0x2f5
Example panic messages
Some example panic messages created by Go programs, which are useful for testing how Stackdriver catches and reports errors. These are logged to stderr, and contain tab characters for indenting.
The “standard” panic report printed before the process crashes:
panic: hello panic goroutine 1 [running]: main.panicNormally(...) /gopath/src/github.com/evanj/gcplogs/appengine/logdemo.go:28 main.funcWithArgs(0x1) /gopath/src/github.com/evanj/gcplogs/appengine/logdemo.go:32 +0x39 main.main() /gopath/src/github.com/evanj/gcplogs/appengine/logdemo.go:45 +0xb1 exit status 2
Logged with log.Printf in the bowels of the http server:
2019/02/23 07:25:58 http: panic serving [::1]:62811: hello this is a panic goroutine 37 [running]: net/http.(*conn).serve.func1(0xc00013a1e0) /go/src/net/http/server.go:1746 +0xd0 panic(0x1246000, 0x12ebcd0) /go/src/runtime/panic.go:513 +0x1b9 main.realPanic(0x12efd60, 0xc00014c1c0, 0xc000176100) /gopath/src/github.com/evanj/gcplogs/appengine/logdemo.go:23 +0x39 net/http.HandlerFunc.ServeHTTP(0x12bd070, 0x12efd60, 0xc00014c1c0, 0xc000176100) /go/src/net/http/server.go:1964 +0x44 net/http.(*ServeMux).ServeHTTP(0x14a07a0, 0x12efd60, 0xc00014c1c0, 0xc000176100) /go/src/net/http/server.go:2361 +0x127 net/http.serverHandler.ServeHTTP(0xc000080f70, 0x12efd60, 0xc00014c1c0, 0xc000176100) /go/src/net/http/server.go:2741 +0xab net/http.(*conn).serve(0xc00013a1e0, 0x12eff60, 0xc0000f8200) /go/src/net/http/server.go:1847 +0x646 created by net/http.(*Server).Serve /go/src/net/http/server.go:2851 +0x2f5
Random bonus: formatting time benchmarks
I compared the following time formats:
- Unix seconds dot nanoseconds (e.g.
- RFC3389 with nanoseconds (e.g.
time.Format, and an unsafe cached implementation.
The results in order of fastest to slowest:
|Unix strconv.Append to byte||71.0|