Interconnected servers and a central database, representing networked data or cloud computing.

Golang

HTTP

Data Caching

Speeding Up HTTP Endpoints with Response Caching in Go

11 min read

June 10, 2025

Go is famous for its speed and efficiency. As a compiled language with a minimalist design, it delivers fast execution right out of the box. Thanks to goroutines, which are Go’s lightweight and built-in way to handle concurrency, you can manage thousands of tasks at once without breaking a sweat. This makes Go a top pick for building web servers and APIs that need to juggle lots of requests in parallel.

But even with all that power, there are situations where raw speed and concurrency aren’t enough. If your endpoints keep running the same heavy logic for every identical request, you’re still wasting resources and slowing things down for your users. No matter how fast your code is or how many goroutines you spin up, sometimes the smartest move is to avoid repeating the work in the first place.

In this article, we’ll build a caching layer that stores HTTP responses and serves them for repeated requests with identical parameters. To demonstrate the performance benefits, we’ll:

Implement a basic HTTP server in Go
Add a simulated slow endpoint with intentional latency
Integrate a caching middleware to instantly serve repeated responses

Initial Setup

Let’s kick things off with a fresh Go project. Open up your terminal and type in go mod init. Now, nobody likes a messy project, so we’ll follow the popular golang-standards/project-layout. That means your main.go should be placed in the cmd folder, and your main server code in internal/server/server.go.

├── cmd
│   └── main.go
├── go.mod
├── go.sum
└── internal
    └── server
        └── server.go

Now for the fun part: building our server. Instead of interacting with Go’s *http.Server directly, we’ll wrap it in our own Server struct. This gives us a nice place to store our logger (which uses Go’s slog package) and keep things tidy.

// server.go

// Server contains the dependencies needed to
// run an HTTP server.
type Server struct {
	log  *slog.Logger
	serv *http.Server
}

// NewServer creates a new Server instance with
// the specified address.
func NewServer(addr string) *Server {
	s := &Server{
		log: slog.Default().With("component", "server"),
	}

	s.serv = &http.Server{
		Addr: addr,
	}

	return s
}

At the moment, our server isn’t doing much; it’s just sitting there, waiting for instructions. Let’s give it some life. To achieve that, we’ll add Start and Stop methods so that we could launch it from the main.go file and shut it down gracefully once the process is terminated.

// server.go

// Start starts the server. It blocks until the server.Stop is called.
func (s *Server) Start() error {
	s.log.With("addr", s.serv.Addr).Info("started web server")

	return s.serv.ListenAndServe()
}

// Stop shuts down the server.
func (s *Server) Stop() error {
	return s.serv.Shutdown(context.Background())
}

Let’s add a router method, which will be responsible for registering all of our HTTP routes. The fairly recently improved http.ServeMux in Go is all we need for most routing, so no need to bring in extra libraries.

// server.go

// router sets up the HTTP routes for the server.
func (s Server) router() http.Handler {
	mux := http.NewServeMux()
	return mux
}

Make sure your server actually uses this router when you create it:

// server.go

// NewServer creates a new Server instance with
// the specified address.
func NewServer(addr string) *Server {
	s := &Server{
		log: slog.Default().With("component", "server"),
	}

	s.serv = &http.Server{
		Addr:    addr,
		Handler: s.router(), // Call the router method here.
	}

	return s
}

Now, let’s make this thing run. In main.go, we’ll start up the server and set things up so it shuts down gracefully when you send it a termination signal, or simply hit CTRL+C.

// main.go

func main() {
	shutdown := runServer()

	ctx, cancel := signal.NotifyContext(
		context.Background(),
		syscall.SIGINT,
		syscall.SIGTERM,
	)
	defer cancel()

	<-ctx.Done()

	shutdown()
}

// runServer starts the HTTP server and returns a shutdown function.
func runServer() func() {
	srv := server.NewServer(":8080")

	stopCh := make(chan struct{})

	go func() {
		defer close(stopCh)

		if err := srv.Start(); err != nil {
			slog.Default().With("error", err).Error("unexpected server closure")
		}
	}()

	return func() {
		if err := srv.Stop(); err != nil {
			slog.Default().With("error", err).Error("stopping server")
		}

		<-stopCh
	}
}

Here’s what’s happening: we use signal.NotifyContext to listen for process termination signals like CTRL+C, which means, when you tell the app to quit, it doesn’t just vanish, it wraps things up neatly. The server runs in its own goroutine so your main function can stay alert for shutdown signals. When it’s time to terminate the process, the shutdown function stops the server and waits for everything to finish.

And that’s it: your server is up, running, and ready for action. Next, we’ll add some real endpoints and start optimizing them with HTTP caching.

Long-Running Task Endpoint

Let’s jump back into the server.go file inside the server package and set up a route that simulates a long-running task. To keep things simple, we’ll use a timer to delay our response by five seconds. Think of it as a stand-in for some heavy report processing.

// server.go

// fetchReport is the handler that fetches report information based on
// the report name provided in the URL path.
func (s Server) fetchReport(w http.ResponseWriter, r *http.Request) {
	name := r.PathValue("name")

	// To keep things simple, the timer below acts
	// as a placeholder for the actual report fetching logic.
	select {
	case <-time.After(5 * time.Second):
		// OK.
	case <-r.Context().Done():
		w.WriteHeader(http.StatusBadRequest)
		return
	}

	w.WriteHeader(http.StatusOK)
	w.Write([]byte(fmt.Sprintf("Report %q fetched successfully!", name)))
}

The logic of this handler is pretty straightforward. It grabs the name parameter from the URL and, after a short wait, sends back a message with that name included.

To properly integrate this into the router, head over to the router method and add a new route that points to your new handler.

// server.go

// router sets up the HTTP routes for the server.
func (s Server) router() http.Handler {
	mux := http.NewServeMux()

	mux.HandleFunc("GET /reports/{name}", s.fetchReport)

	return mux
}

It’s time to see it in action. Start up your server with:

$ go run cmd/main.go
2025/06/07 15:12:33 INFO started web server component=server addr=:8080

Then, give your endpoint a try using curl. You’ll notice the response takes five seconds, just as planned. Every request will wait the same amount of time.

$ time curl 127.0.0.1:8080/reports/hello
Report "hello" fetched successfully!
curl 127.0.0.1:8080/reports/hello  0.00s user 0.00s system 0% cpu 5.007 total

Imagine thousands of users hitting this endpoint, each one triggering a heavy database query to generate the same report. That’s not ideal. Instead, wouldn’t it be better if we could cache each unique response for a while, so we don’t have to do all that work every single time?

That’s exactly what we’ll tackle next. In the following section, we’ll build a simple caching layer to make sure each report gets generated only once per minute, no matter how many requests come in.

Making Endpoints Faster with Caching

To make our server a bit faster, let’s add some response caching. This way, we can avoid doing the same heavy work for every single request. We’ll keep the cache logic close to our server code by creating a new package called respcache inside the server folder. Add a cache.go file there so that your project structure could look like this:

├── cmd
│   └── main.go
├── go.mod
├── go.sum
└── internal
    └── server
        ├── respcache
        │   └── cache.go
        └── server.go

Similar to what we did with our Server wrapper, we’ll create a Cache struct, a constructor, and a method to stop it when we shut things down. For this example, we’ll use an automatic cleaner to clear out expired cache items. We’ll use the github.com/jellydator/ttlcache package (you can install it with go get github.com/jellydator/ttlcache/v3), with a simple string as the key and a custom struct as the value. This struct will hold everything we need to send back as a response.

// cache.go

import (
	"github.com/jellydator/ttlcache/v3" // import the ttlcache package
)

// Cache contains the dependencies needed to
// cache HTTP requests.
type Cache struct {
	log   *slog.Logger
	cache *ttlcache.Cache[string, cacheItem]
}

// NewCache creates a new Cache instance with the specified 
// TTL (time to live, used to auto delete cache items).
func NewCache(ttl time.Duration) *Cache {
	c := &Cache{
		log: slog.Default().With("component", "cache"),
		cache: ttlcache.New(
			ttlcache.WithTTL[string, cacheItem](ttl),
		),
	}

	go c.cache.Start()

	return c
}

// Stop stops the automatic cleanup process.
// It blocks until the cleanup process exits.
func (c Cache) Stop() {
	c.cache.Stop()
}

// cacheItem represents a cached item in the cache.
type cacheItem struct {
	body       []byte
	statusCode int
}

We need a way to check if we already have a cached response for a given request. If we do, we’ll send it right back. If not, we’ll call the original handler, capture the response, and store it in the cache for next time. To do this, we’ll write a middleware that wraps any handler function. We’ll also need a custom http.ResponseWriter to catch the response data as it’s written.

// cache.go

// Handle is a middleware that caches the response
// based on the request path and query parameters.
func (c Cache) Handle(next http.HandlerFunc) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		key := r.URL.RawPath + r.URL.RawQuery

		// We check if the response is already cached
		// by looking up the key in the cache.
		item := c.cache.Get(key)
		if item != nil {
			ci := item.Value()

			w.WriteHeader(ci.statusCode)
			w.Write(ci.body)

			return
		}

		// We create a custom response writer to capture the
		// response body so that we can cache it.
		rw := &responseWriter{
			w:    w,
			body: &bytes.Buffer{},

			// We set a default status code here,
			// as using Write without WriteHeader automatically
			// sets the status code to http.StatusOK.
			statusCode: http.StatusOK,
		}

		next.ServeHTTP(rw, r)

		// After the response is written, we cache it
		// using the key we built earlier.
		c.cache.Set(
			key,
			cacheItem{
				body:       rw.body.Bytes(),
				statusCode: rw.statusCode,
			},
			ttlcache.DefaultTTL,
		)
	})
}

// responseWriter is a helper struct that is used to intercept
// http.ResponseWriter's Write method.
type responseWriter struct {
	w          http.ResponseWriter
	body       *bytes.Buffer
	statusCode int
}

// Write writes the data to the connection as part of an HTTP reply.
func (wr *responseWriter) Write(buf []byte) (int, error) {
	n, err := wr.body.Write(buf)
	if err != nil {
		// unlikely to happen
		return n, err
	}

	return wr.w.Write(buf)
}

// Header returns the header map that is sent by the WriteHeader.
func (wr *responseWriter) Header() http.Header {
	return wr.w.Header()
}

// WriteHeader sends an HTTP response header with the provided status code.
func (wr *responseWriter) WriteHeader(statusCode int) {
	wr.statusCode = statusCode
	wr.w.WriteHeader(statusCode)
}

Here’s how the logic above works: we build a key from the request’s path and query string. Since this is used only for GET requests, we don’t worry about the request body. If a cached response exists for that key, we send it back and skip the slow handler. If not, we wrap the response writer so we can capture what the handler writes, then store that in the cache after the handler finishes.

Next, let’s update our Server struct to use this cache.

// server.go

// Server contains the dependencies needed to
// run an HTTP server.
type Server struct {
	log   *slog.Logger
	cache *respcache.Cache // We add our newly created response cache here.
	serv  *http.Server
}

// NewServer creates a new Server instance with
// the specified address.
func NewServer(addr string) *Server {
	s := &Server{
		log:   slog.Default().With("component", "server"),
		cache: respcache.NewCache(time.Minute), // We create a new instance of the response cache whenever we create a Server.
	}

	s.serv = &http.Server{
		Addr:    addr,
		Handler: s.router(),
	}

	return s
}

Remember to stop the cache when shutting down the server:

// server.go

// Stop shuts down the server.
func (s *Server) Stop() error {
	s.cache.Stop()

	return s.serv.Shutdown(context.Background())
}

Finally, update your router to use the cache middleware for the report endpoint:

// server.go

// router sets up the HTTP routes for the server.
func (s Server) router() http.Handler {
	mux := http.NewServeMux()

	mux.Handle("GET /reports/{name}", s.cache.Handle(s.fetchReport))

	return mux
}

And that’s it! From this point onwards, when you start your app and hit the same endpoint multiple times, only the first request will take the full five seconds. After that, responses come back almost instantly:

$ time curl 127.0.0.1:8080/reports/hello
Report "hello" fetched successfully!
curl 127.0.0.1:8080/reports/hello  0.00s user 0.00s system 0% cpu 5.007 total

$ time curl 127.0.0.1:8080/reports/hello
Report "hello" fetched successfully!
curl 127.0.0.1:8080/reports/hello  0.00s user 0.00s system 92% cpu 0.004 total

Conclusion

By this point, you’ve put together a simple but powerful caching layer for your Go HTTP server. What started as a slow, five-second response for every report request has become a much smoother experience for your users. Instead of making your server do the same heavy lifting over and over, you’re serving up cached results almost instantly.

This kind of optimization isn’t just about speed; it’s about making your server smarter. With response caching in place, you’re saving resources, reducing database load, and keeping things running smoothly even as traffic grows. Plus, you’ve done it all with clear, maintainable code that’s easy to extend as your app evolves.

Of course, this is just the beginning. You can take this pattern and run with it: maybe you’ll tweak the cache duration, add cache invalidation for updates, or expand the caching logic to other endpoints. The beauty of this approach is that it puts you in control, letting you fine-tune performance based on your app’s real-world needs.

If you want to see the complete code or experiment further, check out the full implementation right here: https://github.com/jellydator/ttlcache/tree/v3/examples/httpcache

Thanks for reading, and good luck with your next Go project!