Go is famous for its speed and efficiency. As a compiled language with a minimalist design, it delivers fast execution right out of the box. Thanks to goroutines, which are Go’s lightweight and built-in way to handle concurrency, you can manage thousands of tasks at once without breaking a sweat. This makes Go a top pick for building web servers and APIs that need to juggle lots of requests in parallel.
But even with all that power, there are situations where raw speed and concurrency aren’t enough. If your endpoints keep running the same heavy logic for every identical request, you’re still wasting resources and slowing things down for your users. No matter how fast your code is or how many goroutines you spin up, sometimes the smartest move is to avoid repeating the work in the first place.
In this article, we’ll build a caching layer that stores HTTP responses and serves them for repeated requests with identical parameters. To demonstrate the performance benefits, we’ll:
- Implement a basic HTTP server in Go
- Add a simulated slow endpoint with intentional latency
- Integrate a caching middleware to instantly serve repeated responses
Initial Setup
Let’s kick things off with a fresh Go project. Open up your terminal and
type in go mod init
. Now, nobody likes a messy project, so we’ll follow the
popular golang-standards/project-layout.
That means your main.go
should be placed in the cmd
folder, and your
main server code in internal/server/server.go
.
├── cmd
│ └── main.go
├── go.mod
├── go.sum
└── internal
└── server
└── server.go
Now for the fun part: building our server. Instead of interacting with
Go’s *http.Server
directly, we’ll wrap it in our own Server
struct.
This gives us a nice place to store our logger (which uses Go’s slog package)
and keep things tidy.
// server.go
// Server contains the dependencies needed to
// run an HTTP server.
type Server struct {
log *slog.Logger
serv *http.Server
}
// NewServer creates a new Server instance with
// the specified address.
func NewServer(addr string) *Server {
s := &Server{
log: slog.Default().With("component", "server"),
}
s.serv = &http.Server{
Addr: addr,
}
return s
}
At the moment, our server isn’t doing much; it’s just sitting there, waiting
for instructions. Let’s give it some life. To achieve that, we’ll add
Start
and Stop
methods so that we could launch it from the main.go
file
and shut it down gracefully once the process is terminated.
// server.go
// Start starts the server. It blocks until the server.Stop is called.
func (s *Server) Start() error {
s.log.With("addr", s.serv.Addr).Info("started web server")
return s.serv.ListenAndServe()
}
// Stop shuts down the server.
func (s *Server) Stop() error {
return s.serv.Shutdown(context.Background())
}
Let’s add a router
method, which will be responsible for registering all of
our HTTP routes. The fairly recently improved http.ServeMux
in Go is all we
need for most routing, so no need to bring in extra libraries.
// server.go
// router sets up the HTTP routes for the server.
func (s Server) router() http.Handler {
mux := http.NewServeMux()
return mux
}
Make sure your server actually uses this router when you create it:
// server.go
// NewServer creates a new Server instance with
// the specified address.
func NewServer(addr string) *Server {
s := &Server{
log: slog.Default().With("component", "server"),
}
s.serv = &http.Server{
Addr: addr,
Handler: s.router(), // Call the router method here.
}
return s
}
Now, let’s make this thing run. In main.go
, we’ll start up the server
and set things up so it shuts down gracefully when you send it a
termination signal, or simply hit CTRL+C
.
// main.go
func main() {
shutdown := runServer()
ctx, cancel := signal.NotifyContext(
context.Background(),
syscall.SIGINT,
syscall.SIGTERM,
)
defer cancel()
<-ctx.Done()
shutdown()
}
// runServer starts the HTTP server and returns a shutdown function.
func runServer() func() {
srv := server.NewServer(":8080")
stopCh := make(chan struct{})
go func() {
defer close(stopCh)
if err := srv.Start(); err != nil {
slog.Default().With("error", err).Error("unexpected server closure")
}
}()
return func() {
if err := srv.Stop(); err != nil {
slog.Default().With("error", err).Error("stopping server")
}
<-stopCh
}
}
Here’s what’s happening: we use signal.NotifyContext
to listen for process
termination signals like CTRL+C
, which means, when you tell the app to
quit, it doesn’t just vanish, it wraps things up neatly. The server runs in
its own goroutine so your main function can stay alert for shutdown signals.
When it’s time to terminate the process, the shutdown function stops the
server and waits for everything to finish.
And that’s it: your server is up, running, and ready for action. Next, we’ll add some real endpoints and start optimizing them with HTTP caching.
Long-Running Task Endpoint
Let’s jump back into the server.go
file inside the server
package and
set up a route that simulates a long-running task. To keep things simple,
we’ll use a timer to delay our response by five seconds. Think of it as a
stand-in for some heavy report processing.
// server.go
// fetchReport is the handler that fetches report information based on
// the report name provided in the URL path.
func (s Server) fetchReport(w http.ResponseWriter, r *http.Request) {
name := r.PathValue("name")
// To keep things simple, the timer below acts
// as a placeholder for the actual report fetching logic.
select {
case <-time.After(5 * time.Second):
// OK.
case <-r.Context().Done():
w.WriteHeader(http.StatusBadRequest)
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte(fmt.Sprintf("Report %q fetched successfully!", name)))
}
The logic of this handler is pretty straightforward. It grabs the name
parameter from the URL and, after a short wait, sends back a message with
that name included.
To properly integrate this into the router, head over to the router
method
and add a new route that points to your new handler.
// server.go
// router sets up the HTTP routes for the server.
func (s Server) router() http.Handler {
mux := http.NewServeMux()
mux.HandleFunc("GET /reports/{name}", s.fetchReport)
return mux
}
It’s time to see it in action. Start up your server with:
$ go run cmd/main.go
2025/06/07 15:12:33 INFO started web server component=server addr=:8080
Then, give your endpoint a try using curl
. You’ll notice the response
takes five seconds, just as planned. Every request will wait the same
amount of time.
$ time curl 127.0.0.1:8080/reports/hello
Report "hello" fetched successfully!
curl 127.0.0.1:8080/reports/hello 0.00s user 0.00s system 0% cpu 5.007 total
Imagine thousands of users hitting this endpoint, each one triggering a heavy database query to generate the same report. That’s not ideal. Instead, wouldn’t it be better if we could cache each unique response for a while, so we don’t have to do all that work every single time?
That’s exactly what we’ll tackle next. In the following section, we’ll build a simple caching layer to make sure each report gets generated only once per minute, no matter how many requests come in.
Making Endpoints Faster with Caching
To make our server a bit faster, let’s add some response caching. This way,
we can avoid doing the same heavy work for every single request. We’ll keep
the cache logic close to our server code by creating a new package
called respcache
inside the server
folder. Add a cache.go
file there so
that your project structure could look like this:
├── cmd
│ └── main.go
├── go.mod
├── go.sum
└── internal
└── server
├── respcache
│ └── cache.go
└── server.go
Similar to what we did with our Server
wrapper, we’ll create a Cache
struct, a constructor, and a method to stop it when we shut things down. For
this example, we’ll use an automatic cleaner to clear out expired cache items.
We’ll use the github.com/jellydator/ttlcache
package (you can install it
with go get github.com/jellydator/ttlcache/v3
), with a simple string as
the key and a custom struct as the value. This struct will hold everything
we need to send back as a response.
// cache.go
import (
"github.com/jellydator/ttlcache/v3" // import the ttlcache package
)
// Cache contains the dependencies needed to
// cache HTTP requests.
type Cache struct {
log *slog.Logger
cache *ttlcache.Cache[string, cacheItem]
}
// NewCache creates a new Cache instance with the specified
// TTL (time to live, used to auto delete cache items).
func NewCache(ttl time.Duration) *Cache {
c := &Cache{
log: slog.Default().With("component", "cache"),
cache: ttlcache.New(
ttlcache.WithTTL[string, cacheItem](ttl),
),
}
go c.cache.Start()
return c
}
// Stop stops the automatic cleanup process.
// It blocks until the cleanup process exits.
func (c Cache) Stop() {
c.cache.Stop()
}
// cacheItem represents a cached item in the cache.
type cacheItem struct {
body []byte
statusCode int
}
We need a way to check if we already have a cached response for a given
request. If we do, we’ll send it right back. If not, we’ll call the original
handler, capture the response, and store it in the cache for next time. To do
this, we’ll write a middleware that wraps any handler function. We’ll also
need a custom http.ResponseWriter
to catch the response data as it’s written.
// cache.go
// Handle is a middleware that caches the response
// based on the request path and query parameters.
func (c Cache) Handle(next http.HandlerFunc) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
key := r.URL.RawPath + r.URL.RawQuery
// We check if the response is already cached
// by looking up the key in the cache.
item := c.cache.Get(key)
if item != nil {
ci := item.Value()
w.WriteHeader(ci.statusCode)
w.Write(ci.body)
return
}
// We create a custom response writer to capture the
// response body so that we can cache it.
rw := &responseWriter{
w: w,
body: &bytes.Buffer{},
// We set a default status code here,
// as using Write without WriteHeader automatically
// sets the status code to http.StatusOK.
statusCode: http.StatusOK,
}
next.ServeHTTP(rw, r)
// After the response is written, we cache it
// using the key we built earlier.
c.cache.Set(
key,
cacheItem{
body: rw.body.Bytes(),
statusCode: rw.statusCode,
},
ttlcache.DefaultTTL,
)
})
}
// responseWriter is a helper struct that is used to intercept
// http.ResponseWriter's Write method.
type responseWriter struct {
w http.ResponseWriter
body *bytes.Buffer
statusCode int
}
// Write writes the data to the connection as part of an HTTP reply.
func (wr *responseWriter) Write(buf []byte) (int, error) {
n, err := wr.body.Write(buf)
if err != nil {
// unlikely to happen
return n, err
}
return wr.w.Write(buf)
}
// Header returns the header map that is sent by the WriteHeader.
func (wr *responseWriter) Header() http.Header {
return wr.w.Header()
}
// WriteHeader sends an HTTP response header with the provided status code.
func (wr *responseWriter) WriteHeader(statusCode int) {
wr.statusCode = statusCode
wr.w.WriteHeader(statusCode)
}
Here’s how the logic above works: we build a key from the request’s path and
query string. Since this is used only for GET
requests, we don’t worry about
the request body. If a cached response exists for that key, we send it back
and skip the slow handler. If not, we wrap the response writer so we can
capture what the handler writes, then store that in the cache after the
handler finishes.
Next, let’s update our Server
struct to use this cache.
// server.go
// Server contains the dependencies needed to
// run an HTTP server.
type Server struct {
log *slog.Logger
cache *respcache.Cache // We add our newly created response cache here.
serv *http.Server
}
// NewServer creates a new Server instance with
// the specified address.
func NewServer(addr string) *Server {
s := &Server{
log: slog.Default().With("component", "server"),
cache: respcache.NewCache(time.Minute), // We create a new instance of the response cache whenever we create a Server.
}
s.serv = &http.Server{
Addr: addr,
Handler: s.router(),
}
return s
}
Remember to stop the cache when shutting down the server:
// server.go
// Stop shuts down the server.
func (s *Server) Stop() error {
s.cache.Stop()
return s.serv.Shutdown(context.Background())
}
Finally, update your router to use the cache middleware for the report endpoint:
// server.go
// router sets up the HTTP routes for the server.
func (s Server) router() http.Handler {
mux := http.NewServeMux()
mux.Handle("GET /reports/{name}", s.cache.Handle(s.fetchReport))
return mux
}
And that’s it! From this point onwards, when you start your app and hit the same endpoint multiple times, only the first request will take the full five seconds. After that, responses come back almost instantly:
$ time curl 127.0.0.1:8080/reports/hello
Report "hello" fetched successfully!
curl 127.0.0.1:8080/reports/hello 0.00s user 0.00s system 0% cpu 5.007 total
$ time curl 127.0.0.1:8080/reports/hello
Report "hello" fetched successfully!
curl 127.0.0.1:8080/reports/hello 0.00s user 0.00s system 92% cpu 0.004 total
Conclusion
By this point, you’ve put together a simple but powerful caching layer for your Go HTTP server. What started as a slow, five-second response for every report request has become a much smoother experience for your users. Instead of making your server do the same heavy lifting over and over, you’re serving up cached results almost instantly.
This kind of optimization isn’t just about speed; it’s about making your server smarter. With response caching in place, you’re saving resources, reducing database load, and keeping things running smoothly even as traffic grows. Plus, you’ve done it all with clear, maintainable code that’s easy to extend as your app evolves.
Of course, this is just the beginning. You can take this pattern and run with it: maybe you’ll tweak the cache duration, add cache invalidation for updates, or expand the caching logic to other endpoints. The beauty of this approach is that it puts you in control, letting you fine-tune performance based on your app’s real-world needs.
If you want to see the complete code or experiment further, check out the full implementation right here: https://github.com/jellydator/ttlcache/tree/v3/examples/httpcache
Thanks for reading, and good luck with your next Go project!