Jul 9, 2023

Architecture of an early stage SAAS

Design principles, tradeoffs and tricks used to build, deploy and run Feelback, an API-centric SAAS

Introduction

In this article I describe a simple architecture for an early stage SAAS. As a solo founder, I report some choices made to launch Feelback, a small-scale SAAS for collecting users signals about any content.

This article will cover the technical side of designing and running a simple SAAS. It will also include some details about coding and evolving the initial feature set ready at launch. However, research, marketing, administrative, financial and many other aspects required to run a business are not examined here.

Some questions you will find answers to:

How to design a low-maintenance architecture?
Which hosting and providers to choose and what configurations to use?
How to deploy to production with ease?
How to manage a monorepo with all service systems and components?

In case you landed here for the first time, to better understand the architecture and the choices made, here’s a brief introduction of Feelback and what services provides.

What is Feelback?

Feelback allows you to collect any kind of feedback on your websites or apps. You can receive several type of signals, for example:

Votes and Ratings
Feedback Forms
Likes and Reactions

Feelback integrates nicely with any website or tech, and provides many pre-configured components ready to be included in your pages with no effort.

You can read the following articles for use cases:

Case study: Stripe documentation feedback system

Architecture

Design principles

For the first stage of Feelback, I designed the architecture following three basic principles:

Simplicity
The least amount of moving parts. No fancy stuff, but old boring things that works. No complex setup to anticipate the next level scale.
Low maintenance
Smooth to deploy, painless to maintain, easy to monitor. Keep overheads and technical troubles at minimum.
Cost effective and (some-what) flexible
Adopt cloud resources and features when strictly needed, while having some degree of flexibility. It should accommodate temporary traffic spikes, without breaking down to pieces, if some kind of occasional surge happens.

Infrastructure

Feelback architecture is API-centric. It offers all service features though an API. In addition, the managing and configuration functionalities are exposed via the same API. As result the core API server involves almost all business logic, while the surrounding elements present or refine information to the user.

If we zoom-out, the overall Feelback architecture can be sliced in two parts: frontend and backend.

Although, both frontend and backend aren’t used in their strict meaning, this article misuses both terms in favor of a simpler mental model to follow.

Based more on a functional and responsibility separation:

Backend
Where inputs are received, where answers are given, where information and data are processed and stored. Here’s where all the logic runs. Where the dynamic part of the business dwells. It’s the complex side. It requires most of the design effort, where a bad decision or an overlooked issue can have long ramifications and will be costly to overcome.
Frontend
Where information are presented, where inputs are generated, where questions are asked, where requests are sent from. Frontends don’t run any business logic. They are mostly static, generated or built in advance. Let’s call it the dumb side, where less design thinking is needed. Where bad decisions or breaking errors can easily recovered and patched up.

In conjunction, Feelback uses some external services, which can be placed in the same two-side layout, as request senders or request receivers.

On both sides, for each component, this article will explore three main topics:

Infrastructure
Which services and providers Feelback uses and how are connected
Configurations
What setup and configurations are in place while running
Deploy
How the code is built, packaged and sent to production environments

Backend

The Feelback backend is composed by 3 logical systems:

API - exposes the Feelback service and features over HTTP
Worker - executes scheduled jobs
DB - postgres cluster for the data persistence

The entire backend is hosted on Fly. Lightweight virtual machines, called Fly Machines, lay the ground of the Fly infrastructure as a service. On top, the Fly App layer offers a simplified platform to coordinate, configure and deploy apps with little effort and almost no operations needed.

In the Feelback architecture, each system is mapped to a Fly App, abstracting over several underlying Machines with different scaling configurations.

Fly accepts container-based deploys. You can build standard container images with docker and push them to Fly. The deploy orchestrator will gracefully replace running machine instances with the new version.

In addition, Fly provides several features tailored for http servers. The Fly App automatically manages endpoint public ports, custom domains, SSL certificate handling and renewal, load balancing and many other technical things to get and keep the underlying app running.

These features drastically reduces the operations Feelback have to do in house, as infrastructure-wise everything is already handled. After the initial setup, all that is needed, is just to push a container with the new software version.

Although, Fly offers multi-region deployments, meaning you can spread machine instances over multiple physical locations, at launch Feelback architecture is deployed to a single region.

API

The Feelback API is the core system of the Feelback service. It fulfills three main purposes:

Feedback ingestion
Collects user feedbacks and signals from websites and apps where the Feelback service is integrated. In addition, clients can edit or remove a feedback if the configured time-window allows it.
Stats querying
Clients can get aggregate statistics for a content. For example the like count to show a counter on the UI. Or the overall sentiment to display the percentage split between up-votes and down-votes.
Project management
This the private part of the API, used to manage Feelback projects. The primary client is the Feelback Panel, the main dashboard where Feelback customers can create projects, organize contents and configurations.

The Feelback API is a NodeJS app. It’s built with httpc which allows to create APIs as function calls in an RPC-like fashion. Httpc generates a custom typed client to ensure end-to-end type safety with typescript powered clients.

Together with the whole Feelback backend, the Feelback API is hosted on Fly as a Fly App. A Fly App is an abstraction over several Fly Machines, that is, the actual VM CPU/RAM running them.

Configuration

The Feelback API uses a fly.toml file to set configurations and environment variables.

Fly Machines provide an easy way to configure an auto-scaling behavior for an app. The mechanism is based on usage thresholds. The Feelback API uses the following values:

[http_service]
  [http_service.concurrency]
    type = "requests"
    soft_limit = 150
    hard_limit = 200

The Fly App proxy monitors the connections and, according to the thresholds set, can add new machines when more traffic is incoming. The same happens if, for a period of time, traffic is under threshold, excesses machines get shutdown.

This auto-scaling behavior is enabled by two flags:

[http_service]
  auto_start_machines = true
  auto_stop_machines = true

With the previous configuration, the Fly proxy can automatically add and remove app instances and meet traffic demands.

Each app instance is a Fly Machine. We preallocated 4 Fly Machines for the API, as for the actual business size is more than enough to cover also surprise usage.

To prevent the auto-scaling process to completely shutdown the API, a minimum value of active running instances is set.

[http_service]
  min_machines_running  = 2

This ensures some sort of in-region redundancy. If an instance crashes or gets stuck, the service is up nonetheless, with the other instance fulfilling requests. In the meantime, the auto-scaler will kill the bad instance and starts a new one.

Deploy

Fly allows to deploy apps by simply uploading a container image. A Github Action workflows build the Feelback API image with Docker on each push to master. Then, an action sends the image to Fly.

The workflow uses the superfly/flyctl-actions to deploy the image.

jobs:
  deploy:
      # ... build steps

      - name: Deploy to Fly
        uses: superfly/[email protected]
        env:
          FLY_API_TOKEN: ${{ secrets.FLY_AUTH_TOKEN }}
        with:
          args: "deploy ./packages/api/ --local-only --image feelback_api:latest"

The Feelback API is a simple NodeJS with no special prerequisite. The container image is based on Alpine, a lightweight linux-based distribution ready for NodeJS.

The container image is created by Docker with a two-stage build. The first stage installs all dev dependencies and builds the sources. The second stage install production-only dependencies and copies first-stage outputs to the final image.

For reference, here’s the complete dockerfile:

#
# Stage 1: Build from dev
#

FROM node:18-alpine
RUN npm install -g [email protected]

WORKDIR /app

COPY package.json .
COPY pnpm-*.yaml .
COPY patches/ patches
RUN pnpm fetch --ignore-scripts

COPY packages/api/package.json packages/api/
RUN pnpm install --offline --frozen-lockfile

COPY tsconfig* .
COPY packages/api/tsconfig* packages/api/
COPY packages/api/src/ packages/api/src

RUN pnpm run --filter @feelback/api generate:prisma
RUN pnpm run --filter @feelback/api build


#
# Stage 2: final build no dev
#

FROM node:18-alpine
ENV NODE_ENV=production
RUN npm install -g [email protected]

WORKDIR /app

COPY --from=0 /app/package.json .
COPY --from=0 /app/pnpm-*.yaml .
COPY --from=0 /app/patches/ patches
COPY --from=0 /app/packages/api/package.json packages/api/

RUN pnpm install --frozen-lockfile --prod

COPY packages/api/src/data/ packages/api/src/data/
COPY --from=0 /app/packages/api/dist/ packages/api/dist

WORKDIR /app/packages/api

EXPOSE 3000
ENTRYPOINT [ "pnpm", "start" ]

Scheduled jobs worker

The worker executes periodic jobs. There’re three main jobs run:

Feedback stats aggregation
For each content and content-set, Feelback provides weekly, monthly, yearly and various rolling-window aggregates. Aggregate values depends on the feelback-type associated with the content. For example, the like count as the total number of likes received for a specified content. Other aggregates involve sentiment, average and compound calculations.
Aggregates are calculated daily and stored in the DB. A single fast query can pickup the preprocessed data with efficient performance without wasting CPU cycles.
Weekly/Monthly recaps
Customers can subscribe to a period report to be delivered via mail. Overall performance are summarized weekly and monthly. For each project, the worker calculates content performance for the period, composes a mail with nice graphs and stat counters and send it.
Recaps are scheduled the first of each month, covering the previous month data. And every Monday, for the previous week.
Data cleanup
A job scheduled weekly to run some data clean. Expired data, such as temporary access tokens and partial aggregates, is deleted form the DB. Other transient artifacts are removed.

Configuration

In the Feelback architecture there’s no queue nor bus. The Worker just executes scheduled jobs. Therefore the Worker doesn’t run continuously, instead it gets started whenever the scheduled times approach.

A Fly machine is pre-created for the Worker. The machine is always stopped.

At the moment, Fly doesn’t have a global scheduler or a way to run machines on demand via configuration. So a simple manual call to the Fly Machine API is made to start the Worker.

When the API app starts, it runs a little in-process scheduler which performs the wake up call. Each day the API app calls the Fly Machine API to start the Worker with a simple function.

async function startWorker() {
  const machineId = process.evn.FLY_WORKER_MACHINE_ID;
  const authToken = process.evn.FLY_API_TOKEN;
  const response = await fetch(`https://_api.internal:4280/v1/apps/user-functions/machines/${machineId}/start`, {
    method: "POST",
    headers: {
      authorization: `Bearer ${authToken}`,
      "content-type": "application/json"
    }
  });

  return await response.json();
}

After the processing is done, Fly automatically shuts down and stops the worker machine as the main process naturally exits.

To avoid multiple starts, a quick check to the DB is made. A single record tracks the last worker run with a startedAt column. Before the call, the record is locked for update and tested for the today date. After the call, the record is updated with today’s date and released.

Deploy

The Worker is just a small util around business logic already provided by the API. Hence, it shares 99% of code with the API.

For the sake of making things simple, the Worker is packaged within the API. The API container image contains also the Worker bootstrap code. This allows to deploy just a single container image for both the API and the Worker.

On each push on master, a Github Action workflows build the image and sends it to Fly. An environment variable is set for the Worker Machine. So, on launch, the code understands it’s in Worker mode and runs the scheduler, instead of starting the API server.

DB cluster

You should always start with Postgres
Secret guide to speedrun-launch a SAAS, 2nd chapter

The Feelback uses a Postgres DB as its main persistence system. The DB is hosted on Fly, in the same region of the API app. Fly offers a first-party solution for a Postgres cluster with flexible scaling features.

The Fly Postgres allows us to have an hands-on solution with no much trouble nor thinking, as the setup and monitoring a DB is hard. In addition, we can easily scale both for CPU power and data storage, as the business increases.

The Feelback DB is configured as 2-node cluster using stolon to manage the nodes for leader election and replica. On top, stolon provides a proxy the API can talk to that routes connections to the right Postgres node. All this configuration is preset by Fly. So everything comes out-of-the box when we created the cluster.

The postgres DB is exposed as a single private endpoint inside the internal network. Fly automatically provides an environment variable DATABASE_URL the API app uses to establish the right connection.

Backup

Fly performs daily snapshot of postgres volumes. Snapshots can be restored at any time, thus a first-line backup solution is already in place out the box. At the time of writing, snapshots persists for 5 days.

An additional backup is executed outside Fly, via an AWS lambda. A scheduled function connects to the Feelback DB and dumps the entire db to an S3 bucket.

For now, we don’t delete DB backups, we keep backups on S3 indefinitely. In future as the business increases in size, we’ll set a consolidation policy in place.

Frontend(s)

The Feelback frontends include:

User Panel - main access point where a user manage Feelback projects
Home & Docs website - the public presence of the Feelback service
SDKs - client libraries used to integrate Feelback with any website

Both websites are static websites and are hosted on Cloudflare.

SDKs are client libraries Feelback users can adopt to easily integrate Feelback to their websites and start collecting feedbacks. SDKs target different frameworks like React, Astro, vanilla Javascript, Vue, etc…

User Panel

The User Panel is the preferred way to access the Feelback service. A user can create and manage Feelback projects. He can see feedback aggregated stats and checkout content performance. He can analyze every single feedback and manage data exports.

The User Panel is a plain React SPA. No meta-framework is used. The app is bundled with vite via esbuild to just static assets. The JS bundle is split in several parts which are dynamically loaded when needed.

The User Panel is totally static and distributed by Cloudflare Pages. Through the Cloudflare CDN the User Panel app is served with fast performance, as static assets are delivered by servers close to the user.

The User Panel performs no business logic. All Feelback functionalities are achieved connecting to the Feelback API. The User Panel uses the API client generated by httpc to stay up-to-date with the Feelback API type definitions and, thus, achieving end-to-end type-safety.

Deploy

The User Panel deployment consists only in building the static assets and pushing them to Cloudflare. A GitHub Action workflow runs on every push to master. The workflow builds the website and send the output to Cloudflare via wrangler, a CLI tool developed by Cloudflare to easily interact with its services.

For reference, the complete workflow file:

name: deploy_panel

on:
  push:
    branches: [master]
    paths: ["packages/panel/**"]

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Setup node & pnpm
        uses: ./.github/actions/setup-env

      - name: Test
        run: pnpm run --filter @feelback/panel test

      - name: Build
        run: pnpm run --filter @feelback/panel build
        env:
          VITE_API_ENDPOINT: https://api.feelback.dev/v0

      - name: Deploy
        run: pnpm run --filter @feelback/panel deploy
        env:
          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_AUTH_TOKEN }}

While the package scripts are:

packages/panel/package.json

{
  "scripts": {
    "build": "tsc && vite build",
    "deploy": "wrangler pages publish dist --project-name feelback-panel --branch main",
    "test": "jest",
  }
}

Home & Docs website

The main website www.feelback.dev. The website includes the marketing and landing pages, the documentation and the blog.

The website is build with Astro and is completely generated at build time, making it a 100% static website with no server-side components. Some pages with interactivity loads preact on-demand to provide client interactions. Astro integrate natively with frameworks like preact, making the effort just a line of configuration.

Similar to the User Panel, the Home website is completely static and hosted on Cloudflare Pages.

Deploy

The deployment process of the Home website is the same of the User Panel. On each push to master, a GitHub Action workflow builds the website and sends it to Cloudflare.

The complete workflow file:

name: deploy_www

on:
  push:
    branches: [master]
    paths: ["packages/www/**"]

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Setup node & pnpm
        uses: ./.github/actions/setup-env

      - name: Build
        run: pnpm run --filter @feelback/www build

      - name: Deploy
        run: pnpm run --filter @feelback/www deploy
        env:
          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_AUTH_TOKEN }}

And the package scripts are:

packages/www/package.json

{
  "scripts": {
    "build": "astro build",
    "deploy": "wrangler pages publish dist --project-name feelback-www --branch main"
  }
}

SDKs

Feelback offers first-party client packages to quickly integrate Feelback with any website. Feelback supports any technology. From static site generators to SPA frameworks like React or Vue, from site builder like Wordpress to plain HTML/Javascript.

Different SDKs are developed to target the major frameworks and libraries.

Feelback SDKs are open-source and developed on a public Github repository. Each SDK package is published to npm under the @feedback scope.

Websites using any Feelback SDKs interact with the Feelback service via the API. Thus, making them clients of the service.

External services

As described in the Architecture chapter, the Feelback architecture relies on two essential service providers:

Fly - for all backend production infrastructure
Cloudflare - for all frontend hosting

In addition, Feelback depends on several auxillary providers:

Stripe
Billing, invoicing and subscription management for paid plans. The Feelback API and the Job processor interact with Stripe API for both realtime operations, like a change plan, and for background process like updating invoices and customer data. In addition, the Feelback API receives and processed the Stripe Webhook requests, mostly for notifications about subscription events.
MailerSend
Feelback uses MailerSend as email provider for both notification emails such as login requests and password management, and for reporting emails like weekly recaps.
Logflare
Backend info and error logs are streamed to Logflare which ingests log entries as JSON. We use a custom HTTP Transport for winston, our logging library of choice. Logs are batched and sent to the Logflare API, enriched with basic metadata like request identifiers and relevant environment variables.
Sentry
For client side, aka frontends, error monitoring.
Metabase
A self-hosted instance of Metabase is used for internal dashboards and business goal tracking.
AWS for Backup
AWS is used to make backups of the database. A scheduled lambda runs at specified intervals, and backups the whole db to an S3 bucket.

The overall architecture diagram of the Feelback SAAS is something like:

Conclusions

Building a SAAS, aka the product, is quite easy compared to building the business to sustain it. Unless you’re on a cutting-edge crazy research project, developing the service is the easiest part. Aside technical skills, bootstrapping a SAAS from zero requires mental effort the most. The mental energy to keep pushing in a somewhat adverse environment.

To reduce the mental weight, every solution, choice or trick discussed here, were made pursing simplicity and avoid any complication. Keep things extremely simple. Don’t charge any toll on the mind about technical unproductive matters. Nor design things to handle a supposed next-stage scale.

Hope the info I wrote here can be useful to those who are in the same boat as me, rowing and pushing in these unresting waters. Or, maybe, inspire to begin a new journey.

Additional resources

If you curious about Feelback and what offers:

See the getting started guide with an overview of the Feelback service and all the features available
Checkout the tutorials to see how you can integrate Feelback with any website and start collecting feedbacks