Search the docs...

/

Guides HTMLRewriter

Extract links from a webpage using HTMLRewriter with Bun

Edit on GitHub

Extract links from a webpage

Bun's HTMLRewriter API can be used to efficiently extract links from HTML content. It works by chaining together CSS selectors to match the elements, text, and attributes you want to process. This is a simple example of how to extract links from a webpage. You can pass .transform a Response, Blob, or string.

async function extractLinks(url: string) {
  const links = new Set<string>();
  const response = await fetch(url);

  const rewriter = new HTMLRewriter().on("a[href]", {
    element(el) {
      const href = el.getAttribute("href");
      if (href) {
        links.add(href);
      }
    },
  });

  // Wait for the response to be processed
  await rewriter.transform(response).blob();
  console.log([...links]); // ["https://bun.sh", "/docs", ...]
}

// Extract all links from the Bun website
await extractLinks("https://bun.sh");

Convert relative URLs to absolute

When scraping websites, you often want to convert relative URLs (like /docs) to absolute URLs. Here's how to handle URL resolution:

async function extractLinksFromURL(url: string) {
  const response = await fetch(url);
  const links = new Set<string>();

  const rewriter = new HTMLRewriter().on("a[href]", {
    element(el) {
      const href = el.getAttribute("href");
      if (href) {
        // Convert relative URLs to absolute
        try {
          const absoluteURL = new URL(href, url).href;
          links.add(absoluteURL);
        } catch {
          links.add(href);
        }
      }
    },
  });

  // Wait for the response to be processed
  await rewriter.transform(response).blob();
  return [...links];
}

const websiteLinks = await extractLinksFromURL("https://example.com");

See Docs > API > HTMLRewriter for complete documentation on HTML transformation with Bun.

Ecosystem

Use React and JSX

Use EdgeDB with Bun

Use Prisma with Bun

Create a Discord bot

Add Sentry to a Bun app

Use Drizzle ORM with Bun

Run Bun as a daemon with PM2

Build an app with Nuxt and Bun

Build an app with Qwik and Bun

Build an app with Astro and Bun

Build an app with Remix and Bun

Run Bun as a daemon with systemd

Build an app with Next.js and Bun

Deploy a Bun application on Render

Build an app with SvelteKit and Bun

Build a frontend using Vite and Bun

Build an app with SolidStart and Bun

Use Neon Postgres through Drizzle ORM

Build an HTTP server using Hono and Bun

Use Neon's Serverless Postgres with Bun

Build an HTTP server using Elysia and Bun

Containerize a Bun application with Docker

Build an HTTP server using Express and Bun

Server-side render (SSR) a React component

Build an HTTP server using StricJS and Bun

Read and write data to MongoDB using Mongoose and Bun

HTMLRewriter

Extract links from a webpage using HTMLRewriter

Extract social share images and Open Graph tags

HTTP

Common HTTP server usage

Hot reload an HTTP server

Write a simple HTTP server

Start a cluster of HTTP servers

Configure TLS on an HTTP server

Send an HTTP request using fetch

Proxy HTTP requests using fetch()

Stream a file as an HTTP Response

Upload files via HTTP using FormData

fetch with unix domain sockets in Bun

Streaming HTTP Server with Async Iterators

Streaming HTTP Server with Node.js Streams

Package manager

Add a dependency

Add a Git dependency

Add a peer dependency

Add a tarball dependency

Add a trusted dependency

Add an optional dependency

Add a development dependency

Using bun install with Artifactory

Generate a yarn-compatible lockfile

Migrate from npm install to bun install

Configuring a monorepo using workspaces

Install a package under a different name

Configure git to diff Bun's lockb lockfile

Install dependencies with Bun in GitHub Actions

Override the default npm registry for bun install

Using bun install with an Azure Artifacts npm registry

Configure a private registry for an organization scope with bun install

Processes

Read from stdin

Listen for CTRL+C

Listen to OS signals

Spawn a child process

Parse command-line arguments

Read stderr from a child process

Read stdout from a child process

Get the process uptime in nanoseconds

Spawn a child process and communicate using IPC

Reading files

Read a JSON file

Check if a file exists

Read a file to a Buffer

Read a file as a string

Get the MIME type of a file

Read a file to a Uint8Array

Read a file to an ArrayBuffer

Watch a directory for changes

Read a file as a ReadableStream

Runtime

Delete directories

Import a JSON file

Import a TOML file

Run a Shell Command

Re-map import paths

Set a time zone in Bun

Set environment variables

Import a HTML file as text

Read environment variables

Debugging Bun with the web debugger

Install and run Bun in GitHub Actions

Install TypeScript declarations for Bun

Debugging Bun with the VS Code extension

Inspect memory usage using V8 heap snapshots

Define and replace static globals & constants

Codesign a single-file JavaScript executable on macOS

Streams

Convert a ReadableStream to JSON

Convert a Node.js Readable to JSON

Convert a ReadableStream to a Blob

Convert a Node.js Readable to a Blob

Convert a ReadableStream to a Buffer

Convert a ReadableStream to a string

Convert a Node.js Readable to a string

Convert a ReadableStream to a Uint8Array

Convert a ReadableStream to an ArrayBuffer

Convert a Node.js Readable to an Uint8Array

Convert a Node.js Readable to an ArrayBuffer

Convert a ReadableStream to an array of chunks

Test runner

Mock functions in bun test

Spy on methods in bun test

Using Testing Library with Bun

Update snapshots in bun test

Run tests in watch mode with Bun

Use snapshot testing in bun test

Bail early with the Bun test runner

Skip tests with the Bun test runner

Migrate from Jest to Bun's test runner

Run your tests with the Bun test runner

Set the system time in Bun's test runner

Write browser DOM tests with Bun and happy-dom

Set a per-test timeout with the Bun test runner

Mark a test as a "todo" with the Bun test runner

Re-run tests multiple times with the Bun test runner

Set a code coverage threshold with the Bun test runner

Generate code coverage reports with the Bun test runner

import, require, and test Svelte components with bun test

Utilities

Hash a password

Generate a UUID

Escape an HTML string

Get the current Bun version

Encode and decode base64 strings

Check if two objects are deeply equal

Detect when code is executed with Bun

Get the directory of the current file

Get the file name of the current file

Convert a file URL to an absolute path

Compress and decompress data with gzip

Convert an absolute path to a file URL

Get the path to an executable bin file

Sleep for a fixed number of milliseconds

Compress and decompress data with DEFLATE

Get the absolute path of the current file

Check if the current file is the entrypoint

Get the absolute path to the current entrypoint

WebSocket

Build a simple WebSocket server

Enable compression for WebSocket messages

Build a publish-subscribe WebSocket server

Set per-socket contextual data on a WebSocket

Writing files

Write to stdout

Write a Blob to a file

Write a file to stdout

Append content to a file

Write a string to a file

Write a file incrementally

Write a Response to a file

Copy a file to another location

Write a ReadableStream to a file