Skip to content

Typesense DocSearch Scraper

The typesense-docsearch-scraper scans your documentation site and indexes its content into a Typesense collection, enabling fast, full-text search for your docs. You’ll typically run it whenever you update your documentation content. To set it up, follow the instructions in the official Typesense Docs.

Here is the template for the Scraper Config File that you can quickly copy into your Starlight project:

docsearch.config.json
{
"index_name": "YOUR_TYPESENSE_COLLECTION_NAME", // make sure this matches your `typesenseCollectionName` in `astro.config.ts`, the plugin uses it to query the same Typesense collection that the scraper indexes.
"start_urls": [
{
"url": "YOUR_DOCUMENTATION_SITE_URL"
}
],
"selectors": {
"default": {
"lvl0": ".main-pane h1",
"lvl1": ".main-pane h2",
"lvl2": ".main-pane h3",
"lvl3": ".main-pane h4",
"lvl4": ".main-pane h5",
"text": ".main-pane p, .main-pane ul li, .main-pane table tbody tr"
}
},
"strip_chars": " .,;:#",
"scrape_start_urls": false
}

Here is how to run the Typesense DocSearch Scraper in your Starlight project.

After completing the steps in the official Typesense Docs, add this script to your Starlight site’s package.json:

package.json
{
"scripts": {
"scrape": "docker run -it --env-file=./.env -e \"CONFIG=$(cat docsearch.config.json | jq -r tostring)\" typesense/docsearch-scraper:0.11.0"
}
}

You can then run the scraper with:

Terminal window
npm run scrape # yarn, pnpm, bun equivalently

Here is an example Github Actions workflow that runs the Typesense DocSearch Scraper whenever documentation content is updated in the docs/src/content folder, or when manually triggered:

name: typesense-docsearch-scraper
on:
push:
branches:
- master
paths:
- 'docs/src/content/**'
workflow_dispatch:
concurrency:
cancel-in-progress: true
group: ${{ github.workflow }}
jobs:
scrape:
name: Scrape documentation and update Typesense
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v4
- name: Run DocSearch Scraper
uses: celsiusnarhwal/typesense-scraper@v2
with:
api-key: ${{ secrets.TYPESENSE_API_KEY }}
host: ${{ secrets.TYPESENSE_HOST }}
port: ${{ secrets.TYPESENSE_PORT }}
protocol: ${{ secrets.TYPESENSE_PROTOCOL }}
config: docs/docsearch.config.json

Once the scraper finishes, your Typesense collection will be populated and ready to use with the Starlight Typesense plugin.