Typesense DocSearch Scraper
The typesense-docsearch-scraper scans your documentation site and indexes its content into a Typesense collection, enabling fast, full-text search for your docs.
You’ll typically run it whenever you update your documentation content. To set it up, follow the instructions in the official Typesense Docs.
DocSearch Scraper Config File
Section titled “DocSearch Scraper Config File”Here is the template for the Scraper Config File that you can quickly copy into your Starlight project:
{ "index_name": "YOUR_TYPESENSE_COLLECTION_NAME", // make sure this matches your `typesenseCollectionName` in `astro.config.ts`, the plugin uses it to query the same Typesense collection that the scraper indexes. "start_urls": [ { "url": "YOUR_DOCUMENTATION_SITE_URL" } ], "selectors": { "default": { "lvl0": ".main-pane h1", "lvl1": ".main-pane h2", "lvl2": ".main-pane h3", "lvl3": ".main-pane h4", "lvl4": ".main-pane h5", "text": ".main-pane p, .main-pane ul li, .main-pane table tbody tr" } }, "strip_chars": " .,;:#", "scrape_start_urls": false}Running the Scraper
Section titled “Running the Scraper”Here is how to run the Typesense DocSearch Scraper in your Starlight project.
Run the scraper locally
Section titled “Run the scraper locally”After completing the steps in the official Typesense Docs, add this script to your Starlight site’s package.json:
{ "scripts": { "scrape": "docker run -it --env-file=./.env -e \"CONFIG=$(cat docsearch.config.json | jq -r tostring)\" typesense/docsearch-scraper:0.11.0" }}You can then run the scraper with:
npm run scrape # yarn, pnpm, bun equivalentlyRun the scraper in Github Actions
Section titled “Run the scraper in Github Actions”Here is an example Github Actions workflow that runs the Typesense DocSearch Scraper whenever documentation content is updated in the docs/src/content folder, or when manually triggered:
name: typesense-docsearch-scraper
on: push: branches: - master paths: - 'docs/src/content/**' workflow_dispatch:
concurrency: cancel-in-progress: true group: ${{ github.workflow }}
jobs: scrape: name: Scrape documentation and update Typesense runs-on: ubuntu-latest steps: - name: Checkout Repository uses: actions/checkout@v4
- name: Run DocSearch Scraper uses: celsiusnarhwal/typesense-scraper@v2 with: api-key: ${{ secrets.TYPESENSE_API_KEY }} host: ${{ secrets.TYPESENSE_HOST }} port: ${{ secrets.TYPESENSE_PORT }} protocol: ${{ secrets.TYPESENSE_PROTOCOL }}
config: docs/docsearch.config.jsonOnce the scraper finishes, your Typesense collection will be populated and ready to use with the Starlight Typesense plugin.