Ambient Novel

kitschpatrol

An interface for nonlinear, interwoven, and interactive exploration of a novel.

#ambient #book #fiction #interactive #novel #svelte #whisper

Ambient Novel

Overview

The Ambient Novel is an experimental interface for nonlinear long-form narrated text. It was developed for Scott Wayne Indiana's book The Valentine Mob. The interface allows for simultaneous playback, scrubbing, and interleaving of multiple narrated audio tracks.

The project is live at 39forks.com/thevalentinemob.

[!NOTE] This source code is released as a curiosity, and its quality reflects that of a quick personal side project subject to a number of hasty iterations.

The most interesting parts from a technical standpoint are probably in the content pipeline, which generates the metadata responsible for aligning the audio book with the presentation of the text. It creates inferences from a recorded reading of the book, with some extra logic to negotiate any variations between the exact (known) text of the book and the narrator's (sometimes divergent) utterances to yield the original book text with word-level audio alignment timing data.

Basic architecture

Everything starts with a JSON file representing the book's contents, paragraph by paragraph.

During a one-time content generation step, this JSON file goes through a number of transforms, where it's combined with audio files and ambient music tracks to yield a revised JSON file with per-word timing data embedded in a <span> element's data attributes wrapping each word of the book. (The span elements are ugly, but this is a fast way to provide timing data to the front-end instead of maintaining a cleaner but higher-overhead JSON abstraction — though in hindsight the transformation from timing data to markup should happen in a Svelte server function instead of the content generation script.)

Behind the scenes, the content generation step uses whisperx, with some additional logic employing Levenshtein distance measurements to ensure parity between the text transcribed from the audio and the text of the book itself.

During early development of the project, before recordings of the audio book narration were available, a text-to-speech engine was used to generate spoken audio — that plumbing remains in the repository

At runtime, the book text, timing data, and the associated audio files are loaded by the front-end, which is a SvelteKit app implementing a number of custom Svelte components. The heart of this is the track component which keeps the scrolling text of each chapter in sync with the audio, and allows live scrubbing through the audio and text simultaneously.

The site is statically generated, and does not depend on any server-side logic at runtime.

To improve performance, and to enable offline listening, audio files are preloaded and cached by a service worker abstracted by workbox. (Browsers, especially mobile Safari, purge even modestly-sized audio files from the browser's cache very aggressively, so managing caching manually was the only way to achieve remotely acceptable performance.)

The site's visual design is derived from the cover art of The Valentine Mob book. Aesthetic specificity notwithstanding, the ambient novel "system" could theoretically be used to make the text of any book interactive, so long as it's provided in the appropriate format.

Development notes

Updating the content

[!IMPORTANT] Some audio content assets required for complete content generation are external to this repository.

Certain data and assets are generated from the source data in /data and output to /static, /data-generated and /src/lib/data.

It takes about half an hour to generate everything from scratch.

The content generator does a number of things depending on the config object in /data/book.json.

Generates voice audio files to narrate each chapter if no recordings are provided.
Compresses the voice audio to a number of formats.
Transcribes the voice files using a speech-to-text to a transcript with line-level timings
Aligns the transcription output to the text of the book, preserving the line-level timings
Runs a word-level timing inference on the modified transcript against the original voice over file
Compresses the ambient music to a number of formats.
Information gathered in the above steps is merged with data from /data/book.json to yield the final /src/lib/data

To install dependencies for the content generation process, run:

To update the generated data, run:

pnpm run generate-data

Note that this will overwrite existing data.

Note special cards with embedded html (1-indexed):

1: Has a color span
19: Has a list
54: Has a color span
83: Has a color span

Transcript alignment and text to speech

This is sketchy, and not well automated on account of its rare invocation.

Runs on an M1.

To set up the environment:

# install brew if you haven't already
# install ffmpeg with fddk-aac
# whisperx doesn't care about specific encoder implementations, but safari does
# and we also use ffmpeg in the data generation step
# ffprobe is installed along with ffmpeg
brew tap homebrew-ffmpeg/ffmpeg
brew install homebrew-ffmpeg/ffmpeg/ffmpeg --with-fdk-aac
brew install miniconda
conda init zsh

# restart terminal

# install whisperx
conda create --name whisperx python=3.10
conda activate whisperx
pip install argparse torch torchaudio torchvision
pip install git+https://github.com/m-bain/whisperx.git

# overwrite previously installed torch with nightly build for mps / m1 support
pip install --pre --force-reinstall torch torchaudio torchvision --index-url https://download.pytorch.org/whl/nightly/cpu

# pip will complain about mismatched dependencies, but ignore this
conda deactivate

# install tts
# https://github.com/coqui-ai/TTS/discussions/2177
conda create --name coqui python=3.9
conda activate coqui
git clone https://github.com/coqui-ai/TTS.git
brew install mecab
brew install espeak
conda install numpy scipy scikit-learn Cython
pip install -e .
make install
conda deactivate

Size optimization

No bundle size advantage to moving content preprocessing deps only to their own package.json.

To use the build-report npm script, install dust via homebrew if needed.

brew install dust

Reference links

Scrolling

Tailwind

https://stackoverflow.com/a/76984634/2437832

Svelte PWA

https://stackoverflow.com/questions/76007716/how-do-i-use-workbox-range-requests-plugin-with-vite-pwa
https://github.com/userquin/sveltesociety.dev/tree/pwa
https://www.sarcevic.dev/offline-first-installable-pwa-sveltekit-workbox-precaching
https://github.com/daffinm/audio-cache-test
Tried @vite-pwa/sveltekit, but too many issues getting correct behavior around range requests.

Deployment

The app is deployed via a GitHub action to Scott's DreamHost server, which runs automatically on deployment to the main or develop branches. Each branch deploys to a different subdirectory on the server. Configuration and secrets are stored in the GitHub repo settings.

Required GitHub secrets:

SERVER_HOST
DreamHost server host name
SERVER_USERNAME
DreamHost server SSH user
SERVER_PASSWORD
DreamHost server SSH password

Required GitHub variables:

BASE_PATH_PRODUCTION
Name of subfolder to copy the site to. During the build process, this variable is also used in svelte.config.js. Must start with / and end without /.
Example: /thevalentinemob
BASE_PATH_STAGING
As above, but for the develop branch.
Example: /thevalentinemob-staging
SERVER_PATH
DreamHost server path, this is prepended to the base path when files are copied. Must start with / and end without /.
Example: /home/some-user/some-folder

Note: The deployment server must support HTTP 206 range requests to successfully set currentTime on audio elements on chrome.

Known issues

There are some issues with flick scrolling of chapter text on mobile Safari, where there are no touch up events during inertial scroll animations.

Acknowledgments

Scott Wayne Indiana
author, narrator, interaction designer
Alex McCarl
original ambient tracks
Mike Budd
audio recording and mixing

License

The website's code and the book's text are shared under different licenses:

Website

The Ambient Novel website project is licensed under the MIT License. See license.txt.

Book

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing