The Ambient Novel is an experimental interface for nonlinear long-form narrated text. It was developed for Scott Wayne Indiana's book The Valentine Mob. The interface allows for simultaneous playback, scrubbing, and interleaving of multiple narrated audio tracks.
[!NOTE] This source code is released as a curiosity, and its quality reflects that of a quick personal side project subject to a number of hasty iterations.
The most interesting parts from a technical standpoint are probably in the content pipeline, which generates the metadata responsible for aligning the audio book with the presentation of the text. It creates inferences from a recorded reading of the book, with some extra logic to negotiate any variations between the exact (known) text of the book and the narrator's (sometimes divergent) utterances to yield the original book text with word-level audio alignment timing data.
Everything starts with a JSON file representing the book's contents, paragraph by paragraph.
During a one-time content generation step, this JSON file goes through a number of transforms, where it's combined with audio files and ambient music tracks to yield a revised JSON file with per-word timing data embedded in a <span>
element's data
attributes wrapping each word of the book. (The span elements are ugly, but this is a fast way to provide timing data to the front-end instead of maintaining a cleaner but higher-overhead JSON abstraction — though in hindsight the transformation from timing data to markup should happen in a Svelte server function instead of the content generation script.)
Behind the scenes, the content generation step uses whisperx, with some additional logic employing Levenshtein distance measurements to ensure parity between the text transcribed from the audio and the text of the book itself.
During early development of the project, before recordings of the audio book narration were available, a text-to-speech engine was used to generate spoken audio — that plumbing remains in the repository
At runtime, the book text, timing data, and the associated audio files are loaded by the front-end, which is a SvelteKit app implementing a number of custom Svelte components. The heart of this is the track component which keeps the scrolling text of each chapter in sync with the audio, and allows live scrubbing through the audio and text simultaneously.
The site is statically generated, and does not depend on any server-side logic at runtime.
To improve performance, and to enable offline listening, audio files are preloaded and cached by a service worker abstracted by workbox. (Browsers, especially mobile Safari, purge even modestly-sized audio files from the browser's cache very aggressively, so managing caching manually was the only way to achieve remotely acceptable performance.)
The site's visual design is derived from the cover art of The Valentine Mob book. Aesthetic specificity notwithstanding, the ambient novel "system" could theoretically be used to make the text of any book interactive, so long as it's provided in the appropriate format.
[!IMPORTANT] Some audio content assets required for the are external to this repository.
Certain data and assets are generated from the source data in /data
and output to /static
, /data-generated
and /src/lib/data
.
It takes about half an hour on to generate everything from scratch.
The content generator does a number of things depending on the config object in /data/book.json
.
/data/book.json
to yield the final /src/lib/data
To install dependencies for the content generation process, run:
To update the generated data, run:
pnpm run generate-data
Note that this will overwrite existing data.
Note special cards with embedded html (1-indexed):
This is sketchy, and not well automated on account of its rare invocation.
Runs on an M1.
To set up the environment:
# install brew if you haven't already
# install ffmpeg with fddk-aac
# whisperx doesn't care about specific encoder implementations, but safari does
# and we also use ffmpeg in the data generation step
# ffprobe is installed along with ffmpeg
brew tap homebrew-ffmpeg/ffmpeg
brew install homebrew-ffmpeg/ffmpeg/ffmpeg --with-fdk-aac
brew install miniconda
conda init zsh
# restart terminal
# install whisperx
conda create --name whisperx python=3.10
conda activate whisperx
pip install argparse torch torchaudio torchvision
pip install git+https://github.com/m-bain/whisperx.git
# overwrite previously installed torch with nightly build for mps / m1 support
pip install --pre --force-reinstall torch torchaudio torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
# pip will complain about mismatched dependencies, but ignore this
conda deactivate
# install tts
# https://github.com/coqui-ai/TTS/discussions/2177
conda create --name coqui python=3.9
conda activate coqui
git clone https://github.com/coqui-ai/TTS.git
brew install mecab
brew install espeak
conda install numpy scipy scikit-learn Cython
pip install -e .
make install
conda deactivate
No bundle size advantage to moving content preprocessing deps only to their own package.json.
To use the build-report
npm script, install dust
via homebrew if needed.
brew install dust
@vite-pwa/sveltekit
, but too many issues getting correct behavior around range requests.The app is deployed via a GitHub action to Scott's DreamHost server, which runs automatically on deployment to the main
or develop
branches. Each branch deploys to a different subdirectory on the server. Configuration and secrets are stored in the GitHub repo settings.
Required GitHub secrets:
SERVER_HOST
DreamHost server host name
SERVER_USERNAME
DreamHost server SSH user
SERVER_PASSWORD
DreamHost server SSH password
Required GitHub variables:
BASE_PATH_PRODUCTION
Name of subfolder to copy the site to. During the build process, this variable is also used in svelte.config.js
. copied. Must start with /
and end without /
.
Example: /thevalentinemob
BASE_PATH_STAGING
As above, but for the develop branch.
Example: /thevalentinemob-staging
SERVER_PATH
DreamHost server path, this is prepended to the base path when files are copied. Must start with /
and end without /
.
Example: /home/some-user/some-folder
Note: The deployment server must support HTTP 206 range requests to successfully set currentTime
on audio elements on chrome.
The website's code and the book's text are shared under different licenses:
The Ambient Novel website project is licensed under the MIT License. See license.txt
.
The text of The Valentine Mob book (e.g. /data/book.json
and its derivatives throughout the project) is ©39forks Publishing USA 2023 All Rights Reserved. See /data/license.txt
.