Lang50 · CS50x Final Project
Video Demo:
Description
Lang50 is a game in which languages (modern, ancient, or constructed) must
be identified from audio recordings.
Context
The Lang50 project is a joint venture of Ulysse Berra and Aurélien Berra. It
serves as our Harvard CS50x final project, as well as changing the world, one
nerdy game at a time.
Definition and Aim
Develop a simple web-based language quiz app. It requires the user to listen to
audio snippets of natural or constructed languages and to identify matching
transcriptions in the original scripts.
The texts are translations of the first article of the Universal Declaration
of Human Rights (UDHR): “All human beings are born free and equal in dignity and
rights. They are endowed with reason and conscience and should act towards one
another in a spirit of brotherhood.”
Inspiration
Our main inspiration is the online geeky game Language Squad. It was itself
derived from the extinct project The Great Language
Game, for which legacy documentation and
data are available.
The open (CC BY) data consist in "a confusion dataset based on usage" and is
"meant to help researchers and hobbyists examine what languages people commonly
confuse for one another": "Usage data from the Great Language Game, containing
the guesses users made in identifying unknown foreign language audio samples.
The 2014-03-02 version of this dataset contains some 16 million records of
guesses, one JSON record per line." These data could provide a criterion for
setting difficulty levels.
Other similar projects are mentioned by the creator of the original game:
- Ling Your Language uses only audio snippets,
but implements most of the features we initially had in mind and many more:
- samples: "Over 2,500 samples in nearly 100 languages and 200 dialects from
around the world"
- learning resources about language families: "Information on every language
available both in-game and on a dedicated Learn page", with classifications
and link to scholarly articles
- user profiles and leaderboard: "User profiles: track your high scores and
compare them with other players, and level up your account from mere “Language
enthusiast” to “Omniglot” – master of all languages!"
- competition: "Two different multiplayer modes, each for up to four players
compete with friends and family to see who’s got the best ear for languages!"
- Language Squad's originality is that it
features two modes : audio to guess from spoken samples or alphabet to guess
from written languages in the original scripts. After selecting a difficulty
level, the user has to choose between an increasing number of language names,
either scoring points for good answers or bumping the bombs count for errors
– 3 bombs, and it is game over. There are no user accounts.
- audio difficulty levels
- beginner: 11 languages
- easy: 24 languages
- medium: 58 languages
- hard: 92 languages
- alphabet difficulty levels
- easy: 20 alphabets
- hard: 46 alphabets
- interesting UI features
- zoom on written samples (either Unicode text or images)
- autoplay option for spoken samples
- Name That Language was a simpler version,
with different aesthetic choices. It does not work anymore. Apparently it had no
difficulty levels, only rounds to go through with 3 lives.
Specifications
Defining features
- include constructed languages, on top of natural languages
- include language variants: some dialects and historical forms (no accents
in this version)
- make the game a learning experience: after the user answers the question,
they are shown the correct script, along with the language's name and can
learn about it on Ethnologue (URLs using the ISO 3-letter codes stored in our
database)
Features not implemented here, although it would have been nice…
- provide hints to the user: specific language features to look for
- add life refill mechanism: after 5 consecutive correct answers, 1 life gets
refilled out of initial 3
- provide the option to exclude specific languages, e.g. the user's native
language
- account management
- registration: user inputs credentials into registration form, browser sends credentials to /register.
- login: user inputs credentials into login form, browser sends credentials to /login, gets access token + refresh token
- token refresh: once access token has expired (15 min), browser sends refresh token to /refresh to get new access token
- logout: browser sends post to /logout, server revokes refresh tokens
- hints
Gameplay
- successive rounds
- lives: three of them at the start of the game
- final score
- under the hood, the logic of the game is the following: we call our API to
fetch the languages, randomly pick three and define the correct answer, then
give feedback once the player has made a guess
Database
- 156 languages in our SQL database (see
server
)
Initial list of languages (50 in total, as in Lang50, but many more were
added)
- natural languages (44, including 3 ancient)
- Indo-European
- Germanic
- Latin
- Indo-Iranian
- Hellenic
- Slavic
- Dravidian
- Sino-Tibetan
- Austronesian
- Afro-Asiatic
- Languages unclassified for the purpose of the project
- Language groups not represented
- Niger-Congo
- Uralic
- Trans-New Guinea
- Constructed languages (6)
- Esperanto · epo | Ethnologue · ISO · UDHR txt
- High Valyrian (Game of Thrones)
- Klingon (Star Trek)
- Na'vi (Avatar)
- Quenya (Tolkien)
- Toki Pona
- constructed languages for which data are insufficient
Technologies and Implementation
We use the following technologies:
- Wireframing
- Back-end (
lang50/server
)
- Front-end (
lang50/lang50
)
- Language Resources (documentation in
lang50/lang50/static
, audio files in
lang50/server/src
, metadata and text snippets compiled from
lang50/server/src
to lang50/database.db
)