Arabicstemmer

Wollaston

A small web app that uses NLTK's Arabic stemming algorithms to identify the roots of Arabic words

#daisyui #nlp #nltk #python #svelte #sveltekit #tailwind

ArabicStemmer

A simple web app that allows the user to enter an Arabic word and retrieve stem predictions from three of NLTK's Arabic stemming algorithms.

Example usage of ArabicStemmer

This was created as a learning project to learn more about python stemming algorithms for the Arabic language, to experiment with SvelteKit, especially its API functionality, and to explore Node child processes.

How To Use

Enter an Arabic word in the prompt. Submitting the request will prompt three of NLTK's Arabic stemming algorithms and deliver the response back in table form to the user.

The user can enter words in two ways:

Type the word using an Arabic keyboard
Type using the latin script, and use the incorporated Yamli tool to select the transliterated Arabic

How It Works

The application simply takes the form entry, calls the python script, and returns the result to the user.

To do this, the web app was scaffolded using SvelteKit.

In this example, Node spawns a child process to call the python script with NLTK via an API. It takes the form entry as input and returns the predicted stems as output in JSON format. In this way, the script is coupled with the app for a convenient example, but it can also be easily decoupled and hosted elsewhere for standard API function calls.

Use the App

In its current basic form, there are a few steps required to get the app up and running.

Clone the repo from GitHub

git clone [email protected]:Wollaston/ArabicStemmer.git //using ssh

Create a virtual environment for working with the Python component of the app
```
python3 -m venv venv
```
Install NLTK in the virtual environment
```
pip install nltk
```
Install the SvelteKit and Node dependencies
```
npm install
```

Launch the app using local host

npm run dev //will provide a link to the proper port

Why Three Predictions?

During experimentation, it became clear that the existing Arabic stemming algorithms from NLTK are not entirely perfect, especially when trying to accurately identify word roots, although they are generally accurate with standard vocabulary.

Therefore, the algorithm provides three predictions to give the user some choice when assessing the accuracy of the responses.

These algorithms are:

Next Steps

Explore additional Arabic stemming algorithms and incorporate accordingly
Decouple the Python script from the App for efficient hosting options
Create a local desktop app for a standalone client
Provide the ability to link stemmed responses to a root-based Arabic dictionary and/or provide examples of words based on that stem and root
Add additional tooling and guidance to the App, for example the Buckwalter Arabic Morphological Analyzer
Program proper error checking
Incorporate stem and root verifiers, and warn the user accordingly if the predicted stem does not match an established Arabic stem or root
- This may be useful when working with roots that are not three letters, or with hamzated/geminated/assimilated words

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing