A full-stack application to analyse planary sessions in the German Bundestag using Svelte(Kit), FastAPI and ElasticSearch
The second n (PlenNarylictics) stands for natural-language-processing ;)
The motivation behind creating Plennarilytics was to provide people with a tool to easily analyze the parliamentary protocols of the German Bundestag. As a democracy, it's crucial for citizens to have access to information about what their elected representatives are doing on their behalf. However, understanding and analyzing parliamentary protocols can be a daunting task for many people. Plennarilytics aims to simplify this process by providing an easy-to-use platform where users can access and analyze this information in a clear and concise way. By providing this service, we hope to promote transparency and accountability in government, and ultimately empower citizens to engage more actively in the democratic process.
docker compose build
and docker compose up
to start the set of containersentrypoint: /bin/bash -c "sleep 50 && python /code/ES_Init.py"
(increase
the 50 to what ever amount of seconds ES needs to initialise)After you have run Docker compose, the script will start pulling and loading plenary transcripts immediately. If you should recieve an error notice, that an Index is missing in Elasticsearch, please run the following commands via cmd to create the indexes that are missing:
For more Information about the setup of our application, please refer to the report.
The Data sources we are using are the plenary protocols from the Bundesregierung. We get access to the protocols via an API-Key we got from the Bundesregierung. In the current state of the project we don’t use all plenary protocols but only a small amount to test our code. This amount of protocols belong to plenary protocols since 26.09.2021.
The following steps were taken to preprocess the Data:
The preprocessing of the data was actually a smaller part of the data extraciton process, than making ourselves familiar with the differences in text structure. Our approach up to this point relies heavily splitting the long strings that we get from the Bundestag API into smaller, more manageable parts. This includes differenc speeches from MPs as well as answers from the plenum and other strings. Looking for these different strings to split on and making up for differences between different XML files was a very large part of the project.
Up to this point, we are including 50 files from the current legislative period into elasticsearch. For the current legislatice parties, we have split the documents into the speeches of the members of the parliament. These speeches are assiegned to the politician giving them and are the documents we are actually saving in elasticsearch. Additionally to this, we are currently saving the missing MPs on a per party basis. In the following graphic, we have for example the top 20 missing MPs by Number for the AfD in the curtrent legislative period. The statistic was extracted from the elasticsearch dashboard.
Another interesting static is the amount and type of remark different parties make. Here the inner circle is the party that makes a remark to a speech and the outer circle is the type of remark they make:
And who has something to say when ever someone from another party makes a speech. Here the inner circle is the party that belongs to the speaker and the outer circle is the party that has something to add / makes remarks during the speech:
You can user Plennarililytics to explore the Speeches made in the german Bundestag by the following criteria:
Our Goal was to provide a way to analyze the open data provided by the Bundestag. While this goal has been reached for the most part, we faced some technical difficulties while implementing this functionality in our frontend. The user is able to flexibly search the speeches made by MPs in the Bundestag and to analyze them according to his or her personal interest. It is also possible to access the data about missing MPs and remarks from the Swagger UI (localhost:8080/docs) but we were not able to integrate these datastreams into our Svelte frontend with reasonable effort. The data can still be accessed via elasticsearch or the Swagger UI, but we were not able to reach our goal entirely. Also there are some additional endpoints available in the swagger UI that can be tested and used for some further insights into our data. Feel free to try them and explore the inner works of the Bundestag!