Foreign Language Text Analyzer

Personal Project | 2024


Python spaCy Matplotlib Pandas Plotly

I made this as a tool to hopefully speed up one of the most time-consuming parts of langauge-learning: finding and learning new vocabularly with proper example sentences.


The program takes a foreign-language text (input.txt) and processes the words, creating a frequency list accompanied with an example sentence where that word appears. A few visualizations are also made to help the user parse the most-frequent words, and compare relative book difficulty by new/known words. From there, the CSV file can be easily imported into Anki flashcards for memorization.


Chart for word freq

A flow chart of how the program works


The first output is the word frequency CSV of lemmitized, unique words, with example sentences, as shown below:

An example of output

An example of the word frequency csv


Once processed into a CSV, the user can examine several visualizations, including using the interactive bar chart below to review most-frequent vocabularly.

Bar chart word freq

Bar chart of vocabulary frequency


Waffle chart word freq

Waffle chart based off new/known word frequency


Looking above, though technically I know a higher number of words in El Pasaje, Percy Jackson and Harry Potter should be considered much easier due to their vastly smaller vocabulary used.