This interactive table of contents maps each article in a 2D space using document embeddings. Thematically similar documents are near each other. Color is based on the publication year, with the lightest points showing the earliest articles.
Hover over points to see article titles, and click on a point to read the article.
Article texts are embedded using the Sentence Transformers all-MiniLM-L6-v2 model, which represents each text as a 384-dimensional vector. This model is designed to capture semantic meaning, allowing for comparisons between articles based on their content.
To facilitate visualization, the 384-dimensional vectors are reduced to two dimensions using UMAP (Uniform Manifold Approximation and Projection). UMAP is a dimensionality reduction technique that aims to preserve both local and global structure of the high-dimensional data, making it well-suited for visualizing complex relationships between articles. It looked better than TSNE.
Data are plotted using the Plotly JavaScript library. While I normally use the Python version, only the JS library allow points to be used as URL links.