Behind the Scenes

Methodology

The project utilized historical newspaper data from the early twentieth century, specifically between 1925 and 1929. This data, sourced from reputable publications, was processed using ProQuest TDM and stored in a SQLite database for efficient retrieval. Documents were filtered to include only news articles, excluding obituaries and advertisements. The data was then parsed to extract metadata such as title, date, publisher, and text content.

Search and Retrieval Process

The initial methodology involved creating document encodings using the BERT model, followed by constructing a Facebook AI Similarity Search (FAISS) index for efficient document retrieval. The search process involved encoding user queries, identifying nearest neighbor documents, and selecting relevant text chunks for the LLM to generate responses. After encountering scalability issues, the project transitioned to using SentenceTransformer models for text encoding and a Chroma database for improved performance.

Evaluation of Search Methodologies

The initial search methodology using BERT and SQLite revealed limitations in scalability and relevance of search results. Transitioning to SentenceTransformer and Chroma databases significantly enhanced the efficiency and accuracy of the retrieval process.

User Satisfaction Testing

User satisfaction was assessed through multiple rounds of testing, with users rating their satisfaction with the tool's responses. Initial tests showed moderate satisfaction, which significantly improved after methodological refinements and LLM tuning. This was evidenced by higher satisfaction scores and positive user feedback on the improved depth and relevance of responses.

Discussion and Conclusion

This project undertook the exploration of Retrieval Augmented Generation (RAG) to enhance historical understanding from 1925 to 1929, focusing on user satisfaction and the quality of outputs. The project's findings align with current literature, emphasizing the importance of sophisticated information retrieval algorithms in RAG applications and suggesting potential areas for future research. However, the tool's limitations, including potential biases in source materials and the assumption of source credibility, were acknowledged. The project outlines future directions for enhancing the tool's efficiency and expanding its historical scope, indicating its broader implications for educational and research applications. This iterative development process and the integration of RAG in various domains promise continued advancement in information retrieval technologies.

Impact and Applications

This project promises to significantly impact educational curricula, academic research, and historical analysis by providing a more accurate, accessible means of exploring historical events. Its applications range from classroom settings to professional research, enhancing understanding and engagement with historical events.