Chris Shinnimin

Final Fantasy Bot Project Journal

A fun personal project to learn LLMs and React, and rekindle my love of a favourite childhood game.

October 22, 2025

Prev | Project Journal Entries | Github | Next

Continuing the Project: Preparing for Langchain

I am now three weeks in to a continuing education course on LLMs and Prompt Engineering at McGill University and I want to start attempting to apply some of the things I am learning there to this project. I am planning a full post in the main blog section (outside of this project journal) to summarize everything I learned at the course once it is complete. But I also wanted to begin trying to apply what I learned to solve some of the problems with the FFBot application. The key problem I want to solve is the lowering the nu,ber of prompt (input) tokens required for every message sent to the LLM by the bot app agent (by the React app). In the previous project journal post before the project pause, I discussed my dissatisfaction with the fact that every message the React app sends to the LLM requires a set of training instructions approaching 5000 tokens in size. Couple that with the fact that the React app may need to send multiple messages back and forth for every human message entered in to the UI, and the token usage quickly scales. If we want to experiment with more expensive models, costs can quickly scale. This is meant to be a fun personal project and shouldn't end up costing me, or anyone interested enough in trying this application out for themselves, any mentionable costs. Plus - tackling this scalability challenge is also good for my learning.

Can Langchain Solve My Token Usage Problem?

I don't know the answer to this question, but there is only one way to find out. I am learning that Langchain is a tool I can use to:

Load data we want to train our LLM on
Split the data into chunks of variable sizes.
Embed the data, a process also known as feature extraction.
Search the data efficiently using the LLM when asked a question.

The loading/splitting/embedding process, as I am coming to understand, results in the data being stored in a search tree like structure called a vector store. The embedding process creates metadata that allows the LLM to search this tree efficiently and only retrieve the chunks relevant to the question it is being asked. It is my hope that this means that only the tokens in the chunks it uses will be counted as prompt (input) tokens by the LLM provider. If this is the case, maybe I can split my large instruction set up so that the LLM only retrieves the parts of the instructions it needs with every request. I'm still at the start of a steep learning curve with some of these concepts but I am very excited to start experimenting.

Moving the LLM API to Backend Python

Moving the code that interacts with the remote or local LLM provider from React to Python has several advantages. In fact, it always should have been a back end service to begin with for CORS reasons. By moving the API code to the Python back end it also paves the way for experimentation woth Langchain, since I am learning how to use it via Python.

Adding Token Count Logging

In order to make sure we measure the effect of Langchain with data, I've added token count logging to the terminal console. Every time the LLM API code calls out to the LLM provider and receives a response, I parse the response for the information the provider has given us about the number of prompt (input) tokens used, the number of these which are cached tokens (cached input tokens are also of interest, since they can sometimes cost less that uncached ones, depending on provider and model chosen), and the number of completion (output) tokens, which are also often associated with fees. Soon I will gather a dataset of "before Langchain" token usage in a few repeatable scenarios so that I can compare with "after Langchain" data.

Other Improvements

A full list of improvements I've made since I resumed the project this week can be found on the v0.1.1 release page in Github.

Prev | Project Journal Entries | Github | Next