How to Build LLM-Powered Applications Using Go

Like the capacities of LLM (large -language models) and adjacent tools such as integration models have increased considerably in the past year, more and more developers are planning to integrate LLM into their applications.
Since LLM often requires dedicated equipment and significant calculation resources, they are most often wrapped as a network services that provide access to access. This is how the APIs for leading LLMs like Openai or Google Gemini work; Even the tools to run as you wish like Olllama envelop the LLM in a REST API for local consumption. In addition, developers who take advantage of LLM in their applications often require additional tools such as vector databases, which are also most often deployed as network services.
In other words, LLM-powered applications are much like other modern cloud-native applications: they require excellent support for rest and RPC, competition and performance protocols. These are areas where Go excels, which makes it a fantastic language to write applications fueled by LLM.
This blog article works through an example of using Go for a simple application supplied by LLM. It begins by describing the problem that the demonstration application solves, and proceeds by presenting several variants of the application which all perform the same task, but use different packages to implement it. All the demos code of this article is available online.
A rag server for questions and answers
A current application fueled by LLM is the increased generation of Rag – Retrieval. The cloth is one of the most scalable ways to personalize the knowledge base of an LLM for interactions specific to the domain.
We are going to build a Rag server In Go. It is an HTTP server that provides two operations to users:
- Add a document to the knowledge base
- Ask an LLM on this knowledge basis
In a typical scenario of the real world, users would add a corpus of documents to the server and would ask him questions. For example, a company can fill the knowledge base of the RAG server with internal documentation and use it to provide questions / answers powered by LLM to internal users.
Here is a diagram showing the interactions of our server with the external world:
In addition to the user sending HTTP requests (the two operations described above), the server interacts with:
- An incorporation model to calculate vector incorporations for submitted documents and for user questions.
- A vector database to effectively store and recover integrations.
- An LLM to ask questions based on the context collected from the knowledge base.
Concretely, the server exhibits two HTTP termination points to users:
/add/: POST {"documents": [{"text": "..."}, {"text": "..."}, ...]}
: Submit a sequence of text documents to the server, to add to its knowledge base. For this request, the server:
- Calculate a vector incorporation for each document using the incorporation model.
- Store the documents as well as their vector interests in the vector database.
/query/: POST {"content": "..."}
: Submit a question to the server. For this request, the server:
- Calculate the vector of the question incorporate the incorporation model.
- Use Vector DB's similarity search to find the most relevant documents to the question of the knowledge database.
- Use simple fast engineering to reformulate the question with the most relevant documents found in step (2) as context, and send it to the LLM, returning their response to the user.
The services used by our demo are:
It should be very simple to replace them with other equivalent services. In fact, this is what the second and third variants of the server are! We will start with the first variant that directly uses these tools.
Using the Gemini API and draw directly
The Gemini and Weavate API have practical SDK GOs (customer libraries), and our first server variant uses them directly. The complete code of this variant is in this directory.
We do not reproduce the entire code of this blog article, but here are some notes to keep in mind by reading it:
Structure: The structure of the code will be familiar to anyone who has written an HTTP server in Go. Customer libraries for Gemini and Weaviate are initialized and customers are stored in a state value transmitted to HTTP managers.
Route registration: HTTP roads for our server are trivial to configure using the Routing improvements Introduced in Go 1.22:
mux := http.NewServeMux()
mux.HandleFunc("POST /add/", server.addDocumentsHandler)
mux.HandleFunc("POST /query/", server.queryHandler)
Competition: Our server HTTP managers reach other services on the network and await an answer. This is not a problem for Go, because each HTTP manager works simultaneously in his own Goroutine. This rag server can manage a large number of simultaneous requests, and the code of each manager is linear and synchronous.
Lots API: From a /add/
The request can provide a large number of documents to be added to the knowledge base, the server operates Lots API For both integrations (embModel.BatchEmbedContents
) and TIVETTE DB (rs.wvClient.Batch
) for efficiency.
Use Langchain to go
Our second RAG server variant uses Langchaingo to accomplish the same task.
Langchain is a popular Python frame for the construction of applications powered by LLM. Langchaingo is its equivalent. The framework has certain tools to create applications from modular components and supports many LLM data suppliers and vector databases in a common API. This allows developers to write code that can work very easily with any supplier and modify suppliers.
The complete code of this variant is in this directory. You will notice two things when you read the code:
First of all, it's a little shorter than the previous variant. Langchaingo takes care of the envelope of the complete APIs of the vector databases in the common interfaces, and less code are necessary to initialize and manage the Weavates.
Second, the Langchaingo API facilitates the switching of suppliers. Let's say that we want to replace Weavate with another vector database; In our previous variant, we had to rewrite all the code interfacing the vector database to use a new API. With a frame like Langchaingo, we no longer need to do so. As long as Langchaingo supports the new vector database that interests us, we should be able to replace a few lines of code in our server, because all DBs implement a common interface:
type VectorStore interface {
AddDocuments(ctx context.Context, docs []schema.Document, options ...Option) ([]string, error)
SimilaritySearch(ctx context.Context, query string, numDocuments int, options ...Option) ([]schema.Document, error)
}
Use of Genkit to go
Earlier this year, Google presented Genkit for Go – a new open source framework for the construction of applications powered by LLM. Genkit shares certain characteristics with Langchain, but diverges in other aspects.
Like Langchain, it provides common interfaces which can be implemented by various suppliers (as plugins), and thus make the passage from one to the other simpler. However, he does not try to prescribe how the different LLM components interact; Instead, it focuses on production features such as rapid management and engineering and deployment with integrated developer tools.
Our third RAG server variant uses Genkit to go to accomplish the same task. His complete code is in this directory.
This variant is quite similar to that of Langchaingo – common interfaces for LLM, integrations and vector DBs are used in place of direct supplier APIs, which facilitates the passage from one to the other. In addition, the deployment of an application powered by LLM to production is much easier with Genkit; We do not implement this in our variant, but do not hesitate to read the documentation if you are interested.
Summary – Opt for applications powered by LLM
The samples of this article just give a taste of what is possible for the construction of applications supplied by LLM in Go. It shows how easy it is to create a powerful rag server with relatively little code; Most importantly, samples have a significant degree in production preparation due to certain characteristics of GO fundamentals.
Working with LLM services often means sending rest or RPC requests to a network service, pending the response, by sending new requests to other services according to this and so on. Go excels to all these elements, offering excellent tools to manage the competition and complexity of the juggling network services.
In addition, GO's great performance and reliability as a native language of the cloud make it a natural choice to implement the more fundamental constituent elements of the LLM ecosystem. For some examples, consult projects like Ollama, Localai, Weavate or Milvus.
Credits: Eli Bendersky
Verne Ho photo on a stash
This article is available on