-
Notifications
You must be signed in to change notification settings - Fork 395
Optional Document Storage? #1023
-
I have a similar use-case like this where I want to recognize a user's intent. For every intent I have some utterances. Basically I want to replace LUIS with KernelMemory.
Planned setup:
- Reuse existing C# API (Microsoft BotFramework), using MemoryServerless
- A WebJob will perform the Ingestion using KernelMemory (Azure Storage BlobTrigger), MemoryServerless
- Use Azure AI Search as Vector store
For my use case, I only want to find the name of the intent based on a user prompt. Utterances, intent name and vector data is all stored on Azure AI Search after ingestion.
So I asked myself, how does a document store helping me here? Everything I need can be pulled from the Vector store.
I started with SimpleDocumentStorage, but had to enable AllowMixingVolatileAndPersistentData.
Then I configured KM with WithAzureBlobsDocumentStorage but I noticed that this makes the ingestion super slow, even when I import the same set of data again.
I expected KM to at least skip updating intents that has not changed, but that didn't happen..
I guess I get something wrong.
Can someone explain me, why I need a DocumentStorage at all? Can I opt out from it, or is there an option on DocumentStorage that avoid re-importing unchanged data?
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 1 reply
-
The main reason for document storage is to allow updates. When a document is uploaded, the ID is persisted in document storage, using a folder name. The folder contains information used during the update process. This information would be hard to store in other places.
If documents are never updated and KM runs in a single node, you can use SimpleDocumentStorage and keep data in memory (with the risk of losses in case of reboots though). Considering multiple nodes and reliability, data needs to be centralized somewhere and Azure Blobs is one of the options. It should not be that slow though.
Beta Was this translation helpful? Give feedback.
All reactions
-
For testing purposed, I created a simple WebApi project and configured it like this:
memoryBuilder.WithAzureAISearchMemoryDb(new AzureAISearchConfig { Endpoint = "<my-endpoint>", Auth = AzureAISearchConfig.AuthTypes.APIKey, APIKey = "<my-api-key>" }); // Test with persistent storage memoryBuilder.WithAzureBlobsDocumentStorage(new AzureBlobsConfig() { Auth = AzureBlobsConfig.AuthTypes.AzureIdentity, Account = "<my-storage-account-name>", Container = "kernelmemory" }); var memory = memoryBuilder.Build<MemoryServerless>(new KernelMemoryBuilderBuildOptions()); builder.Services.AddSingleton<MemoryServerless>(memory);
In a controller method, I'm adding entries to the memory:
[HttpPost("memoryadd")] public async Task TestMemory() { await _memory.ImportTextAsync(text: "Where can i buy a smartphone?", documentId: "i_buy_electronics"); await _memory.ImportTextAsync(text: "Who created bitcoin?", documentId: "i_satoshi"); }
In the output window, I can observe that embeddings are are created for each entry. So far so good. Also the storage account is getting populated.
But when I call the "memoryadd" endpoint again (without changing the data), It creates embeddings again and updates the entries in the search DB.
Shouldn't the blobs document storage help avoiding unnecessary model calls and updates? When I add the same intent/text with the same document id again, it still creates the embeddings and updates the entry.
By allowing updates, you mean the benefit of avoiding downtime by wiping and re-creating the index?
I plan to sporadically process a JSON file that contain the latest version of intents and utterances and thought I could keep a few versions (indexes) and maybe work with an alias to select the active one.
Do you think KernelMemory is not the right choice in my scenario?
I thought about creating a custom "NullDocumentStorage" as last resort if opting out of a document storage is not possible.
Beta Was this translation helpful? Give feedback.