Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Optional Document Storage? #1023

Unanswered
martinossx asked this question in Q&A
Discussion options

I have a similar use-case like this where I want to recognize a user's intent. For every intent I have some utterances. Basically I want to replace LUIS with KernelMemory.

Planned setup:

  • Reuse existing C# API (Microsoft BotFramework), using MemoryServerless
  • A WebJob will perform the Ingestion using KernelMemory (Azure Storage BlobTrigger), MemoryServerless
  • Use Azure AI Search as Vector store

For my use case, I only want to find the name of the intent based on a user prompt. Utterances, intent name and vector data is all stored on Azure AI Search after ingestion.

So I asked myself, how does a document store helping me here? Everything I need can be pulled from the Vector store.

I started with SimpleDocumentStorage, but had to enable AllowMixingVolatileAndPersistentData.
Then I configured KM with WithAzureBlobsDocumentStorage but I noticed that this makes the ingestion super slow, even when I import the same set of data again.
I expected KM to at least skip updating intents that has not changed, but that didn't happen..

I guess I get something wrong.
Can someone explain me, why I need a DocumentStorage at all? Can I opt out from it, or is there an option on DocumentStorage that avoid re-importing unchanged data?

You must be logged in to vote

Replies: 1 comment 1 reply

Comment options

The main reason for document storage is to allow updates. When a document is uploaded, the ID is persisted in document storage, using a folder name. The folder contains information used during the update process. This information would be hard to store in other places.

If documents are never updated and KM runs in a single node, you can use SimpleDocumentStorage and keep data in memory (with the risk of losses in case of reboots though). Considering multiple nodes and reliability, data needs to be centralized somewhere and Azure Blobs is one of the options. It should not be that slow though.

You must be logged in to vote
1 reply
Comment options

For testing purposed, I created a simple WebApi project and configured it like this:

memoryBuilder.WithAzureAISearchMemoryDb(new AzureAISearchConfig
 {
 Endpoint = "<my-endpoint>",
 Auth = AzureAISearchConfig.AuthTypes.APIKey,
 APIKey = "<my-api-key>"
 });
// Test with persistent storage
memoryBuilder.WithAzureBlobsDocumentStorage(new AzureBlobsConfig()
{
 Auth = AzureBlobsConfig.AuthTypes.AzureIdentity,
 Account = "<my-storage-account-name>",
 Container = "kernelmemory"
});
var memory = memoryBuilder.Build<MemoryServerless>(new KernelMemoryBuilderBuildOptions());
builder.Services.AddSingleton<MemoryServerless>(memory);

In a controller method, I'm adding entries to the memory:

[HttpPost("memoryadd")]
public async Task TestMemory()
{ 
 await _memory.ImportTextAsync(text: "Where can i buy a smartphone?", documentId: "i_buy_electronics");
 await _memory.ImportTextAsync(text: "Who created bitcoin?", documentId: "i_satoshi");
}

In the output window, I can observe that embeddings are are created for each entry. So far so good. Also the storage account is getting populated.

But when I call the "memoryadd" endpoint again (without changing the data), It creates embeddings again and updates the entries in the search DB.

Shouldn't the blobs document storage help avoiding unnecessary model calls and updates? When I add the same intent/text with the same document id again, it still creates the embeddings and updates the entry.

By allowing updates, you mean the benefit of avoiding downtime by wiping and re-creating the index?

I plan to sporadically process a JSON file that contain the latest version of intents and utterances and thought I could keep a few versions (indexes) and maybe work with an alias to select the active one.

Do you think KernelMemory is not the right choice in my scenario?

I thought about creating a custom "NullDocumentStorage" as last resort if opting out of a document storage is not possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants

AltStyle によって変換されたページ (->オリジナル) /