[P] We added semantic caching to Bifrost and it’s cutting API costs by 60-70%
Building Bifrost and one feature that’s been really effective is semantic caching. Instead of just exact string matching, we use embeddings to catch when users ask the same thing in different ways. How it works: when a request comes in, we generate an embedding and check if anything semantically similar exists in the cache. You can tune the similarity threshold – we default to 0.8 but you can go stricter (0.9+) or looser (0.7) depending on your use […]