Security News
Bun 1.2 Released with 90% Node.js Compatibility and Built-in S3 Object Support
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.
github.com/by12380/Autocomplete
Autocomplete is a highly performant, scalable, and available service in the kubernetes environment to provide an application with type-ahead search word suggestions similar to Google search.
This repo contains additional components that will showcase a full system using Autocomplete.
However, the ultimate goal of this project is to create an autocomplete service that will be portable and reusable with other kubernetes projects.
The data used for Autocomplete search words is based upon the frequency of past search queries collected from the web component.
Go, Kubernetes, Helm (K8 package manager), MongoDB (Log DB), Minio (S3-compatible storage), Argo (Workflow manager), Gin (Web framework)
Install minikube and kubectl.
Clone this repository and navgiate to the project's root directory.
Run go get
to install go module dependencies.
Run minikube start
to start your kubernetes environment.
Run helm install default ./chart
to deploy project to kubernetes.
(Optional) Run minikube dashboard
to open Kubernetes dashboard and check for deployment status.
Trie was used as the data structure behind the Autocomplete service.
Since searching for all words matching a prefix in a trie has a time complexity of O(n), n being the number of nodes in the trie, the performace will suffer as the size of the trie grows.
To ensure the efficiency of search, we modified the trie to store top K results at each node for its corresponding prefix.
This will increase the space complexity to O(nk), where k is the number of top results we store.
This will reduce the time complexity for searching words for a given prefix to O(1), and total time complexity for search operation would be reduced to O(l), where l is the length of the prefix (input keyword).
A sacrifice of increased space for better time complexiity is a worth it tradeoff.
Since the Autocomplete service is read only, we can easily create replicas of the service to handle more request load.
We can utilize the autoscaling feature that is supported by kubernetes natively.
As the size of the trie grows (growing list of suggested search words in our bank), it will eventually hit the memory limit for each pod instances within the service.
To avoid holding all suggested search words in one app instance, we can split the search word data by the first letter alphabetically into groups, ex ([A-I], [J-R], [S-Z]).
Thankfully, with the help of Helm templates, we can easily and dynamically create kubernetes resource by updating the configuration files used by Helm.
We placed a reverse proxy behind the autocomplete endpoints in our api gateway service. The reverse proxy will determine which shard of the autocomplete service to relay the request to by looking at which alphabet range the the first letter of the user's request query falls into. Since the api gateway is auto-scalable, it will not be a bottleneck to our autocomplete service.
When users make search requests (not request for autocomplete suggestions) to the web component, the requests are continously loggeed to the log DB. A cron job will then trigger our "update autocompelete service" workflow. This workflow is defined and made possible to run in a kubernetes environment by Argo Workfkow, and we defined the workflow as such:
We used frequency of past search quries as both the dictionary of words used to populate our trie as well as the ranking for our top-K-result feature. However, the aggregator logic can be define in anyway we needed.
How do you ensure high availability for your service?
How are the search words in the service updated in real time?
How did you cooordinate sequence of actions to happen in Kubernetes?
How do you prevent downtime when the service is being updated?
Why did you use Helm? What are the benefits over using kubernetes alone?
How do you re-shard the service?
Why did you use mongoDB as a log database?
How did you ensure logging does not slow down the speed of your service?
What is Minio?
Why did you not use exisitng cloud services such as AWS S3 and Mongo Atlas?
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.
Security News
Biden's executive order pushes for AI-driven cybersecurity, software supply chain transparency, and stronger protections for federal and open source systems.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.