Thursday, January 21, 2021

Design spam URL testing for a browser

Need a web crawler that crawls all URLs. Could be billions.

CAP theorem. Consistency can be slow; availability & partition tolerance more important. Use Cassandra/DynamoDB or equivalent. Web crawler tests for spam. Saves URLs as good or not.

Browser queries this URL & decides whether to warn, block or allow.

Client also needs a cache implemented. A most recently used cache hashmap purging least recently used entries depending on memory limits. Entries expire in a day. DynamoDB Accelerator has implemented a client-side cache.

Web farm that supports queries can also have such a cache on each machine. 

Free AI Chat tools

https://grok.com https://x.com/i/grok https://chatgpt.com https://copilot.microsoft.com https://chat.deepseek.com https://www.meta.ai https:...