Software design & coding-notes: January 2021

Thursday, January 21, 2021

Design spam URL testing for a browser

Need a web crawler that crawls all URLs. Could be billions.

CAP theorem. Consistency can be slow; availability & partition tolerance more important. Use Cassandra/DynamoDB or equivalent. Web crawler tests for spam. Saves URLs as good or not.

Browser queries this URL & decides whether to warn, block or allow.

Client also needs a cache implemented. A most recently used cache hashmap purging least recently used entries depending on memory limits. Entries expire in a day. DynamoDB Accelerator has implemented a client-side cache.

Web farm that supports queries can also have such a cache on each machine.