Need a web crawler that crawls all URLs. Could be billions.
CAP theorem. Consistency can be slow; availability & partition tolerance more important. Use Cassandra/DynamoDB or equivalent. Web crawler tests for spam. Saves URLs as good or not.
Browser queries this URL & decides whether to warn, block or allow.
Client also needs a cache implemented. A most recently used cache hashmap purging least recently used entries depending on memory limits. Entries expire in a day. DynamoDB Accelerator has implemented a client-side cache.
Web farm that supports queries can also have such a cache on each machine.