IMO, censorship is a harder task than you think.
It’s quite hard to restrict the output of general purpose, generative, black box algorithms. With a search engine, the full output is known (the set of all pages that have been crawled), so it’s fairly easy to be confident that you have fully censored a topic.
LLMs have an effectively unbounded output space. They can produce output that is surprising even to their creators.
Censoring via limiting the training data is hard because algorithms could synthesize an “offensive” output by combining multiple outputs that are ok on their own.
Adding an extra filter layer to censor is hard as well Look at all the trouble chatGPT has had with this. Users have repeatedly found ways around the dumb limitations on certain topics.
Also, China censors in an agile fashion. A topic that was fine yesterday will suddenly disappear if there was