Benchmarking stereotype bias and toxicity in large language models