Job Description
Job Title: AI Policy Reviewer
Location: Remote (Worldwide)
Job Summary: The AI Policy Reviewer is responsible for evaluating AI-generated and user-generated content to ensure compliance with internal governance standards, regulatory requirements, and responsible AI principles. This role plays a key part in safeguarding model integrity by reviewing outputs for safety risks, bias, misinformation, harmful content, and policy violations, while ensuring consistent enforcement of AI usage guidelines.
Responsibilities:
• Review and score AI-generated responses against detailed policy rubrics. Assess outputs for safety, truthfulness, fairness, and alignment with community guidelines.
• Act as a quality assurance checkpoint for automated systems. Identify instances where the AI misinterprets policy (e.g., being over-sensitive and censoring benign content, or under-sensitive and allowing harmful content).
• Handle complex edge cases where policy application is ambiguous. Make nuanced judgement calls regarding context, satire, or emerging risks that the AI model struggles to process.
• Analyze and review data to identify systematic flaws in the AI’s reasoning. Report patterns of bias, hallucination, or policy gaps to the Product and Engineering teams.
• Collaborate with Policy teams to test and refine evaluation rubrics. Provide feedback on whether current policies are teachable to AI models or if they require human-only judgement.
• Participate in adversarial testing (red teaming) by attempting to jailbreak the model or provoke unsafe responses to identify vulnerabilities before launch.
• Work closely with Machine Learning Engineers to explain the why behind your ratings, helping them adjust model behavior.
• Write high-quality examples (prompts and ideal responses) that’s serve as golden sets for training the AI on how to handle difficult policy scenarios.
Requirements:
• Minimum of 5 years of professional experience in Trust & Safety Operations, Content Policy, Risk Analysis, or Legal/Compliance review.
• Deep understanding of content moderation principles, including hate speech, harassment, misinformation, and graphic violence policies.
• Strong ability to deconstruct complex AI responses and identify logical flaws, hallucinations, or subtle biases.
• Clear and concise written communication skills. You must be able to explain why an AI response was wrong in a way that engineers and policy experts can understand.
• This role involves exposure to disturbing AI-generated text and images designed to test safely limits. Proven emotional resilience and self-care strategies are required.
• Comfortable working with dashboards, spreadsheets, and specialized review tools. Familiarity with LLMs (ChatGPT, Gemini, etc.).
• Proven ability to follow complex, detailed instructions and scoring rubrics with high consistency and accuracy.
• Understanding of global cultural and political nuances to assess whether AI responses are appropriate for diverse international audiences.
Method of Application:
Qualified candidates should send a copy of their cv and portfolio to with the job title as the subject of the mail.
Salary:
$1,000