AI Scanner — 2026-04-12

research @bhaskark_la

7/10

A comparison of four AI models on their ability to prove a hard theorem reveals significant differences in performance, with Grok Expert leading. This insight into model capabilities could inform future development and benchmarking efforts.

Gave 4 AI models a hard new theorem to prove. Rankings: 1. Grok Expert - quick and elegant proof. 2. Gemini Pro - close runner-up. 3. ChatGPT Pro claimed the theorem was incorrect and had no proof. 4. Claude Opus just gave up after some time with no output (is it really nerfed?)

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

AI modelstheorem provingbenchmarkingGrok ExpertChatGPT

research @ycl_yc

7/10

Comparing Human Experience Data in GAI Workflows

This tweet discusses a comparative study of four types of human experience data used in generative AI workflows, which could provide insights into user interaction and experience design. Senior engineers may find the methodology and findings relevant for improving AI system design.

We compare 4 types of human experience data in a GAI workflow: C1: demographics C2: gaze (eye-tracking) C3: questionnaire-based experience C4: AI-predicted experience 12 designers + 30 evaluators (4/)

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

GAIuser experienceresearchdata comparisonAI workflows

AI Twitter Scanner