LLMs are getting better at character-level text manipulation

(blog.burkert.me)

33 points | by curioussquirrel 5 hours ago

3 comments

malshe 1 hour ago
I play Quartiles in Apple News app daily (https://support.apple.com/guide/iphone/solve-quartiles-puzzl...). Occasionally when I get stuck, I use ChatGPT to find a word that uses four word fragments or tiles. It never worked before GPT 5. And with GPT 5 it works only with reasoning enabled. Even then, there is no guarantee it will find the correct word and may end up hallucinating badly.
simonw 1 hour ago
If you take a look at the system prompt for Claude 3.7 Sonnet on this page you'll see: https://docs.claude.com/en/release-notes/system-prompts#clau...
> If Claude is asked to count words, letters, and characters, it thinks step by step before answering the person. It explicitly counts the words, letters, or characters by assigning a number to each. It only answers the person once it has performed this explicit counting step.
But... if you look at the system prompts on the same page for later models - Claude 4 and upwards - that text is gone.
Which suggests to me that Claude 4 was the first Anthropic model where they didn't feel the need to include that tip in the system prompt.
[-]
- kristianp 42 minutes ago
  Does that mean they've managed to post train the thinking steps required to get these types of questions correct?
- ivape 1 hour ago
  Or they’d rather use that context window space for more useful instructions for a variety of other topics.
  [-]
  - astrange 1 hour ago
    Claude's system prompt is still incredibly long and probably hurting its performance.
    https://github.com/asgeirtj/system_prompts_leaks/blob/main/A...
hansonkd 16 minutes ago
chatgpt5 still is pathetically bad at roman numerals. I asked it to find the longest roman numeral in a range. first guess was the highest number in the range despite being a short numeral. second guess after help was a longer numeral but outside the range. last guess was the correct longest numeral but it miscounted how many characters it contained.