DeepSeek Introduces Vision
114 points by RIshabh235 3 hours ago | 48 comments

rcMgD2BwE72F 3 hours ago
Points to https://chat.deepseek.com/sign_in for me, that's just a login screen. Anything page with some info?
reply
RIshabh235 2 hours ago
Not in official news yet, but works for me https://files.catbox.moe/hnnnlx.png
reply
dude250711 28 minutes ago
OP made a mistake, confused HN and https://www.reddit.com/r/DeepSeek .
reply
jiehong 2 hours ago
For those not trying, this allows Deepseek to understand a picture (instead of just extracting text from it), and it can describe what's in the picture, but this is not an image generation system, so you can't ask it to modify an image.

Personally, I'm a bit surprised the DS chat app still doesn't offer its own text to speech and speech to text features (I know DS doesn't have any ASR model for example, but there are quite a few in the open).

reply
paulluuk 52 minutes ago
Can you explain what the benefits are of actually "talking" with the bot instead of typing and reading?

As someone who would rather send a slack message to a coworker rather than actually walking over and talk to them, the idea of having to talk with my laptop is not appealing at all, haha.

reply
throawayonthe 9 minutes ago
it's very confusing. maaaybe if the stt is good and fast enough, speaking may be faster? english speakers can probably hit 150-180 wpm but seems like a hassle
reply
stranded22 13 minutes ago
Accessibility.
reply
bjoli 3 hours ago
What has been going on with deepseek recently? I have gotten lots of replies in Chinese and even more frequently, reasoning in Chinese as well.

Is it a new silent update?

reply
Shank 3 hours ago
Well, it is a Chinese model, maybe it thinks better in Chinese?
reply
bogdan 54 minutes ago
Hànzì can use 30%-40% fewer tokens than English. So, yes, it probably thinks better in Chinese.
reply
Razengan 30 minutes ago
If so, would other models like ChatGPT benefit from translating the user's prompt to Chinese/Japanese and thinking in Hanzi/Kanji and then converting the response back to the user's language before displaying it?
reply
cocoflunchy 22 minutes ago
I believe that most reasoning models actually think in their own "language" which is not really understandable by humans. The thinking traces that are shown in the UI are actually summaries generated by a smaller model in plain english (or user language). Sometimes this leaks through and you see some chinese/japanese characters in e.g. Claude's reasoning.
reply
seydor 7 minutes ago
> summaries generated

Or hallucinated

reply
bogdan 21 minutes ago
There are other even more efficient ways of doing this, i.e. using images instead of raw text https://xcancel.com/karpathy/status/1980397031542989305?lang...
reply
grogg 20 minutes ago
Yeah, it’s why the Caveman skill includes a Wenyan mode.

https://github.com/JuliusBrussee/caveman

reply
serf 2 hours ago
This happens to me a lot when I ask a qwen3.6 model to respond to a question in JSON. No clue why.
reply
surgical_fire 2 hours ago
I use DeepSeek daily, never happened to me.

I use the API however, not the chat interface.

reply
abyssin 3 hours ago
It doesn’t seem that recent to me, at least been like that for six months.
reply
RIshabh235 3 hours ago
yes, kind of silent update plus they might have better chinese datasets and user data for their training, that might be leading to chinese preference.
reply
alfiedotwtf 2 hours ago
Are you running out of context? I’ve found that tooling and giberish most of the time happens when I’m butting up against the high watermark of my context window. One other thing it could be, I’ve read that lower quanta like Q1 and Q2 for smaller models can leak Chinese
reply
epolanski 2 hours ago
It never happened to me with Deepseek, but it happened multiple times with Kimi 2.6.

It also happened a handful of times with Anthropic models.

reply
throwaw12 2 hours ago
I wish they published a post where we read about capabilities, quality, accuracy and other parameters
reply
tornikeo 2 hours ago
I really need this as an API.

Turns out, to use Claude Agents SDK, you need to have a vision enabled API. If Deepseek API could see, it can fully drive Claude Code and Claude Agents SDK. A project I'm working on relies on a Claude-in-CloudflareWorker setup and I've been relying on Qwen and gemini flash lite, both more expensive than Deepseek.

Can't wait to have it available on deepseek.

reply
petesergeant 2 hours ago
Have you looked at MiniMax or MiMo? Available today via OpenRouter, and it’ll make the path to porting to DeepSeek a line change https://openrouter.ai/collections/vision-models
reply
alexwwang 21 minutes ago
Does the api support vision yet?
reply
RIshabh235 12 minutes ago
No announcements about it yet.
reply
alexwwang 9 minutes ago
That makes sense. I haven’t found it work in api yet.
reply
arjie 2 hours ago
If they'd do one of those little extraneous additions like Qwen does, so that I can have DS4 Flash with Vision that would be great. I've got to run a separate model entirely so that I can get vision and I'd prefer to just put it all in one space.
reply
RIshabh235 46 minutes ago
Maybe they will do now as they got huge funding.
reply
earth2mars 3 hours ago
And it's really good and fast. Have tested with bunch of odd photos on what is happening. Overall the training set seems large enough to know what's what and where
reply
RIshabh235 3 hours ago
yes and I hope their rate of shipping increases after recent funding.
reply
crvdgc 3 hours ago
Vision has been in A/B testing for a while now (at least in China). Is there an official announcement that this will be available for everyone?
reply
RIshabh235 3 hours ago
I haven't seen any official announcement yet, works for me though.
reply
innis226 3 hours ago
Nice, is this available in the API now as well?
reply
naseemali925 2 hours ago
I am also waiting on the vision support in API. Its the only thing blocking me from buying their subscription.
reply
dakolli 2 hours ago
What subscription?
reply
naseemali925 15 minutes ago
I mean't topup. They don't have subsciptions.
reply
RIshabh235 3 hours ago
Not in the api yet.
reply
tw1984 33 minutes ago
what is more interesting to me is why it takes so long for them to support vision.

does it implies that Liang believes vision/voice is less important on its way to AGI?

reply
hklohani 2 hours ago
[flagged]
reply
ValveFan6666 3 hours ago
[dead]
reply
andrewstuart 2 hours ago
OpenAI and Anthropic need to get this free foreign competition banned.
reply
0xpgm 8 minutes ago
Is that before or after the OpenAI and Anthropic pay off all the people and companies who's copyrights were violated when they used their works for free to train their models?

At least DeepSeek freely gives back the benefits.

reply
epolanski 2 hours ago
Care to expand on why? Or did you forgot the /s at the end?
reply
dudisubekti 2 hours ago
I feel like '/s' has ruined irony on the internet. Irony is at its best if left ambiguous, lol.
reply
cromka 2 hours ago
Nah, they're serious actually!
reply
Weryj 2 hours ago
Wait, did that need a /s?
reply
ReptileMan 2 hours ago
If everything goes to plan everyone involved with big US models will be trillionaire and everyone else will poor and unemployed. If there are open and cheap to run Chinese models (and please god silicon) the financial house of cards that we have build will fall, people involved with big US models will be poor and unemployed, and everyone else will be slightly less poor and unemployed than in the first scenario.

What is good for Dario is good for America.

reply
andrewstuart 2 hours ago
Why do you think it’s free?

Any ideas, theories where they get their payoff?

reply
cromka 2 hours ago
Yes, subscription options they sell on deepseek.com
reply