icon 字幕
正在載入字幕...

Dots.OCR 1.5: Recognize Any Human Scripts and Symbols

youtube 翻譯 youtube 中文翻譯 youtube 字幕 youtube 中文字幕 youtube 翻譯成中文 youtube 視頻翻譯 youtube translate to chinese translate youtube to chinese youtube transcript to chinese translate youtube video to chinese

YouTube transcript, YouTube translate

32/32

A quick preview of the first subtitles so you know what the video covers.

The next version of dots.oR is here. This time they have released an improved 3 billion multimodal model which is quite good when it comes to complex documents, some handwritten documents and also nested tables. In this video, we are going to review this and we are going to test it out. when they released the previous version and we covered it here. I got it installed locally and I showed you how to use it on various test cases. It was again quite promising at that time. Now in this video I'm not going to install it locally. Rather I'm going to show you the hosted version and then we will test it out. Now at this point allow me one minute to describe what exactly is meant by locally at least for me and what exactly is meant by hosted or non-local. The distinction between local and hosted really comes down to who controls the model and where it is running not the physical location of the hardware. If you are renting a GPU server, spinning up the model yourself, managing the weights and the inference, that's local in every meaningful sense. The server just happens to be in a data center instead of your garage or bedroom. What makes something not local is when you are hitting someone else's endpoint or when you are just using a hosted chatbot uh like you know Gemini or you know Enthropic or Chat GPT or you are using it through API where you have no control over the model you cannot see the weights and you cannot you know um see what exactly is happening behind the scene and you are just essentially a client. The moment you are self-hosting whether that's on your laptop a home server or as I said you know on any remote rented GPU neo cloud that is I would say um a local version. Now you can disagree. Uh that is not a problem at all. But again I think this is something uh we have to really make the distinction and I really wanted to you know take the moment to make sure that I um sort of explain this because I'm getting lot of comments and I just wanted to make sure that we are all on the same page. Now that's out of the way let's talk about this model. Maybe I will just quickly take you to their hosted version or the local version and we will try it out. I'm just going to um upload one image from my local system and then we'll go from there. And as it is a multilingual one, so I have just selected this Arabic newspaper which is bit crumpled and there are some shadows too.

設定

100%

翻譯目標語言

🔊 音訊播放
正在播放翻譯音訊