![]() |
Gemma 4 is a large model language (LLM) developed by Google DeepMind. It's an open-source model family that supports on-premises processing without an internet connection. Users can download, customize, and deploy it on their computers or mobile devices. |
![]() |
The Gemma 4 series is distributed in four versions: E2B, E4B, 31B, and 26B A4B. The E2B and E4B versions require a minimum of 4-6 GB (4-bit) or 10-16 GB (16-bit) of RAM, suitable for running on mobile devices and moderately configured computers. Meanwhile, the 26B A4B version requires a minimum of 18 GB of RAM, and the 31B requires at least 20 GB. |
![]() |
According to MindStudio , one of the advantages of running AI models locally is security and no additional costs. However, the performance of these models depends on the device hardware. Mobile users can install the Google AI Edge Gallery app (pictured), while computers require tools like LM Studio or Ollama. Photo: Google . |
![]() |
LM Studio on PC allows you to select and load Gemma 4 on the first run. The E4B version is approximately 6.3 GB in size and supports image inference and analysis. Gemma 4 E4B on mobile has a size of 3.6 GB when downloaded using Google AI Edge Gallery. |
![]() |
After the download is complete, the user is redirected to a chatbot-style interface. In the model selection section below, click on Gemma 4 E4B . In the next window, select Load Model and wait about a minute for the model to start. |
![]() |
Similar to other popular models, Gemma 4 E4B supports Vietnamese language interaction. Testing on a Mac mini M4 (16 GB RAM) with the command "Hello," the model took approximately 8 seconds to deduce and respond. |
![]() |
When asked "What can you do?", Gemma 4 E4B took approximately 13 seconds to understand and immediately translate the command into English, then gradually write down the answer. |
![]() |
Because it runs directly on the device, the model's response time may vary depending on the hardware. With the same question, "What can you do?", the model took approximately 45 seconds to provide a full response on an iPhone 15 Pro. |
![]() |
Another reasoning question that was answered quickly and accurately was, for example, "A train departs at 8:15 AM and arrives at 11:47 AM. How long did the journey take?". In general, simple reasoning statements like these are not too complicated for the new generation of LLMs. |
![]() |
Tested with a logic-based question like "How many 'r's are there in the word 'strawberry'?". This question had stumped many previous LLMs, but Gemma 4 E4B only took about 3 seconds to answer correctly. |
![]() ![]() |
With a more complex question, after a series of meticulous reasoning, Gemma 4 answered correctly. The total thinking time was 1 minute and 6 seconds, not too long for an offline model. For comparison, Gemini 3 Thinking took about 15 seconds, and GPT-5.5 took a similar amount of time. |
![]() |
The highlight of Gemma 4 E4B comes from its multimodal capabilities, supporting image input. For example, LLM can analyze images and answer questions about landmarks, prominent details, and weather and climate conditions in the image. |
![]() |
When asked to extract all the text from a magazine page image, Gemma 4 took just over 30 seconds to return the result. This timeframe is not significantly different from that of other online search engines that users are familiar with. |
![]() |
On the smartphone app, users need to select a feature from the main interface (AI Chat, Ask Image, etc.), then choose a model to use. Because it operates based on the GPU, the device may heat up during the AI inference process. |
![]() |
Users can also upload document files, in DOCX or PDF format, and then request text analysis or summarization. According to Google representatives, the new generation of models effectively controls character string generation. The model limits unnecessary thought processes, reducing the computational strain on graphics cards and computer memory. |
![]() |
Gemma 4 is also programmable. In one experiment, the model was tasked with using HTML, CSS, and JavaScript to build an operating system that runs directly in the browser. Users needed to increase the Context Length before startup to ensure the model produced a complete answer. Even so, the AI could still make mistakes if the HTML file was incomplete, and some application components might not work. |
![]() |
In general, commands requiring multiple steps or complex data can be challenging for Gemma 4. Some commands may consume a large number of processing tokens. Setting excessively large token limits can consume a lot of RAM or VRAM. |
Source: https://znews.vn/ai-khong-can-internet-cua-google-lam-duoc-gi-post1652142.html


























Comment (0)