What can AI do without Google's network?

Gemma 4's core model can infer, analyze images, and write code, supporting both PCs and smartphones.

ZNews•28/05/2026

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 1

Gemma 4 is a large model language (LLM) developed by Google DeepMind. It's an open-source model family that supports on-premises processing without an internet connection. Users can download, customize, and deploy it on their computers or mobile devices.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 2

The Gemma 4 series is distributed in four versions: E2B, E4B, 31B, and 26B A4B. The E2B and E4B versions require a minimum of 4-6 GB (4-bit) or 10-16 GB (16-bit) of RAM, suitable for running on mobile devices and moderately configured computers. Meanwhile, the 26B A4B version requires a minimum of 18 GB of RAM, and the 31B requires at least 20 GB.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 3

According to MindStudio , one of the advantages of running AI models locally is security and no additional costs. However, the performance of these models depends on the device hardware. Mobile users can install the Google AI Edge Gallery app (pictured), while computers require tools like LM Studio or Ollama. Photo: Google .

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 4

LM Studio on PC allows you to select and load Gemma 4 on the first run. The E4B version is approximately 6.3 GB in size and supports image inference and analysis. Gemma 4 E4B on mobile has a size of 3.6 GB when downloaded using Google AI Edge Gallery.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 5

After the download is complete, the user is redirected to a chatbot-style interface. In the model selection section below, click on Gemma 4 E4B . In the next window, select Load Model and wait about a minute for the model to start.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 6

Similar to other popular models, Gemma 4 E4B supports Vietnamese language interaction. Testing on a Mac mini M4 (16 GB RAM) with the command "Hello," the model took approximately 8 seconds to deduce and respond.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 7

When asked "What can you do?", Gemma 4 E4B took approximately 13 seconds to understand and immediately translate the command into English, then gradually write down the answer.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 8

Because it runs directly on the device, the model's response time may vary depending on the hardware. With the same question, "What can you do?", the model took approximately 45 seconds to provide a full response on an iPhone 15 Pro.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 9

Another reasoning question that was answered quickly and accurately was, for example, "A train departs at 8:15 AM and arrives at 11:47 AM. How long did the journey take?". In general, simple reasoning statements like these are not too complicated for the new generation of LLMs.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 10

Tested with a logic-based question like "How many 'r's are there in the word 'strawberry'?". This question had stumped many previous LLMs, but Gemma 4 E4B only took about 3 seconds to answer correctly.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 11

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 12

With a more complex question, after a series of meticulous reasoning, Gemma 4 answered correctly. The total thinking time was 1 minute and 6 seconds, not too long for an offline model. For comparison, Gemini 3 Thinking took about 15 seconds, and GPT-5.5 took a similar amount of time.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 13

The highlight of Gemma 4 E4B comes from its multimodal capabilities, supporting image input. For example, LLM can analyze images and answer questions about landmarks, prominent details, and weather and climate conditions in the image.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 14

When asked to extract all the text from a magazine page image, Gemma 4 took just over 30 seconds to return the result. This timeframe is not significantly different from that of other online search engines that users are familiar with.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 15

On the smartphone app, users need to select a feature from the main interface (AI Chat, Ask Image, etc.), then choose a model to use. Because it operates based on the GPU, the device may heat up during the AI inference process.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 16

Users can also upload document files, in DOCX or PDF format, and then request text analysis or summarization. According to Google representatives, the new generation of models effectively controls character string generation. The model limits unnecessary thought processes, reducing the computational strain on graphics cards and computer memory.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 17

Gemma 4 is also programmable. In one experiment, the model was tasked with using HTML, CSS, and JavaScript to build an operating system that runs directly in the browser. Users needed to increase the Context Length before startup to ensure the model produced a complete answer. Even so, the AI could still make mistakes if the HTML file was incomplete, and some application components might not work.

Tai Google Gemma 4, Gemma 4 la gi, Gemma 4 vs Gemini, tai AI mien phi anh 18

In general, commands requiring multiple steps or complex data can be challenging for Gemma 4. Some commands may consume a large number of processing tokens. Setting excessively large token limits can consume a lot of RAM or VRAM.

Source: https://znews.vn/ai-khong-can-internet-cua-google-lam-duoc-gi-post1652142.html