With the evolution of artificial intelligence technology continuously goes on, the competition in LLM has become the focus of industry attention. It is generally believed that the more parameters there are and the stronger the computing power, the stronger the model’s ability will be. However, on June 19, 2025, in the significant Amazon Web Services Global League for Large Language Models, the Smart Vision team under Digital China emerged victorious with an overwhelming advantage, providing the industry with a brand-new perspective: parameter scale did not equal the upper limit of capability. What truly determined the practical performance of a model was the combination of data value density and process innovation capabilities.
Event Background: The highest-level global competition event for large language models by Amazon Web Services
The predecessor of the Amazon Web Services Large Language Model National Competition was “Artificial Intelligence Car Race” that was launched in 2018 and has attracted over 560,000 developers worldwide, covering thousands of events and competitions. In 2024, Amazon Web Services launched the Amazon Web Services Large Language Model National Competition during Invent 2024. In this competition, the task for the participants was to customize the specific domain Meta Llama 3.5B base model by using the learned tools and technologies. The fine-tuned models submitted would be compared with a larger 70B reference model, and the quality of the answers would be evaluated using a method called “LLM-as-a-Judge”. If the answers provided by the fine-tuned model were considered more accurate and comprehensive than the larger-scale model, the participants would gain victory points for the corresponding questions. Digital China, as one of the first partners in China to obtain GenAI capability certification from Amazon Web Services, was invited to participate in this competition.
Breaking through a desperate situation:
The small parameter model performed better than the LLM in specific scenarios.
The rules of this competition were inherently very challenging - using a small model with only 3.5B parameters to compete head-on with an ultra-large-scale model of 70B level. And the 3.5B base model itself had the following issues:
• Language disadvantage: All the evaluations were conducted in Chinese, and the 3.5B model had significant weaknesses in understanding Chinese.
• Knowledge Disparity: The content of the question focused on the professional knowledge of the large model industry, which was precisely the core area of strength during the training of the 70B model;
• Resource scarcity: The available original data was only 20 pieces, and the fine-tuning time was merely 3 hours.
In response to the above issues, the Smart Vision Team quickly formulated a systematic and detailed technical plan, and ultimately achieved the first-round victory with a winning rate of 53%.
The solution:
The three powerful tricked for model fine-tuning.
In response to the four major weaknesses of the 3.5B model in terms of Chinese support, logical reasoning, multi-hop tasks and knowledge breadth, Smart Vision had adopted three key strategies:
• “The Precise Surgical Knife” of Knowledge Distillation
The Smart Vision Team designed a “question-answer - logical chain - evidence fragment” triad knowledge format for distilled knowledge, and ensured the quality of the knowledge injected into the 3.5B model through multiple manual and machine cross-checks. This process was not simply a replication of knowledge, but rather a precise extraction and implantation of key information liked in a surgical operation. At the same time, they also constructed a “knowledge topology network” and supplemented relevant documentation, thereby effectively expanding the knowledge coverage of the small model.
• The “quantum entanglement” transformation of the thinking chain
The entire competition lasted for only three hours. Whether it was feasible to inject thinking chains into the 3.5B model was a test of the team's overall strategy, technical plan, and execution ability. Facing the inherent shortcomings of the 3.5B model, the Smart Vision team adopted a partial sample light-thinking solution. They added the ability to break down problems, search for concepts, verify logic, and generate conclusions into the selected samples, enabling the 3.5B model to possess reasoning capabilities far exceeding its parameter size within three hours.
• The “Battle Commander” system of dynamic prompt words
During the on-site evaluation session held in the afternoon, the judges and the audience witnessed a high-level real-time response competition. Facing 6 questions, each participating team had 60 seconds to understand and design the prompt words. The Smart Vision team, with its profound experience in the implementation of large models, provided targeted prompts for the 6 questions. Under the strict conditions of a 200-word small window for the 3.5B small model, they gave excellent answers to each question, receiving high scores from both the on-site judges and the AI. During the event, the audience and judges randomly interviewed by the host praised the solutions provided by the Digital China team. Finally, they achieved an absolute victory with a score of 179.
Technical Upgrading:
Insights from the Transition from the Competition Field to the Industrial Sector
In the past, we were accustomed to the mindset that "the larger the parameters, the better models would be". However, in actual business scenarios, although LLM with 70 billion parameters were powerful, they often involved a lot of redundant computations. In contrast, a small model that undergone knowledge purification, architecture refinement, and continuous evolution had more advantages in terms of deployment cost, response speed, and controllability.
This victory also demonstrated the outstanding model optimization capabilities of Smart Vision. While the industry was obsessed with the competition of trillions of parameters, Digital China, with its profound industry insights, directly pointed to the essence: the core contradiction in the implementation of enterprise-level AI lied in the precise matching of technical capabilities and scene pain points - rather than the competition of computing power. Its innovative architecture had constructed a dual cognitive engine: expanding the cognitive breadth based on the general large model, penetrating the depth of the scene with the fine-tuned small model, and achieving intelligent collaboration of computing power through dynamic routing.
This three-in-one paradigm of "broad foundation, deep breakthrough, and intelligent resource allocation" helped to truly implement AI in enterprises.
Process Wisdom,
Driving the New Era of AI
In this game where the small could defeat the big, Smart Vision demonstrated not only technical prowess, but also a shift in thinking mode. The future of AI will lie not in who possesses the most computing power, but in who can create the greatest value with the least resources.
This is not an accidental victory, but rather a profound reflection on the development path of AI. As AI enters the 2.0 era, the real competition will no longer be limited to model size, but will shift to how to efficiently and accurately solve practical problems.