Mistral 2 and Mistral NeMo: A Comprehensive Guide to the Latest LLM Coming From Paris

Founded by alums from Google’s DeepMind and Meta, Paris-based startup Mistral AI has consistently made waves in the AI community since 2023.

Mistral AI first caught the world’s attention with its debut model, Mistral 7B, released in 2023. This 7-billion parameter model quickly gained traction for its impressive performance, surpassing larger models like Llama 2 13B in various benchmarks and even rivaling Llama 1 34B in many metrics. What set Mistral 7B apart was not just its performance, but also its accessibility – the model could be easily downloaded from GitHub or even via a 13.4-gigabyte torrent, making it readily available for researchers and developers worldwide.

The company’s unconventional approach to releases, often foregoing traditional papers, blogs, or press releases, has proven remarkably effective in capturing the AI community’s attention. This strategy, coupled with their commitment to open-source principles, has positioned Mistral AI as a formidable player in the AI landscape.

Mistral AI’s rapid ascent in the industry is further evidenced by their recent funding success. The company achieved a staggering $2 billion valuation following a funding round led by Andreessen Horowitz. This came on the heels of a historic $118 million seed round – the largest in European history – showcasing the immense faith investors have in Mistral AI’s vision and capabilities.

Beyond their technological advancements, Mistral AI has also been actively involved in shaping AI policy, particularly in discussions around the EU AI Act, where they’ve advocated for reduced regulation in open-source AI.

Now, in 2024, Mistral AI has once again raised the bar with two groundbreaking models: Mistral Large 2 (also known as Mistral-Large-Instruct-2407) and Mistral NeMo. In this comprehensive guide, we’ll dive deep into the features, performance, and potential applications of these impressive AI models.

Key specifications of Mistral Large 2 include:

123 billion parameters
128k context window
Support for dozens of languages
Proficiency in 80+ coding languages
Advanced function calling capabilities

The model is designed to push the boundaries of cost efficiency, speed, and performance, making it an attractive option for both researchers and enterprises looking to leverage cutting-edge AI.

Mistral NeMo: The New Smaller Model

While Mistral Large 2 represents the best of Mistral AI’s large-scale models, Mistral NeMo, released on July, 2024, takes a different approach. Developed in collaboration with NVIDIA, Mistral NeMo is a more compact 12 billion parameter model that still offers impressive capabilities:

12 billion parameters
128k context window
State-of-the-art performance in its size category
Apache 2.0 license for open use
Quantization-aware training for efficient inference

Mistral NeMo is positioned as a drop-in replacement for systems currently using Mistral 7B, offering enhanced performance while maintaining ease of use and compatibility.

Key Features and Capabilities

Both Mistral Large 2 and Mistral NeMo share several key features that set them apart in the AI landscape:

Large Context Windows: With 128k token context lengths, both models can process and understand much longer pieces of text, enabling more coherent and contextually relevant outputs.
Multilingual Support: The models excel in a wide range of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Arabic, and Hindi.
Advanced Coding Capabilities: Both models demonstrate exceptional proficiency in code generation across numerous programming languages.
Instruction Following: Significant improvements have been made in the models’ ability to follow precise instructions and handle multi-turn conversations.
Function Calling: Native support for function calling allows these models to interact dynamically with external tools and services.
Reasoning and Problem-Solving: Enhanced capabilities in mathematical reasoning and complex problem-solving tasks.

Let’s delve deeper into some of these features and examine how they perform in practice.

Performance Benchmarks

To understand the true capabilities of Mistral Large 2 and Mistral NeMo, it’s essential to look at their performance across various benchmarks. Let’s examine some key metrics:

Mistral Large 2 Benchmarks

This table presents the proficiency of various LLMs in different programming languages. Models like Mistral Large 2 (24.07), Llama 3.1 (405B), and GPT-4o are evaluated on their ability to understand and generate code in languages such as Python, C++, Bash, Java, TypeScript, PHP, and C#.

Llama 3.1 405B shows strong performance across multiple languages, with particularly high scores in Python and Java. This proficiency is critical for applications involving code generation, debugging, and software development, making these models valuable tools for developers.

https://mistral.ai/news/mistral-large-2407/

Mistral NeMo Benchmarks

While specific benchmark scores for Mistral NeMo are not provided in the given information, it’s described as offering state-of-the-art performance in its size category. The model outperforms other recent open-source pre-trained models like Gemma 2 9B and Llama 3 8B in various tasks.

Mistral NeMo base model performance compared to Gemma 2 9B and Llama 3 8B.

Multilingual and Coding Prowess

One of the standout features of both Mistral Large 2 and Mistral NeMo is their exceptional multilingual capabilities. This is particularly important in our increasingly globalized world, where AI models need to understand and generate content in multiple languages.

Mistral Large 2, for instance, supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. This multilingual proficiency extends beyond mere translation, encompassing nuanced understanding and generation in these languages.

To illustrate this, let’s look at an example of Mistral Large 2 generating text in multiple languages:

[INST] Generate a short greeting in English, French, German, Spanish, and Japanese. [/INST]

Here are short greetings in the requested languages:

English: Hello! How are you today?
French: Bonjour ! Comment allez-vous aujourd’hui ?
German: Hallo! Wie geht es Ihnen heute?
Spanish: ¡Hola! ¿Cómo estás hoy?
Japanese: こんにちは！今日はお元気ですか？

[INST]

Now, write a simple “Hello, World!” program in Python, Java, and JavaScript. [/INST]

These examples showcase both the multilingual text generation and coding capabilities of the model.

Both models are available on various platforms such as Hugging Face, Mistral AI’s platform, and major cloud service providers like Google Cloud Platform, Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai (Mistral AI | Frontier AI in your hands).

The Agentic Paradigm and Function Calling

Both Mistral Large 2 and Mistral NeMo embrace an agentic-centric design, which represents a paradigm shift in how we interact with AI models. This approach focuses on building models capable of interacting with their environment, making decisions, and taking actions to achieve specific goals.

A key feature enabling this paradigm is the native support for function calling. This allows the models to dynamically interact with external tools and services, effectively expanding their capabilities beyond simple text generation.

Let’s look at an example of how function calling might work with Mistral Large 2:

 
from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
# Initialize tokenizer and model
mistral_models_path = "path/to/mistral/models"  # Ensure this path is correct
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)
# Define a function for getting weather information
weather_function = Function(
    name="get_current_weather",
    description="Get the current weather",
    parameters={
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA",
            },
            "format": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "The temperature unit to use. Infer this from the user's location.",
            },
        },
        "required": ["location", "format"],
    },
)
# Create a chat completion request with the function
completion_request = ChatCompletionRequest(
    tools=[Tool(function=weather_function)],
    messages=[
        UserMessage(content="What's the weather like today in Paris?"),
    ],
)
# Encode the request
tokens = tokenizer.encode_chat_completion(completion_request).tokens
# Generate a response
out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)

In this example, we define a function for getting weather information and include it in our chat completion request. The model can then use this function to retrieve real-time weather data, demonstrating how it can interact with external systems to provide more accurate and up-to-date information.

Tekken: A More Efficient Tokenizer

Mistral NeMo introduces a new tokenizer called Tekken, which is based on Tiktoken and trained on over 100 languages. This new tokenizer offers significant improvements in text compression efficiency compared to previous tokenizers like SentencePiece.

Key features of Tekken include:

30% more efficient compression for source code, Chinese, Italian, French, German, Spanish, and Russian
2x more efficient compression for Korean
3x more efficient compression for Arabic
Outperforms the Llama 3 tokenizer in compressing text for approximately 85% of all languages

This improved tokenization efficiency translates to better model performance, especially when dealing with multilingual text and source code. It allows the model to process more information within the same context window, leading to more coherent and contextually relevant outputs.

Licensing and Availability

Mistral Large 2 and Mistral NeMo have different licensing models, reflecting their intended use cases:

Mistral Large 2

Released under the Mistral Research License
Allows usage and modification for research and non-commercial purposes
Commercial usage requires a Mistral Commercial License

Mistral NeMo

Released under the Apache 2.0 license
Allows for open use, including commercial applications

Both models are available through various platforms:

Hugging Face: Weights for both base and instruct models are hosted here
Mistral AI: Available as mistral-large-2407 (Mistral Large 2) and open-mistral-nemo-2407 (Mistral NeMo)
Cloud Service Providers: Available on Google Cloud Platform’s Vertex AI, Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai

https://mistral.ai/news/mistral-large-2407/

For developers looking to use these models, here’s a quick example of how to load and use Mistral Large 2 with Hugging Face transformers:

 
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "mistralai/Mistral-Large-Instruct-2407"
device = "cuda"  # Use GPU if available
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Move the model to the appropriate device
model.to(device)
# Prepare input
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Explain the concept of neural networks in simple terms."}
]
# Encode input
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
# Generate response
output_ids = model.generate(input_ids, max_new_tokens=500, do_sample=True)
# Decode and print the response
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(response)

This code demonstrates how to load the model, prepare input in a chat format, generate a response, and decode the output.

Limitations and Ethical Considerations

While Mistral Large 2 and Mistral NeMo represent significant advancements in AI technology, it’s crucial to acknowledge their limitations and the ethical considerations surrounding their use:

Potential for Biases: Like all AI models trained on large datasets, these models may inherit and amplify biases present in their training data. Users should be aware of this and implement appropriate safeguards.
Lack of True Understanding: Despite their impressive capabilities, these models do not possess true understanding or consciousness. They generate responses based on patterns in their training data, which can sometimes lead to plausible-sounding but incorrect information.
Privacy Concerns: When using these models, especially in applications handling sensitive information, it’s crucial to consider data privacy and security implications.

Conclusion

Fine-tuning advanced models like Mistral Large 2 and Mistral NeMo presents a powerful opportunity to leverage cutting-edge AI for a variety of applications, from dynamic function calling to efficient multilingual processing. Here are some practical tips and key insights to keep in mind:

Understand Your Use Case: Clearly define the specific tasks and goals you want your model to achieve. This understanding will guide your choice of model and fine-tuning approach, whether it’s Mistral’s robust function-calling capabilities or its efficient multilingual text processing.
Optimize for Efficiency: Utilize the Tekken tokenizer to significantly improve text compression efficiency, especially if your application involves handling large volumes of text or multiple languages. This will enhance model performance and reduce computational costs.
Leverage Function Calling: Embrace the agentic paradigm by incorporating function calls in your model interactions. This allows your AI to dynamically interact with external tools and services, providing more accurate and actionable outputs. For instance, integrating weather APIs or other external data sources can significantly enhance the relevance and utility of your model’s responses.
Choose the Right Platform: Ensure you deploy your models on platforms that support their capabilities, such as Google Cloud Platform’s Vertex AI, Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai. These platforms provide the necessary infrastructure and tools to maximize the performance and scalability of your AI models.

By following these tips and utilizing the provided code examples, you can effectively harness the power of Mistral Large 2 and Mistral NeMo for your specific needs.

Source Link