Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
JOHNWICK
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
IBM Granite 4.0-Nano
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
IBM Granite 4.0 Nano You can use it on your everyday’s laptop 🤩 by Alain Airom (Ayrom) Oct 29, 2025 IBM has introduced the Granite 4.0 Nano model family, making a strong commitment to create powerful and useful large language models (LLMs) specifically optimized for edge and on-device applications. These models, which range from approximately 350 million to 1.5 billion parameters, are highlighted for delivering significantly increased capabilities compared to similarly sized models from competitors, as validated across standard benchmarks in areas like General Knowledge, Math, Code, and Safety. The release comprises four main variants, including models based on a new, efficient hybrid-SSM architecture (like the Granite 4.0 H 1B and H 350M) and alternative traditional transformer versions to ensure compatibility with diverse runtimes (such as llama.cpp). Crucially, the Granite 4.0 Nano models are built upon the same robust training pipelines and over 15 trillion tokens of data used for the larger Granite 4.0 family. For broad usability and confidence, all Nano models are released under the Apache 2.0 license and come with IBM’s ISO 42001 certification for responsible model development and governance, allowing users to deploy them with assurance of global standards compliance. Press enter or click to view image in full size Press enter or click to view image in full size Press enter or click to view image in full size ❓How Try Granite 4 Nano? These state-of-the-art models are conveniently accessible through two primary channels: the streamlined deployment platform, Ollama, and the comprehensive repository maintained on Hugging Face. Direct access links for both resources are consolidated within the dedicated “Links” section for your convenience. To clearly illustrate the remarkable simplicity and speed of integration, I have adapted and enhanced the foundational code samples, which are presented in the following section. Prepare your environment python3 -m venv venv source venv/bin/activate pip install --upgrade pip Install the required and necessary packages 📦 # requirements.txt HuggingFace torch transformers accelerate torchvision torchaudio pip install -r requirements.txt Once the environment is ready, just copy and run these two simple apps! The 1st code constructs simply a chat prompt using the tokenizer’s template to enable tool-calling behavior in response to the user’s query about Boston weather. After tokenizing the prepared input and moving it to the selected device, the model generates an output sequence. The 2nd script performs a complete machine learning inference pipeline using a pre-trained LLM. It then loads the IBM Granite 4.0 (350M parameter) model and tokenizer, prepares a user prompt for a research lab location query, and generates a response from the model. import torch from transformers import AutoModelForCausalLM, AutoTokenizer import json import os # Import os for file system operations # --- Device Detection and Selection --- # Automatically determine the best device available (CUDA > MPS > CPU) if torch.cuda.is_available(): device = "cuda" elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available(): # MPS (Metal Performance Shaders) is the accelerator for Apple Silicon (M1/M2/M3) device = "mps" else: device = "cpu" print(f"Selected device: {device}") # --- End Device Detection --- model_path = "ibm-granite/granite-4.0-350M" tokenizer = AutoTokenizer.from_pretrained(model_path) # Pass the automatically determined device to device_map # The model will now load onto the CPU (or MPS/CUDA if available) model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "Name of the city" } }, "required": ["city"] } } } ] # change input text as desired chat = [ { "role": "user", "content": "What's the weather like in Boston right now?" }, ] chat = tokenizer.apply_chat_template(chat, \ tokenize=False, \ tools=tools, \ add_generation_prompt=True) # tokenize the text and move to the selected device input_tokens = tokenizer(chat, return_tensors="pt").to(device) # generate output tokens output = model.generate(**input_tokens, max_new_tokens=100) # decode output tokens into text output = tokenizer.batch_decode(output) # --- Save output to file in Markdown format --- output_dir = "./output" output_file = os.path.join(output_dir, "output.md") # Create the output directory if it doesn't exist. # The exist_ok=True argument prevents an error if the directory already exists. try: os.makedirs(output_dir, exist_ok=True) # Write the output to the Markdown file with open(output_file, "w", encoding="utf-8") as f: # The output from batch_decode is a list, we take the first item (the generated text) f.write(output[0]) # Confirmation message print(f"\nModel output saved successfully to {output_file}") # Optionally print the content to the console for immediate review print("\n--- Generated Content ---\n") print(output[0]) print("\n-------------------------") except Exception as e: print(f"\nAn error occurred while trying to save the output file: {e}") # --- End File Saving Logic --- import torch from transformers import AutoModelForCausalLM, AutoTokenizer import os # Import os for file system operations # --- Device Detection and Selection --- # Automatically determine the best device available (CUDA > MPS > CPU) if torch.cuda.is_available(): device = "cuda" elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available(): # MPS (Metal Performance Shaders) is the accelerator for Apple Silicon (M1/M2/M3) device = "mps" else: device = "cpu" print(f"Selected device: {device}") # --- End Device Detection --- model_path = "ibm-granite/granite-4.0-350M" tokenizer = AutoTokenizer.from_pretrained(model_path) # Pass the automatically determined device to device_map model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() # change input text as desired chat = [ { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." }, ] chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) # tokenize the text and move to the selected device input_tokens = tokenizer(chat, return_tensors="pt").to(device) # generate output tokens output = model.generate(**input_tokens, max_new_tokens=100) # decode output tokens into text output = tokenizer.batch_decode(output) # --- Save output to file in Markdown format --- output_dir = "./output" output_file = os.path.join(output_dir, "output.md") # Create the output directory if it doesn't exist. try: os.makedirs(output_dir, exist_ok=True) # Write the output to the Markdown file with open(output_file, "w", encoding="utf-8") as f: # The output from batch_decode is a list, we take the first item (the generated text) f.write(output[0]) # Confirmation message print(f"\nModel output saved successfully to {output_file}") # Optionally print the content to the console for immediate review print("\n--- Generated Content ---\n") print(output[0]) print("\n-------------------------") except Exception as e: print(f"\nAn error occurred while trying to save the output file: {e}") # --- End File Saving Logic --- You’ll get these outputs 📄 <|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}} </tools> For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|> <|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|> <|start_of_role|>assistant<|end_of_role|><tool_call> {"name": "get_current_weather", "arguments": {"city": "Boston"}} </tool_call><|end_of_text|> ====== <|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|> <|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|> <|start_of_role|>assistant<|end_of_role|>IBM Research Laboratory: Cambridge Research Laboratory<|end_of_text|> Et voilà 🥇 Conclusion The Granite 4.0 model family represents a pivotal shift toward highly efficient, enterprise-grade AI, redefining performance by focusing on accessibility rather than sheer scale. A key strength lies in its innovative hybrid Mamba/Transformer architecture, which significantly reduces memory requirements — often by over 70% — enabling robust inference on rudimentary and affordable hardware, including consumer-grade GPUs and edge devices. Crucially, as an open-source offering under the Apache 2.0 license, Granite 4.0 models empower developers with complete operational sovereignty, allowing for deep customization, on-premise deployment for enhanced data privacy, and full transparency. This combination of superior efficiency and open governance lowers the barrier to entry, democratizing advanced AI for complex workflows like RAG and function calling, while ensuring the control and trust necessary for real-world business adoption.
Summary:
Please note that all contributions to JOHNWICK may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
JOHNWICK:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
IBM Granite 4.0-Nano
Add topic