adds jais2 model support #42684

sarathc-cerebras · 2025-12-07T13:43:09Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2025-12-08T16:46:01Z

Hi @sarathc-cerebras, thank you for the PR! The main thing missing is a conversion to modular format. You can look at the modular files for other models to see how it works, but it reduces the size of the PR a lot by importing duplicated code from other models.

sarathc-cerebras · 2025-12-09T12:57:31Z

@Rocketknight1 thanks for bringing this up, i have updated it to use the modular format

HuggingFaceDocBuilderDev · 2025-12-09T15:40:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1

Yes, this looks good! I made a few comments but they're small.

Rocketknight1 · 2025-12-09T16:19:57Z

src/transformers/models/jais2/modular_jais2.py

+class Jais2MLP(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.config = config
+        self.hidden_size = config.hidden_size
+        self.intermediate_size = config.intermediate_size
+        self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
+        self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=config.mlp_bias)
+        self.act_fn = ACT2FN[config.hidden_act]
+
+    def forward(self, x):
+        return self.down_proj(self.act_fn(self.up_proj(x)))


I think you can import this class too! We have a few other models that don't use gated linear units in the MLP. Maybe nemotron?

thanks for suggesting.. imported for nemotron now 👍

Rocketknight1 · 2025-12-09T16:22:59Z

docs/source/en/model_doc/jais2.md

+
+## Overview
+
+Jais2 is a large language model developed by MBZUAI, Inception and Cerebras Systems. It is based on the transformer architecture with several modifications including:


We should probably mention that it's Arabic-focused here, right? That's one of the main selling points for jais / jais2!

i have updated as requested

Rocketknight1 · 2025-12-09T16:31:33Z

tests/models/jais2/test_modeling_jais2.py

+        model = Jais2ForCausalLM.from_pretrained(
+            self.checkpoint,
+            device_map="auto",
+            torch_dtype=torch.float16,


Some tests are float16 and some are bfloat16 - is this intended? If it's copied from another model then it's probably fine 😅

changed all to float16

docs/source/en/model_doc/jais2.md

vasqu

Left some comments, I think we can still simplify a bit and update a few things to be up to date with our current standards. Overall, looking really good already tho

docs/source/en/model_doc/jais2.md

vasqu · 2025-12-10T13:05:19Z

src/transformers/models/auto/configuration_auto.py

We need something in tokenization_auto as well.

vasqu · 2025-12-10T13:06:00Z

src/transformers/models/jais2/__init__.py

This is the old init structure we had, can you take a look at Llama for example

transformers/src/transformers/models/llama/__init__.py

Lines 1 to 28 in b9951b4

# Copyright 2024 The HuggingFace Team. All rights reserved.

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

from typing import TYPE_CHECKING

from ...utils import _LazyModule

from ...utils.import_utils import define_import_structure

if TYPE_CHECKING:

from .configuration_llama import *

from .modeling_llama import *

from .tokenization_llama import *

else:

import sys

_file = globals()["__file__"]

sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)

Much simpler

src/transformers/models/jais2/modular_jais2.py

vasqu · 2025-12-10T13:14:47Z

src/transformers/models/jais2/modular_jais2.py

+class Jais2ForSequenceClassification(LlamaForSequenceClassification):
+    pass
+
+
+class Jais2ForQuestionAnswering(LlamaForQuestionAnswering):
+    pass
+
+
+class Jais2ForTokenClassification(LlamaForTokenClassification):
+    pass


I'd like to avoid additional classes unless we have a reason to include them

vasqu · 2025-12-10T13:15:37Z

tests/models/jais2/test_modeling_jais2.py

+    )
+
+
+JAIS2_8B_CHECKPOINT = "inceptionai/Jais-2-8B-Chat"


Would avoid having a constant here, let's just move the string directly

vasqu · 2025-12-10T13:16:59Z

tests/models/jais2/test_modeling_jais2.py

+    def setUp(self):
+        self.tokenizer = AutoTokenizer.from_pretrained(self.checkpoint)
+        if self.tokenizer.chat_template is None:
+            self.tokenizer.chat_template = (
+                "{% for message in messages %}{{ message['role'] + ': ' + message['content'] + '\n' }}{% endfor %}"
+            )
+
+    def tearDown(self):
+        backend_empty_cache(torch_device)
+        gc.collect()


Should be enough instead:

transformers/tests/models/llama/test_modeling_llama.py

Lines 67 to 75 in b9951b4

def setup(self):

cleanup(torch_device, gc_collect=True)

def tearDown(self):

# TODO (joao): automatic compilation, i.e. compilation when `cache_implementation="static"` is used, leaves

# some memory allocated in the cache, which means some object is not being released properly. This causes some

# unoptimal memory usage, e.g. after certain tests a 7B model in FP16 no longer fits in a 24GB GPU.

# Investigate the root cause.

cleanup(torch_device, gc_collect=True)

We can load the tokenizer as well there

vasqu · 2025-12-10T13:18:05Z

tests/models/jais2/test_modeling_jais2.py

+
+    @slow
+    @require_torch_accelerator
+    def test_model_logits(self):


Let's reduce the amounts of tests to 2-3 tests, e.g. one fp16 logits test, and a generation test. No need to go over the board here.

github-actions · 2025-12-10T13:58:54Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jais2

github-actions · 2025-12-10T14:34:48Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42684&sha=5090c1

sarathc-cerebras force-pushed the add-jais2-model branch from 377e2b8 to ab785fc Compare December 9, 2025 12:01

sarathc-cerebras force-pushed the add-jais2-model branch 4 times, most recently from 2ae7204 to 672e38a Compare December 9, 2025 14:13

sarathc-cerebras force-pushed the add-jais2-model branch from 9e0839b to 7dfa45e Compare December 9, 2025 16:10

Rocketknight1 approved these changes Dec 9, 2025

View reviewed changes

sarathc-cerebras force-pushed the add-jais2-model branch from a363e45 to e363470 Compare December 9, 2025 16:58

vasqu reviewed Dec 10, 2025

View reviewed changes

sarathc-cerebras added 3 commits December 10, 2025 18:25

adds jais2 model support

7abf11e

updates tests

eed8cd1

addresses review comment

5090c18

sarathc-cerebras force-pushed the add-jais2-model branch from 2f9713c to 5090c18 Compare December 10, 2025 14:25


		## Overview

		Jais2 is a large language model developed by MBZUAI, Inception and Cerebras Systems. It is based on the transformer architecture with several modifications including:

	# Copyright 2024 The HuggingFace Team. All rights reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software
	# distributed under the License is distributed on an "AS IS" BASIS,
	# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	# See the License for the specific language governing permissions and
	# limitations under the License.
	from typing import TYPE_CHECKING

	from ...utils import _LazyModule
	from ...utils.import_utils import define_import_structure


	if TYPE_CHECKING:
	from .configuration_llama import *
	from .modeling_llama import *
	from .tokenization_llama import *
	else:
	import sys

	_file = globals()["__file__"]
	sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)

	def setup(self):
	cleanup(torch_device, gc_collect=True)

	def tearDown(self):
	# TODO (joao): automatic compilation, i.e. compilation when `cache_implementation="static"` is used, leaves
	# some memory allocated in the cache, which means some object is not being released properly. This causes some
	# unoptimal memory usage, e.g. after certain tests a 7B model in FP16 no longer fits in a 24GB GPU.
	# Investigate the root cause.
	cleanup(torch_device, gc_collect=True)

		)


		JAIS2_8B_CHECKPOINT = "inceptionai/Jais-2-8B-Chat"

adds jais2 model support #42684

Are you sure you want to change the base?

adds jais2 model support #42684

Conversation

sarathc-cerebras commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Dec 8, 2025

Uh oh!

sarathc-cerebras commented Dec 9, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 9, 2025

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sarathc-cerebras commented Dec 7, 2025 •

edited

Loading