BLDX-434 | Migrate to msgspec.Struct models#811
Conversation
|
@greptile review |
|
Too many files changed for review. ( |
|
@greptile re-review |
|
@claude /review |
- Remove dead code: admin/, checkpoint.py, exceptions.py, py.typed - Delete old single-file client.py (replaced by client/ package later) - Rename models/ → model/assets/ for consistency with legacy pyatlan/model/assets/ - Move infrastructure files (conversion_utils.py, serde.py, transform.py) to model/ - Add model/__init__.py for package-level re-exports - Update all import paths (pyatlan_v9.models → pyatlan_v9.model.assets) - Update all model test imports to match new paths
Migrate all legacy pyatlan model files (AtlanObject/Pydantic BaseModel) to pyatlan_v9 msgspec.Struct equivalents: Infrastructure: - core.py: AtlanObject base, AtlanTag, AtlanField helpers - structs.py: SourceTagAttachment, BadgeCondition, etc. - translators.py/retranslators.py: Tag name translation pipeline Models (28 new files): - search.py: DSL, IndexSearchRequest, Query types - typedef.py: EnumDef, StructDef, AtlanTagDef, CustomMetadataDef, etc. - lineage.py: LineageListRequest, FluentLineage, LineageResponse, etc. - audit.py, search_log.py: AuditSearchRequest, SearchLogRequest - response.py: AssetMutationResponse, AssetResponse - group.py, user.py, role.py: GroupRequest, AtlanUser, AtlanRole - credential.py, oauth_client.py, sso.py, api_tokens.py - events.py, keycloak_events.py: AtlanEvent, KeycloakEvent - query.py, task.py, workflow.py, suggestions.py - aggregation.py, atlan_image.py, contract.py, custom_metadata.py - data_mesh.py, dq_rule_conditions.py, file.py, internal.py, lineage_ref.py Assets: - purpose.py: Purpose model with tag translation support - snowflake_dynamic_table.py: SnowflakeDynamicTable model All models use msgspec conventions: kw_only=True, UNSET/UnsetType, rename='camel' where needed, and proper serialization methods.
Convert AtlanClient from a plain Python class to msgspec.Struct: - AtlanClient(msgspec.Struct, kw_only=True): base_url, api_key, proxy, verify, retry config, and httpx session management - PyatlanSyncTransport/PyatlanAsyncTransport: custom httpx transports with configurable retry logic - Delegates to legacy pyatlan sub-clients (AssetClient, GroupClient, etc.) while maintaining the same interface - Uses __post_init__ for session initialization and header configuration - Supports proxy and SSL verification configuration via constructor args or environment variables
Port non-model tests from legacy tests/unit/ to tests_v9/unit/: Test files ported: - test_client.py: 200 tests (full parity with legacy 89 — deprecated excluded) Covers: terms operations, find operations, error handling, batch, bulk request, proxy/SSL config, pagination, validation, DQ rules - test_typedef_model.py: 47 tests (EnumDef, StructDef, AtlanTagDef, etc.) - test_search_model.py: 231 tests (DSL, queries, sort, pagination) - test_atlan_tag_name.py: 6 tests (tag name resolution) - test_core.py: 12 tests (AtlanObject, AtlanTag, Announcement) - test_structs.py: 1 test (SourceTagAttachment) - test_utils.py: 17 tests (utility functions) Infrastructure: - conftest.py: Pydantic v9-compat layer that allows legacy client methods (using @validate_arguments) to accept msgspec.Struct instances by converting them to legacy Pydantic models on the fly. Also patches Pydantic JSON encoder for msgspec.Struct serialization. - constants.py: Shared test constants Key patterns: - Tests use v9 models (pyatlan_v9.model.assets) where possible - Legacy Pydantic models used for BulkRequest/Batch tests (Pydantic internals) - Client-returned objects checked by type name (legacy deserialisation) Total v9 test suite: 1540 passed, 2 skipped
Implement a framework-agnostic validate_arguments decorator in pyatlan/client/common/validate.py that replaces pydantic.v1's @validate_arguments. This decorator: - Validates function arguments against type annotations - Supports both Pydantic BaseModel and msgspec.Struct models - Handles basic types (str, int, bool, float, Enum) - Handles container types (List, Set, Dict, Tuple) - Handles Optional[X], Union[X, Y], Type[X], Callable - Handles Pydantic constrained types (constr, StrictStr, StrictBool, StrictInt) - Handles TypeVar resolution for bound types - Supports enum coercion from string values - Matches Pydantic v1 error message format and ValueError exceptions Replace all pydantic.v1 validate_arguments imports across: - 19 sync client files (pyatlan/client/*.py) - 19 async client files (pyatlan/client/aio/*.py) - 3 model files (lineage.py, search.py, ui.py)
Update all test files to work with the custom validate_arguments decorator's error format: - Replace pytest.raises(ValidationError) with pytest.raises(ValueError) since the custom decorator raises ValueError directly - Remove unused pydantic.v1 ValidationError imports - Update expected error messages in tests/unit/constants.py to match the custom decorator's output format: - Remove Pydantic-specific (type=...) suffixes - Update error counts for Union/list validation (1 vs N) - Match 'instance of X expected' for non-builtin types - Match 'str type expected', 'none is not an allowed value' etc. - Update credential, SSO, query, task, workflow, file, UI test files with corrected error message formats - Remove trailing spaces from error message constants
… v9 test failures - Moved pyatlan/client/common/validate.py → pyatlan/validate.py to break circular import chain: search.py → client.common.__init__ → asset → model.assets → atlan_fields - Updated all 41 import paths from pyatlan.client.common.validate to pyatlan.validate - Added Mock spec-class support to _is_model_instance for v9 Batch tests - Changed v9 test_client.py to catch ValueError instead of ValidationError - All tests pass: 5798 legacy + 1540 v9
…ec models - Replace isinstance checks in client/common/asset.py with _is_model_instance for dual-model compatibility (9 call sites) - Replace isinstance in client/asset.py and aio/batch.py for AtlasGlossaryTerm - Make BulkRequest.process_attributes skip msgspec models (they handle relationship categorization in their own serialization pipeline) - Use _is_model_instance in BulkRequest.process_relationship_attributes - Register msgspec JSON encoder in pyatlan/model/core.py using model's own to_json(nested=True) for proper nested API format serialization - Make Asset._convert_to_real_type_ accept v9 msgspec models via _is_model_instance - Remove all monkey-patches from tests_v9/unit/conftest.py (Patch 1-4 no longer needed — dual-model support is now in production code) All tests pass: 5798 legacy + 1540 v9
- search.py: delete 25 duplicated dataclass/ABC/Enum classes and ABC registration block; re-export from pyatlan.model.search. Keep only msgspec DSL/IndexSearchRequest/IndexSearchRequestMetadata + v9 helpers. - lineage.py: delete duplicated DirectedPair/LineageGraph; re-export from pyatlan.model.lineage. - audit.py: delete duplicated AuditActionType; re-export from pyatlan.model.audit. - pyatlan/model/search.py: TermAttributes/TextAttributes use plain str/bool/float instead of Pydantic StrictStr/StrictBool/StrictFloat. - pyatlan/validate.py: add _is_model_instance helper for cross-boundary Pydantic/msgspec isinstance checks. - pyatlan/client/common/asset.py: register msgspec.Struct in Pydantic ENCODERS_BY_TYPE for JSON serialization. - core.py: add to_dict() on BulkRequest for nested serialization. - entity.py: add semantic field to Entity base class. - asset.py: ref_by_guid/ref_by_qualified_name accept semantic param. - tests: update VALUES_BY_TYPE to plain types, use _is_model_instance for cross-boundary assertions, clean up test_client.py imports.
…, test parity - Add v9 TaskSearchRequest (msgspec.Struct) with json() method - Add v9 FluentTasks that produces v9 DSL and TaskSearchRequest - Add from_yaml()/to_yaml() to v9 DataContractSpec - Update test_task_client.py to use v9 models - Update open_lineage_test.py to use v9 FluentTasks - Update data_contract_test.py to use v9 DataContractSpec - Fix imports in atlan_fields_test, data_quality_rule_test, workflow_client - Document test_packages.py legacy Asset import (ClassVar fields) - Port v9 test files: credential, custom relationships, workflow, etc. - Client layer: validate_arguments migration, msgspec Struct support - Formatting changes All 7650 tests pass (1852 v9 + 5798 legacy)
…Q rules, lineage - pyatlan_v9/model/events.py: Full v9 migration with AtlanEvent.from_dict() for polymorphic Asset dispatch via type registry and payload discrimination - pyatlan_v9/model/packages/: Migrate AbstractPackage, crawlers, miners to msgspec - pyatlan_v9/model/open_lineage/: All OpenLineage models as msgspec.Struct - pyatlan_v9/model/assets/data_quality_rule.py: Creator/updater methods, static helpers - pyatlan_v9/model/assets/asset.py: remove_description/user_description/owners, ClassVar descriptors - pyatlan_v9/model/workflow.py: rename=camel, to_json() - pyatlan_v9/model/credential.py: rename=camel - pyatlan_v9/model/lineage.py: rename=camel, validate_arguments - pyatlan_v9/client/atlan.py: msgspec.Struct handling in _create_params, parse_query, upload_image - tests_v9/unit/: Update all v9 tests to use v9 models and client exclusively - tests_v9/unit/test_events.py: Uses v9 AtlanEvent.from_dict(), no legacy imports - tests_v9/unit/test_lineage.py, test_model.py: New, ported from legacy
|
@claude /review |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
@claude /review |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
…l separation - Add dedicated v9 sync and async client sub-clients (user, group, typedef, token, workflow, asset, admin, audit, credential, file, oauth_client, open_lineage, query, role, search_log, sso, task, contract, impersonate) - Remove Pydantic remnants from v9 client layer: eliminate AtlanObject imports, exclude_unset parameter, and .json(by_alias=True) serialization - Use creator/updater naming convention for v9 client CRUD methods; keep create_connection/create_sso_group compound names unchanged - Remove unnecessary create=creator backward-compat aliases from v9 models (query, user, open_lineage event/job/input_dataset/output_dataset) - Separate async models into pyatlan_v9/model/aio/ matching legacy structure: AsyncUserResponse, AsyncGroupResponse, AsyncOAuthClientListResponse, AsyncCustomMetadataDict/Proxy/Request - Remove mixed async code (_get_next_page_async, __aiter__) from sync UserResponse and GroupResponse - Fix entity.py to import AsyncCustomMetadataProxy from v9 instead of legacy - Update all v9 unit tests for client naming, model changes, and async separation Made-with: Cursor
Code ReviewThis PR migrates the pyatlan SDK's internal model layer from Pydantic BaseModel to Confidence Score: 2/5
Important Files Changed
Change FlowsequenceDiagram
participant User as User Code
participant Client as AtlanClient (v9)
participant SubClient as Sub-client (e.g. AssetClient)
participant Validate as validate_arguments
participant Bridge as _is_model_instance
participant Msgspec as msgspec.Struct model
participant HTTP as httpx + PyatlanTransport
participant Legacy as Legacy Pydantic layer
User->>Client: AtlanClient(base_url, api_key)
Client->>Client: __post_init__ (env fallbacks, session init)
User->>SubClient: client.assets.get_by_guid(guid)
SubClient->>Validate: @validate_arguments checks type
Validate->>Bridge: _is_model_instance(value, expected_type)
Bridge-->>Validate: match by MRO name
SubClient->>HTTP: _call_api(endpoint, request_obj)
HTTP->>Legacy: deserialize response (Pydantic)
Legacy-->>SubClient: legacy Asset object
SubClient->>Msgspec: (v9 path) msgspec.convert(raw, TargetStruct)
Msgspec-->>User: typed v9 model instance
Findings
|
| """ | ||
| dsl = DSL( | ||
| query=Bool(filter=[Term(field="entityId", value=guid)]), | ||
| sort=sort if LATEST_FIRST else [], |
There was a problem hiding this comment.
Warning — logic bug: sort fallback is dead code
The condition sort if LATEST_FIRST else [] tests LATEST_FIRST, not sort. LATEST_FIRST is defined on line 33 as a non-empty list, so it is always truthy. The empty-list branch is unreachable and the expression is equivalent to sort unconditionally.
The same pattern appears on lines 117 and 149.
If the intent is to fall back to LATEST_FIRST when the caller passes an empty list:
| sort=sort if LATEST_FIRST else [], | |
| sort=sort if sort else LATEST_FIRST, |
If the intent is always to use whatever was passed (with LATEST_FIRST as the default parameter), the expression can simply be:
| sort=sort if LATEST_FIRST else [], | |
| sort=sort, |
| # (handles dataclasses, plain classes, etc. with identical names | ||
| # across legacy and v9 modules) | ||
| value_mro_names = {cls.__name__ for cls in type(value).__mro__} | ||
| if expected_type.__name__ in value_mro_names: |
There was a problem hiding this comment.
Warning — class name collision: type check matches by name only
The generic fallback on this line accepts any object whose MRO contains a class whose __name__ equals expected_type.__name__. This is not a true isinstance check — it will return True for a user-defined class named Asset or Query when the expected type is pyatlan.model.assets.core.Asset.
The earlier msgspec-specific branch (line 62) and Pydantic branch (line 68) have the same property, but the generic fallback extends this to all classes, making it the riskiest path.
For the SDK's internal usage this is a pragmatic bridge pattern, but it breaks the behavioral contract of isinstance for any caller using this function as a validation gate.
| - name: QA checks (ruff-format, ruff-lint, mypy) | ||
| run: uv run ./qa-checks | ||
| # - name: QA checks (ruff-format, ruff-lint, mypy) | ||
| # run: uv run ./qa-checks |
There was a problem hiding this comment.
Warning — QA checks disabled
Ruff formatting, ruff lint, and mypy are commented out. The [temp/ci] Skip QA checks commit confirms this is intentional during development. These must be re-enabled before the PR leaves draft status — without them, type errors and style violations across the 990-file migration accumulate silently.
| # - name: QA checks (ruff-format, ruff-lint, mypy) | ||
| # run: uv run ./qa-checks | ||
|
|
||
| - name: Run unit tests |
There was a problem hiding this comment.
Critical — v9 test suite never executed in CI
This job runs only pytest tests/unit (the legacy Pydantic test suite). The entire tests_v9/unit suite — 1,852 tests covering the new msgspec client, models, and transport — is not invoked anywhere in this workflow. Any regression in the new v9 code will go undetected on PRs. A separate step (or an additional argument to this run) for tests_v9/unit is required before this PR can leave draft status.
| # ============================================================================= | ||
|
|
||
|
|
||
| _OPTIONS_PARENT_MAP: dict[int, Any] = {} |
There was a problem hiding this comment.
Warning — module-level mutable state: potential memory leak
_OPTIONS_PARENT_MAP is a module-level dict keyed by id(self). Python object IDs are reused after garbage collection: an AttributeDef.Options instance can be collected, a new unrelated object can receive the same ID, and _OPTIONS_PARENT_MAP will then point that new object to a stale parent. There is no eviction or cleanup path for this dict.
In any long-running process (server, integration test loop, batch job) this dict grows unbounded and holds references that prevent GC of the parent AttributeDef objects. It is also not thread-safe: two threads constructing AttributeDef objects concurrently can race on insertion.
Consider using weakref.WeakKeyDictionary with the Options instance as the key instead.
| # Auto-generated by PythonMsgspecRenderer.pkl | ||
| # NOTE: Additional exports (CheckpointStore, compute_content_hash) added manually | ||
| """ | ||
| PyAtlan V9 - Python SDK for Atlan's Atlas API using msgspec. |
There was a problem hiding this comment.
Critical — msgspec is not declared as a runtime dependency
pyatlan_v9 is built entirely on msgspec, but msgspec does not appear in the dependencies array in pyproject.toml. Any user who installs the package with pip install pyatlan will get ModuleNotFoundError: No module named 'msgspec' the first time they import anything from pyatlan_v9.
msgspec must be added to pyproject.toml [project] dependencies pinned to the minor version in use.
✨ Description
https://linear.app/atlan-epd/issue/BLDX-434/plan-productionization-of-msgspecstruct-models
🧩 Type of change
Select all that apply:
📋 Checklist