Skip to content

Conversation

@ediaz-caio
Copy link

@ediaz-caio ediaz-caio commented Feb 11, 2026

please delete

ediaz-caio and others added 26 commits November 24, 2025 18:55
…and folder organization

## Summary
Customized Microsoft Content Processing Solution Accelerator for TRS document processing with five major enhancement areas: branding, Excel exports, schema management improvements, enhanced schema score visibility, and folder-based file organization.

## Features Added

### 1. TRS Branding
- Updated header to display 'TRS Document Processing'
- Removed subtitle for cleaner appearance
- Organization-specific branding throughout UI

### 2. Excel Export Functionality
- Grid export: Export processed documents metadata to Excel
- Detailed results export: Export extraction results for single document
- Bulk export: Export results for all completed documents in schema
- Progress tracking with toast notifications
- Schema-aware file naming
- Professional formatting with optimized column widths

### 3. Schema Management & Filtering
- Fixed schema dropdown to properly maintain full object reference
- Added automatic grid filtering by selected schema
- Implemented auto-refresh when schema selection changes
- Schema-specific document views throughout application

### 4. Enhanced Schema Score Display
- Display scores with field ratios: '85% (17/20)'
- Added tooltips showing null/zero confidence field names
- Improved transparency for incomplete extractions
- Backward compatible with existing data

### 5. Folder Organization System
- Optional folder field for organizing files within schemas
- Folder selector in upload modal with autocomplete
- Freeform input to create new folders on-the-fly
- Folder column in grid with sorting capability
- API endpoints for folder management (GET, UPDATE)
- Support for folder filtering in queries

## Technical Changes

### Backend (Python)
- Added folder field to ContentProcess model (optional)
- Added confidence to API projection for grid queries
- Created GET /folders endpoint for listing unique folders
- Created PUT /folder endpoint for updating file folders
- Added get_distinct_values() helper method to Cosmos DB helper
- Updated Paging and ContentProcessorRequest models for folder support
- Made shell scripts executable (post_deployment, register_schema, upload_files)

### Frontend (TypeScript/React)
- Added xlsx dependency (^0.18.5) for Excel exports
- Created comprehensive excelExport utility with 3 export functions
- Enhanced ProcessQueueGrid with folder column and confidence display
- Updated UploadFilesModal with folder selector and autocomplete
- Improved CustomCellRender with tooltips and field ratio display
- Fixed SchemaDropdown to maintain full schema object
- Added export buttons to PanelLeft (grid, bulk exports)
- Added export button to PanelCenter (detailed results)
- Updated Redux uploadFile action to include folder parameter

### Database
- Added optional 'folder' field to ContentProcess collection
- confidence dict now included in API responses
- Fully backward compatible - no migration required
- Recommended indexes: folder, target_schema.Id + folder

## New Files
- src/ContentProcessorWeb/src/utils/excelExport.ts
- src/ContentProcessorAPI/samples/schemas/membercard.py
- src/ContentProcessorAPI/samples/schemas/membercard_schema.json
- src/ContentProcessorAPI/samples/schemas/indexcard.py
- src/ContentProcessorAPI/samples/schemas/indexcard_schema.json
- src/ContentProcessorAPI/samples/schemas/pension_verification.py
- updates.md

## Modified Files (13 total)
- Backend: 4 Python files, 3 shell scripts
- Frontend: 10 TypeScript/React files, package.json

## Testing Notes
- All changes are backward compatible
- Existing records display gracefully with fallback values
- New features are optional and don't break existing workflows
- See updates.md for comprehensive testing checklist

## Documentation
Complete implementation details available in updates.md
- Add bulk delete API endpoint (DELETE /contentprocessor/processed/bulk)
- Implement Redux thunk for bulk delete operation
- Update ProcessQueueGrid with checkbox multi-select
- Comment out Process Time column to make space for checkboxes
- Add bulk delete button and confirmation dialog
- Update httpUtility to support DELETE requests with body
- Make processTime optional in ProcessQueueGridTypes
- Change total_evaluated_fields_count to totalFields
- Change zero_confidence_fields_count to zeroConfidenceCount
- Change zero_confidence_fields to zeroConfidenceFields
- This resolves the 0/0 display issue in schema scores
- Updated frontend confidence field mapping to match backend camelCase naming
- Increased actions column width from 35px to 60px for better visibility
- Schema scores will now display actual field counts instead of 0/0
…result to result

- Changed projection field from 'extracted_result' to 'result' to match Pydantic model
- Updated item.get() to use 'result' instead of 'extracted_result'
- Fixed cleanup to remove 'result' field from response
- This resolves the (0/0) schema score display issue
…pipeline

This fixes the critical bug where schema scores displayed "(0/0)" instead of actual field counts.

Root cause: The API was recalculating confidence by counting NULL values in the result field,
completely ignoring the properly calculated confidence data stored during document processing.

Changes:
- Added "confidence" field to database projection to retrieve stored data
- Modified confidence processing to prioritize stored confidence values
- Maps stored field names (snake_case) to frontend format (camelCase)
- Maintains backward compatibility with legacy data via fallback calculation
- Preserves existing result field cleanup behavior

The stored confidence data includes:
- total_evaluated_fields_count: Actual count from AI evaluation
- zero_confidence_fields_count: Fields with low confidence scores
- zero_confidence_fields: List of problematic field names

This ensures the frontend displays accurate confidence metrics calculated
during the evaluation pipeline rather than simplified NULL counting.
…ted field count

Root cause: Used total_evaluated_fields_count (fields with confidence scores)
instead of total schema field count from comparison_data.items, causing
negative numbers like (-29/12) instead of correct counts like (24/53).

Changes:
- Added extracted_comparison_data to projection to access schema items
- Changed totalFields to use len(comparison_items) for accurate count
- Updated fallback logic to also use schema field count when available
- Clean up extracted_comparison_data from response after processing

This ensures the denominator matches the actual schema field count.
…add folder filtering

- Fix React key bug: Changed from non-unique fileName to unique processId to prevent auto-selection of multiple files
- Added event handling to prevent checkbox conflicts when clicking table rows
- Fixed Schema Score header width to prevent text wrapping (already at 100-120px)
- Added missing file_mime_type field to grid item mapping
- Improved delete error handling with proper async/await and TypeScript interfaces
- Added ProcessedFileResponse interface for type safety, removed any types
- Implemented folder filtering feature:
  * Created FolderFilter component with multi-select Combobox
  * Added folder filter Redux state management (selectedFolders, availableFolders, isLoading)
  * Created fetchFolders async thunk to load available folders
  * Added client-side filtering logic to ProcessQueueGrid
  * Integrated FolderFilter into PanelLeft below SchemaDropdown
  * Support for "(Unassigned)" folders (null/empty values)
Add missing import for CosmosMongDBHelper class which is used by the /folders endpoint to retrieve distinct folder values from MongoDB. This fixes the 'Failed to load folders' error in the UI.
Separate row click behavior from checkbox selection:
- Row click now only sets selected row for review/viewing (single selection)
- Checkbox click handles multi-select for deletion purposes
- Prevents automatic multi-selection when clicking a row to review it

This fixes the issue where clicking a file automatically selected multiple files, preventing proper file review.
Remove unused onClick and onKeyDown parameters from RenderRow destructuring since row clicks no longer trigger selection behavior.
Complete the cleanup by removing the onKeyDown prop that was referencing the removed variable.
Enable checkbox selection by detecting clicks on checkbox elements and calling toggleRow only for those clicks. Row clicks on other areas will set the row for viewing without triggering multi-select.
Registry Configuration:
- Change publicContainerImageEndpoint from cpscontainerreg to crstg6fsvw (line 61)
- Add registry authentication with managed identity (lines 712-717, 766-771, 883-888)
- Configure all container apps to authenticate to crstg6fsvw.azurecr.io

Checkbox Selection Fix:
- Consolidate click handlers in ProcessQueueGrid to prevent interference
- Add role="checkbox" attribute to TableSelectionCell for proper detection
- Individual checkboxes now work independently from row clicks
- Maintains bulk select functionality

Deployed and verified working:
- Web: ca-stg6fsvw-web--0000026 (crstg6fsvw.azurecr.io/contentprocessorweb:1764132441)
- API: ca-stg6fsvw-api--0000025 (crstg6fsvw.azurecr.io/contentprocessorapi:latest)
- Add missing logging import and logger initialization
- Wrap get_folders() endpoint in try-except block
- Add detailed error logging and 500 response on failures
- Prevents unhandled exceptions from crashing the endpoint
- Add Makefile for consistent API deployment

The get_distinct_values() method already exists in CosmosMongDBHelper
(added in commit 3e5d744), so no helper.py changes needed.
Safely deployed and verified:
- API revision: ca-stg6fsvw-api--0000026
- Web revision: ca-stg6fsvw-web--0000026 (unchanged)
- Folders endpoint: Working with proper error handling
- Checkbox selection: Still working correctly
Documentation includes:
- RESTORATION-DOCUMENTATION.md: Complete history of fixes
- DEPLOYMENT-STRATEGY.md: Safe deployment methodology
- DEPLOYMENT-QUICKSTART.md: Step-by-step execution guide
- Makefiles for Web and API for consistent deployments
- README-DEPLOY.md for web deployment instructions
Wait for token before fetching folders to prevent 401 errors:
- Import useAuth hook in FolderFilter component
- Check for token before dispatching fetchFolders
- Add token to useEffect dependency array for proper reactivity

This ensures the folders endpoint is only called after
authentication completes, preventing race conditions.

Deployed to: ca-stg6fsvw-web--0000027
Image: crstg6fsvw.azurecr.io/contentprocessorweb:1764140177
Enhanced env.sh to replace both APP_* and REACT_APP_* placeholders.
This ensures REACT_APP_API_SCOPE and REACT_APP_WEB_SCOPE are
properly set at runtime, fixing 401 authentication errors.

- Replace both naming conventions (APP_* and REACT_APP_*)
- Improved logging for debugging
- Only process .js and .html files for performance

Deployed to: ca-stg6fsvw-web--0000028
Image: crstg6fsvw.azurecr.io/contentprocessorweb:1764173760
This commit fixes two critical issues with folder functionality:

1. Fixed fetchFolders Redux thunk response transformation bug
   - The httpUtility.get() returns {data, status} structure
   - Previous code tried to access response.folders directly (undefined)
   - Now properly extracts data from handleApiThunk before accessing folders array
   - File: src/ContentProcessorWeb/src/store/slices/leftPanelSlice.ts

2. Fixed upload modal folder selection to use authenticated API calls
   - Previous code used unauthenticated fetch() causing 401 errors
   - Now uses authenticated fetchFolders Redux thunk
   - Connects to Redux state for availableFolders
   - Users can now see existing folders in dropdown during upload
   - File: src/ContentProcessorWeb/src/Components/UploadContent/UploadFilesModal.tsx

Both changes ensure proper authentication and correct data handling
for folder operations throughout the application.
Implements a user-controlled toggle to hide/show null values in extracted results.

Features:
- Checkbox control in Extracted Results panel
- Recursively filters null, undefined, and empty string values
- Works with nested objects and arrays
- Preference saved to localStorage (persists between sessions)
- Clean UI integration next to search box
- No impact on data editing or saving

Technical changes:
- Added Checkbox component from Fluent UI
- Implemented filterNullValues recursive function
- Added hideNulls state with localStorage persistence
- Updated data transformation pipeline
- Applied filtering before passing to JsonEditor component

File: src/ContentProcessorWeb/src/Components/JSONEditor/JSONEditor.tsx

User benefit: Cleaner view of extraction results by hiding null/empty fields
while maintaining ability to show all fields when needed for debugging.
Fixes issue where Upload/Close buttons were pushed off-screen when
uploading more than 10 files.

Problem:
- File list used viewport-dependent height (calc(100vh - 358px))
- Long file lists pushed buttons below visible area
- Users had to zoom out to access buttons

Solution:
- Changed to fixed max-height of 400px
- File list now scrolls vertically after ~10-12 files
- Buttons always remain visible at bottom of modal
- Added padding-right for clean scrollbar appearance

Technical changes:
- Changed max-height from calc(100vh - 358px) to 400px
- Changed overflow: auto to overflow-y: auto, overflow-x: hidden
- Added padding-right: 8px for scrollbar spacing

File: src/ContentProcessorWeb/src/Components/UploadContent/UploadFilesModal.styles.scss

User benefit: Can upload up to 100 files with buttons always accessible,
no need to zoom out or resize window.
Created planning document for multi-type external verification system
that validates extracted fields against external APIs and databases.

Verification Types (4):
1. Notary - notary registries, license validation
2. Doctor/Medical Provider - NPI registry, state medical boards
3. Identity (DL/Passport) - DMV databases, passport services
4. Death Certificate - vital records, SSDI

Key Features:
- Schema-specific configuration (different verifications per document type)
- Async verification with parallel API calls
- Multiple verification methods per type
- Confidence-based verification (only verify high-confidence extractions)
- Comprehensive error handling and retry logic
- Detailed verification results with audit trail

Architecture:
- New VerifyHandler in pipeline between Evaluate and Save
- Verification router dispatches to appropriate verifier classes
- Results stored in Cosmos DB with verification metadata
- Queue-based async processing

Configuration:
- Environment variables for all API endpoints
- Schema-level verification rules in Cosmos DB or blob storage
- Field mappings and requirements per schema
- Confidence thresholds configurable

Use Cases:
- Legal documents (notary verification)
- Medical records (doctor/provider verification)
- KYC/identity verification (license/passport)
- Estate/probate documents (death certificate verification)

Status: Planning phase - ready for implementation when needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add schema-specific external verification for doctor credentials using NPI Registry API. This enables verification of physician information on TRS Retirement Allowance Verification Forms.

Core Components:
- VerifyHandler: New pipeline handler for external verification
- DoctorCredentialVerifier: NPI Registry API client with caching
- Verification data models: VerificationType, VerificationStatus, VerificationResult, VerificationMetadata

Data Model Changes:
- Extended ExtractionComparisonItem with verification fields (VerificationStatus, VerificationDetails, VerifiedAt, VerificationResponseTime)
- Updated both ContentProcessor and API layer models for consistency
- Added verification_metadata to ContentProcess model

Configuration:
- Added 6 new optional config fields to AppConfiguration
- Verification disabled by default (app_verify_enabled: false)
- Confidence threshold: 0.70 (70%)
- NPI Registry endpoint: https://npiregistry.cms.hhs.gov/api/
- Timeout: 30 seconds

Features:
- Schema-driven verification routing
- NPI Registry lookup (free public API, no auth required)
- In-memory caching for performance optimization
- Confidence-based verification (skips low-confidence fields)
- Graceful degradation on API failures
- Passthrough mode when verification disabled
- Detailed verification metadata and statistics

Backward Compatibility:
- All verification fields are Optional with None defaults
- Verification disabled by default
- No changes to existing pipeline behavior when disabled
- Existing documents/schemas work unchanged

Status: Core implementation complete, ready for configuration and testing
Create schema definition for Teachers' Retirement System (TRS) Retirement Allowance Verification Form. This schema defines all fields on the form, with physician credentials configured for NPI verification.

Schema Fields:
- Member information: member_name, member_id
- Physician information: physician_name, physician_npi, physician_license_number, state_issuing_license
- Verification details: physician_signature, verification_date, disability_status
- Contact information: physician_address, physician_phone

Verification Configuration:
- Schema-specific verification enabled for physician fields
- Field patterns match: physician*, doctor*, npi, license
- Non-physician fields (member info, dates) are extracted but not verified
- Verification handler uses field patterns to selectively verify credentials

Files:
- retirement_allowance_verification.py: Pydantic model schema
- schema_info_sh.json: Updated schema registry (Linux/Mac)
- schema_info_ps1.json: Updated schema registry (Windows)
Create detailed testing guide for NPI verification feature covering all deployment and testing phases for NYC TRS implementation.

Testing Phases:
1. Backward compatibility testing (verification disabled)
2. TRS schema upload and registration
3. Configuration and environment setup
4. Full verification testing with sample NPIs
5. Production monitoring and validation

Includes:
- Step-by-step deployment instructions
- Azure CLI commands for configuration
- Test cases for valid/invalid/timeout scenarios
- Troubleshooting guide for common issues
- Rollback procedures
- Success criteria checklist
- Sample output structure
Create comprehensive plan for implementing notary credential verification using NY Department of State Commissioned Notaries Public database via Socrata Open Data API.

Verification Features:
- Commission number verification (primary method)
- Name + county verification (fallback)
- Expiration date checking
- Name mismatch detection
- Multi-match handling

API Details:
- Endpoint: data.ny.gov/resource/rwbv-mz6z.json
- Source: NY State Department of State, Division of Licensing
- Update frequency: Daily
- Rate limits: 1000/day (100k with app token)
- Authentication: Optional app token for higher limits

Implementation Components:
- NotaryCredentialVerifier service class
- Integration with existing VerifyHandler
- Configuration additions for notary API
- Schema-specific verification rules
- Caching strategy for performance

Document Types:
Power of Attorney, Affidavits, Deeds, Wills, Loan Documents, Sworn Statements, Retirement applications

Estimated effort: 1 day of development
Change description from "TRS Retirement Allowance Verification Form" to "TRS Medical Verification" for better UI display.
Completely rebuilt retirement_allowance_verification.py to follow pension_verification.py pattern. Schema now extracts ALL fields from the form, with verification handler selectively verifying only physician credentials.

Schema Structure:
- Member Information: name, TRS ID, address, contact (phone, email)
- Physician Certification: name, license, NPI, signature, specialty, contact
- Disability Details: status, diagnosis, start date, certification statement
- Notary Section: state, county, signature, commission (if form supports notary option)
- Form Metadata: form date, form number

Key Changes:
- Flat structure (no nested objects) matching pension_verification pattern
- Comprehensive field coverage for entire form
- Added from_json() method for consistency
- Realistic example() data
- Added notary fields for alternative certification path
- All physician_* fields will be selectively verified by verify_handler

Verification Behavior:
- Schema extracts ALL fields
- Verification handler uses field patterns ["physician", "doctor", "npi", "license"]
- Only physician credentials verified against CMS NPI Registry
- Member information extracted but NOT verified
Add support for verifying physicians by name+state when NPI is not available. This matches the actual TRS Retirement Allowance Verification Form which only has physician name, license number, and state - no NPI field.

New Verification Method:
- _verify_by_name_and_state(): Searches CMS NPI Registry by first_name + last_name + state
- Handles exact name matching with preference for active providers
- Returns single match as VERIFIED, multiple matches as INVALID, no matches as NOT_FOUND
- Parses "Dr. Natalia Polyakova" into first="Natalia", last="Polyakova"

Verification Priority (updated):
1. NPI number (if available) - exact lookup
2. State license + state (if API configured) - direct license verification
3. Name + state (NEW - fallback) - name-based search
4. No identifiers - mark as not found

Benefits:
- Works with actual TRS form fields (no NPI required)
- Successfully verified Dr. Natalia Polyakova using name="Natalia Polyakova" + state="NY"
- Returns NPI from search results for record keeping
- Handles ambiguous cases (multiple matches) appropriately

CMS API Parameters Used:
- first_name, last_name, state, version=2.1, limit=10
Remove retirement_allowance_verification.py as it's redundant with pension_verification.py. Both represent the same form (RP68 Retirement Allowance Verification Form).

Why pension_verification.py is better:
- Has complete Part C Attestation section (retirement_allowance was missing this)
- Matches actual RP68 form exactly (30 fields)
- Already registered and working in production
- No extra fields that don't exist on the form

Why retirement_allowance_verification.py was problematic:
- Missing entire Part C Attestation section
- Had 13 fields NOT on the actual RP68 form (npi, specialty, diagnosis, etc.)
- Bloated with 43 fields vs 30 needed
- Never successfully registered (500 errors)

Changes:
- Deleted retirement_allowance_verification.py
- Updated schema_info_sh.json to reference pension_verification.py
- Updated schema_info_ps1.json to reference pension_verification.py
- Updated description to clarify this is RP68 form

Verification will work with pension_verification.py using name+state lookup (no NPI required).
- Fix verify_handler.py: Add Pydantic fields, handler_name, null checks
- Update main.bicep: Add 'verify' to APP_PROCESS_STEPS
- Update azure.yaml: Add predeploy hook for image building
- Add verify step to pipeline configuration (API reads APP_PROCESS_STEPS)
- Add Steps.Verify enum for pipeline step mapping
- Add notary verification using NYS Department of State database
- Add API endpoints for schema verification config (GET/PUT/DELETE)
- Update verify_handler to support both doctor and notary verification
SetVerificationConfig was passing the full document including MongoDB's
immutable _id field, causing the update to silently fail. Now only
updates the verification_config field.
- Set app_verify_enabled default to True in application_configuration.py
- Add APP_VERIFY_ENABLED=true to Azure App Configuration in main.bicep
@ediaz-caio ediaz-caio force-pushed the feature/npi-verification branch from 4c1f4c1 to f63c3e9 Compare February 12, 2026 20:31
@ediaz-caio ediaz-caio closed this Feb 12, 2026
@ediaz-caio ediaz-caio deleted the feature/npi-verification branch February 12, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant