Task ID: TASK-001 Date: 2026-01-14 Status: Completed Time Spent: 60 mins


Summary

Reviewed the CRS Data Loader notebooks from the perspective of a first-time user with no DIGIT platform context.

Overall Score: 4/10

Issues Found: 19 total (5 P0, 8 P1, 6 P2)


Files Reviewed

File Purpose Code Lines Print Statements
DataLoader.ipynb Unified 4-phase workflow 2,279 329 (14%)
1_TenantAndCommonMaster.ipynb Phase 1 + 3 1,356 222 (16%)
2_BoundarySetup.ipynb Phase 2: Boundaries 1,099 162 (14%)
3_EmployeeOnboarding.ipynb Phase 4: Employees 912 129 (14%)
Tenant And Branding Master.xlsx Tenant config 3 sheets -
Common and Complaint Master.xlsx Dept/Complaints 3 sheets -
Employee_Master_Dynamic_statea.xlsx Employee data 7 sheets -

Total: 5,646 lines of code, ~850 print statements


BUCKET 1: Initial UX/Workflow Issues

First-pass review of notebook usability

1. No Clear Starting Point [P0]

Problem: 4 notebooks exist - unclear which to use:

  • DataLoader.ipynb - all 4 phases
  • Notebooks/1_TenantAndCommonMaster.ipynb - Phase 1 + 3
  • Notebooks/2_BoundarySetup.ipynb - Phase 2
  • Notebooks/3_EmployeeOnboarding.ipynb - Phase 4

User asks: “Do I run DataLoader.ipynb OR run 1→2→3 in sequence?”

Fix: Add START_HERE.md with clear workflow guidance.


2. Phase Numbering Confusion [P0]

Problem: Notebook 1 says “Phase 0, 1, 3” (skips 2). Step 2.1/2.2 are actually Phase 3 content.

User sees:

Phase 0: Authentication
Phase 1: Tenant Setup
Phase 3: Common Masters  ← Where's Phase 2?
Step 2.1: Upload Common Master  ← Is this Phase 2 or Step 2?

Fix: Consistent numbering throughout.


3. Authentication Not Shared Between Notebooks [P1]

Problem: Each notebook has its own authentication cell. Opening Notebook 2 after Notebook 1 requires re-authentication.

Fix: Persist auth token to file or use shared module.


4. No Dry-Run / Validation Mode [P1]

Problem: Can’t validate data without actually uploading. No preview of what will happen.

Fix: Add “Validate Only” button that checks without uploading.


5. No DIGIT Concepts Explanation [P1]

Problem: Assumes user knows:

  • What MDMS is
  • What a “tenant” means in DIGIT
  • Difference between pg (root) and pg.citya (child)
  • What boundary hierarchy means
  • What GRO, CSR, LOC_ADMIN roles do

Fix: Add glossary section or link to documentation.


6. Poor Error Messages [P2]

Problem: Generic errors with no actionable guidance:

❌ Authentication failed. Please check your credentials.

Fix: Specific errors:

❌ Authentication failed: 401 Unauthorized
   → Check your username/password
   → Ensure you have MDMS_ADMIN role
   → Try logging into web portal first

7. No Rollback / Undo Instructions [P2]

Problem: Notebooks only show CREATE operations. No guidance on:

  • How to delete a tenant
  • How to fix wrong data
  • How to undo a mistake

Fix: Add troubleshooting section with rollback procedures.


BUCKET 2: Architectural Issues

Deeper analysis of design problems

A1. Leaky Abstraction - API Details Exposed [P0]

Problem: Notebooks expose internal API implementation details users don’t need.

User sees:

token_url = f"{self.auth_url}/oauth/token"
response = requests.post(url, json=payload, headers=headers)
self.mdms_url = f"{self.base_url}{mdms_v2_service}"

User should see:

loader.login()  # That's it

Evidence: 27 API references visible across notebooks.

Impact: User thinks they need to understand DIGIT’s API architecture. They don’t.

Fix: Hide ALL API details in library. Notebook shows only inputs → results.


A2. Assumes User Needs Internal Knowledge [P0]

Problem: Notebook designed as if user needs to understand DIGIT internals.

Reality: User loading tenant data doesn’t need to know:

  • OAuth2 token flow
  • Service endpoint paths (/egov-mdms-service/v2/_search)
  • Payload schemas
  • How MDMS stores data

They just need:

  1. Fill Excel template
  2. Enter credentials
  3. Click “Upload”

Fix: Black box approach - Excel in, success/failure out.


Problem: Zero links to official DIGIT documentation.

Search: grep -r "digit.org/docs\|docs.digit" → 0 matches

Impact:

  • No way to learn more
  • No context for concepts
  • No troubleshooting resources

Fix: Add links:

## Learn More
- [DIGIT Platform Overview](https://core.digit.org/platform/overview)
- [MDMS Documentation](https://core.digit.org/platform/core-services/mdms)
- [Tenant Setup Guide](https://core.digit.org/guides/tenant-setup)

A4. Poor Signal-to-Noise Ratio [P1]

Problem: 14-16% of code is print statements. User reads noise, not config.

Breakdown of DataLoader.ipynb (2,279 lines):

Type Lines %
Print statements 329 14%
Widget setup 165 7%
Comments 130 6%
Imports 34 1.5%
Actual config logic ~200 9%
Boilerplate/helpers ~1,400 62%

User spends 90% of time reading code that doesn’t help them.

Example noise:

print("="*70)
print("        🔐 GATEWAY AUTHENTICATION")
print("="*70)
print()
print("🔄 Authenticating with gateway...")
print(f"   Gateway: {base_url_input.value}")

What user needs: ✅ Authenticated as DEV_ADMIN

Fix: Move helpers to library. Notebook cells 5-10 lines max.


A5. Code That Shouldn’t Be in Notebook [P1]

Problem: Implementation code belongs in library, not notebook.

Shouldn’t be in notebook:

  • Dependency installation (80+ lines)
  • OAuth2 flow
  • API request/response handling
  • Error parsing
  • Data transformation

Should be in notebook:

  • Widget for input
  • Library function call
  • Result display

Current: 80 lines of install_dependencies() in Cell 1.

Ideal: !pip install crs-dataloader or pre-installed.


BUCKET 3: Excel Template Issues

Analysis of where users spend most time

E1. Documentation Buried in Excel Files [P1]

Problem: README/Instructions exist but:

  • Inside Excel (must download and open)
  • Plain text in cells (no formatting)
  • Not linked from notebook

User experience:

  1. Downloads template
  2. Opens Excel, sees data sheet
  3. May never notice “Read me” tab
  4. Fills data incorrectly

Fix:

  • Add template docs to notebook markdown
  • Create TEMPLATES.md guide
  • Warning: “Read Instructions sheet first!”

E2. Data Validation Uses Google Sheets Syntax [P0]

Problem: Validation uses REGEXMATCH() - Google Sheets only, broken in Excel!

REGEXMATCH(B2,"^[A-Za-z0-9.]+$")  ← Google Sheets function

Impact: Users in Excel get zero validation - can enter anything.

Fix: Excel-compatible validation, or state “Google Sheets only”.


E3. Column Names Have Line Breaks [P2]

Problem:

"Tenant Code*
(To be filled by ADMIN)"   ← Line break in header!

"Search Words (comma separated)"  ← No asterisk - required or not?

Impact: Hard to document, inconsistent, breaks programmatic processing.

Fix: Clean names: Tenant Code* (Admin fills)


E4. Typos in Sample Data [P2]

Problem: Employee template has "SANATION DEPARTMENT" (should be SANITATION).

Impact: Users copy typo, uploads fail due to mismatch.

Fix: Review and fix all sample data.


E5. No Visual Formatting [P2]

Problem: Plain white cells. No:

  • Color coding (required vs optional)
  • Header formatting
  • Example row highlighting
  • Conditional formatting for errors

What user should see:

  • Yellow headers = required
  • Green example row with “DELETE THIS” note
  • Red when validation fails

E6. Reference Sheets Not Explained [P1]

Problem: Employee template has hidden sheets:

  • Ref_Departments (15 rows)
  • Ref_Designations (33 rows)
  • Ref_Roles (21 rows)
  • Ref_Boundaries (2 rows)

User doesn’t know these exist. No explanation of what each role means.

Fix: Document in Instructions sheet with full option lists.


E7. No Schema Documentation [P1]

Problem: Vague field descriptions.

Current: "Use only letters and spaces"

Should be:

Field: Tenant Code
Format: STATE.CITY (e.g., PG.CITYA)
Max length: 50 characters
Allowed: A-Z, a-z, 0-9, dots only
Valid: PG.CITYA, KA.BLR001
Invalid: pg citya (spaces), PG@CITY (special chars)

Summary Tables

All Issues by Category

Bucket Count P0 P1 P2
1. UX/Workflow 7 2 3 2
2. Architecture 5 3 2 0
3. Excel Templates 7 1 3 3
Total 19 6 8 5

What Works Well

  1. Widget-based UI for file uploads
  2. 4-phase workflow concept is logical
  3. Dynamic template generation from MDMS
  4. Localization auto-generation
  5. Excel templates have README sheets (good intent)
  6. Data validation exists (wrong syntax though)

Recommendations by Priority

Priority Issue Effort Impact Category
P0 Hide API internals (A1) High Very High Architecture
P0 Fix Excel validation - Google Sheets→Excel (E2) Medium Very High Templates
P0 Add DIGIT documentation links (A3) Low High Documentation
P0 Add START_HERE.md (1) Low High UX
P0 Fix phase numbering (2) Low High UX
P1 Create TEMPLATES.md guide (E1) Medium High Documentation
P1 Reduce noise - 14% print statements (A4) Medium High Architecture
P1 Move implementation to library (A5) High High Architecture
P1 Add dry-run mode (4) Medium High Feature
P1 Add field schema documentation (E7) Medium High Templates
P1 Document reference sheets (E6) Low Medium Templates
P1 Share auth between notebooks (3) Medium Medium UX
P2 Clean Excel column names (E3) Low Medium Templates
P2 Add visual formatting to templates (E5) Medium Medium Templates
P2 Improve error messages (6) Medium Medium UX
P2 Add rollback docs (7) Low Medium Documentation
P2 Fix typos in sample data (E4) Low Low Templates

Ideal State

Notebook (Current vs Ideal)

Metric Current Ideal
Cells 32 5-8
Code lines 2,279 <100
Print statements 329 <20
API references visible 27 0
Doc links 0 5+

What it should look like:

# Cell 1: Setup
from crs_dataloader import CRSLoader

# Cell 2: Login
loader = CRSLoader("https://unified-dev.digit.org")
loader.login()

# Cell 3: Load data
loader.load_tenant("Tenant Master.xlsx")     # ✅ Created tenant pg.citya
loader.load_boundaries("Boundaries.xlsx")    # ✅ Created 45 boundaries
loader.load_employees("Employees.xlsx")      # ✅ Created 12 employees

5 cells, ~15 lines vs current 32 cells, 2,279 lines


Next Steps for TASK-002

  1. Architecture: Refactor to hide internals, keep notebook declarative
  2. Documentation: Add DIGIT links, create TEMPLATES.md, START_HERE.md
  3. Templates: Fix Google Sheets validation, add schema docs
  4. UX: Reduce noise, fix numbering, add dry-run mode