Task ID: TASK-001 Date: 2026-01-14 Status: Completed Time Spent: 60 mins
Summary
Reviewed the CRS Data Loader notebooks from the perspective of a first-time user with no DIGIT platform context.
Overall Score: 4/10
Issues Found: 19 total (5 P0, 8 P1, 6 P2)
Files Reviewed
| File | Purpose | Code Lines | Print Statements |
|---|---|---|---|
DataLoader.ipynb |
Unified 4-phase workflow | 2,279 | 329 (14%) |
1_TenantAndCommonMaster.ipynb |
Phase 1 + 3 | 1,356 | 222 (16%) |
2_BoundarySetup.ipynb |
Phase 2: Boundaries | 1,099 | 162 (14%) |
3_EmployeeOnboarding.ipynb |
Phase 4: Employees | 912 | 129 (14%) |
Tenant And Branding Master.xlsx |
Tenant config | 3 sheets | - |
Common and Complaint Master.xlsx |
Dept/Complaints | 3 sheets | - |
Employee_Master_Dynamic_statea.xlsx |
Employee data | 7 sheets | - |
Total: 5,646 lines of code, ~850 print statements
BUCKET 1: Initial UX/Workflow Issues
First-pass review of notebook usability
1. No Clear Starting Point [P0]
Problem: 4 notebooks exist - unclear which to use:
DataLoader.ipynb- all 4 phasesNotebooks/1_TenantAndCommonMaster.ipynb- Phase 1 + 3Notebooks/2_BoundarySetup.ipynb- Phase 2Notebooks/3_EmployeeOnboarding.ipynb- Phase 4
User asks: “Do I run DataLoader.ipynb OR run 1→2→3 in sequence?”
Fix: Add START_HERE.md with clear workflow guidance.
2. Phase Numbering Confusion [P0]
Problem: Notebook 1 says “Phase 0, 1, 3” (skips 2). Step 2.1/2.2 are actually Phase 3 content.
User sees:
Phase 0: Authentication
Phase 1: Tenant Setup
Phase 3: Common Masters ← Where's Phase 2?
Step 2.1: Upload Common Master ← Is this Phase 2 or Step 2?
Fix: Consistent numbering throughout.
3. Authentication Not Shared Between Notebooks [P1]
Problem: Each notebook has its own authentication cell. Opening Notebook 2 after Notebook 1 requires re-authentication.
Fix: Persist auth token to file or use shared module.
4. No Dry-Run / Validation Mode [P1]
Problem: Can’t validate data without actually uploading. No preview of what will happen.
Fix: Add “Validate Only” button that checks without uploading.
5. No DIGIT Concepts Explanation [P1]
Problem: Assumes user knows:
- What MDMS is
- What a “tenant” means in DIGIT
- Difference between
pg(root) andpg.citya(child) - What boundary hierarchy means
- What GRO, CSR, LOC_ADMIN roles do
Fix: Add glossary section or link to documentation.
6. Poor Error Messages [P2]
Problem: Generic errors with no actionable guidance:
❌ Authentication failed. Please check your credentials.
Fix: Specific errors:
❌ Authentication failed: 401 Unauthorized
→ Check your username/password
→ Ensure you have MDMS_ADMIN role
→ Try logging into web portal first
7. No Rollback / Undo Instructions [P2]
Problem: Notebooks only show CREATE operations. No guidance on:
- How to delete a tenant
- How to fix wrong data
- How to undo a mistake
Fix: Add troubleshooting section with rollback procedures.
BUCKET 2: Architectural Issues
Deeper analysis of design problems
A1. Leaky Abstraction - API Details Exposed [P0]
Problem: Notebooks expose internal API implementation details users don’t need.
User sees:
token_url = f"{self.auth_url}/oauth/token"
response = requests.post(url, json=payload, headers=headers)
self.mdms_url = f"{self.base_url}{mdms_v2_service}"
User should see:
loader.login() # That's it
Evidence: 27 API references visible across notebooks.
Impact: User thinks they need to understand DIGIT’s API architecture. They don’t.
Fix: Hide ALL API details in library. Notebook shows only inputs → results.
A2. Assumes User Needs Internal Knowledge [P0]
Problem: Notebook designed as if user needs to understand DIGIT internals.
Reality: User loading tenant data doesn’t need to know:
- OAuth2 token flow
- Service endpoint paths (
/egov-mdms-service/v2/_search) - Payload schemas
- How MDMS stores data
They just need:
- Fill Excel template
- Enter credentials
- Click “Upload”
Fix: Black box approach - Excel in, success/failure out.
A3. No Links to DIGIT Documentation [P0]
Problem: Zero links to official DIGIT documentation.
Search: grep -r "digit.org/docs\|docs.digit" → 0 matches
Impact:
- No way to learn more
- No context for concepts
- No troubleshooting resources
Fix: Add links:
## Learn More
- [DIGIT Platform Overview](https://core.digit.org/platform/overview)
- [MDMS Documentation](https://core.digit.org/platform/core-services/mdms)
- [Tenant Setup Guide](https://core.digit.org/guides/tenant-setup)
A4. Poor Signal-to-Noise Ratio [P1]
Problem: 14-16% of code is print statements. User reads noise, not config.
Breakdown of DataLoader.ipynb (2,279 lines):
| Type | Lines | % |
|---|---|---|
| Print statements | 329 | 14% |
| Widget setup | 165 | 7% |
| Comments | 130 | 6% |
| Imports | 34 | 1.5% |
| Actual config logic | ~200 | 9% |
| Boilerplate/helpers | ~1,400 | 62% |
User spends 90% of time reading code that doesn’t help them.
Example noise:
print("="*70)
print(" 🔐 GATEWAY AUTHENTICATION")
print("="*70)
print()
print("🔄 Authenticating with gateway...")
print(f" Gateway: {base_url_input.value}")
What user needs: ✅ Authenticated as DEV_ADMIN
Fix: Move helpers to library. Notebook cells 5-10 lines max.
A5. Code That Shouldn’t Be in Notebook [P1]
Problem: Implementation code belongs in library, not notebook.
Shouldn’t be in notebook:
- Dependency installation (80+ lines)
- OAuth2 flow
- API request/response handling
- Error parsing
- Data transformation
Should be in notebook:
- Widget for input
- Library function call
- Result display
Current: 80 lines of install_dependencies() in Cell 1.
Ideal: !pip install crs-dataloader or pre-installed.
BUCKET 3: Excel Template Issues
Analysis of where users spend most time
E1. Documentation Buried in Excel Files [P1]
Problem: README/Instructions exist but:
- Inside Excel (must download and open)
- Plain text in cells (no formatting)
- Not linked from notebook
User experience:
- Downloads template
- Opens Excel, sees data sheet
- May never notice “Read me” tab
- Fills data incorrectly
Fix:
- Add template docs to notebook markdown
- Create
TEMPLATES.mdguide - Warning: “Read Instructions sheet first!”
E2. Data Validation Uses Google Sheets Syntax [P0]
Problem: Validation uses REGEXMATCH() - Google Sheets only, broken in Excel!
REGEXMATCH(B2,"^[A-Za-z0-9.]+$") ← Google Sheets function
Impact: Users in Excel get zero validation - can enter anything.
Fix: Excel-compatible validation, or state “Google Sheets only”.
E3. Column Names Have Line Breaks [P2]
Problem:
"Tenant Code*
(To be filled by ADMIN)" ← Line break in header!
"Search Words (comma separated)" ← No asterisk - required or not?
Impact: Hard to document, inconsistent, breaks programmatic processing.
Fix: Clean names: Tenant Code* (Admin fills)
E4. Typos in Sample Data [P2]
Problem: Employee template has "SANATION DEPARTMENT" (should be SANITATION).
Impact: Users copy typo, uploads fail due to mismatch.
Fix: Review and fix all sample data.
E5. No Visual Formatting [P2]
Problem: Plain white cells. No:
- Color coding (required vs optional)
- Header formatting
- Example row highlighting
- Conditional formatting for errors
What user should see:
- Yellow headers = required
- Green example row with “DELETE THIS” note
- Red when validation fails
E6. Reference Sheets Not Explained [P1]
Problem: Employee template has hidden sheets:
Ref_Departments(15 rows)Ref_Designations(33 rows)Ref_Roles(21 rows)Ref_Boundaries(2 rows)
User doesn’t know these exist. No explanation of what each role means.
Fix: Document in Instructions sheet with full option lists.
E7. No Schema Documentation [P1]
Problem: Vague field descriptions.
Current: "Use only letters and spaces"
Should be:
Field: Tenant Code
Format: STATE.CITY (e.g., PG.CITYA)
Max length: 50 characters
Allowed: A-Z, a-z, 0-9, dots only
Valid: PG.CITYA, KA.BLR001
Invalid: pg citya (spaces), PG@CITY (special chars)
Summary Tables
All Issues by Category
| Bucket | Count | P0 | P1 | P2 |
|---|---|---|---|---|
| 1. UX/Workflow | 7 | 2 | 3 | 2 |
| 2. Architecture | 5 | 3 | 2 | 0 |
| 3. Excel Templates | 7 | 1 | 3 | 3 |
| Total | 19 | 6 | 8 | 5 |
What Works Well
- Widget-based UI for file uploads
- 4-phase workflow concept is logical
- Dynamic template generation from MDMS
- Localization auto-generation
- Excel templates have README sheets (good intent)
- Data validation exists (wrong syntax though)
Recommendations by Priority
| Priority | Issue | Effort | Impact | Category |
|---|---|---|---|---|
| P0 | Hide API internals (A1) | High | Very High | Architecture |
| P0 | Fix Excel validation - Google Sheets→Excel (E2) | Medium | Very High | Templates |
| P0 | Add DIGIT documentation links (A3) | Low | High | Documentation |
| P0 | Add START_HERE.md (1) | Low | High | UX |
| P0 | Fix phase numbering (2) | Low | High | UX |
| P1 | Create TEMPLATES.md guide (E1) | Medium | High | Documentation |
| P1 | Reduce noise - 14% print statements (A4) | Medium | High | Architecture |
| P1 | Move implementation to library (A5) | High | High | Architecture |
| P1 | Add dry-run mode (4) | Medium | High | Feature |
| P1 | Add field schema documentation (E7) | Medium | High | Templates |
| P1 | Document reference sheets (E6) | Low | Medium | Templates |
| P1 | Share auth between notebooks (3) | Medium | Medium | UX |
| P2 | Clean Excel column names (E3) | Low | Medium | Templates |
| P2 | Add visual formatting to templates (E5) | Medium | Medium | Templates |
| P2 | Improve error messages (6) | Medium | Medium | UX |
| P2 | Add rollback docs (7) | Low | Medium | Documentation |
| P2 | Fix typos in sample data (E4) | Low | Low | Templates |
Ideal State
Notebook (Current vs Ideal)
| Metric | Current | Ideal |
|---|---|---|
| Cells | 32 | 5-8 |
| Code lines | 2,279 | <100 |
| Print statements | 329 | <20 |
| API references visible | 27 | 0 |
| Doc links | 0 | 5+ |
What it should look like:
# Cell 1: Setup
from crs_dataloader import CRSLoader
# Cell 2: Login
loader = CRSLoader("https://unified-dev.digit.org")
loader.login()
# Cell 3: Load data
loader.load_tenant("Tenant Master.xlsx") # ✅ Created tenant pg.citya
loader.load_boundaries("Boundaries.xlsx") # ✅ Created 45 boundaries
loader.load_employees("Employees.xlsx") # ✅ Created 12 employees
5 cells, ~15 lines vs current 32 cells, 2,279 lines
Next Steps for TASK-002
- Architecture: Refactor to hide internals, keep notebook declarative
- Documentation: Add DIGIT links, create TEMPLATES.md, START_HERE.md
- Templates: Fix Google Sheets validation, add schema docs
- UX: Reduce noise, fix numbering, add dry-run mode