Would you want to build a flexible data governance strategy, iteratively realizing the benefits as it evolves?
Would you have big data oriented programs, generating a vast amount of data?
Would you want a governance strategy around individual programs or rolled up into some logical grouping?
Would you want to improve over your existing governance strategy?
Would you have regulatory / compliance needs?
Are the Overall IT objectives in line with the business goals?
Data Governance in a Nutshell
When it comes to Data governance I remember Mark Twin phrase “Lies, damned lies, and statistics”. One side several business leaders are still exploring “What to do” and “How to do” data governance, which data to consider, which tools are available and on the other side there are complex regulatory compliance like HIPAA, SOX and Basel II hanging as sword.
DG especially in Big Data occasionally perceived as lie by a few and dammed lie by other few BUT when done properly not only this solves governance problem but also improve the data quality. Simple goal of DG is to govern how data can be accessed and used via business initiatives, as well as defined and managed via data management infrastructure
So what have we built?
We have built a multi-tenant Healthcare Analytics Platform on the Hortonworks Big Data Stack. The platform receives messages from multiple devices, from multiple tenants (in this case, it is the hospitals). Usual messages received are, from the devices attached to the patients in the high or low acuity areas, from the ventilators, from the laboratories, ADT messages. Our flagship product ‘LogiCrunch’ processes, predicts, publishes the patient’s condition in real-time to the clinicians (respective tenants’)
Listed below are the key governance based activities organized by Phases:
Standardized stream based Lake
Standardized feature specific Ponds
Tenant specific data islands
Who can access what
Data maintenance procedures
Periodic automated model evaluation procedures
Standardized Tenant Authentication and Authorization Strategy
Reference Data Management
Technical and Business Metadata Management
Enterprise Master Patient Index
Raw Data (Lake) Persistence
Dynamically build metadata in real-time Tag based policy enforcement
Standardized Tenant Feed Specific Data Carpentry
Product specific Rules
Tenant based Policies
Metadata based data cataloging and lineage discovery
Measure and Monitor:
Ensure Regulatory Compliance
Conformance with Policies, Standards and Data Principles
Data Governance KPI
Tools Used to Power the Platform:
Hortonworks Big Data Governance Stack
OpenLDAP (trust established between tenants and the platform) for authentication, integrated with Knox