Success story

Data Architecture Modernization

Leading EU Bank

Older man educating younger woman on the computer

Project Overview

Leading EU Bank undertook a transformative project to address the limitations of its existing data architecture and to unlock advanced analytics capabilities. The project aimed to resolve current architectural pain points while enabling sophisticated features such as a speed layer, unstructured data processing, and the development, training, and execution of Machine Learning (ML) algorithms. This ambitious initiative positioned the Bank as a leader in data-driven banking solutions.

Key Objectives

Overcome challenges in the current data architecture
Enable real-time data processing through a speed layer
Facilitate storage and processing of unstructured data
Support ML algorithm development, training, and deployment

Key Roles and Responsibilities:

The project’s success was driven by a multi-disciplinary team with diverse roles, including:

Product Ownership:
- Product Owner of Data Transformation
- Product Owner of DWH Modernization (Spin-off)
- Managing the Data Product Backlog
- Anticipating and addressing the bank’s evolving data needs
- Defining the vision for data architecture as a strategic product
- Budget management, including forecasting, tracking actuals, and pitching for resources
Data Architecture:
- Creating a decentralized data architecture through Data Mesh implementation
- Designing patterns and principles for the new data architecture
- Architecting ETL processes and techniques
- Developing standards and best practices for data management
Data Engineering:
- Designing and implementing metadata-driven generators for Serving Layer and Data Lake (Layer L0) data pipelines
- Developing orchestration patterns for streamlined data workflows
- Utilizing Metadata-Driven ETL Engines to enhance data processing efficiency

Strategic Initiatives and Achievements

Vendor and Concept Selection: Leading interviews and selecting the most suitable vendors and implementation concepts for the new data architecture.
Architecture Analysis: Conducting a thorough analysis of the current data architecture to identify areas for improvement.
Temporary Architecture Setup: Establishing interim architecture to ensure seamless synchronization and cooperation between the existing and new systems.
Design a new Data Architecture: Develop a modern Data Architecture inspired by Data Mesh principles. A key innovation is the introduction of a Serving Layer — a novel addition at the time, which has since become a widely accepted industry standard.
Decoupling Legacy Systems: Systematically migrating functionalities from the old architecture to the new, modern framework.
Migration Framework Development: Creating a robust framework for migrating IBM DataStage jobs to Oracle Data Integrator 12c, including reverse engineering, migration, and testing.
Big Data Cluster Design: Developing concepts for the physical architecture of Big Data clusters, optimizing for scalability and performance.
Migration of DWH Core ETL pipelines to the New Data Architecture: Core ETL pipelines were successfully migrated to the new Data Architecture, aligned with established standards, principles, and best practices.
Migration from Oracle to PostgreSQL: Core ETL pipelines were successfully migrated from Oracle to PostgreSQL, overcoming the challenge of PostgreSQL’s missing advanced Oracle functionalities through targeted improvements and performance optimizations. Oracle Data Integrator Knowledge Modules (KMs) were enhanced to fully support PostgreSQL and seamlessly integrate with RBI’s internal tool, IGI.
Orchestration Module: A new orchestration module was developed on top of the open-source tool Airflow, enabling orchestration-as-code and supporting a wide range of data sources and targets.
Near Real-Time Analytics: The new Data Architecture, powered by Apache NiFi and HBase, enabled the implementation of the first near real-time data analytics use cases, starting with HR data.
Metadata driven development: Transitioned the organization to a metadata-driven development strategy—eliminating vendor lock-in and dramatically accelerating time-to-market. This change empowered our data engineers to adopt new mindsets, rethink traditional workflows, and upskill to fully leverage metadata as the foundation of our pipelines.
Data Lake Initiative: Introduced a unified Data Lake to harness the full potential of raw datasets and accelerate report delivery from heterogeneous, disparate sources—proving invaluable during the volatile COVID-19 period.

Man hands using tablet that has data information.

Tools and Technologies

The project leveraged a diverse technology stack to achieve its goals:

ETL and Data Integration: Oracle Data Integrator 12c, IBM DataStage, IGI Metadata-Driven ETL Engine
Databases: Oracle Database 12c, DB2, PostgreSQL, ZIM Database
Big Data and Processing: Nifi, Hive, HDFS, Parquet, Avro, Spark, Kafka
Orchestration and Automation: Airflow
Security and Governance: Ambari, Ranger
Programming and Scripting: Groovy, PL/SQL, Python
Data Management: Schema Registry

Impact and Success

The Bank’s data architecture modernization project has significantly enhanced the Bank’s data capabilities. The implementation of a decentralized data architecture, coupled with advanced ML support and real-time processing features, has transformed how the Bank manages and utilizes data. This project not only solved existing architectural challenges but also laid a robust foundation for future data-driven innovations, positioning the Bank as a front-runner in the digital banking landscape.