Modernising data convergence in AWS through AWS DMS and Data Lake architecture

Modernising data convergence in AWS through AWS DMS replication and Data Lake architecture

Data Convergence refers to the process of consolidating disparate data sources (OLTP databases, ERPs, CRMs, logs) into a single, cohesive repository to enable unified analytics, machine learning, and reporting.

Building a Data Convergence pipeline using AWS Database Migration Service (DMS) and an S3-based Data Lake architecture is an industry-standard pattern. It allows you to move data from legacy transactional databases into a highly scalable storage layer with minimal impact on your production workloads.

Data convergence is about bringing data from multiple operational systems into a single, consistent platform where it can be:

Centralised in one place (your data lake on Amazon S3)
Kept in sync with source changes (using CDC from AWS DMS)
Standardised and governed (schemas, quality, security)
Served to many consumers (BI, ML, real‑time analytics)

In AWS, the usual pattern is:

Operational databases → AWS DMS → S3 data lake → Processing (Glue/EMR) → Lakehouse/warehouse (Athena, Redshift, Iceberg/Delta)

Components

1. Source Systems

Multiple heterogeneous databases such as:

Oracle
SQL Server
MySQL
PostgreSQL
SAP databases
MongoDB (supported versions)

These contain transactional business data.

2. AWS DMS (Database Migration Service)

AWS DMS performs:

Initial full load
Continuous Change Data Capture (CDC)
Schema conversion (with AWS SCT if required)

Output targets include:

Amazon S3
Amazon Redshift
Amazon RDS
Amazon Aurora
Apache Kafka
Amazon Kinesis

For a data lake, Amazon S3 is the most common destination.

3. Raw Data Lake (Landing Zone)

Data is stored exactly as received.

Example structure:

s3://company-data-lake/raw/ oracle/ customer/ orders/ sqlserver/ employee/ mysql/ sales/

Typical formats:

CSV
JSON
Parquet

Partitioning example:

year=2026/ month=07/ day=01/

4. AWS Glue Catalog

AWS Glue Crawlers automatically discover:

Tables
Schemas
Partitions

The Glue Data Catalog acts as the metadata layer.

5. Data Transformation

AWS Glue ETL jobs perform:

Data cleansing
Standardization
Data quality checks
Deduplication
Joining datasets
Data enrichment

Example:

Customer Oracle + Sales SQL Server + CRM PostgreSQL ↓ Unified Customer Table

6. Curated Data Lake

Store optimized datasets.

Preferred formats:

Apache Parquet
Apache Iceberg
Delta Lake (if using compatible tools)

Benefits:

Columnar storage
Compression
Faster queries
Lower storage cost

Example:

s3://company-data-lake/curated/ customers/ sales/ finance/ inventory/

7. Analytics Layer

Data can be queried using:

Amazon Athena
Amazon Redshift Spectrum
Amazon EMR
Apache Spark
Amazon SageMaker

Visualization:

Amazon QuickSight
Tableau
Power BI

Setting Up AWS DMS for Convergence

AWS DMS accomplishes data convergence through two distinct phases: Full Load (migrating the existing snapshot) and CDC (Change Data Capture) (replicating ongoing data modifications in near real-time).

Step 1: Network & Prerequisites

Ensure the DMS Replication Instance resides in a Virtual Private Cloud (VPC) with security groups configured to access your source databases.

Enable logical replication or binary logging on your source system (e.g., wal_level = logical for PostgreSQL, or enabling binlog for MySQL) so DMS can parse transaction logs without degrading performance.

Step 2: Configure DMS Endpoints

Source Endpoint: Connects to your production database via credentials stored securely in AWS Secrets Manager.

Target Endpoint: Points to your Amazon S3 Bronze bucket.

Best Practice Endpoint Settings: > Set the target format to Apache Parquet. Parquet is a columnar storage format that compresses data heavily (saving ~70% on storage costs) and significantly accelerates analytical queries compared to CSV.

Step 3: Run Full Load + CDC Tasks

Configure a DMS Migration Task with the option Migrate existing data and replicate ongoing changes .

Full Load: DMS dumps the existing database records into S3 partitioned by table name.

CDC: DMS continuously tailors transaction logs. Whenever an INSERT , UPDATE , or DELETE happens at the source, DMS outputs a new file to S3 containing the data payload alongside a structural header metadata tag (e.g., Op flag indicating 'I' , 'U' , or 'D' ).

Production Best Practices

DMS Serverless: Consider using DMS Serverless for replication tasks; it scales data capacity units (DCUs) automatically based on transaction volume, reducing idle infrastructure costs.

File Size Optimisation: Configure your DMS S3 target settings using CdcMaxBatchInterval and CdcMinFileSize . This prevents the "small file problem" (creating thousands of tiny S3 files that cripple query performance). Aim for files between 128MB and 512MB.

Partitioning: Ensure DMS partitions data by timestamp (e.g., year=YYYY/month=MM/day=DD/ ). This allows Athena to skip files outside the query range, optimising performance and reducing query bills.

< Older Post

Top Cloud Native Design Principles

By RAVI R • May 28, 2026

Designing Resilient Cloud Systems: Principles and Best Practices for Modern Applications

Zero-Trust in Multi‑Cloud: Security Without Borders

April 22, 2026

As organisations accelerate their shift to cloud-native architectures, many are no longer relying on a single provider. Instead, they operate across multiple platforms public, private, and hybrid creating what’s known as a multi-cloud environment. While this approach offers flexibility, resilience, and vendor independence, it also introduces a sprawling attack surface. Traditional perimeter-based security models struggle to keep up. Cloud computing, remote work, mobile devices, and third-party integrations have dissolved the once-clear boundaries between "inside" and "outside" an organisation’s network. As a result, a new approach to cybersecurity has emerged: Zero Trust. By 2026, Zero Trust Architecture (ZTA) has transitioned from a buzzword to a mandatory framework for managing the complexities of multi-cloud security. What is Zero Trust ? Zero Trust is a security model built on a simple but powerful principle: never trust, always verify. Rather than assuming that anything inside a network is safe, Zero Trust requires continuous authentication, authorisation, and validation of every user, device, and workload—regardless of where it originates. This means that even if a user is already inside the network, they must still prove their identity and legitimacy every time they attempt to access systems or data. similar to someone inside office but still need ID card to open the doors. In a multi-cloud world, where systems are distributed across providers and geographies, this approach becomes essential rather than optional. Why Zero Trust Matters ? Traditional security models rely heavily on perimeter defenses like firewalls and VPNs. While these tools are still useful, they are no longer sufficient on their own. Cyber threats have evolved, attackers often gain access through compromised credentials or insider vulnerabilities, then move laterally within the network. Zero Trust addresses these challenges by: Reducing the risk of unauthorised access Limiting lateral movement within systems Enhancing visibility into user and device behavior Strengthening protection for sensitive data Core Principles of Zero Trust in Multi-Cloud A successful Zero Trust strategy typically rests on several foundational principles: 1. Identity as the New Perimeter In Zero Trust, identity replaces the traditional network perimeter. Every request must be authenticated using strong identity controls, such as multi-factor authentication (MFA) and adaptive access policies. In multi-cloud setups, this means federating identity across platforms so users can be verified consistently, regardless of where resources are hosted. 2. Least Privilege Access Users and services should only have access to what they absolutely need and nothing more. This minimises the blast radius if credentials are compromised. Implementing least privilege across clouds requires centralised policy management and continuous auditing of permissions. 3. Assume Breach Zero Trust operates under the assumption that threats may already exist within the network. This mindset drives continuous monitoring and rapid response. 4. Verify Explicitly Every access request must be authenticated and authorized using all available data points, including user identity, device health, location, and behavior patterns. 5. Continuous Monitoring and Verification Trust is never permanent. Even after access is granted, behavior must be continuously monitored for anomalies. This includes: Real-time threat detection Behavioral analytics Automated response mechanisms 6. Micro-Segmentation Instead of one large, flat network, Zero Trust divides environments into smaller, isolated segments. Each segment enforces its own access controls. In multi-cloud environments, micro-segmentation prevents lateral movement between workloads—even across different providers. 7. Device and Workload Security Every endpoint, whether it’s a laptop, container, or virtual machine, It must be verified before accessing resources. Security checks may include: Device posture validation Patch level verification Runtime workload protection Key Components of a Zero Trust Strategy Implementing Zero Trust involves a combination of technologies, policies, and cultural changes: 1. Identity and Access Management (IAM) Strong authentication mechanisms such as multi-factor authentication (MFA), ensure that users are who they claim to be. 2. Device Security Only trusted and compliant devices should be allowed to access resources. This includes enforcing security updates and endpoint protection. 3. Network Segmentation Breaking the network into smaller segments prevents attackers from moving freely if they gain access. 4. Data Protection Sensitive data should be encrypted, classified, and monitored to prevent unauthorised access or leakage. 5. Continuous Monitoring and Analytics Real-time monitoring helps detect unusual behavior and respond quickly to potential threats. The Strategic Benefits of Zero Trust in Multi‑Cloud Organisations that embrace Zero Trust gain more than security. Reduced breach impact through segmentation and least privilege Faster cloud adoption with consistent controls Improved compliance across jurisdictions Operational resilience even when one cloud provider experiences issues Better user experience with modern identity solutions Zero Trust becomes a business enabler, not a bottleneck. Practical Steps to Implement Zero Trust Across Clouds A realistic roadmap looks like this: Start with identity: unify IAM and enforce MFA everywhere. Map your data flows: understand what moves between clouds. Segment your networks and workloads: shrink the attack surface. Adopt cloud‑agnostic security tooling: avoid vendor lock‑in. Automate everything: policy enforcement, access reviews, threat response. Continuously measure maturity: Zero Trust is a journey, not a destination. Security Without Borders Multi‑cloud is the new normal. The organisations that thrive in it will be the ones that treat security as a distributed, adaptive, identity‑driven discipline. Zero Trust provides the blueprint for a world where data flows across borders, clouds, and platforms, without sacrificing control. By shifting the focus from location to identity, from trust to verification, organizations can build a security posture that truly has no borders. Need further assistance? How can we help ? Brainstorming: Exploring fresh ideas or building on existing ones. Problem Solving: Working through technical, logical, or creative challenges. Organisation: Bringing structure to your thoughts, plans, or information. Clarity: Breaking down complex ideas into clear, simple explanations. Implementation: Helping you turn ideas into actionable steps, plans, or real-world execution.

Modernising data convergence in AWS through AWS DMS replication and Data Lake architecture

Components

1. Source Systems

2. AWS DMS (Database Migration Service)

3. Raw Data Lake (Landing Zone)

4. AWS Glue Catalog

5. Data Transformation

6. Curated Data Lake

7. Analytics Layer

Setting Up AWS DMS for Convergence

Step 1: Network & Prerequisites

Step 2: Configure DMS Endpoints

Step 3: Run Full Load + CDC Tasks

Production Best Practices

Top Cloud Native Design Principles

Zero-Trust in Multi‑Cloud: Security Without Borders

Let's chat, collaborate and innovate.

Bring your vision to life in the digital realm.

CONTACT

hello@alliantdigital.com

USEFUL LINKS

FOLLOW US

STAY INFORMED

You need a helping hand with your project?

Modernising data convergence in AWS through AWS DMS replication and Data Lake architecture

Components

1. Source Systems

2. AWS DMS (Database Migration Service)

3. Raw Data Lake (Landing Zone)

4. AWS Glue Catalog

5. Data Transformation

6. Curated Data Lake

7. Analytics Layer

Setting Up AWS DMS for Convergence

Step 1: Network & Prerequisites

Step 2: Configure DMS Endpoints

Step 3: Run Full Load + CDC Tasks

Production Best Practices

Top Cloud Native Design Principles

Zero-Trust in Multi‑Cloud: Security Without Borders

Let's chat, collaborate and innovate.

Bring your vision to life in the digital realm.

CONTACT﻿

hello@alliantdigital.com

USEFUL LINKS

FOLLOW US

STAY INFORMED

You need a helping hand with your project?

CONTACT