Constructing a contemporary lakehouse structure: Yggdrasil Gaming’s journey from BigQuery to AWS

It is a visitor publish by Edijs Drezovs, CEO and Founding father of GOStack, Viesturs Kols, Knowledge Architect at GOStack, and Krisjanis Beitans, Senior Knowledge Engineer at GOStack, in partnership with AWS.

Yggdrasil Gaming develops and publishes on line casino video games globally, processing large quantities of real-time gaming knowledge for recreation efficiency analytics, participant conduct insights, and trade intelligence. As Yggdrasil’s system grew, managing dual-cloud environments created operational overhead and restricted their capability to implement superior analytics initiatives. This problem turned vital forward of the launch of the Sport in a Field answer on AWS Market, which generates will increase in knowledge quantity and complexity.

Yggdrasil Gaming decreased multi-cloud complexity and constructed a scalable analytics basis by migrating from Google BigQuery to AWS analytics providers. On this publish, you’ll uncover how Yggdrasil Gaming remodeled their knowledge structure to fulfill rising enterprise calls for. You’ll study sensible methods for migrating from proprietary techniques to open desk codecs similar to Apache Iceberg whereas sustaining enterprise continuity.

Yggdrasil labored with GOStack, an AWS Companion, emigrate to an Apache Iceberg-based lakehouse structure. The migration helped scale back operational complexity and enabled real-time gaming analytics and machine studying.

Challenges

Yggdrasil confronted a number of vital challenges that prompted their migration to AWS:

  • Multi-cloud operational complexity: Managing infrastructure throughout AWS and Google Cloud created vital operational overhead, decreasing agility and growing upkeep prices. The info workforce needed to keep experience in each environments and coordinate knowledge motion between clouds.
  • Structure limitations: The prevailing setup couldn’t successfully assist superior analytics and AI initiatives. Extra critically, the launch of Yggdrasil’s Sport in a Field answer required a modernized, scalable knowledge setting able to dealing with elevated knowledge volumes and enabling superior analytics.
  • Scalability constraints: The structure lacked the unified knowledge basis with open requirements and automation required to scale effectively. As knowledge volumes grew, prices elevated proportionally, and the workforce wanted an setting designed for contemporary analytics at scale.

Answer overview

Yggdrasil labored with GOStack, an AWS APN associate, to design their new lakehouse structure. The next diagram reveals the excessive stage overview of this structure.

Figure 1: High-level architecture diagram of Yggdrasil's modern lakehouse on AWS

Figure 1: High-level architecture diagram

Yggdrasil efficiently migrated from Google BigQuery to an information lakehouse structure utilizing Amazon Athena, Amazon EMR, Amazon Easy Storage Service (Amazon S3), AWS Glue Knowledge Catalog, AWS Lake Formation, Amazon Elastic Kubernetes Service (Amazon EKS) and AWS Lambda. Their strategic method goals to scale back multi-cloud complexity whereas constructing a scalable basis for his or her Sport in a Field answer and particular AI/ML initiatives like customized recreation suggestions and fraud detection.

The mix of Amazon S3, Apache Iceberg, and Amazon Athena allowed Yggdrasil to maneuver away from provisioned, always-on compute fashions. The Amazon Athena pay-per-query pricing costs just for knowledge scanned, eradicating idle compute prices throughout off-peak intervals. Inner price modeling carried out through the analysis section indicated that this structure may scale back analytics system prices by 30–50% in comparison with compute-based warehouse pricing fashions of different options, significantly for bursty workloads pushed by recreation launches, tournaments, and seasonal site visitors. By adopting AWS-native analytics providers, Yggdrasil decreased operational complexity by native integration with AWS Id and Entry Administration (AWS IAM), Amazon EKS, and AWS Lambda, serving to simplify safety, governance, and automation throughout the analytics system.

The answer facilities on a contemporary lakehouse structure constructed on Amazon S3, which gives sturdy and cost-efficient storage for Iceberg tables in Apache Parquet format. Apache Iceberg desk format gives ACID transactions, schema evolution, and time journey capabilities whereas sustaining an open commonplace. AWS Glue Knowledge Catalog serves because the central technical metadata repository, whereas Amazon Athena acts because the serverless question engine utilized by dbt-athena and for ad-hoc knowledge exploration. Amazon EMR runs Yggdrasil’s legacy Apache Spark utility in a completely managed setting, and AWS Lake Formation gives centralized safety and governance for knowledge lakes, permitting fine-grained entry management at database, desk, column, and row ranges.

The migration adopted a phased method:

  1. Set up lakehouse basis – Arrange Apache Iceberg-based structure with Amazon S3 with AWS Glue Knowledge Catalog
  2. Implement real-time knowledge ingestion – Deploy Debezium connectors for real-time change knowledge seize from EKS and Google Kubernetes Engine (GKE) clusters
  3. Migrate processing pipelines – Re-system ETL pipelines utilizing AWS Lambda, and legacy knowledge functions re-systemed on Amazon EMR
  4. Modernizing the transformation layer – Implement dbt with Amazon Athena for modular, reusable fashions
  5. Allow governance – Configure AWS Lake Formation for complete knowledge governance

Set up lakehouse basis

The primary section of the migration centered on constructing a strong basis for the brand new knowledge lakehouse structure on AWS. The aim was to create a scalable, safe, and cost-efficient setting that would assist analytical workloads with open knowledge codecs and serverless question capabilities.

GOStack provisioned an Amazon S3-based knowledge lake because the central storage layer, offering just about limitless scalability and fine-grained price management. This storage-compute separation allows groups to decouple ingestion, transformation, and analytics processes, with every part scaling independently utilizing probably the most acceptable compute engine.

To ascertain dataset interoperability and discoverability, the workforce adopted AWS Glue Knowledge Catalog because the unified metadata repository. The catalog shops Iceberg desk definitions and makes schemas accessible throughout providers similar to Amazon Athena and Apache Spark workloads on Amazon EMR. Most datasets, each batch and streaming, are registered right here, enabling constant metadata visibility throughout the lakehouse.

The info is saved in Apache Iceberg tables on Amazon S3, chosen for its open desk format, ACID transaction assist, and highly effective schema evolution options. Yggdrasil required ACID transactions for constant monetary reporting and fraud detection, schema evolution to accommodate quickly altering gaming knowledge fashions, and time journey queries to align with regulatory audit necessities.

GOStack constructed a customized schema conversion and desk registration service. This inside device converts source-system Avro schemas into Iceberg desk definitions and manages the creation and evolution of raw-layer tables. By controlling schema translation and desk registration immediately, the workforce makes positive that metadata stays in line with the supply techniques and gives predictable, versioned schema evolution aligned with ingestion wants.

The preliminary setup made the next elements:

  • Amazon S3 bucket construction design: Carried out a multi-layer format (uncooked, curated, and analytics zones) aligned with knowledge lifecycle finest practices.
  • AWS Glue Knowledge Catalog integration: Outlined database and desk schemas with partitioning methods optimized for Athena efficiency.
  • Iceberg configuration: Enabled versioning and metadata retention insurance policies to stability storage effectivity and question flexibility.
  • Safety and compliance: Configured encryption at relaxation utilizing AWS Key Administration Service (AWS KMS), helped implement entry controls by way of AWS IAM and Lake Formation, and applied Amazon S3 bucket insurance policies following the precept of least privilege.

The redesign of the earlier GCP setup helped ship price-performance enhancements. Yggdrasil decreased ingestion and processing prices by roughly 60% whereas additionally reducing operational overhead by a extra direct, event-driven pipeline.

Implement real-time knowledge ingestion

After establishing the lakehouse structure, the subsequent step centered on enabling real-time knowledge ingestion from Yggdrasil’s operational databases into the uncooked knowledge layer of the lakehouse. The target was to seize and ship transactional modifications as they happen, ensuring that downstream analytics and reporting mirror probably the most up-to-date data.

To realize this, GOStack deployed Debezium Server Iceberg, an open-source mission that integrates change knowledge seize (CDC) immediately with Apache Iceberg tables. It was deployed as Argo CD functions on Amazon EKS and used Argo’s GitOps-based mannequin for reproducibility, scalability, and seamless rollouts.

This structure gives an environment friendly ingestion pathway – streaming knowledge modifications immediately from the supply system’s outbox tables into the Apache Iceberg tables registered within the AWS Glue Knowledge Catalog and bodily saved on Amazon S3, bypassing the necessity for intermediate brokers or staging providers. By writing knowledge within the Iceberg desk format, the ingestion layer maintained transactional ensures and fast question availability by Amazon Athena.

Figure 2: Streaming ingestion pipeline using Debezium in Amazon EKS

As a result of Yggdrasil’s supply techniques emitted outbox occasions containing Avro data, the workforce applied a customized outbox-to-Avro transformation inside Debezium. The outbox desk saved two key elements:

  • The Avro schema definition
  • The JSON-encoded payload of every report

The customized transformation module mixed these components into legitimate Avro data earlier than persisting them into the goal Iceberg tables. This method preserved schema constancy and verified compatibility with downstream processing instruments.

To dynamically route incoming change occasions, the workforce leveraged Debezium’s occasion router configuration. Every report was routed to the suitable Apache Iceberg desk (backed by Amazon S3) based mostly on matter and metadata guidelines, whereas desk schemas and partitioning had been ruled on the AWS Glue aspect to keep up stability and alignment with the lakehouse’s knowledge group requirements.

This setup helped ship low-latency ingestion with end-to-end streaming from database outbox to S3-based Iceberg tables in close to actual time. The workforce managed operations finish to finish on Amazon EKS utilizing Helm charts deployed by way of Argo CD in a GitOps mannequin for absolutely declarative, version-controlled operations. ACID-compliant Iceberg writes verified that partially written knowledge couldn’t corrupt downstream analytics. The modular transformation logic allowed future growth to new supply techniques or occasion codecs with out rearchitecting the ingestion pipeline.

This Debezium Server answer gives quick, real-time knowledge ingestion. GOStack considers it an interim structure. In the long run, the ingestion pipeline will evolve to make use of Amazon Managed Streaming for Apache Kafka (Amazon MSK) because the central occasion spine. Debezium connectors will act as producers, publishing change occasions to Apache Kafka subjects, whereas Apache Flink functions will devour, course of, and write knowledge into Iceberg tables.

This deliberate evolution towards a Kafka-based streaming structure verifies Yggdrasil’s lakehouse stays not solely scalable and cost-efficient at this time, but additionally future-ready – able to supporting richer streaming analytics and broader knowledge integration eventualities because the group grows.

Migrate processing pipelines

As soon as real-time knowledge ingestion was established, GOStack turned its focus to modernizing the info transformation layer. The aim was to simplify the transformation logic, scale back operational overhead, and unify the orchestration of analytical workloads throughout the new AWS-based lakehouse.

GOStack adopted a lift-and-shift method for a few of Yggdrasil’s knowledge pipelines to assist a quick and low-risk transition away from GCP. The light-weight Cloud Run features that beforehand dealt with extraction duties – pulling knowledge from file shares, SharePoint, Google Sheets, and numerous third-party APIs – had been re-implemented utilizing AWS Lambda. These Lambda features now combine with the identical exterior techniques and write knowledge immediately into Iceberg tables.

For extra advanced processing, earlier Apache Spark functions working on Dataproc had been migrated to Amazon EMR with minimal code modifications. This allowed it to protect the present transformation logic whereas benefiting from the managed scaling capabilities of EMR and improved price management on AWS.

Over time, these processes will likely be steadily refactored and consolidated into containerized workflows on the EKS cluster, absolutely orchestrated by Argo Workflows. This phased migration permits Yggdrasil to maneuver workloads to AWS rapidly and decommission GCP sources sooner, whereas nonetheless leaving room for steady enchancment and modernization of the info system over time.

Lastly, a whole lot of analytical transformations that beforehand lived as BigQuery saved procedures and scheduled queries, that had been now rebuilt as modular dbt fashions executed with dbt-athena. This shift made transformation logic extra clear, maintainable, and version-controlled, bettering each developer expertise and long-term governance.

Modernizing the transformation layer

With the ingestion pipelines migrated to AWS, GOStack turned its focus to simplifying and modernizing Yggdrasil’s analytical transformations. Fairly than replicating the earlier stored-procedure–pushed method, the workforce rebuilt the transformation layer utilizing dbt to assist enhance maintainability, lineage visibility, orchestration, and long-term governance.As a part of this redesign, a number of knowledge fashions had been reshaped to suit the brand new lakehouse structure. Essentially the most vital effort concerned rewriting a vital Spark-based monetary transformation right into a set of SQL-driven dbt fashions. This shift not solely aligned the logic with the lakehouse design but additionally eliminated the necessity for long-running Spark clusters, serving to generate operational and value financial savings.For the curated knowledge layers, changing the legacy warehouse, GOStack consolidated quite a few scheduled queries and saved procedures into structured dbt fashions. This gives standardized, version-controlled transformations and clear lineage throughout the analytical stack.

Orchestration was simplified as properly. Beforehand, coordination was break up between Apache Airflow for Spark workloads and scheduled queries analytical transformations, creating operational friction and dependency dangers. Within the new structure, Argo Workflows on Amazon EKS orchestrates dbt fashions centrally, consolidating the transformation logic inside a single workflow engine. Whereas most transformations nonetheless run on time-based schedules at this time, the system now helps event-driven execution by Argo Occasions, giving the chance to progressively undertake trigger-based workflows because the transformation layer evolves.

This unified orchestration framework can convey a number of advantages:

  • Consistency: One orchestration layer for knowledge workflows throughout ingestion and transformation.
  • Automation: Occasion-driven dbt runs assist take away guide scheduling and scale back operational overhead.
  • Scalability: Argo Workflows scales with the EKS cluster, dealing with concurrent dbt jobs seamlessly.
  • Observability: Centralized logging and workflow visualization assist enhance visibility into job dependencies and knowledge freshness.

By means of this transformation, Yggdrasil efficiently unified its knowledge lakes and warehouses into a contemporary lakehouse structure, powered by open knowledge codecs, serverless question engines, and modular transformation logic. The transfer to dbt and Athena not solely simplified operations but additionally helped pave the best way for quicker iteration, easier governance, and higher developer productiveness throughout the info setting.

Lakehouse efficiency optimizations

Whereas efficiency tuning is an ongoing journey, as a part of the transformation redesign, GOStack made few performance-oriented tweaks to verify Athena queries may be quick and cost-efficient. The Apache Iceberg tables had been saved in Parquet with ZSTD compression, offering sturdy learn efficiency and decreasing the quantity of knowledge scanned by Athena.

Partitioning methods had been additionally aligned to precise entry patterns utilizing Iceberg’s native partitioning. Uncooked knowledge zones had been partitioned by ingestion timestamp, enabling environment friendly incremental processing. Curated knowledge used business-driven partition keys, similar to participant or recreation identifiers and date dimensions, to assist optimize analytical queries. These designs made positive Athena may prune unneeded knowledge and persistently scan solely the related partitions.

Iceberg’s native partitioning options, together with transforms similar to bucketing and time slicing, change conventional Hive partitioning patterns. As a result of Iceberg manages partitions internally in its metadata layer, not all Glue or Athena partition constructs apply. Counting on Iceberg’s native partitioning helps present predictable pruning and constant efficiency throughout the lakehouse with out introducing legacy Hive behaviors.

To deal with the excessive quantity of small information produced by real-time ingestion, GOStack enabled AWS Glue Iceberg compaction. This routinely merges small Parquet information into bigger segments, serving to enhance question efficiency and scale back metadata overhead with out guide intervention.

Allow governance

The workforce adopted AWS Lake Formation as the first governance layer for the curated zone of the lakehouse, leveraging Lake Formation hybrid entry mode to handle fine-grained permissions alongside present IAM-based entry patterns. This hybrid mode gives an incremental and versatile pathway to undertake Lake Formation with out forcing a full migration of legacy permissions or inside pipeline roles, making it an excellent match for Yggdrasil’s phased modernization technique.

Lake Formation affords centralized authorization, supporting database, desk, column, and, critically for Yggdrasil, row-level permissions. These capabilities are important due to the corporate’s multi-tenant working mannequin:

  • Sport improvement companions require entry to knowledge and studies pertaining solely to their very own video games, facilitating each safety and compliance alignment with associate agreements.
  • iGaming operators integrating with Yggdrasil’s system should obtain operational and monetary insights solely for their very own knowledge, enforced routinely by reporting instruments backed by curated Iceberg tables.

With Lake Formation hybrid entry mode, tenant-specific row-level entry insurance policies are persistently enforced throughout Amazon Athena, AWS Glue, and Amazon EMR, with out introducing breaking modifications to present IAM-based workloads. This allowed Yggdrasil to implement sturdy governance for exterior customers whereas holding inside operations steady and predictable.

Internally, Lake Formation can also be used to grant the Analytics workforce and BI instruments focused entry to curated datasets, simple however centrally managed to keep up consistency and scale back administrative overhead.

For ingestion and transformation workloads, the workforce continues to depend on IAM roles and insurance policies. Companies similar to Debezium, dbt, and Argo Workflows require broad however managed entry to uncooked and intermediate storage layers, and IAM gives a simple, least-privilege mechanism for granting these permissions with out involving Lake Formation within the inside pipeline path.

By adopting Lake Formation in hybrid entry mode and mixing it with IAM for inside providers, Yggdrasil established a governance mannequin that may stability sturdy safety with operational flexibility – enabling the lakehouse to scale securely because the enterprise grows.

Outcomes and enterprise affect

The brand new lakehouse, constructed on Amazon Athena, Amazon S3, and AWS Glue Knowledge Catalog, now underpins superior analytics and AI/ML use instances similar to participant conduct modeling, predictive recreation suggestions, and fraud detection.

The optimized lakehouse design permits Yggdrasil to quickly onboard new analytics workloads and enterprise use instances, serving to ship measurable outcomes:

  • Lowered operational complexity by consolidation on AWS analytics providers
  • Price optimization with a 60% discount in knowledge processing prices
  • Improved knowledge freshness with 75% decrease latency for analytics outcomes (from 2 hours to half-hour)
  • Enhanced governance utilizing the AWS Lake Formation fine-grained controls
  • Future-ready structure leveraging open codecs and serverless analytics

Conclusion

Yggdrasil Gaming’s migration journey illustrates how organizations can efficiently transition from proprietary analytics techniques to an open, versatile lakehouse structure. By following a phased method guided by AWS Properly-Architected Framework ideas, Yggdrasil maintained enterprise continuity whereas establishing a contemporary basis for his or her knowledge wants.

Primarily based on this expertise, a number of classes emerged to assist information your individual transfer to an AWS-based lakehouse:

  1. Assess your present state: Determine ache factors in your present knowledge structure and set up clear targets for modernization.
  2. Begin small: Start with a pilot mission utilizing AWS analytics providers to validate the lakehouse method in your particular use instances.
  3. Design for openness: Leverage open desk codecs like Apache Iceberg to keep up flexibility and keep away from vendor lock-in.
  4. Implement steadily: Observe a phased migration technique just like Yggdrasil’s, prioritizing high-value workloads.
  5. Optimize repeatedly: Use efficiency tuning methods for Amazon Athena to assist maximize effectivity and reduce prices.

To study extra about constructing trendy lakehouse architectures, discuss with “The lakehouse structure of Amazon SageMaker”.


Concerning the authors

Edijs Drezovs

Edijs Drezovs

Edijs is the CEO and Founding father of GOStack an AWS Companion specializing in modernizing cloud-native infrastructures, knowledge techniques and analytics architectures. He brings over 12 years of expertise driving advanced cloud transformations and knowledge engineering initiatives.

Viesturs Kols

Viesturs Kols

Viesturs is a Knowledge Architect at GOStack with deep experience in lakehouse architectures and real-time analytics. He led the technical implementation of Yggdrasil Gaming’s migration to AWS analytics providers and focuses on Apache Iceberg and streaming knowledge techniques.

Krisjanis Beitans

Krisjanis Beitans

Krisjanis is Senior Knowledge Engineer at GOStack specializing in lakehouse architectures, Apache Iceberg, Amazon Athena, and dbt-based transformation frameworks. Throughout Yggdrasil Gaming’s migration to AWS, he rebuilt the analytical layer, designing Iceberg desk buildings, optimizing Athena efficiency, and implementing the dbt-driven transformation pipeline.

Alvaro Guerrero

Alvaro Guerrero

Alvaro is an AWS Options Architect who helps prospects construct progressive cloud options – specialised in AWS analytics providers.

Aleksandra Zgnilec

Aleksandra Zgnilec

Aleksandra is an Account Government at AWS supporting Betting & Gaming prospects of their cloud and enterprise transformations.

Zahi Njeim

Zahi Njeim

Zahi is a Enterprise Improvement Supervisor at AWS for Betting & Gaming, Media, Leisure, Video games and Sports activities.

Muhib
Muhib
Muhib is a technology journalist and the driving force behind Express Pakistan. Specializing in Telecom and Robotics. Bridges the gap between complex global innovations and local Pakistani perspectives.

Related Articles

Stay Connected

1,857,539FansLike
121,256FollowersFollow
7FollowersFollow
1FollowersFollow
- Advertisement -spot_img

Latest Articles