Data Engineering

Vlaamse Radio- en Televisieomroeporganisatie (VRT)

Blvd Auguste Reyers 52, 1043 Brussel

January 2023 - March 2023

DBTPythonAWSJupyter NotebooksHuggingface

Overview

Conducted an in-depth feasibility study on leveraging advanced large language models (LLMs) for automating the generation of DBT (Data Build Tool) documentation, addressing VRT's undocumented DBT codebase

Experience Details

Explore the feasiblility of using Large Large Language Models to Document DBT Queries

dbt (Data Build Tool) is an open-source tool used in data engineering to help transform raw data in a data warehouse using SQL-based models. It acts as a T in the ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) process, where the data is first loaded into a warehouse and then transformed into usable datasets. Key Features of dbt SQL-Based Transformations – Uses SQL (with Jinja templating) to define transformation logic. Modularity & Reusability – Allows breaking down complex queries into smaller, manageable models. Version Control & Collaboration – Works well with Git for tracking changes. Automated Testing – Provides built-in testing for data quality. Dependency Management – Uses a DAG (Directed Acyclic Graph) to manage model dependencies. Documentation Generation – Automatically generates docs for models.

Final Dcocumentation.pdf

Open Download

Loading Document...

Somethings I did

Onboarding
Introduction
Task description

Mid Presentation Presentation

2_Friday Presentation.pptx

Open Download

Loading Presentation...