Data engineering expertise. From practice to training.

Hands-on Apache Kafka, Flink and Databricks training. Web IDE architecture, developer platforms and cloud infrastructure.

Dashboard métriques temps réel — pipeline Kafka Flink DatabricksLIVEPlugbee · Real-time pipeline monitor2026-05-20 · 14:32:07Kafka throughput1.2Mevents / secFlink p99 latency38msend-to-endActive Flink jobs12running · 0 failedDatabricks clusters4auto-scaling · AWSKafka topic throughput — last 7 intervals14:2614:2714:2814:2914:3014:3114:32Pipeline topologyKafkaFlinkSpark3 topicsstatefulDelta LakeLast checkpoint✓ 14:32:05Consumer lag142 msgs

Trusted by

BMW
Continental
Doctolib
Geopost
Nokia
SAP
SNCF
Thales
TotalEnergies
NuantNuant

Services

Areas of expertise

Data Platform Architecture

Real-time and batch data pipeline architecture using Kafka, Flink, Spark. Event-driven systems, OpenSearch indexing, stream processing at scale.

Apache KafkaApache FlinkOpenSearchSpark

Web IDE Engineering

Architecture and integration of browser-based IDEs using Eclipse Theia and VSCode extension APIs. Custom plugins, on-premise deployment, white-label platforms.

Eclipse TheiaMonaco EditorLSPTypeScript

DSL & Language Tooling

Design and implementation of domain-specific languages: grammars (ANTLR, Xtext), LSP servers, syntax highlighting, code completion, and validation.

ANTLR4XtextLSPTree-sitter

Cloud & DevOps Architecture

AWS infrastructure design (ECS, EKS), Terraform automation, CI/CD pipelines, Kubernetes deployments, observability (Datadog, ELK).

AWSTerraformKubernetesDatadog

Training

Production-grade training for data engineering teams.

Hands-on Kafka and Flink training taught by a practitioner — not a generalist instructor. Every session is scoped to your stack, your team level, and your real use cases.

How it works

01

Scoping call

We meet before the training to understand your infrastructure, your team's level, and what you're actually trying to solve. No generic slides.

02

Custom preparation

Exercises, datasets, and examples are adapted to your context. Your team works on scenarios that look like their daily work.

03

Hands-on sessions

2 to 3 days on-site or remote. Theory is minimal. Most of the time is spent writing, running, and debugging real code.

Modules

Apache Kafka & Kafka Streams

Design and operate event-driven architectures in production.

This training covers Apache Kafka from the ground up to the real challenges of distributed systems in production: topic architecture, consumption strategies, stateful processing, schema management, and observability. Participants work on concrete use cases inspired by real-time systems deployed at Doctolib, GeoPost, and Nuant.

  • Kafka architecture: topics, partitions, replication, consumer groups
  • Producers, consumers, and advanced consumption strategies
  • Kafka Streams: stateful processing, joins, windowing
  • Schema Registry, Avro, and schema evolution
  • Error handling, retries, and resilience patterns
  • AWS deployment and monitoring with Datadog

Duration · 2 days

Level · Intermediate to senior engineers

Format · On-site or remote, 4 to 10 people

Apache KafkaKafka StreamsAvroAWSDatadog

Stream Processing with Apache Flink

Build robust, fault-tolerant real-time pipelines.

An advanced training focused on distributed stream processing with Apache Flink: event time management, stateful operators, windowing strategies, and Kubernetes deployment. Content emphasizes real production challenges: fault tolerance, checkpointing, observability, and integration with modern data stacks.

  • DataStream API and advanced transformations
  • Event time, watermarks, and windowing strategies
  • Stateful operators and distributed state management
  • Checkpointing, recovery, and fault tolerance
  • Deploying Flink on Kubernetes
  • Integration with Kafka, OpenSearch, and PostgreSQL

Duration · 2 days

Level · Intermediate to senior engineers

Format · On-site or remote, 4 to 10 people

Apache FlinkKubernetesKafkaOpenSearchPostgreSQL

ML Engineering with Databricks

Industrialize data and machine learning pipelines on a lakehouse architecture.

This training covers Databricks as a production-grade data and ML engineering platform: data structuring, Spark optimization, collaborative notebooks, model deployment, and ML CI/CD. Examples and exercises are drawn from real data platform and ML industrialization projects.

  • Unity Catalog, Delta Lake, and lakehouse architecture
  • Spark DataFrames, GraphX, and performance optimization
  • Collaborative development with Jupyter notebooks
  • Model training, tracking, and versioning with MLflow
  • ML pipeline CI/CD and deployment automation
  • Production deployment on AWS and platform integration

Duration · 2 days

Level · Intermediate to senior data and ML engineers

Format · On-site or remote, 4 to 10 people

DatabricksDelta LakeApache SparkMLflowPythonAWS

These training programs are built from real production experience on modern data stacks used at Doctolib, GeoPost, and Nuant. Beyond the fundamentals, they cover the architecture, observability, and scaling challenges encountered in the field.

Let's talk about your team.

Get in touch →

Pricing is scoped per engagement. Get in touch to discuss.

Portfolio

Selected Work

Doctolib

Doctolib

Data Architect

2024–2025

Industrialization of a Kafka pipeline PoC for real-time + batch sync between the Doctolib monolith and OpenSearch.

  • Built a Kafka consumer handling real-time and batch OpenSearch index updates
  • Designed a data flow simulator for business scenario testing
  • Deployed on AWS EKS with full Terraform automation
JavaSpringApache KafkaOpenSearchAWS EKSTerraform
Geopost

Geopost

Tech Lead / Data Engineer

2024

Rebuilt a real-time tracking solution for food parcel delivery across the Geopost logistics network.

  • Scoped requirements and designed the full data pipeline architecture end-to-end
  • Modeled and optimized a time-series database for high-frequency delivery events
  • Built microservices for real-time event correlation and data persistence
JavaSpringApache KafkaKafka StreamsRedis
Nuant

Nuant

Tech Lead / Data Engineer · Zurich

2021–2022

Led the Web IDE project: a browser-based development environment with a custom DSL for blockchain metric analysis.

  • Scoped requirements, delivered PoC, recruited and managed a 6-person team
  • Industrialized a KYC blockchain transaction monitoring system to production
  • Built fraud detection graph analysis scripts with Spark DataFrames and GraphX
Apache KafkaApache FlinkDatabricksPythonAWS ECS
TotalEnergies

TotalEnergies

Software Architect

2020–2021

Industrialized WISH, a well placement optimization component of the Sismage platform used in upstream oil & gas operations.

  • Brought the component architecture into compliance with TotalEnergies engineering standards
  • Developed a well visualization output module integrated into the Sismage UI framework
  • Prototyped a web-based DSL editor for the component configuration
JavaSpringRESTSwingCORBADSL
SNCF

SNCF

Software Architect

2018–2020

Industrialized a web application for train station placement management.

  • Redesigned data model, refactored backend, established coding standards
  • Set up CI infrastructure and delivery pipelines
JavaSpringPostgreSQLGitLab CI
Thales

Thales

Software Architect

2015–2018

Developed a model co-evolution specification tool on Eclipse Melody / Capella.

  • Implemented an architecture viewpoint DSL for trade-off evaluation
  • Built an open-source framework for generating textual editors from ANTLR grammars, later released as DSL Forge
JavaEclipseXtextEMFANTLR

Open Source

DSL Forge

80+ GitHub stars · Open-source framework

DSL Forge is an open-source framework for generating web-deployable textual DSL editors from ANTLR grammars. It bridges the gap between language specification and browser-based IDE tooling, powering both Coding Park's interactive coding environment and client DSL platforms.

Developed during the Continental mission (2012–2014), open-sourced and maintained since. Used as the technical foundation of Coding Park (educational platform for primary schools). Funded in part by CIR/CII declarations under JEI status.

ANTLR4ACE EditorLSPJavaTypeScript
github.com/plugbee/dslforge
Coding Park platform powered by DSL Forge

About

About Plugbee

Amine Lajmi

Amine L.

Amine L. — PhD in Computer Science · Founder & Senior Engineer

Founded in 2015 by a computer science PhD, Plugbee works at the deep end of the developer tools stack: Web IDEs, language servers, DSL runtimes, and cloud infrastructure. An ecosystem where very few consultants operate at a senior level.

The company was born out of 10+ years of hands-on architecture work across 8 industries: aerospace, finance, energy, logistics, transport, and education. Every engagement has been at the architecture or tech lead level. No generalist work, no junior handoffs.

Plugbee also maintains DSL Forge, an open-source framework for generating web-deployable DSL editors, and was the technical backbone behind Coding Park, an interactive coding platform for primary and secondary school students.

Contact

Let's work together

Available for training engagements and senior consulting in Web IDEs, DSL tooling, and data platform design.

Connect on LinkedIn →

Based in Paris, France · Remote-friendly