Transcript: Like Deep Dives?
Adam: What's up, everyone? I'm Adam.
Patrick: And I'm Patrick.
Adam: We are program managers on the Microsoft Fabric product team. We work on the Fabric CAT team, the customer advisory team, working with some of the world's largest customers. We eat, sleep, and breathe Fabric.
How many people here use Power BI?
[Audience reaction]
Adam: That's what I expected. How many people use Fabric?
Patrick: Liars. Everyone who raised their hand for Power BI should also have raised their hand for Fabric, because Power BI is Fabric.
Adam: It's amazing, right?
Patrick: Not really.
Adam: It is. It's all Fabric.
Patrick: I have a Pro license and that's not Fabric.
Adam: Power BI is Fabric.
Patrick: Okay, we’ll talk about that later.
Adam: To level-set what we're going through here: we have a lot of demos planned. We’re going to walk through the different options you can leverage inside Microsoft Fabric.
Microsoft Fabric is a unified platform that encompasses your data plus different analytical engines that are exposed inside Fabric.
Patrick: So what do we mean by a unified system? For all the folks that say, “We’re Power BI, but we’re not Fabric” — what happened was, back in the day when we said “Power BI”, that meant capacities, workspaces, gateways, dataflows, Power BI reports, semantic models (I still say datasets).
Today, everything outside of the reports and datasets is Fabric. That’s the Fabric platform. Those are things available to every engine inside Microsoft Fabric, including Power BI.
Adam: What we want to focus on is: which analytical engine do you pick inside Microsoft Fabric? That’s a tough choice.
Patrick: It’s not that hard.
Adam: What would your initial answer be?
Patrick: Lakehouse.
Adam: No. The right answer is: it depends.
Patrick: Lakehouse! Lakehouse all the things.
Adam: We’ve got options:
- Data engineering with Lakehouse
- Data warehousing
- Real-Time Analytics (KQL)
- And of course Power BI, which is an analytical engine. Power BI is Analysis Services under the covers — a tabular model, the dataset, the semantic model.
Patrick: I’m confused.
Adam: So which one do you use?
Patrick: I use Power BI.
Adam: You use Power BI. So which engine, and how do you decide?
First, we’ll talk about Lakehouse.
What is a Lakehouse?
Patrick: People always ask, “What’s a Lakehouse?”
Think about a data lake. Remember that term? That was when everyone said “big data.” Then we had the data warehouse. Everyone knows Kimball star schemas: dimensions and facts. If you talk to Adam or the CAT team, they always say “build a star schema.”
So imagine the data lake and the data warehouse go out like we did last night. They have a good time, get drunk, and spend the night together. They produce the Lakehouse. The Lakehouse is the offspring of the data lake and the data warehouse.
In the data warehouse, you have a fine-tuned schema. In the data lake, you have all your raw and semi-structured data for engineers and data scientists — PDFs, images, JSON, XML, CSV files. Lakehouse brings both together.
Who knows what’s next? When we were in Oslo, someone in our workshop said, “Why can’t we just have the house?” Brilliant.
We’re just getting started with Fabric. Maybe in a few years, the baby will grow up.
Your Lakehouse is primarily for data engineers and data scientists. I was talking to Simon Whiteley — maybe you know him; he’s a crazy data engineer — Lakehouse is for people like him. Don’t ask him which engine he’d pick.
It’s built for Spark data engineers. How many of you can set up Spark clusters? Very few. How many write Python? The numbers are increasing.
Adam: Did you know that I write Python?
Patrick: No you don’t.
Adam: ChatGPT.
Patrick: You are not ChatGPT.
There is a delightful authoring experience (I stole that marketing phrase) — I’m not in marketing — and you basically have all the data you need for all types of analysis.
Business analysts can connect to a pristine set of managed tables. Data scientists can go against the raw data that you’re extracting from various sources. You have both.
Lakehouse from the Analyst Perspective
Patrick: This slide is eye-candy and full of words, but it basically says what I just said.
The interesting question is: what can a data analyst actually do with the Lakehouse if they don’t have a data engineer?
Adam: Co-pilot.
Patrick: No, this is not a Copilot session.
If you have a data engineer, they build the Lakehouse: notebooks, Python, Spark jobs. But if you don’t, what can an analyst do?
Let me show you.
I go into the Fabric Lakehouse experience. If an analyst wants to tinker and learn, often the way Adam and I learn is: we go break stuff, try things, then make videos about what worked.
To create a Lakehouse, in the Fabric UI you click “New”, go to “More options”, pick the data engineering experience, and choose Lakehouse. You don’t need to set up Spark clusters, storage, or networking. The Lakehouse is automatically created. Magic. Like a cooking show: we already have one created.
When you create the Lakehouse, you get two additional items:
- A default semantic model (dataset)
- A SQL analytical endpoint (we’ll call it SAE — SQL Analytical Endpoint)
We like acronyms at Microsoft.
Adam: That’s why he gets paid the big bucks. He’s in marketing.
Patrick: I am not in marketing. That’s why I’m doing most of the demos.
The SQL analytical endpoint in a Lakehouse is read-only; we’ll come back to that.
Your Lakehouse has two main sections:
- Tables
- Files
The Files section is where you upload all your messy, unstructured or semi-structured data: CSV files, Parquet files, JSON files, images, etc.
As an analyst, you can:
- Click the ellipsis on “Files” and choose “Upload files” or “Upload folder”.
- Or use OneLake Explorer and upload or copy files from your file system directly.
Pro tip: if you really want to do large scale data movement (not production-grade, but to play around), just open OneLake Explorer and copy the files every morning. Your mouse is your ETL process.
Once the files are there, second part of the analyst ETL process:
- Click the ellipsis next to a file.
- Choose “Load to tables”.
- Fabric will create managed tables from those files.
You just created a no-code Lakehouse. Boom. As an analyst.
Should you build serious, large-scale solutions like this? No. You’d use notebooks or pipelines or Dataflows Gen2. But it’s great for learning and smaller scenarios.
Shortcuts in Fabric / OneLake
Patrick: You may have noticed that a folder has a little link icon. That indicates a shortcut.
Adam’s desktop shortcut analogy was pretty bad. But the idea is that a shortcut is a reference to data stored somewhere else.
In Fabric, shortcuts can point to:
- Microsoft OneLake
- Amazon S3
- Azure Data Lake Storage Gen2
- Dataverse
(and more are coming, like Google Cloud as shown on slides)
You go to “New shortcut”, launch a wizard, pick the source, authenticate, and then you see a linked folder in your Lakehouse. The data still physically lives in that external storage, potentially even in a different tenant.
I have a shortcut folder that points to ADLS Gen2 in a separate tenant. I didn’t copy the data; I just created a shortcut.
Then, in a notebook, I can reference the data via that shortcut:
- Write Python over the shortcut
- Transform the data
- Persist it into managed tables that business analysts can use
We’re just referencing it as if it’s in OneLake, even though it’s not physically there. It’s a logical representation.
You can also schedule notebooks:
- In the Lakehouse UI, on a notebook, choose “Schedule”
- Schedule execution without using Data Factory-style pipelines
- You can orchestrate multiple notebooks, running them in parallel or sequentially (e.g., run dimension loads before fact loads)
There’s also a Python library called “SemPy” to interact with semantic models:
- Refresh semantic models
- Reframe models as part of your orchestration
Phil and others have talked about this in longer sessions.
Notebooks – What Are They?
Audience: What’s a notebook?
Adam: Great question.
Patrick: It’s a book that has paper in it.
No, seriously: a notebook is not where the data is. It’s an environment where you:
- Write code (Python, Spark SQL, etc.)
- Orchestrate processes
- Interact with data (read, transform, write)
Cathy has a full session about notebooks where she goes through every detail.
Audience: How would you trigger a notebook from an external event?
Patrick: I don’t know the answer off the top of my head. Maybe from a pipeline, maybe via an API.
Adam: There is an API according to colleagues in the room. That means we’ll probably do a video about it.
Patrick: If there’s something you want that doesn’t exist yet, go to ideas.fabric.microsoft.com and submit or vote on feedback.
Warehouse in Fabric
Patrick: How many of you use some type of data warehouse today?
[Hands]
How many used or are using Azure Synapse Analytics?
If you’re using Synapse, you know warehouses. The same team that built that worked on the data warehouse in Fabric.
The Fabric Warehouse is:
- A lake-centric SQL database
- For citizen and pro developers
- Backed by Delta parquet files in OneLake
How many of you can write some dialect of SQL? Most of you. If you can write SQL, the warehouse is very approachable.
All warehouse data is stored as Delta tables in OneLake. Any compute engine in Fabric that speaks to Delta parquet can read that data:
- Power BI
- Lakehouse Spark
- KQL (via the right abstraction)
- Dataflows / pipelines
You can do cross-database queries with three-part naming (not four-part naming yet).
From a management perspective:
- In a Lakehouse, when you do lots of inserts/updates/deletes against Delta parquet, you accumulate files and transaction logs, and you need to maintain them (optimize, vacuum) to prevent performance issues and hit Direct Lake guardrails.
- In a Warehouse, Fabric handles file maintenance for you.
In a Lakehouse, you can also:
- Use the “Maintenance” menu: optimize, apply V-Order, vacuum
- These operations help keep file counts and logs manageable
Every time we create files using Fabric compute engines (notebooks, pipelines, Dataflows Gen2), they are created as V-ordered parquet files by default. But you still need occasional maintenance.
In a Warehouse, that maintenance is handled by the service.
When to Use Lakehouse vs Warehouse (High-Level)
Patrick: This slide I “worked hard” on (copied from another deck and reformatted) compares when to use which:
- If you want to ingest and transform data using Spark, use a Lakehouse.
- If your team knows T‑SQL, use a Warehouse.
- If you need multi-row and transactional-style operations, Warehouse is often the better fit.
- If you’re okay with schema-bound tables and a classic warehouse schema, use a Warehouse.
- If you need to analyze unstructured or semi-structured data (images, videos, JSON, logs, etc.), or need advanced data engineering and ML exploration, use a Lakehouse.
Adam: Another Lakehouse scenario: if you’re working with other table formats like Iceberg, you can use Spark in the Lakehouse to work with them directly. To leverage them across other Fabric engines, you convert into Delta parquet.
So for initial exploration, Spark + Lakehouse is a good fit; for broader re-use across engines, you move to Delta parquet.
Audience: Can we make a shortcut from Warehouse?
Patrick: Not today. You can make shortcuts in KQL databases and Lakehouses, not in Warehouses. Please log it on ideas.fabric.microsoft.com.
Creating and Using a Warehouse
To create a Warehouse:
- In the Fabric UI, choose “New”, then “Warehouse”, give it a name.
- Fabric spins it up; you don’t install SQL Server. It’s a PaaS experience.
When you create a Warehouse, you get:
- The warehouse database
- A SQL endpoint for that warehouse
With a Lakehouse, you also get a SQL endpoint, but the difference is:
- Warehouse SQL endpoint: you can create objects, insert, update, delete, create stored procedures, etc.
- Lakehouse SQL endpoint: read-only; you can’t create objects or manipulate data there. If you want to use SQL to write into a Lakehouse, you’d use Spark SQL inside notebooks.
Patrick: In my demo, I have:
- A Lakehouse with a simple star schema (one fact table, a couple of dimensions)
- A Warehouse fed via Dataflows Gen2 and pipelines, doing incremental loads
Today:
- No identity columns yet in Warehouse
- No MERGE statement support yet
So you have to finagle things:
- Use ROW_NUMBER with a partition to simulate identity keys
- Grab the max surrogate key on each load and increment from there
- Use that as a surrogate key to join back to alternate keys
It’s a bit of work, but that’s how you do it today. In Lakehouse, you do something similar with Python, often more elegantly.
Audience: Is Warehouse backed by the same engine as Azure Synapse?
Adam: There are differences in engineering for Fabric, but it still leverages the Polaris engine used by Azure Synapse Analytics.
Audience: What’s the physical storage for Warehouse?
Adam: OneLake, Delta parquet. If you inspect OneLake Explorer or the browser folders, you’ll see the Delta log and parquet files for your warehouse tables.
Audience: Are all T‑SQL operations supported in Warehouse?
Adam: Not yet. As mentioned, MERGE, ALTER, identity, etc. are not all there today. There are gaps, and the team is iterating. Documentation lists which T‑SQL features are supported.
Audience: Will managed tables support unique constraints?
Patrick: It’s on the roadmap with no timeline yet. Still, go to ideas.fabric.microsoft.com and upvote it.
Moving Between Lakehouse and Warehouse
Audience: If we decide to switch from Lakehouse to Warehouse, how do we move tables?
Patrick: You have to move them. Options:
- From a Lakehouse, you can create a shortcut to tables in a Warehouse.
- From a Warehouse, you can query a Lakehouse using T‑SQL.
- Potentially, you could use OneLake Explorer to copy the Delta files directly (we’d have to test this, might become a video idea).
Real-Time Analytics (KQL Database)
Adam: Now let’s talk about Real-Time Analytics. We’re short on time, but we’ll show the key ideas.
For true streaming, high-volume telemetry or log data, Real-Time Analytics with KQL databases is a great fit. If you saw the keynote, you saw the delivery tracking demo.
Typical flow:
- Sources: Event Hubs, IoT Hubs, custom apps
- Event streams in Fabric to route data
- Data lands in a KQL database (backed by files, again parquet/Delta-like structures)
- You can use Data Activator to set up real-time alerts and triggers
- You can query with KQL or SQL-like syntax
Patrick: Let me show you a simple scenario: I pretend I own a fleet of ice cream trucks in this city. Adam is one of my drivers. I’ve put IoT sensors in each truck to monitor temperature, because if it gets too warm the ice cream melts. Adam doesn’t like soft-serve, so we need to keep a stable temperature.
I have:
- An Azure Event Hub that receives simulated telemetry (a PowerShell script is sending events)
- A Fabric event stream that:
- Uses the Event Hub as a source
- Writes to a KQL database as destination
- Optionally does transformation/aggregation on the fly
In Fabric:
- You create a KQL database from “New”
- Configure the event stream: choose source, choose destination (KQL DB, Lakehouse, Reflex, etc.)
- Event processing can aggregate or filter: for example, stream raw data to one destination and aggregated data to another
Once data is streaming:
- In the KQL database, you can see ingestion metrics and tables, e.g.
rta_icecream - You can click “Explore” to query data
Fun story from Oslo:
Patrick: Right before the session, I asked Adam to write some KQL for me. I gave him a T‑SQL pattern and said, “Translate this to KQL.” He wrote a short KQL query with the correct project/summarize/operators.
We also mentioned a website that shows SQL-to-KQL examples. That’s a great resource.
Then Dang (from the engineering team) was in the front row and said: “You know you can just write SQL there, right?” We didn’t believe him at first.
In the KQL query window:
- You can write regular T‑SQL, like:
SELECT * FROM rta_icecream - Prepend the line
EXPLAINabove the T‑SQL statement. - Run it.
- Fabric converts it into equivalent KQL and shows it to you.
So:
- You can write SQL you already know.
- Use
EXPLAINto see the KQL version.
Once you have a query, you can:
- Click the Power BI icon in the query results
- Generate a Power BI report on top of that result
- Turn on auto-refresh to see it update in near real time
Real-time analytics with KQL is especially good for:
- Telemetry
- Time-series and log analysis
- Streaming scenarios where you want real-time or “real real-time” data
If you don’t need streaming, use Lakehouse or Warehouse. If you need real-time telemetry and time-based analytics, look at KQL databases.
Power BI and Direct Lake
Patrick: Let’s talk about Power BI. There was a debate: is Power BI a place to create objects for others to consume in Fabric?
Adam: Of course.
Patrick: Historically, Power BI had two main storage modes:
- DirectQuery – very fast for small data, pushes queries down to source. For big data and complex models, it can be challenging.
- Import – loads data into the Vertipaq cache. Great for smaller or mid-sized data; for very large data you need aggregations, composite models, incremental refresh, etc.
Microsoft introduced Direct Lake mode for Fabric:
- Power BI pages data directly from Delta parquet files in OneLake.
- No full data import, no DirectQuery translations at query time.
- Near real-time access with good performance if you meet certain conditions.
We have a full session tomorrow at noon with Phil about Direct Lake.
OneLake Integration with Power BI
A powerful integration point is OneLake integration for semantic models:
Scenario:
- Many of you are doing your “ETL” in Power Query.
- You have a great model with dimension tables like Calendar, Geography, Product, etc.
- You want to reuse those tables in other Fabric items, like a Lakehouse, without recreating all the logic.
In your workspace:
- Go to the dataset/semantic model settings.
- Turn on “OneLake integration” for that model.
What happens:
- Fabric writes your model tables as Delta parquet in OneLake.
Then, from a Lakehouse:
- Create a shortcut to those tables under “OneLake tables”.
- Pick the workspace and semantic model, then choose the table you want (e.g., Calendar).
- Fabric creates a shortcut in your Lakehouse to that table.
Now you can:
- Query and use that table in Lakehouse notebooks or SQL endpoints.
- Reuse Power Query transformations from Power BI in other Fabric engines.
Today, the first phase is shortcuts. There’s more coming to integrate these tables more deeply into semantic models.
Identity Columns, Performance, and “It Depends”
Audience: If Warehouse doesn’t support identity yet and Lakehouse does similar things via Python, why use Warehouse?
Adam: Stored procedures, functions, T‑SQL-based views, and the whole relational database experience. Also the structured manageability you get from a classic database environment.
But yes, there’s an honest tradeoff right now. Some capabilities are still coming to Warehouse.
Audience: Which is faster, Lakehouse or Warehouse? Which is faster for Direct Lake in Power BI?
Adam: It depends:
- Your data volume
- Data structure and partitioning
- Query patterns and concurrency
- Model structure and design
In some scenarios Lakehouse will be faster; in others Warehouse will. There’s no universal answer. You need to test with your own data and workloads.
Mark (from the Warehouse team) says Warehouse is faster; he’s a bit biased. Always test.
Patrick: We’re almost out of time. I hope you learned something.
We covered:
- Lakehouse for data engineering, data science, and flexible storage
- Warehouse for SQL developers and managed lake-centric warehousing
- KQL databases for real-time analytics and telemetry
- Power BI, Direct Lake, and OneLake integration to tie everything together
We’ll have more deep-dive sessions, and there’s a panel later today. Come join us and bring your questions.