Transcript: Like Deep Dives?

Adam: What's up, everyone? I'm Adam.

Patrick: And I'm Patrick.

Adam: We are program managers on the Microsoft Fabric product team. We work on the Fabric CAT team, the customer advisory team, working with some of the world's largest customers. We eat, sleep, and breathe Fabric.

How many people here use Power BI?

[Audience reaction]

Adam: That's what I expected. How many people use Fabric?

Patrick: Liars. Everyone who raised their hand for Power BI should also have raised their hand for Fabric, because Power BI is Fabric.

Adam: It's amazing, right?

Patrick: Not really.

Adam: It is. It's all Fabric.

Patrick: I have a Pro license and that's not Fabric.

Adam: Power BI is Fabric.

Patrick: Okay, we’ll talk about that later.

Adam: To level-set what we're going through here: we have a lot of demos planned. We’re going to walk through the different options you can leverage inside Microsoft Fabric.

Microsoft Fabric is a unified platform that encompasses your data plus different analytical engines that are exposed inside Fabric.

Patrick: So what do we mean by a unified system? For all the folks that say, “We’re Power BI, but we’re not Fabric” — what happened was, back in the day when we said “Power BI”, that meant capacities, workspaces, gateways, dataflows, Power BI reports, semantic models (I still say datasets).

Today, everything outside of the reports and datasets is Fabric. That’s the Fabric platform. Those are things available to every engine inside Microsoft Fabric, including Power BI.

Adam: What we want to focus on is: which analytical engine do you pick inside Microsoft Fabric? That’s a tough choice.

Patrick: It’s not that hard.

Adam: What would your initial answer be?

Patrick: Lakehouse.

Adam: No. The right answer is: it depends.

Patrick: Lakehouse! Lakehouse all the things.

Adam: We’ve got options:

Patrick: I’m confused.

Adam: So which one do you use?

Patrick: I use Power BI.

Adam: You use Power BI. So which engine, and how do you decide?

First, we’ll talk about Lakehouse.


What is a Lakehouse?

Patrick: People always ask, “What’s a Lakehouse?”

Think about a data lake. Remember that term? That was when everyone said “big data.” Then we had the data warehouse. Everyone knows Kimball star schemas: dimensions and facts. If you talk to Adam or the CAT team, they always say “build a star schema.”

So imagine the data lake and the data warehouse go out like we did last night. They have a good time, get drunk, and spend the night together. They produce the Lakehouse. The Lakehouse is the offspring of the data lake and the data warehouse.

In the data warehouse, you have a fine-tuned schema. In the data lake, you have all your raw and semi-structured data for engineers and data scientists — PDFs, images, JSON, XML, CSV files. Lakehouse brings both together.

Who knows what’s next? When we were in Oslo, someone in our workshop said, “Why can’t we just have the house?” Brilliant.

We’re just getting started with Fabric. Maybe in a few years, the baby will grow up.

Your Lakehouse is primarily for data engineers and data scientists. I was talking to Simon Whiteley — maybe you know him; he’s a crazy data engineer — Lakehouse is for people like him. Don’t ask him which engine he’d pick.

It’s built for Spark data engineers. How many of you can set up Spark clusters? Very few. How many write Python? The numbers are increasing.

Adam: Did you know that I write Python?

Patrick: No you don’t.

Adam: ChatGPT.

Patrick: You are not ChatGPT.

There is a delightful authoring experience (I stole that marketing phrase) — I’m not in marketing — and you basically have all the data you need for all types of analysis.

Business analysts can connect to a pristine set of managed tables. Data scientists can go against the raw data that you’re extracting from various sources. You have both.


Lakehouse from the Analyst Perspective

Patrick: This slide is eye-candy and full of words, but it basically says what I just said.

The interesting question is: what can a data analyst actually do with the Lakehouse if they don’t have a data engineer?

Adam: Co-pilot.

Patrick: No, this is not a Copilot session.

If you have a data engineer, they build the Lakehouse: notebooks, Python, Spark jobs. But if you don’t, what can an analyst do?

Let me show you.

I go into the Fabric Lakehouse experience. If an analyst wants to tinker and learn, often the way Adam and I learn is: we go break stuff, try things, then make videos about what worked.

To create a Lakehouse, in the Fabric UI you click “New”, go to “More options”, pick the data engineering experience, and choose Lakehouse. You don’t need to set up Spark clusters, storage, or networking. The Lakehouse is automatically created. Magic. Like a cooking show: we already have one created.

When you create the Lakehouse, you get two additional items:

We like acronyms at Microsoft.

Adam: That’s why he gets paid the big bucks. He’s in marketing.

Patrick: I am not in marketing. That’s why I’m doing most of the demos.

The SQL analytical endpoint in a Lakehouse is read-only; we’ll come back to that.

Your Lakehouse has two main sections:

The Files section is where you upload all your messy, unstructured or semi-structured data: CSV files, Parquet files, JSON files, images, etc.

As an analyst, you can:

Pro tip: if you really want to do large scale data movement (not production-grade, but to play around), just open OneLake Explorer and copy the files every morning. Your mouse is your ETL process.

Once the files are there, second part of the analyst ETL process:

You just created a no-code Lakehouse. Boom. As an analyst.

Should you build serious, large-scale solutions like this? No. You’d use notebooks or pipelines or Dataflows Gen2. But it’s great for learning and smaller scenarios.


Shortcuts in Fabric / OneLake

Patrick: You may have noticed that a folder has a little link icon. That indicates a shortcut.

Adam’s desktop shortcut analogy was pretty bad. But the idea is that a shortcut is a reference to data stored somewhere else.

In Fabric, shortcuts can point to:

You go to “New shortcut”, launch a wizard, pick the source, authenticate, and then you see a linked folder in your Lakehouse. The data still physically lives in that external storage, potentially even in a different tenant.

I have a shortcut folder that points to ADLS Gen2 in a separate tenant. I didn’t copy the data; I just created a shortcut.

Then, in a notebook, I can reference the data via that shortcut:

We’re just referencing it as if it’s in OneLake, even though it’s not physically there. It’s a logical representation.

You can also schedule notebooks:

There’s also a Python library called “SemPy” to interact with semantic models:

Phil and others have talked about this in longer sessions.


Notebooks – What Are They?

Audience: What’s a notebook?

Adam: Great question.

Patrick: It’s a book that has paper in it.

No, seriously: a notebook is not where the data is. It’s an environment where you:

Cathy has a full session about notebooks where she goes through every detail.

Audience: How would you trigger a notebook from an external event?

Patrick: I don’t know the answer off the top of my head. Maybe from a pipeline, maybe via an API.

Adam: There is an API according to colleagues in the room. That means we’ll probably do a video about it.

Patrick: If there’s something you want that doesn’t exist yet, go to ideas.fabric.microsoft.com and submit or vote on feedback.


Warehouse in Fabric

Patrick: How many of you use some type of data warehouse today?

[Hands]

How many used or are using Azure Synapse Analytics?

If you’re using Synapse, you know warehouses. The same team that built that worked on the data warehouse in Fabric.

The Fabric Warehouse is:

How many of you can write some dialect of SQL? Most of you. If you can write SQL, the warehouse is very approachable.

All warehouse data is stored as Delta tables in OneLake. Any compute engine in Fabric that speaks to Delta parquet can read that data:

You can do cross-database queries with three-part naming (not four-part naming yet).

From a management perspective:

In a Lakehouse, you can also:

Every time we create files using Fabric compute engines (notebooks, pipelines, Dataflows Gen2), they are created as V-ordered parquet files by default. But you still need occasional maintenance.

In a Warehouse, that maintenance is handled by the service.


When to Use Lakehouse vs Warehouse (High-Level)

Patrick: This slide I “worked hard” on (copied from another deck and reformatted) compares when to use which:

Adam: Another Lakehouse scenario: if you’re working with other table formats like Iceberg, you can use Spark in the Lakehouse to work with them directly. To leverage them across other Fabric engines, you convert into Delta parquet.

So for initial exploration, Spark + Lakehouse is a good fit; for broader re-use across engines, you move to Delta parquet.

Audience: Can we make a shortcut from Warehouse?

Patrick: Not today. You can make shortcuts in KQL databases and Lakehouses, not in Warehouses. Please log it on ideas.fabric.microsoft.com.


Creating and Using a Warehouse

To create a Warehouse:

When you create a Warehouse, you get:

With a Lakehouse, you also get a SQL endpoint, but the difference is:

Patrick: In my demo, I have:

Today:

So you have to finagle things:

It’s a bit of work, but that’s how you do it today. In Lakehouse, you do something similar with Python, often more elegantly.

Audience: Is Warehouse backed by the same engine as Azure Synapse?

Adam: There are differences in engineering for Fabric, but it still leverages the Polaris engine used by Azure Synapse Analytics.

Audience: What’s the physical storage for Warehouse?

Adam: OneLake, Delta parquet. If you inspect OneLake Explorer or the browser folders, you’ll see the Delta log and parquet files for your warehouse tables.

Audience: Are all T‑SQL operations supported in Warehouse?

Adam: Not yet. As mentioned, MERGE, ALTER, identity, etc. are not all there today. There are gaps, and the team is iterating. Documentation lists which T‑SQL features are supported.

Audience: Will managed tables support unique constraints?

Patrick: It’s on the roadmap with no timeline yet. Still, go to ideas.fabric.microsoft.com and upvote it.


Moving Between Lakehouse and Warehouse

Audience: If we decide to switch from Lakehouse to Warehouse, how do we move tables?

Patrick: You have to move them. Options:


Real-Time Analytics (KQL Database)

Adam: Now let’s talk about Real-Time Analytics. We’re short on time, but we’ll show the key ideas.

For true streaming, high-volume telemetry or log data, Real-Time Analytics with KQL databases is a great fit. If you saw the keynote, you saw the delivery tracking demo.

Typical flow:

Patrick: Let me show you a simple scenario: I pretend I own a fleet of ice cream trucks in this city. Adam is one of my drivers. I’ve put IoT sensors in each truck to monitor temperature, because if it gets too warm the ice cream melts. Adam doesn’t like soft-serve, so we need to keep a stable temperature.

I have:

In Fabric:

Once data is streaming:

Fun story from Oslo:
Patrick: Right before the session, I asked Adam to write some KQL for me. I gave him a T‑SQL pattern and said, “Translate this to KQL.” He wrote a short KQL query with the correct project/summarize/operators.

We also mentioned a website that shows SQL-to-KQL examples. That’s a great resource.

Then Dang (from the engineering team) was in the front row and said: “You know you can just write SQL there, right?” We didn’t believe him at first.

In the KQL query window:

So:

Once you have a query, you can:

Real-time analytics with KQL is especially good for:

If you don’t need streaming, use Lakehouse or Warehouse. If you need real-time telemetry and time-based analytics, look at KQL databases.


Power BI and Direct Lake

Patrick: Let’s talk about Power BI. There was a debate: is Power BI a place to create objects for others to consume in Fabric?

Adam: Of course.

Patrick: Historically, Power BI had two main storage modes:

Microsoft introduced Direct Lake mode for Fabric:

We have a full session tomorrow at noon with Phil about Direct Lake.


OneLake Integration with Power BI

A powerful integration point is OneLake integration for semantic models:

Scenario:

In your workspace:

What happens:

Then, from a Lakehouse:

Now you can:

Today, the first phase is shortcuts. There’s more coming to integrate these tables more deeply into semantic models.


Identity Columns, Performance, and “It Depends”

Audience: If Warehouse doesn’t support identity yet and Lakehouse does similar things via Python, why use Warehouse?

Adam: Stored procedures, functions, T‑SQL-based views, and the whole relational database experience. Also the structured manageability you get from a classic database environment.

But yes, there’s an honest tradeoff right now. Some capabilities are still coming to Warehouse.

Audience: Which is faster, Lakehouse or Warehouse? Which is faster for Direct Lake in Power BI?

Adam: It depends:

In some scenarios Lakehouse will be faster; in others Warehouse will. There’s no universal answer. You need to test with your own data and workloads.

Mark (from the Warehouse team) says Warehouse is faster; he’s a bit biased. Always test.


Patrick: We’re almost out of time. I hope you learned something.

We covered:

We’ll have more deep-dive sessions, and there’s a panel later today. Come join us and bring your questions.