-->

In recent years, Data Lake­house archi­tec­ture has become the pri­ma­ry data archi­tec­ture for cloud-based data plat­forms. The Medal­lion archi­tec­ture (bronze, sil­ver, gold) has become the de fac­to stan­dard when build­ing a Lake­house. Until now, Microsoft­’s solu­tion for cloud-based data plat­forms has been the Azure Synapse Ana­lyt­ics PaaS solu­tion or Data­bricks on Azure. Novem­ber 2023 Microsoft released a new SaaS-based ana­lyt­ics plat­form called Microsoft Fab­ric. If Fab­ric is not yet famil­iar to you, check out the Microsoft Learn overview.

Below are some of our thoughts on Fab­ric relat­ed to Data Lake­house imple­men­ta­tions and why orga­ni­za­tions should adopt Fabric.

Synapse vs Fabric

The pur­pose of Synapse Ana­lyt­ics was to bring the data ser­vices avail­able in Azure under one umbrel­la. In prac­tice it did, but under the hood they did not always work togeth­er seam­less­ly. Exam­ples of this include Spark and SQL work­loads not native­ly com­mu­ni­cat­ing with each oth­er, con­nec­tions between dif­fer­ent ser­vices not work­ing out of the box, and need­ing to store mul­ti­ple copies of data to opti­mal­ly uti­lize it in dif­fer­ent work­loads. These are not insur­mount­able issues, but they cause extra work that does not add val­ue for end users. Of course, many of these things can be man­aged with automa­tion, as the tem­plate-based solu­tions we have devel­oped at Islet do. In both solu­tions, Spark note­books which per­form the actu­al data han­dling, are at the heart of the Lake­house archi­tec­ture. For this pur­pose, we have devel­oped our own Spark libraries, which make imple­men­ta­tion faster and more qual­i­ty-con­scious, and they are ful­ly com­pat­i­ble with Fab­ric’s notebooks.

How does Fab­ric change the picture? 

Fab­ric does not change the basic prin­ci­ples of the Data Lake­house and Medal­lion archi­tec­ture, but it pro­vides a com­plete­ly new type of plat­form for build­ing them. Since it’s a SaaS ser­vice, its set­up and main­te­nance require less work than the PaaS-based Synapse.

Fab­ric’s com­mon inter­face for all work­loads is a good thing, of course, and reduces the num­ber of tools need­ed and the tran­si­tion between them. As for the work­loads, Fab­ric has a lot to choose from: Data Fac­to­ry, Data Engi­neer­ing + Lake­house, Data Ware­house, Data Sci­ence, Real-time Ana­lyt­ics, Data Acti­va­tor, and Pow­er BI. You can read more about these from the Fab­ric intro­duc­tion found in the afore­men­tioned link. Of course, not all work­loads need to be used, but the most suit­able tool is cho­sen for each need. The same things can be imple­ment­ed with dif­fer­ent work­loads, for exam­ple, alter­nate imple­men­ta­tions in a low-code or code-first manner.

How­ev­er, the most impor­tant fea­ture is the One Lake stor­age space under the hood and the Apache Delta Lake stor­age for­mat used by all Fab­ric work­loads. Behind One Lake is Azure Data Lake Stor­age Gen2, which means that One Lake sup­ports all the same fea­tures as Data Lake Stor­age. Delta Lake, on the oth­er hand, is an open stor­age for­mat that sup­ports ACID trans­ac­tions and data ver­sion­ing, and the same for­mat is also used by Data­bricks. Synapse’s Note­books can just as well use Delta Lake, and Server­less SQL Pool can also read it, but in Fab­ric all work­loads both read and write Delta Lake native­ly. This of course makes it eas­i­er to uti­lize data between dif­fer­ent work­loads and also the abil­i­ty of peo­ple in dif­fer­ent roles to uti­lize the data on the plat­form, which is exact­ly what a mod­ern data plat­form should be.

With the uni­fied Delta Lake for­mat, the need to copy the same data in dif­fer­ent for­mats for dif­fer­ent tools or use cas­es is sig­nif­i­cant­ly reduced. In addi­tion to this, Fab­ric has com­plete­ly new fea­tures like short­cut and data­base mir­ror­ing, which allow exist­ing data, for exam­ple from AWS’s S3, Azure’s Stor­age, or Azure’s SQL and Snowflake data­bas­es, to be linked to One Lake with­out nec­es­sar­i­ly need­ing to be sep­a­rate­ly trans­ferred to One Lake. Each case should of course be stud­ied in more detail and the most suit­able solu­tion sought for the spe­cif­ic need.

Among the new fea­tures, worth men­tion­ing sep­a­rate­ly is the Pow­er BI’s Direct Lake con­nec­tor, which can read data from One Lake in real time and very effi­cient­ly, essen­tial­ly com­bin­ing the best aspects of Direct Query and Import Mode type con­nec­tions: up-to-date infor­ma­tion mod­el and efficiency.

In addi­tion to the above, Fab­ric has numer­ous oth­er new fea­tures and the prod­uct is con­tin­u­ous­ly devel­op­ing. It’s impor­tant to note that although Microsoft released a pro­duc­tion-ready (GA) ver­sion of Fab­ric in Novem­ber 2023, there are still defi­cien­cies in its fea­tures. How­ev­er, these are being patched at a rapid pace and new fea­tures are being announced weekly.

When is a good time to start using Fabric? 

Orga­ni­za­tions that are just start­ing to tran­si­tion to a cloud-based data plat­form should def­i­nite­ly con­sid­er Fab­ric as a pri­ma­ry option. On the oth­er hand, those orga­ni­za­tions that have already built their data plat­forms on Synapse or Data­bricks are in no hur­ry to trans­fer already com­plet­ed parts to Fab­ric, as Synapse will con­tin­ue to be ful­ly sup­port­ed. How­ev­er, for these orga­ni­za­tions, it may be an inter­est­ing option to imple­ment Fab­ric for a cer­tain area of use and thus gain expe­ri­ence with the new platform.

There are indi­ca­tions that Microsoft will pro­vide tools for migra­tions at some point. If an orga­ni­za­tion’s cur­rent Synapse-based solu­tion is Lake­house using Spark note­books, as Islet’s imple­men­ta­tion mod­el is, the migra­tion to Fab­ric will be a fair­ly light oper­a­tion, regard­less of whether it hap­pens now or in a few years.

In sum­ma­ry, what ben­e­fits does Fab­ric bring to an organization?

Under the same ser­vice, you can now find every­thing relat­ed to data and ana­lyt­ics needs from data inte­gra­tion to its mod­i­fi­ca­tion, stor­age, and report­ing, as well as machine learn­ing and AI tools.

Since all Fab­ric tools rec­og­nize the cen­tral­ized One Lake and use the same data stor­age for­mat, it’s easy for peo­ple in dif­fer­ent roles to uti­lize the infor­ma­tion stored on the plat­form. Time and mon­ey are saved when an indi­vid­ual does not have to fig­ure out how to read the desired data.

Like­wise, work is made more effi­cient by Copi­lot. It is inte­grat­ed as part of all Fab­ric work­loads and has the same vis­i­bil­i­ty to the data on the plat­form as the devel­op­ers, so devel­op­ers can ask Copi­lot to write code, cal­cu­late for­mu­las, or ana­lyze data, for example.

Fab­ric’s costs are based on capac­i­ty units, which all work­loads con­sume. As the amount of data grows and usage needs expand, more capac­i­ty is pur­chased or vice ver­sa. Pow­er BI licens­es, how­ev­er, are still pur­chased sep­a­rate­ly unless using the F64 capac­i­ty, ie. the for­mer Pow­er BI Premium.

Islet and Fabric 

At Islet, we have been imple­ment­ing Data Lake­house archi­tec­tures for a long time instead of tra­di­tion­al data ware­hous­es, as in the case of Wihuri. We have devel­oped gener­ic, repeat­able mod­els and libraries for the effi­cient imple­men­ta­tion of the Medal­lion archi­tec­ture and use Delta Lake as the data stor­age for­mat. Con­sid­er­ing these, the tran­si­tion to Fab­ric does­n’t great­ly change our way of imple­ment­ing Lake­house, but it brings many new pos­si­bil­i­ties and fea­tures for build­ing the data plat­form and uti­liz­ing data.

- — - — -

The blog’s author Mika Kuiv­a­nen is data archi­tect at Islet with over 15 years of expe­ri­ence about data­bas­es, data & ana­lyt­ics and consulting. 

More info:

Janne Antti­la

CBO — Data and Ana­lyt­ics, Isletter

janne.​anttila@​isletgroup.​fi

+358 45 672 8569

#Microsoft­Fab­ric #Azure #lake­house #delta­lake #power­BI #data #ana­lyt­ics #AI #onelake #Microsoft

Like what you read? Share this!