{"id":229774,"date":"2023-04-24T15:22:36","date_gmt":"2023-04-24T12:22:36","guid":{"rendered":"https:\/\/isletgroup.fi\/?p=229774"},"modified":"2024-02-27T13:48:56","modified_gmt":"2024-02-27T11:48:56","slug":"databricks-and-lakehouse-architecture-a-comprehensive-overview","status":"publish","type":"post","link":"https:\/\/isletgroup.fi\/en\/2023\/04\/24\/databricks-and-lakehouse-architecture-a-comprehensive-overview\/","title":{"rendered":"Data\u00adbricks and Lake\u00adhouse Archi\u00adtec\u00adture: A&nbsp;Com\u00adpre\u00adhen\u00adsive Overview"},"content":{"rendered":"<p>[et_\u200bpb_\u200bsection fb_built=\u201c1\u201d _builder_version=\u201c4.16\u201d _module_preset=\u201cdefault\u201d da_disable_devices=\u201coff|off|off\u201d global_\u200bcolors_\u200binfo=\u201d{}\u201d da_is_popup=\u201coff\u201d da_exit_intent=\u201coff\u201d da_has_close=\u201con\u201d da_alt_close=\u201coff\u201d da_dark_close=\u201coff\u201d da_not_modal=\u201con\u201d da_is_singular=\u201coff\u201d da_with_loader=\u201coff\u201d da_has_shadow=\u201con\u201d][et_pb_row _builder_version=\u201c4.16\u201d _module_preset=\u201cdefault\u201d global_colors_info=\u201d{}\u201d][et_pb_column type=\u201c4_4\u201d _builder_version=\u201c4.16\u201d _module_preset=\u201cdefault\u201d global_colors_info=\u201d{}\u201d][et_pb_text _builder_version=\u201c4.17.4\u201d _module_preset=\u201cdefault\u201d text_orientation=\u201cjustified\u201d global_colors_info=\u201d{}\u201d]<\/p>\n<h3><strong>Intro\u00adduc\u00adtion to Databricks<\/strong><\/h3>\n<p><a href=\"https:\/\/www.databricks.com\/\" target=\"_blank\" rel=\"noopener\"><strong>Data\u00adbricks<\/strong><\/a>, a&nbsp;US-based soft\u00adware com\u00adpa\u00adny, was found\u00aded by the cre\u00adators of <a href=\"https:\/\/spark.apache.org\/\" target=\"_blank\" rel=\"noopener\"><strong>Apache Spark<\/strong><\/a>, an open-source ana\u00adlyt\u00adics engine. Their flag\u00adship prod\u00aduct, the Data\u00adbricks Lake\u00adhouse Plat\u00adform, is a&nbsp;cloud-based data and ana\u00adlyt\u00adics plat\u00adform pow\u00adered by Spark and <a href=\"https:\/\/delta.io\/\" target=\"_blank\" rel=\"noopener\"><strong>Delta Lake<\/strong><\/a>. The plat\u00adform com\u00adbines the best ele\u00adments of data lakes and data ware\u00adhous\u00ades, pro\u00advid\u00ading tools for data engi\u00adneers, data sci\u00aden\u00adtists, and busi\u00adness intel\u00adli\u00adgence ana\u00adlysts to col\u00adlab\u00ado\u00adra\u00adtive\u00adly devel\u00adop solu\u00adtions rang\u00ading from small-scale, high\u00adly cus\u00adtomized data needs to enter\u00adprise-lev\u00adel data plat\u00adform solu\u00adtions. With its roots in the Big Data world, Data\u00adbricks is par\u00adtic\u00adu\u00adlar\u00adly adept at han\u00addling large vol\u00adumes of data, which typ\u00adi\u00adcal\u00adly involves cloud stor\u00adage. Data\u00adbricks is avail\u00adable on all major pub\u00adlic cloud plat\u00adforms (<a href=\"https:\/\/azure.microsoft.com\/en-us\" target=\"_blank\" rel=\"noopener\"><strong>Azure<\/strong><\/a>, <a href=\"https:\/\/aws.amazon.com\/\" target=\"_blank\" rel=\"noopener\"><strong>AWS<\/strong><\/a>, and <a href=\"https:\/\/cloud.google.com\/\" target=\"_blank\" rel=\"noopener\"><strong>GCP<\/strong><\/a>) and seam\u00adless\u00adly inte\u00adgrates with their stor\u00adage, secu\u00adri\u00adty, and com\u00adpute infra\u00adstruc\u00adture while offer\u00ading admin\u00adis\u00adtra\u00adtion capa\u00adbil\u00adi\u00adties to&nbsp;users.<\/p>\n<p>Under the hood, Data\u00adbricks relies on decen\u00adtral\u00adized, par\u00adal\u00adlel com\u00adput\u00ading pow\u00adered by Apache Spark. Users can deter\u00admine the types of clus\u00adters required for com\u00adpu\u00adta\u00adtion, while Data\u00adbricks pro\u00adcures the nec\u00ades\u00adsary machines, acti\u00advates them as need\u00aded or accord\u00ading to a&nbsp;pre\u00adde\u00adter\u00admined sched\u00adule, and shuts them down when no longer required. Data is stored in the cloud but is sep\u00ada\u00adrat\u00aded into the cus\u00adtomer\u2019s own data stor\u00adage solu\u00adtions, such as Azure Data Lake Stor\u00adage Gen2 or AWS S3. The total cost of own\u00ader\u00adship com\u00adbines data and com\u00adpute resources, allow\u00ading both to scale independently.<\/p>\n<h3><\/h3>\n<h3><strong>Data pro\u00adcess\u00ading with Databricks<\/strong><\/h3>\n<p>Dur\u00ading data pro\u00adcess\u00ading, Spark Dataframes, which resem\u00adble data\u00adbase tables, serve as the typ\u00adi\u00adcal unit of pro\u00adcess\u00ading and are man\u00adaged by the Spark engine. Dataframe pro\u00adcess\u00ading occurs with\u00adin the clus\u00adter\u2019s mem\u00ado\u00adry and is manip\u00adu\u00adlat\u00aded using Python, Scala, R, or SQL in data pro\u00adcess\u00ading pipelines. Pipeline devel\u00adop\u00adment takes place in <a href=\"https:\/\/www.databricks.com\/product\/collaborative-notebooks\" target=\"_blank\" rel=\"noopener\"><strong>Data\u00adbricks Note\u00adbooks<\/strong><\/a>, which con\u00adsist of one or more com\u00admand cells exe\u00adcut\u00aded sequen\u00adtial\u00adly. Dif\u00adfer\u00adent lan\u00adguages can be used with\u00adin a&nbsp;sin\u00adgle note\u00adbook, with code (e.g., Python or SQL) writ\u00adten in indi\u00advid\u00adual com\u00admand cells. Gen\u00ader\u00adal\u00adly, a&nbsp;sin\u00adgle data pro\u00adcess\u00ading pipeline com\u00adpris\u00ades sev\u00ader\u00adal sequen\u00adtial\u00adly run Note\u00adbooks, each focus\u00ading on a&nbsp;spe\u00adcif\u00adic type of pro\u00adcess\u00ading, such as raw data clean\u00ading, key gen\u00ader\u00ada\u00adtion, or dimen\u00adsion\u00adal mod\u00adel gen\u00ader\u00ada\u00adtion. The final out\u00adput of each pro\u00adcess\u00ading pipeline is stored in the cloud, typ\u00adi\u00adcal\u00adly in the Delta Lake for\u00admat, which is the de fac\u00adto solu\u00adtion for mod\u00adern data lake ana\u00adlyt\u00adics. Although run\u00adning on a&nbsp;decen\u00adtral\u00adized and par\u00adal\u00adlel com\u00adput\u00ading plat\u00adform, devel\u00adop\u00adment at a&nbsp;tech\u00adni\u00adcal lev\u00adel is quite straight\u00adfor\u00adward, and devel\u00adop\u00aders don\u2019t nec\u00ades\u00adsar\u00adi\u00adly need to delve into the intri\u00adca\u00adcies of the Spark engine. How\u00adev\u00ader, full cus\u00adtomiza\u00adtion is avail\u00adable for devel\u00adop\u00aders who require it.<\/p>\n<h3><\/h3>\n<h3><strong>Work\u00adflow gen\u00ader\u00ada\u00adtion and Depen\u00adden\u00adcy management<\/strong><\/h3>\n<p><a href=\"https:\/\/www.databricks.com\/product\/delta-live-tables\" target=\"_blank\" rel=\"noopener\"><strong>Delta Live Tables<\/strong><\/a> (DLT), a&nbsp;recent addi\u00adtion to Data\u00adbricks, offers sig\u00adnif\u00adi\u00adcant added val\u00adue for data solu\u00adtions. The most essen\u00adtial fea\u00adture is the auto\u00admat\u00adic direct\u00aded acyclic graph (DAG) work\u00adflow gen\u00ader\u00ada\u00adtion and depen\u00adden\u00adcy man\u00adage\u00adment, which helps main\u00adtain prop\u00ader pro\u00adcess\u00ading order and ensures the cor\u00adrect sequence of table load\u00ading. DAG chains are vis\u00adi\u00adble to users, pro\u00advid\u00ading data lin\u00adeage vis\u00adi\u00adbil\u00adi\u00adty from raw data to report\u00ading-ready tables. Addi\u00adtion\u00adal\u00adly, DLT enables declar\u00ada\u00adtive data pipeline devel\u00adop\u00adment, allow\u00ading the same pipeline and code to process dif\u00adfer\u00adent source tables in batch mode or as a&nbsp;stream. This sim\u00adpli\u00adfies the archi\u00adtec\u00adture and reduces code com\u00adplex\u00adi\u00adty. Qual\u00adi\u00adty con\u00adtrol is high\u00adly auto\u00admat\u00aded in DLT data pipelines, with data val\u00adi\u00adda\u00adtion occur\u00adring dur\u00ading pro\u00adcess\u00ading based on pre\u00adde\u00adfined rules and con\u00addi\u00adtions, and the results (e.g., the num\u00adber of rows not meet\u00ading con\u00addi\u00adtions) are visu\u00adal\u00adly acces\u00adsi\u00adble to the&nbsp;user.<\/p>\n<p><\/p>\n<h3><strong>Data man\u00adage\u00adment using Uni\u00adty Catalog<\/strong><\/h3>\n<p>Among the inter\u00adest\u00ading fea\u00adtures intro\u00adduced last year, the <a href=\"https:\/\/www.databricks.com\/product\/unity-catalog\" target=\"_blank\" rel=\"noopener\"><strong>Data\u00adbricks Uni\u00adty Cat\u00ada\u00adlog<\/strong><\/a> offers vis\u00adi\u00adbil\u00adi\u00adty and man\u00adage\u00adabil\u00adi\u00adty for mul\u00adti\u00adple source sys\u00adtems and end-user groups, all cen\u00adtral\u00adized to ensure con\u00adsis\u00adtent data gov\u00ader\u00adnance through\u00adout the data life\u00adcy\u00adcle. Uni\u00adty Cat\u00ada\u00adlog allows end-users to find data based on meta\u00adda\u00adta from all reg\u00adis\u00adtered sources and share data through the Delta Shar\u00ading fea\u00adture with\u00adin the same Data\u00adbricks instance, with\u00adin the same cloud, or even out\u00adside the cloud envi\u00adron\u00adment. Data is eas\u00adi\u00adly acces\u00adsi\u00adble yet con\u00adtrolled, ensur\u00ading that access to data with\u00adin the Cat\u00ada\u00adlog is cen\u00adtral\u00adly man\u00adaged, audit\u00aded, and mon\u00adi\u00adtored in one&nbsp;place.<\/p>\n<p><\/p>\n<h3><strong>Data pub\u00adlish\u00ading for con\u00adsump\u00adtion with Databricks<\/strong><\/h3>\n<p>His\u00adtor\u00adi\u00adcal\u00adly, Data\u00adbricks\u2019 chal\u00adlenge has been on the data dis\u00adtri\u00adb\u00adu\u00adtion side, ensur\u00ading that BI devel\u00adop\u00aders and busi\u00adness ana\u00adlysts have access to ready-made data using famil\u00adiar meth\u00adods and tools. Today, Data\u00adbricks offers SQL Ware\u00adhouse, which allows data to be pub\u00adlished in a&nbsp;user inter\u00adface resem\u00adbling a&nbsp;tra\u00addi\u00adtion\u00adal rela\u00adtion\u00adal data\u00adbase. With this fea\u00adture, BI devel\u00adop\u00aders and ana\u00adlysts can query ready-made, mod\u00adeled data direct\u00adly from a&nbsp;web brows\u00ader using Data\u00adbricks SQL IDE or from sup\u00adport\u00aded IDEs such as <a href=\"https:\/\/dbeaver.io\/\" target=\"_blank\" rel=\"noopener\"><strong>DBeaver<\/strong><\/a>. Wide\u00adly used BI tools like <a href=\"https:\/\/powerbi.microsoft.com\/en-us\/\" target=\"_blank\" rel=\"noopener\"><strong>Pow\u00ader BI<\/strong><\/a>, <a href=\"https:\/\/www.tableau.com\/\" target=\"_blank\" rel=\"noopener\"><strong>Tableau<\/strong><\/a>, and <a href=\"https:\/\/www.qlik.com\/us\/\" target=\"_blank\" rel=\"noopener\"><strong>Qlik<\/strong><\/a> have native con\u00adnec\u00adtors that eas\u00adi\u00adly inte\u00adgrate with Data\u00adbricks SQL Ware\u00adhouse. At the time of writ\u00ading, SQL Ware\u00adhouse requires a&nbsp;run\u00adning clus\u00adter to oper\u00adate, but a&nbsp;server\u00adless solu\u00adtion is already in the pub\u00adlic pre\u00adview&nbsp;phase.<\/p>\n<p><\/p>\n<h3><strong>Typ\u00adi\u00adcal use cas\u00ades for Databricks<\/strong><\/h3>\n<p>Data\u00adbricks is suit\u00adable for a&nbsp;vari\u00adety of use cas\u00ades, including:<\/p>\n<ol>\n<li>Large data masses:&nbsp;<ul>\n<li>Dis\u00adtrib\u00aduted com\u00adput\u00ading, a&nbsp;wide range of clus\u00adter options, and rich pro\u00adgram\u00adming lan\u00adguage options pro\u00advide all the nec\u00ades\u00adsary tools to work with large vol\u00adumes of data effi\u00adcient\u00adly and cost-effectively.<\/li>\n<\/ul>\n<\/li>\n<li>Exten\u00adsive and diverse data platforms:&nbsp;<ul>\n<li>The ver\u00adsa\u00adtil\u00adi\u00adty of Data\u00adbricks allows for the imple\u00admen\u00adta\u00adtion of numer\u00adous use cas\u00ades with\u00adin a&nbsp;sin\u00adgle envi\u00adron\u00adment. Data engi\u00adneers han\u00addle raw data retrieval, purifi\u00adca\u00adtion, and Data Vault\/\u200bdimensional mod\u00adel\u00ading; data sci\u00aden\u00adtists devel\u00adop and train machine learn\u00ading mod\u00adels based on these datasets; and BI ana\u00adlysts pro\u00advide data for use with com\u00admon\u00adly used visu\u00adal\u00adiza\u00adtion tools in the SQL Ware\u00adhouse&nbsp;API.<\/li>\n<\/ul>\n<\/li>\n<li>Pre\u00adci\u00adsion solu\u00adtions with chal\u00adleng\u00ading requirements:&nbsp;<ul>\n<li>For exam\u00adple, pro\u00adcess\u00ading data from IoT devices may over\u00adwhelm many data plat\u00adforms built with tra\u00addi\u00adtion\u00adal tools (e.g., rela\u00adtion\u00adal data\u00adbas\u00ades) in terms of data val\u00adues and update fre\u00adquen\u00adcy. Delta Live Table fea\u00adtures, among oth\u00ader options, pro\u00advide an easy-to-use Stream\u00ading Table for such purposes.<\/li>\n<li>In some cas\u00ades, raw data can be in such a&nbsp;chal\u00adleng\u00ading form that solu\u00adtions built with tra\u00addi\u00adtion\u00adal tools even\u00adtu\u00adal\u00adly encounter per\u00adfor\u00admance or main\u00adtain\u00adabil\u00adi\u00adty issues. Here, Spark Dataframes and Python\u2019s mod\u00adu\u00adlar\u00adi\u00adty and reusabil\u00adi\u00adty over SQL enable sig\u00adnif\u00adi\u00adcant\u00adly more effi\u00adcient and easy-to-main\u00adtain solutions.<\/li>\n<\/ul>\n<\/li>\n<li>Build\u00ading a&nbsp;new data plat\u00adform from scratch:&nbsp;<ul>\n<li>Data\u00adbricks offers pow\u00ader\u00adful tools for devel\u00adop\u00ading data plat\u00adform solu\u00adtions with famil\u00adiar tools for devel\u00adop\u00aders. Choos\u00ading Data\u00adbricks as the imple\u00admen\u00adta\u00adtion tech\u00adnol\u00ado\u00adgy from the start avoids inad\u00adver\u00adtent\u00adly exclud\u00ading any use cas\u00ades or forc\u00ading the lat\u00ader inte\u00adgra\u00adtion of incom\u00adpat\u00adi\u00adble components.<\/li>\n<\/ul>\n<\/li>\n<li>Machine Learn\u00ading: <ul>\n<li>Machine Learn\u00ading (ML) has always been cen\u00adtral to Data\u00adbricks. Its ML capa\u00adbil\u00adi\u00adties are built on top of an open lake\u00adhouse archi\u00adtec\u00adture, with fea\u00adtures such as AutoML and MLflow sup\u00adport\u00ading the devel\u00adop\u00adment, life\u00adcy\u00adcle man\u00adage\u00adment, and mon\u00adi\u00adtor\u00ading of machine learn\u00ading models.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p><\/p>\n<h3><strong>What has Islet done with Databricks?<\/strong><\/h3>\n<p>Our team has built an enter\u00adprise-lev\u00adel data plat\u00adform using Data\u00adbricks, where data engi\u00adneers lever\u00adaged Delta Live Table func\u00adtion\u00adal\u00adi\u00adty to process data from raw inputs into a&nbsp;com\u00adplete, dis\u00adtrib\u00adutable dimen\u00adsion\u00adal mod\u00adel deliv\u00adered to end-users via the SQL Ware\u00adhouse inter\u00adface. The solu\u00adtion inte\u00adgrat\u00aded SAP S\/4HANA data from a&nbsp;large Finnish com\u00adpa\u00adny into Azure using Data\u00adbricks Note\u00adbooks. Raw data was retrieved from SAP and stored in ADLS Gen2 using Aecor\u00adsoft Data Inte\u00adgra\u00adtor. The imple\u00admen\u00adta\u00adtion was com\u00adplete\u00adly meta\u00adda\u00adta-dri\u00adven; when inte\u00adgrat\u00ading new data, only the basic infor\u00adma\u00adtion of the new tables is writ\u00adten to the con\u00adfig\u00adu\u00adra\u00adtion file, after which Data\u00adbricks Note\u00adbooks retrieves the new, avail\u00adable raw data, cleans it, and per\u00adforms the nec\u00ades\u00adsary trans\u00adfor\u00adma\u00adtions for fur\u00adther pro\u00adcess\u00ading. Final\u00adly, the dimen\u00adsion\u00adal mod\u00adel designed in col\u00adlab\u00ado\u00adra\u00adtion with the cus\u00adtomer is gen\u00ader\u00adat\u00aded using SQL queries host\u00aded in Azure DevOps repos\u00adi\u00adto\u00adry, which are read by Delta Live Table pipelines. This mod\u00adel is then dis\u00adtrib\u00aduted to Pow\u00ader BI and ana\u00adlysts through SQL Warehouse.<\/p>\n<p>To learn more about Data\u00adbricks\u2019 capa\u00adbil\u00adi\u00adties, our expe\u00adri\u00adences, or to brain\u00adstorm about Lake\u00adhouse archi\u00adtec\u00adture, please con\u00adtact&nbsp;us!<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row column_structure=\u201c1_4,1_4,1_2\u201d _builder_version=\u201c4.16\u201d background_size=\u201cinitial\u201d background_position=\u201ctop_left\u201d background_repeat=\u201crepeat\u201d global_colors_info=\u201d{}\u201d][et_pb_column type=\u201c1_4\u201d _builder_version=\u201c4.16\u201d custom_\u200bpadding=\u201d|||\u201d global_\u200bcolors_\u200binfo=\u201d{}\u201d custom_padding__hover=\u201d|||\u201d][\/et_pb_column][et_pb_column type=\u201c1_4\u201d _builder_version=\u201c4.16\u201d custom_\u200bpadding=\u201d|||\u201d global_\u200bcolors_\u200binfo=\u201d{}\u201d custom_padding__hover=\u201d|||\u201d][et_pb_image src=\u201chttps:\/\/isletgroup.fi\/wp-content\/uploads\/2021\/02\/Janne_Anttila_Beanie_Round.png\u201d title_text=\u201cJanne_Anttila_Beanie_Round\u201d url_new_window=\u201con\u201d align=\u201ccenter\u201d align_tablet=\u201ccenter\u201d align_\u200bphone=\u201d\u201d align_last_edited=\u201con|desktop\u201d _builder_version=\u201c4.16\u201d width=\u201c85%\u201d global_colors_info=\u201d{}\u201d][\/et_pb_image][et_pb_text _builder_version=\u201c4.16\u201d custom_margin=\u201d|-10px||-30px|false|false\u201d global_colors_info=\u201d{}\u201d]<\/p>\n<p style=\"text-align: center;\">Janne Antti\u00adla CBO\u2009\u2014\u2009Data and Ana\u00adlyt\u00adics, Islet\u00adter <a href=\"mailto:janne.anttila@isletgroup.fi\">janne.\u200banttila@\u200bisletgroup.\u200bfi<\/a> +358 45 672&nbsp;8569<\/p>\n<p>[\/et_pb_text][\/et_pb_column][et_pb_column type=\u201c1_2\u201d _builder_version=\u201c4.16\u201d custom_\u200bpadding=\u201d|||\u201d global_\u200bcolors_\u200binfo=\u201d{}\u201d custom_padding__hover=\u201d|||\u201d][et_pb_text admin_label=\u201cmore infor\u00adma\u00adtion text (fill it if you want to\u201d _builder_version=\u201c4.16\u201d custom_margin=\u201d||40px|||\u201d global_colors_info=\u201d{}\u201d][\/et_pb_text][et_pb_button button_url=\u201cmailto:janne.anttila@isletgroup.fi\u201d url_new_window=\u201con\u201d button_text=\u201cGet in touch to hear more\u201d button_alignment=\u201ccenter\u201d admin_label=\u201cemail\/get in touch but\u00adton\u201d module_class=\u201cCTA_box_bedrock\u201d _builder_version=\u201c4.16\u201d custom_margin=\u201c140px||||false|false\u201d global_\u200bcolors_\u200binfo=\u201d{}\u201d button_text_size__hover_enabled=\u201coff\u201d button_one_text_size__hover_enabled=\u201coff\u201d button_two_text_size__hover_enabled=\u201coff\u201d button_text_color__hover_enabled=\u201coff\u201d button_one_text_color__hover_enabled=\u201coff\u201d button_two_text_color__hover_enabled=\u201coff\u201d button_border_width__hover_enabled=\u201coff\u201d button_one_border_width__hover_enabled=\u201coff\u201d button_two_border_width__hover_enabled=\u201coff\u201d button_border_color__hover_enabled=\u201coff\u201d button_one_border_color__hover_enabled=\u201coff\u201d button_two_border_color__hover_enabled=\u201coff\u201d button_border_radius__hover_enabled=\u201coff\u201d button_one_border_radius__hover_enabled=\u201coff\u201d button_two_border_radius__hover_enabled=\u201coff\u201d button_letter_spacing__hover_enabled=\u201coff\u201d button_one_letter_spacing__hover_enabled=\u201coff\u201d button_two_letter_spacing__hover_enabled=\u201coff\u201d button_bg_color__hover_enabled=\u201coff\u201d button_one_bg_color__hover_enabled=\u201coff\u201d button_two_bg_color__hover_enabled=\u201coff\u201d][\/et_pb_button][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=\u201c4.16\u201d _module_preset=\u201cdefault\u201d global_colors_info=\u201d{}\u201d][et_pb_column type=\u201c4_4\u201d _builder_version=\u201c4.16\u201d _module_preset=\u201cdefault\u201d global_colors_info=\u201d{}\u201d][et_pb_text _builder_version=\u201c4.17.4\u201d _module_preset=\u201cdefault\u201d global_colors_info=\u201d{}\u201d]<\/p>\n<p>The authors of the blog, <strong><a href=\"https:\/\/www.linkedin.com\/in\/aku-rantala-72b8a443\/\" target=\"_blank\" rel=\"noopener\">Aku Ranta\u00adla<\/a> <\/strong>and<strong> <a href=\"https:\/\/www.linkedin.com\/in\/mika-ronkko\/\" target=\"_blank\" rel=\"noopener\">Mika R\u00f6nkk\u00f6<\/a><\/strong>, are ISLET\u2019s Lead Cloud Data Archi\u00adtects who have man\u00adaged a&nbsp;large num\u00adber of projects. They have a&nbsp;wide range of skills with a&nbsp;vari\u00adety of tools and tech\u00adnolo\u00adgies, Data\u00adbricks and Microsoft Azure in par\u00adtic\u00adu\u00adlar are among their strengths.<\/p>\n<p>[\/et_pb_text][et_pb_text _builder_version=\u201c4.17.4\u201d _module_preset=\u201cdefault\u201d global_colors_info=\u201d{}\u201d]#IsletGroup #data #ana\u00adlyt\u00adics #Data\u00adBricks #Data\u00adLake\u00adhouse\u00adPlat\u00adform #ApacheSpark[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Intro\u00adduc\u00adtion to Data\u00adbricks Data\u00adbricks, a&nbsp;US-based soft\u00adware com\u00adpa\u00adny, was found\u00aded by the cre\u00adators of Apache Spark, an open-source ana\u00adlyt\u00adics engine. Their flag\u00adship prod\u00aduct, the Data\u00adbricks Lake\u00adhouse Plat\u00adform, is a&nbsp;cloud-based data and ana\u00adlyt\u00adics plat\u00adform pow\u00adered by Spark and Delta Lake. The plat\u00adform com\u00adbines the best ele\u00adments of data lakes and data ware\u00adhous\u00ades, pro\u00advid\u00ading tools&nbsp;for&nbsp;[\u2026]<\/p>\n","protected":false},"author":27,"featured_media":229781,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","wp_typography_post_enhancements_disabled":false,"footnotes":""},"categories":[453],"tags":[],"class_list":["post-229774","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cover-story"],"acf":[],"_links":{"self":[{"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/posts\/229774","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/comments?post=229774"}],"version-history":[{"count":24,"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/posts\/229774\/revisions"}],"predecessor-version":[{"id":233615,"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/posts\/229774\/revisions\/233615"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/media\/229781"}],"wp:attachment":[{"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/media?parent=229774"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/categories?post=229774"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/isletgroup.fi\/en\/wp-json\/wp\/v2\/tags?post=229774"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}