top of page

How can you master the new data management platforms?

  • Jan 30
  • 4 min read

Data-as-a-Service and data quality driving innovation: the new Data Management platforms


The new generation of data platforms is being rolled out with the motto “self-service data, for everyone, in real time.” The meaning behind the many initiatives we are seeing in the data market demonstrates a desire to make business users more autonomous in their use of data. This does not mean that technology and IT are fading into the background. On the contrary, they have never been so present, with one major objective: data quality. The distribution of roles and a new governance structure that is being put in place to facilitate the use of artificial intelligence also contribute to this quality.


The organization of business data


The Big Data moment was characterized by the 5Vs (or 3, or 7, we ended up not really knowing anymore): Velocity, Variety, etc.


If we keep in mind that this moment is ongoing and evolving over time, we can say that not all the Vs have been equally important at different stages of Big Data deployment. Clearly, Volume was the predominant concern: the initial goal was to build platforms capable of processing volumes never seen before.


Velocity and Variety are now beginning to be taken into account in the sense that they are finally being organized:


  • Velocity thanks to streaming platforms,

  • Variety thanks to the modeling of possible metadata at the Data Catalog level.


This raises the question of how to better organize and utilize assets and improve data quality in a more industrial context, with better access to tools for business lines.


These two topics are particularly hot right now. The first is technical. The second is more focused on describing information, and it fully addresses the major challenge of Data-as-a-Service.


Data catalogs are metadata directories. They describe, list, and locate information, sometimes also allowing, via governance features, the transformations that data undergoes through multiple processes to be traced, thus contributing to the goal of data quality.


Company mergers and information system consolidations are one of the drivers behind the implementation of these catalogs, whose market is still fragmented due to the highly diverse origins of the players involved.


The importance of reference data remains undeniable, and data quality depends on the clear construction of reference data and its proper use to feed the entire information system. This field, which has existed for some fifteen years, remains at the center of discussions and work, and demonstrates its importance in a cross-functional information system strategy.


Data Insights


Data analysis and data science remain key areas of data and are undergoing modernization.


It is not always easy to be creative if technology places too many constraints on how we implement our use cases. What analysts want now are ways to quickly and easily apply statistical algorithms to their data, so they can build models rapidly and immediately determine their reliability.


Although knowledge of Python and R remains essential, the use of Data Science-in-a-box platforms such as Tibco Data Science, Dataiku, and Alteryx can speed up analysis. The challenge is to ensure the reliability of the model and quickly detect correlations between variables.


If data is readily available (particularly thanks to data virtualization technologies) and the question is no longer how to build a model (Python or Alteryx does it for you), then it becomes easier to run multiple tests to find usable models (for prediction or prescription purposes).


Reducing the time spent building models helps maintain team creativity, and technology can thus support ideation sessions where it becomes easier to test all the ideas that come to mind.


This also frees up time for DataViz, which remains a field in its own right and covers more advanced methodological and educational aspects than before. Storytelling has become a subject in its own right, and it is important to know how to build your narrative around representations, communicating hypotheses.


Data Architecture


The data value chain remains simple, but the tools that comprise it have multiplied and become more complex. Despite the emergence of de facto standards, it is important to maintain in-depth knowledge of the tools, platforms, and their positioning in order to build high-performance data acquisition and exploitation chains capable of covering all use cases.


In this context, several building blocks currently stand out:


  • Storage, where technologies are diversifying. Our recent studies on VoltDB, MongoDB, and DataStax show the dynamism that exists in this sector and the need to closely monitor its evolution.

  • Streaming, for real-time data delivery to consumers.

  • Data virtualization, which creates secure, up-to-date business views without creating new databases, and exposes these views in the form of APIs, reinforcing the API strategy of the information system.


These platforms are consolidating and expanding to provide turnkey development workshops that improve access performance, data governance, and data quality.


We would be remiss if we did not mention the industrialization of data engineering chains, which require a high degree of specialization, involving in-depth consideration of roles and processes, and therefore data management governance, a subject we addressed in a white paper six years ago, which requires a complete revision and will be updated by the end of the year.


To help our clients build and deploy a data strategy that serves all users, it is important to consider all of these aspects and the construction of these new-generation platforms from the perspective of a comprehensive, iterative program focused on data quality as a cross-functional discipline.

Comments


© Gabriel Greenfield

© Gabriel Greenfield

© Gabriel Greenfield

© Gabriel Greenfield

© Gabriel Greenfield

bottom of page