How a data exchange platform eases data integration

By David Boskovic

What has always fascinated me about Moore’s law is that for more than half a century, the technological computing innovations we take for granted—from the PC to smart watches to self-driving cars—hinged on solving one small, specific problem: the distance between transistors on a chip. As our software-powered world becomes more and more data-driven, unlocking and unblocking the coming decades of innovation hinges on data: how we collect it, exchange it, consolidate it, and use it.

In a way, the speed, ease, and accuracy of data exchange has become the new Moore’s law.

TL;DR: Safely and efficiently importing a myriad of data file types from thousands or even millions of different unmanaged external sources is a pervasive, growing problem. Most organizations struggle with file import because traditional ETL (extract, transform, and load) and iPaaS (integration platform-as-a-service) solutions are designed to transfer data only between tightly managed IT systems and databases.

Below, I’ll explain what data import is and the common problems companies face in taming unmanaged files. I’ll discuss how emerging new data exchange platforms are designed to solve these problems and how these platforms work individually and in tandem with traditional ETL solutions to make them faster and more agile.

Data import (aka “data exchange”)

Data import is the safe, fast, and efficient upload of data files that are:

Six data file exchange challenges

Data files often require data mapping, review, cleanup, and validation. They may need human oversight before they can be imported into managed databases and business systems. Data files present developers and IT teams with a variety of challenges:

Data import workarounds vs. a purpose-built data exchange solution

Most IT teams rely on a range of workarounds to bring data files into their business, usually with significant data quality issues and at a high cost. Businesses attempt to solve these data file issues by hiring outside IT services teams, using end-user templates and rules, or building a custom solution.

Beyond the direct costs of personnel and maintenance required for these workarounds, the opportunity cost of lost and delayed revenue vastly increases the impact of data import. A data exchange solution will streamline, accelerate, and secure data import processes, improving business velocity and delivering rapid and sustained ROI.

The right solution will:

Build vs. buy (or a mixture of both)

In addition to building a file importer from scratch, companies can draw on several open-source libraries and commercial solutions to complete their enterprise data integration architecture. Building is always a long-term commitment and will entail developing new features as file import needs change (such as adding new languages, or navigating regulatory concerns that may come with supporting a new customer), on top of supporting and maintaining the tool over time.

Some companies opt to buy a CSV import tool, choosing among the many options that have emerged in recent years. These tools offer basic functionality but typically are limited to a narrowly defined use case and cannot address the varied and evolving needs of enterprise use cases.

The third option is a “build with” approach that provides the functionality and scalability of software, together with the flexibility to meet an organization’s specific business needs. An API-based file import platform enables developers to build fully customizable data file import, using code to drive business and data logic without having to maintain the underlying plumbing.

Whether an organization DIYs it, outsources it, or builds with a platform, there are certain basic functions that any data exchange solution needs to support.

Data parsing is the process of aggregating information (in a file) and breaking it into discrete parts. A data parsing feature that provides the ability to transform a file into an array of discrete data and streamlines this process for end users. Along with parsing, proper data structuring ensures that data is received into the system and labeled appropriately. APIs expect a specific format of data and will fail without it.

Data validation involves checking the data to ensure it matches an expected format or value, preventing issues from occurring down the line and eliminating the need for your end users to remove and re-upload data. After validation, data mapping and matching refer to taking the previously unknown source data and matching it to a known target. Without data mapping, imports will fail when data elements—such as column headings—do not match exactly.

Data transformation involves making changes to data as it flows into the system to ensure it meets an expected or desired value. Rather than sending data back to users with an error message, the data undergoes small, systematic tweaks to ensure that it is usable.

Data in / data out refers to all the ways data can be moved into and out of the tool. It can be as simple as downloading and uploading or as complex as automating imports and posting exports to an external API. Data ingress and egress should align with an organization's operational needs.

Performance at scale and facilitating collaboration among multiple users is imperative. What might suffice in the short term can swiftly devolve into a sluggish system unless you consider future requirements.

Security, compliance, and access functionalities ensure that the data import solution functions smoothly, aligns with regulatory requirements, safeguards data integrity, and increases transparency. These elements form the foundation of a trustworthy and dependable file import tool.

ETL + data import = stronger together

Data exchange and import solutions are designed to work seamlessly alongside traditional integration solutions. ETL tools integrate structured systems and databases and manage the ongoing transfer and synchronization of data records between these systems. Adding a solution for data-file exchange next to an ETL tool enables teams to facilitate the seamless import and exchange of variable unmanaged data files.

The data exchange and ETL systems can be implemented on separate, independent, and parallel tracks, or so that the data-file exchange solution feeds the restructured, cleaned, and validated data into the ETL tool for further consolidation in downstream enterprise systems.

A data exchange platform integrated with a traditional ETL tool offers several advantages in managing and transferring data:

Combining a data exchange platform with an ETL tool will create a modern data integration and management ecosystem that enables companies to make better use of all of their data and start reaping the benefits of the new Moore’s law.

David Boskovic, founder and CEO of Flatfile.

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.

© Info World