New trade body wants to license training data for AI use

Seven companies that license music, images, videos, and other data used for training artificial intelligence systems have formed a trade association to promote responsible and ethical licensing of intellectual property.

The issue has become a concern for builders of generative AI models and the enterprises that use them, as some data sets used in AI training have legally and ethically uncertain origins. Musicians, authors, and actors are complaining about unauthorized use of their works, voices and likenesses in such training data sets, and website operators fight to stop AI companies scraping their content.

The founders of the Dataset Providers Alliance (DPA) include Rightsify, Global Copyright Exchange (GCX), vAIsual, Calliope Networks, ado, Datarade, and Pixta AI.

The companies aim to standardize the licensing of intellectual property for AI and ML datasets, promote ethical data practices, facilitate industry collaboration, advocate for content creators’ rights, and support innovation in AI and ML technologies while protecting intellectual property, they said in a statement.

“The DPA will serve as a powerful voice for dataset providers, ensuring that the rights of content creators are protected while AI developers get access to large amounts of high-quality AI training data,” Alex Bestall, CEO of Rightsify and GCX, said in the statement.

Implications for AI companies

AI companies have been training their models using vast quantities of content, often sourced from the internet without the consent of the original creators or rights holders, leading to numerous disputes.

Additionally, there are growing concerns over the unauthorized digital replication of individuals’ voices or likenesses. A significant example involved Scarlett Johansson, who claimed that an OpenAI bot’s voice closely resembled hers.

In response to such issues, the US introduced the NO FAKES Act last year and the Generative AI Copyright Disclosure Act this year. Trade associations like the DPA may play a role in supporting the enforcement of such legislation and advocating for other similar measures.

But while emphasizing the need for transparency and responsible AI practices, these regulations can impact compliance costs and necessitate operational adjustments, says Charlie Dai, VP and principal analyst at Forrester.

“For instance, to comply with the Generative AI Copyright Disclosure Act, organizations will need to allocate workforce and budget for tracking and reporting copyrighted content, ensuring transparency, and complying with the disclosure requirements,” said Dai. “They must also introduce operational processes document and disclose copyright-related information during dataset creation.”

Effective risk management will be crucial for addressing legal and reputational risks, and innovation strategies may require adjustments to comply with regulatory standards. The situation could become even more complex for multinational companies.

Swapnil Shende, associate research manager for AI at IDC Asia/Pacific, sees complications for multinational organizations. “While established markets like the US and Europe lead the way in setting regulatory standards that may influence other countries, each nation will have to customize its rules to fit local markets and capitalize on strengths,” Shende said. “This regulatory diversity presents challenges for multinational firms operating across borders, as they must navigate varying compliance requirements while striving for consistency.”

Strategy adjustments required

With a potential increase in demand for licensed data amid ongoing copyright disputes, enterprise tech companies may need to adjust their strategies for acquiring and using training data to mitigate legal and financial risks.

Dai suggested that AI security and governance leaders should align with business strategy and develop comprehensive risk mitigation frameworks. These frameworks should identify, evaluate, and address potential risks in AI projects and initiatives.

“Specifically, they should not only consider platforms and practices that ensure continuous data security and governance for AI-driven enterprises but also implement robust security measures to safeguard sensitive data and comply with regulations, revisiting the capabilities of their data and AI vendors on AI compliance in the meantime,” Dai said.

Shende added that enterprises should prioritize licensed data from compliant providers and verify ownership with clear contracts and indemnification clauses.

“By embracing rigorous standards for data sourcing and management, enterprises can set new industry benchmarks, enhance their operational integrity, and build greater trust with consumers and regulatory bodies – their ongoing engagement and innovation in ethical AI practices will be crucial in achieving sustainable growth and maintaining a competitive edge in the technology sector,” Shende said.

© Foundry