Seminar Report - Text and Data Mining Seminar Brussels
A lively and very well attended PDLN seminar was held on 21st November on the topic of text and data mining, and the implications of Article 4 in the DSM Copyright Directive. Conversation ranged widely over many related topics, reflecting member interest in how implementation will change licensing models.
Article 4 creates a new right for commercial organisations (eg Meltwater) to copy news for data mining, which can only be managed if news organisations clearly mark their free web content as protected (as explained in the PDLN TDM briefing paper). The directive encourages this being done digitally. Angela Mills Wade of EPC showed the Copyright Hub proposed approach to marking websites and web pages with a widget that provides this. The presentation is on the PDLN site. It was suggested this be developed with a simple, short and clear advice for PDLN members to use in briefing publishers. A case study or an early adopter would be helpful.
Protecting content implicitly requires a licensing model and a business case publishers will respond to. Valuing the opportunity, or the risk of media monitoring moving from clippings to analysis based on TDM, was discussed at length. CLA and Belga were especially keen to develop clearer models as the basis for further action. Both are open to discussion, CLA proposing a comprehensive research exercise. They and other noted that there is a direct corporate application as well as intermediary (MMO) licensees. There was some considerable discussion on the progress of national web content licensing models. NLA, CLA, CFC, Belga and NLI have working models. CFC now have 11 companies licensed. Some continue to evade and avoid paying. The general consensus is fees based on a cper client charge (with or without weighting by client size) is needed, given the value is in the data copied by the provider rather than the limited copyright value in material actually passed to users. Several reported challenges of breaking apart evaluation and copying rights.
CLA addressed wider values of TDM in anti piracy initiatives and in extracting greater value from data held by CMOs. Googles API is a resource, as well as a concern.
Subjects for further work includes Article 17, some of the other DSM sections (eg out of commerce) and (as noted above) business models and value. PDLN was also asked to ensure data on wider implementation was shared, noting not all publisher associations and countries had or were applying resources to this.
Presentations from CLA and CFC are available on the member only area of the website.