You are currently browsing the tag archive for the ‘Open Data’ tag.

Scholarship in the Digital Age - Christine BorgmanChristine Borgman is one of my scholarly heroines; when it comes to her fine nose for current developments in e-scholarship and digital information retrieval and her thorough and concise way of communicating (alas, she is a specialist in scholarly communication) these issues via monographs, articles and lectures, she definitely belongs to my scholarly all-star gallery. Her latest book Scholarship in the Digital Age, was an indispensable resource for me when writing my Master’s thesis on the Scholarly Communication System and Open Access.

So I was really glad I found this lecture (which I can’t embed, sorry) by Christine Borgman online, in which she discusses most of her main topics: cyberinfrastructure and e-science, Open Access, the data deluge, collaborations and intellectual property and the scholarly communication value chain. The lecture is entitled Scholarship in the Digital Age: Information, Infrastructure, and the Internet, and was delivered at Columbia University.

 I also found, via Open Access News, this podcast with Alma Swan, Key Perspectives main consultant on Open Access, Scholarly Communication and Academic Publishing. In this podcast, conducted by Sara Bartlett from Talis, she discusses amongst others the current state and difficulties concerning e-books or digital monographs in the Humanities and Social Sciences, the main subject of my current research for the OAPEN project. I especially like the way she recommends in the end that we need to stop the ‘pillarization’ in the Open Access focus, as OA journals, OA books and OA data are mainly targeted as separate issues by separate initiatives, whilst they need to be combined to create a truly interconnected collaborative scholarship.

Alma Swan also maintains a weblog, Optimal Scholarship and has been interviewed before by Richard Poynder. You can find that interview here.

‘Beauty is pregnant with potentiality’ – Bracha Ettinger

 

apple_genes_spliced_by_bonkrissybon1Again, delving deeper into the rabbit hole, let’s try to entangle the concepts in the web of free knowledge definitions.

In the previous post we mainly discussed the difference between free information and free knowledge. But we were not quite finished. We were still basically stuck when we hit the Cyberpunk definition which gave information an inherent entelechy towards freedom, making it in a way into an active agent.

 

But maybe we shouldn’t interpret the cyberpunk aphorism of ‘information wants to be free’ in such a way. For as we established before, information in itself is not active. Information needs an agent. If we again look closer at the DIKW definition, we find that knowledge is the appropriate collection of information, it is thus deterministic. Information has (or can have) use and meaning, but only becomes knowledge when it is ‘made active’, when it is put to use, involving an action/actor. Again, in other words to make it clearer: information without action might have meaning and may be useful, only when it is put to use can it become knowledge. As the definition says: the intent of knowledge is to be useful, information does not have this intent, it only has the potential.

Information needs an action/actor to combine information into knowledge: to give it meaning in context.

 

Now as we look at it in this way, the cyberpunk definition of information should be free or wants to be free, can be interpreted as in order to be able to become knowledge. And this is the possibility the web offers increasingly.

 

Now this potentiality of information entails two things:

 

imagination-is-more-important-than-knowledge-         -     It entails an actor, who acts upon the information, collecting and combining it in such a manner and applying it to the appropriate context so that it can become knowledge. Since it is the actor (or actors in this respect, for of course in many occasions it is groups of people working together turning information into knowledge) who is responsible for the creation of this knowledge, it is in a way his or her interpretation, combination and contextualization of the information. This explains why people have moral rights or even claim copyright or intellectual ownership over their active creation of knowledge out of information (if they publish it that is, there is no such thing as copyright on thoughts, unfortunately one thinks sometimes…).

 

-         It also entails a movement, a dynamic, as already expressed in the Cyberpunk definition. Not a dynamic inherent in information however, but a dynamic from information towards knowledge (a force in between information and knowledge in a way). It is the potentiality itself that creates the dynamic, the need towards. As the Cyberpunk movement argues, the digital age and the coming of the Internet, which has freed information from its mostly physical and print based constraints, has enlarged this potentiality of information enormously, making the dynamic or movement seem in a way more urgent, or more logical. In this way one can say that it is the digital age that makes information want to be free.

 knowledge-eye-chart-by-choconancy1

As a final remark, what is interesting in even more recent developments is that the actor can now also be a computer: with the rise of the semantic web the computer can turn/turns information into knowledge or at least into networked information, conceptualizing and contextualizing it and thus combining it in a useful manner. Maybe this means information and communication technologies as well as social media are adding a new layer to the DIKW hierarchy: Connectionism, or connected, interlinked information. As Kevin Kelly says when speaking about social media: ‘we are connecting everything to everything’, and: ‘when connected into a swarm, small thoughts become smart.’

As promised before, I would like to dig a little deeper into the meaning and complexities of the concept of free information, referring to ts-eliotthe well known aphorism ‘information wants to be free’. No better way to start than by throwing in some good old definitions we can all find scattered on the web.

So stay tuned (or run away screaming) for Everything you always wanted to know about….free information, free knowledge, open knowledge, gratis knowledge, open science, open access, open content, copyright, copyleft and Creative Commons and most importantly, what the difference is between the lot of them.

 

First of all we need to establish the difference between information and knowledge. Often a distinction is made between data, information, knowledge and wisdom. Together they form the knowledge hierarchy. One can find a lot of sources for the origin of this model, but it seems the original distinction was born in poetry. As T.S. Eliot wrote in 1934 in the opening stanza of the choruses from his play “The Rock“:

 

Where is the Life we have lost in living?

            Where is the wisdom we have lost in knowledge?

            Where is the knowledge we have lost in information?

 

Milan Zeleny (who quotes Albert Einstein on his website with ‘information is not knowledge’, although I can find no source for this quote online nor can others) and Russell Ackoff are both said to have expanded the DIKW definition. The difference between the concepts comes down to something like this:

 

dikw_ackoff2Data… data is raw. It simply exists and has no significance beyond its existence (in and of itself). It can exist in any form, usable or not. It does not have meaning of itself. In computer parlance, a spreadsheet generally starts out by holding data.

Information… information is data that has been given meaning by way of relational connection. This “meaning” can be useful, but does not have to be. In computer parlance, a relational database makes information from the data stored within it.

Knowledge… knowledge is the appropriate collection of information, such that it’s intent is to be useful. Knowledge is a deterministic process. When someone “memorizes” information (as less-aspiring test-bound students often do), then they have amassed knowledge. This knowledge has useful meaning to them, but it does not provide for, in and of itself, an integration such as would infer further knowledge. For example, elementary school children memorize, or amass knowledge of, the “times table”. They can tell you that “2 x 2 = 4″ because they have amassed that knowledge (it being included in the times table). But when asked what is “1267 x 300″, they can not respond correctly because that entry is not in their times table. To correctly answer such a question requires a true cognitive and analytical ability that is only encompassed in the next level… understanding. In computer parlance, most of the applications we use (modeling, simulation, etc.) exercise some type of stored knowledge.” (source: http://www.systems-thinking.org/dikw/dikw.htm)

Important in this respect is that both information and knowledge are in this definition already seen as processed data. However, knowledge has an utilitarian streak to it, for, as the above definition says, its intend is to be useful. Information does not have this intent. It is just there, you can do with it what you want, make it knowledgeable in a fashion you see fit. Information already has meaning conveyed in it but lacks in a way direction, it needs an actor. The most important aspect to remember in this respect is that knowledge is information that has been put to use, adding another layer of value to the raw data (they also speak of the DIKW pyramid). This is an important difference which we will come back to later.

Now that we have established the difference between knowledge and information, let’s go back to the free-bit. What is the difference between free information and free knowledge?

When one googles free information, Wikipedia brings up ‘information wants to be free’, the slogan first coined by Stewart Brand. It has been, as Wikipedia says, given a normative spin by hacker Richard Stallman:

“I believe that all generally useful information should be free. By ‘free’ I am not referring to price, but rather to the freedom to copy the information and to adapt it to one’s own uses… When information is generally useful, redistributing it makes humanity wealthier no matter who is distributing and no matter who is receiving.”

free-by-emilie79Interestingly enough, Stallman here combines usefulness with information, into useful information. Is he talking about knowledge here? Not necessarily, information can be useful of course (as also mentioned in the definition above), only when it is put to use (again by an actor, requiring an action) does it become knowledge or knowledgeable. Information always also is potential knowledge and this is also what makes it potentially valuable (in the right hands). The Cyberpunk movement goes even further according to Wikipedia, arguing that information ánd knowledge should be free since ‘its internal force or entelechy makes it essentially incompatible with proprietary notions. Information is dynamic, ever-growing and evolving and cannot be contained within (any) ideological structure.’

I like the way Wikipedia says that with these notions ‘desire’ is put into information, it is brought to life in a way, craving freedom, needing to be liberated. But if one makes information come to life in this fashion (giving it an internal drive), doesn’t one also create a sort of living Frankenstein, putting agency in a lifeless abstract and in itself (though not potentially) useless concept? Doesn’t the Cyberpunk movement in this way destroy the distinction between information and knowledge? Or do they simply not approve of the above definition? Or do cyberpunks see themselves as the agency ‘liberating’ information, in the form of the so-called hacker?

And now what is exactly the difference with free knowledge?

I will get back to that later.

classroom-coe-college-cedar-rapids-iowa-2007-by-eric-william-carrollOn the second and final day of APE, Sebastian Mislej of the Jozef Stefan Institute in Ljubljana, talked about videolectures.net, a website streaming online video lectures that can be viewed for free. All the content on videolectures.net is scientifically approved (it has been peer reviewed) so in its entirety it forms a complete scientific repository of free top conferences’ content. The site is hosted and supported by well known academic institutions, where some of the content partners and contributors are for instance MIT Open Course Ware, the Mellon Foundation and the University of Cambridge. The site makes use of semantic web applications and additional functionalities like streaming video with synchronized slides. They also add links to other resources. As Mislej states, the website functions as a learning culture, the links that are necessary to understand the topic will and can be added. In this way videolectures.net can be seen as a kind of scientific YouTube, a portal to high quality scientific video content on the web.

As Mislej explains, 99% off all people giving the lecture are very interested in putting their video online; they really want to put their videos online. Publishers of conference proceedings are also positive, since it is good promotion for their content. In the future videolectures.net will improve the web portal (redesign, improve navigation, automatic knowledge object linking), it will be extracting semantic information (speech indexing, text mining, video mining, automatic ontology construction, user tracking and profiling) and it will focus on solving some important issues regarding intellectual property on content, and problems regarding video formats, mobile platforms and accessibility.

 

Hans Pfeiffenberger, from the Helmholtz Association, gave a lecture on publishing data, focusing specifically on “Earth System Science Data”, a data publishing journal. As Pfeiffenberger stated, in polar research the incentive is to preserve the data and their meaning for centuries in the future. This data preservation is best done by publishing them. The question is how publishing can help comply with the requirement of quality assurance for research data. As Pfeiffenberger remarks there are of course different kinds of data and this means we will also need different methods to take care of them. According to him review guidelines for data should focus on originality, significance and data quality. The peer reviewers will have to look into the data itself and look at its quality and the connection to the article. Articles can then be seen as interpretations of the data.

Pfeiffenberger states that it is also important to have incentives for researchers to publish data. We need to have rewards for data publication, it needs to be citable and it needs to be part of the impact factor. And, as stated before, it needs to be quality assured data. Preservation and (open) access to data are also critical issues. The aim should be to reuse and reproduce the data. The data will be provided by the scientists but who will provide the infrastructure? And what about licensing and long-term preservation? Pfeiffenberger concludes that these are issues that we will need to consider in the future.

 data-visualisation

 

During the afternoon panel on Open Books, three panellists from the publishing world, Eelco Ferwerda from Amsterdam University Press, Frances Pinter from Bloomsbury Academic and Barbara Kalumenos from STM publishers, where asked to discuss two questions.

The first question focused on the Academic book in the digital age: What it is now and in 5 years – what will users expect?

Eelco Ferwerda first said a few introductory words about the OAPEN project and afterwards replied to the first question by stating that users will in the future expect to find, access and search within books online. He stated that it was Google that changed the whole idea of books for us. Because of Google, books have now become an integral part of the Internet and in this way have gained a new future. Now where is the book heading? Ferwerda recalls Robert Darnton’s pyramid model, in which the book is seen as a pyramid consisting of different layers: the book itself and comments, updates, e-learning, primary sources and datasets in other connected layers. Ferwerda gave the example of the Driver II project, focusing on enhanced publications, where research data, extra materials and post-publication data will be added to the primary publication. The moment scholars recognize the value of these types of additions, they will become the norm.

 

Frances Pinter from Bloomsbury Academic went on to compare the old publishing model with a new future model, The old model is based on printed content, on publishers as gatekeepers who verify and brand the work, and on publishers as bankers. In this model costs can be a barrier to dissemination together with a limited range of formats. In the new model however, she states that there can be multiple versions and formats of content, on different locations and channels. In this new model here will be competition with free versions and it will be uncertain who will pay for the publishing process.

One existing online business model revolves around publishers charging for the premium content and putting free content around the premium content in order to generate the traffic. Pinter asks what would happen if you inverted that model? What if you would offer the free premium content online (with a CC license) and then would charge for the activities around it, like the print edition and a variety of other services and activities.

According to Pinter, this is what academic authors will want because they do not need publishers anymore. Publishers need to find some new models that sustain the user needs whilst still upholding the quality added value system and rewarding structure.

 books

Barbara Kalumenos from STM Publishers, states that STM has focused mostly on journals. The problem, as she sees it, with future forecasts when it comes to digital books has to do with the fact that there is still way too little hard factual material available on digitized books. She also states that the term books is way to general, we need to differentiate between textbooks, monographs etc. and then focus on these categories specific. What Kalumenos especially regrets is the lack of numbers on the amount of books that are already digitized. As she states, we need to do some basic empirical research on what the status quo is at the moment. Only then can we speculate what will happen in five years. And it also depends heavily on the discipline. The users however are in the centre of this development. What does the user want? The user wants its content easy, directly and with very few clicks, as Kalemunos remarks, you loose users after more than three clicks as user interaction research has shown. This kind of research can also show how the users search and interact with the material. Kalemunos doubts that monographs in HSS will only be used in digital environments, which means that web 2.0 tools will be developed for books too, but maybe not for the full catalogue of books. So she concludes that we should look at user behavior and what they expect of electronic books in the digital age.

 

During the discussion that followed after the first question remarks where made about the book format, the development to more article use and production in HSS and the possibility of the emergence of a middle category in between the article and the book.

Next to that the funding possibilities of monographs were discussed. As Eelco Ferwerda remarked, the book is different in this respect: the economic model of distributing books is becoming impossible. Academic publishers suffer selling proper monographs. An Open Access motive for books might be to come up with a new business model to keep the book alive as a research format. Frances Pinter made the point that with books, whether they are expensive or not, we need to look at, what are the actual costs of reading a book in print and online.

monograph-cover-made-by-six1 The second question focused on Open Access publishing models for books:  How will they work, change scholarly communication and change the market?

Eelco Ferwerda starts by talking about the IMISCOE series. The basic Open Access model for this kind of series, he remarks, is a hybrid model, where on both focuses on online and print. The basic online edition is free and the printed edition is sold, where the author retains the copyright. OAPEN wants to expand this model; they want to develop a common approach or model to fund the Open Access edition, in collaboration with research councils. In their view a network is needed and funders need to see this as a service. Funding model will revolve around a fee for direct costs and revenues from additional services.

 

As Frances Pinter remarks, the situation for Bloomsbury Academic is rather different. Bloomsbury Academic is a commercial company so needs to cover its costs completely. Printing editions appear simultaneously with the online edition. They are offering traditional publishers service along with free online access and added value services.

As Pinter mentions, this is also a start up: additional added value services to sell around the content still need to be developed on top of that. What about licensing contracts? Authors do no longer need the publishers, so we will no longer have exclusive licenses between author and readers in both ways. The big question is whether people will really pay for the added services. According to Pinter that is a risk for the publisher to take. But who is going to fund the added value services that the publisher provides? Pinter asks what would happen if we would not see publishers as people who take the risk? What if they become more like the service arm for scholarly work? Pinter imagines an independent party that tenders between different publishers for a service contract to put in that added functionality. It would be a more streamlined system in this way according to her.

Barbara Kalumenos however remarks that putting a layer between publishers as Pinter says, might not work well in a system made up of all kinds of different country policies and it will probably only lead to extra bureaucracy. Kalumenos thinks more in the lines of Open Access as part of the research costs. Open Access should be paid as part of this process. Afterwards remarks were made concerning the degrading of the publisher in these kinds of new models from the value adding / risk taker to a service offering party. Important in this respect is the second process of reviewing for the commercial sustainability of a monograph. This also has an added value for it helps to bring out the better publications. What will happen with this when publishers will be service providers: what about the needed commercial filter to see if this book is fit to be published? Finally Eelco Ferwerda remarks that publishers will always be in competition for content, based on their reputation.

Open Reflections is created by Janneke Adema

Open Reflections on Twitter

del.icio.us - bookmarks

Follow

Get every new post delivered to your Inbox.

Join 29 other followers