The Samizdat Protocol
David Christie
In the early 1990's Tim Berners-Lee, a researcher at CERN , published specifications and reference code for a set of computer protocols for client/server communication over IP (internet protocol) networks. Berners-Lee's 'web' protocols (HTTP, HTML, and URI) revolutionized the use of the internet, a popularly unknown, government-sponsored computer network shared by universities and military-industrial R&D organizations. His inventions gave rise to web sites, web browsing, and web-based applications:
a phenomenon its inventor named the World Wide Web.
Because Berners-Lee's protocols were published as 'open source', other contributors implemented software that employed them. In 1992 two graduate students at NCSA , Mark Andreessen and Eric Bina, wrote a point-and-click web browser called Mosaic that could display images embedded in HTML text. Within six months it had captured much of the user base of the internet community. The web protocols had found their 'killer application'.
The history of the web after these beginnings is familiar to us all, because it rapidly became an element of everyday life, a venue of popular culture, and very big business. Its explosive growth was all the more remarkable for being almost completely unanticipated. It is less commonly remembered that its origins lie in a set of network protocols, and a few thousand lines of C code -- neither of which inventions was 'rocket science' -- constructed on the initiative of obscure, farsighted people with time on their hands.
The Samizdat Protocol project is an attempt to bring about another such tipping point in the information revolution. Fifteen years on, the World Wide Web is outgrowing its egalitarian beginnings. Despite the welcome emergence of blogging and new forms of grass-roots media, the trend is toward huge dominant web sites. Consolidation is rampant in every application domain. Google seems well-placed to soon displace Microsoft as the ruling monopoly, much as Microsoft displaced IBM around 1990. But this time the stakes for society are higher. With the increasing importance of the network in our new media, politics, economy, culture and our democracy itself, massive consolidation is disturbing. It has been hypothesized that the architecture of the web defined by HTTP is part of the reason it is so easy for web services to grow so big, until they squeeze out all competitors, and so hard for individuals to retain control over their lives and data online.
Ideally, we would now be at a point in time where the network would begin to empower individuals as never before. The web shows signs of this – but also signs that the revolution may be thwarted, by the usual forces that co-opt all new technologies once they figure out how they can be monopolized. To protect democracy’s beachhead on the internet, it’s time to think about changing the dominant paradigm once again. It’s time for a new protocol that will vastly expand what the internet does (in practice, for ordinary people), much as Berners-Lee’s web protocols did so successfully before.
Around 1985 a Yale computer science graduate student named David Gelernter published a protocol for network coordination which he named 'Linda'. Over the next decade, Gelernter's invention made a modest stir in the computer science community, found its way into the literature, and then faded from view. The most permanent artifact created was a luminous book written by Gelernter in 1991 entitled Mirror Worlds. In a style reminiscent of Bucky Fuller at his most lucid, he described in layman-accessible prose a transformed society linked by networked computers that modeled the whole society as a growing set of applications. Mirror Worlds remains oft-cited as "the book that anticipated the web". In fact, it anticipated something better than the World Wide Web, which we still don't have yet.
Gelernter imagined that the computers of the new networked society would use something like his 'Linda' protocol for coordination. Like Berners-Lee's HTTP protocol which followed it, 'Linda' had four main primitive operations out of which all communications were constructed. The similarity ended there. The HTTP protocol gives rise to hub-and-spoke, "client/server" communication between web clients (browsers) and web servers (sites where data and applications reside). Gelernter's protocol, on the other hand, put all the computers on the network, user machines and server machines alike, on an equal footing as "peers". It encouraged distributed processing of a vast, shared database and did not centralize control over the data being processed at specialized web sites, as HTTP tends to do.
It now appears that Gelernter’s protocol did not prosper (as Berners-Lee’s later protocol did) because the former was ahead of its time, and the latter was more completely specified. Linda’s elegant simplicity of architecture swept certain thorny unsolved problems under the rug. HTTP is carefully contrived to make all its limitations crystal clear, so it is easy to implement.
Both protocols lacked any means of specifying the semantics (the meaning) of data or the pieces of code that operated on it. Berners-Lee would come to see this as a shortcoming. He has spent much of his time in the last decade working with others to develop the “semantic web”, initiatives designed to make the meaning of web data universally accessible to programs. One fruit of this community has been a semantic database format called Resource Description Framework (RDF). Elegant and well-standardized , it has yet to find its ‘killer application’. Meanwhile, web sites become larger, and more proprietary, every day – as if they did not really wish to ‘coordinate’.
The Samizdat Protocol is our synergetic solution to this problem. It combines a Linda-inspired set of operations with an RDF database, and borrows other elements from other traditions. It uses the web as its underlying infrastructure and builds upon it. We call its Linda-inspired part Resource Coordination Protocol (RCP), and the combination RCP/RDF. Like HTTP/HTML before it, it consists of a set of operations applied to a data structure.
RCP differs from Linda in adopting RDF’s semantic model. Unlike either Linda or RDF, it incorporates a transactional actor model, which functions like a stored procedure mechanism for attaching code to data in a database. RCP/RDF data is not passive; it has behaviors. Unlike Linda, which did not specify what its programs did, RCP/RDF programs have well-described transaction semantics encoded in the RDF database ‘statements’ themselves.
RCP/RDF is not a computer language; like Linda it is language-independent. RCP/RDF does not replace HTTP/HTML; it uses it as a transport mechanism.
We have written a specification and reference implementation of RCP/RDF, and are preparing it for release under the Free Software Foundation’s AGPL open-source license. The reference implementation combines existing open source components with our protocol implementation and its actor model. The result is the Samizdat Model Agent, a system service that can run on a client or server machine. It is portable across all major operating systems . Each Model Agent maintains an RDF database and serves RCP/RDF requests . It is an application development platform that runs actors.
Installed on your client machine, your Model Agent acts as an invisible assistant for your web browsing and email transactions, maintaining your personal view of the web and relationships with correspondents. Installed at a server, it can implement services for many clients.
The advantage of installing a Model Agent at each machine is that the same program can run on many Model Agents which coordinate via RCP/RDF. This is what makes RCP/RDF different from HTTP/HTML and the web as it exists today. It is not necessary to own the machines on which it runs to program an actor. Any reputable application can run anywhere users accept it for use, and perform local database transactions. Private user data stays on the user’s own machine. Shared data is not dependent on a single service provider. Applications are not proprietary, and they are always open source.
The reference implementation has grown to ~100,000 lines of original C++ code. We plan to release it soon and document its use for the open source developer community. We are working on a ‘killer application’ for the personal Model Agent, to drive adoption by motivating users to download and install the agent.
The AGPL is a free license subject to ‘copyleft’, i.e. derived works must also be shared with the community. This approach retains many opportunities for commercial applications while it protects the community from monopolistic competitors . Nothing less open would facilitate adoption of the protocol, the greatest enabler of commercial success.
Because Berners-Lee's protocols were published as 'open source', other contributors implemented software that employed them. In 1992 two graduate students at NCSA , Mark Andreessen and Eric Bina, wrote a point-and-click web browser called Mosaic that could display images embedded in HTML text. Within six months it had captured much of the user base of the internet community. The web protocols had found their 'killer application'.
The history of the web after these beginnings is familiar to us all, because it rapidly became an element of everyday life, a venue of popular culture, and very big business. Its explosive growth was all the more remarkable for being almost completely unanticipated. It is less commonly remembered that its origins lie in a set of network protocols, and a few thousand lines of C code -- neither of which inventions was 'rocket science' -- constructed on the initiative of obscure, farsighted people with time on their hands.
The Samizdat Protocol project is an attempt to bring about another such tipping point in the information revolution. Fifteen years on, the World Wide Web is outgrowing its egalitarian beginnings. Despite the welcome emergence of blogging and new forms of grass-roots media, the trend is toward huge dominant web sites. Consolidation is rampant in every application domain. Google seems well-placed to soon displace Microsoft as the ruling monopoly, much as Microsoft displaced IBM around 1990. But this time the stakes for society are higher. With the increasing importance of the network in our new media, politics, economy, culture and our democracy itself, massive consolidation is disturbing. It has been hypothesized that the architecture of the web defined by HTTP is part of the reason it is so easy for web services to grow so big, until they squeeze out all competitors, and so hard for individuals to retain control over their lives and data online.
Ideally, we would now be at a point in time where the network would begin to empower individuals as never before. The web shows signs of this – but also signs that the revolution may be thwarted, by the usual forces that co-opt all new technologies once they figure out how they can be monopolized. To protect democracy’s beachhead on the internet, it’s time to think about changing the dominant paradigm once again. It’s time for a new protocol that will vastly expand what the internet does (in practice, for ordinary people), much as Berners-Lee’s web protocols did so successfully before.
Around 1985 a Yale computer science graduate student named David Gelernter published a protocol for network coordination which he named 'Linda'. Over the next decade, Gelernter's invention made a modest stir in the computer science community, found its way into the literature, and then faded from view. The most permanent artifact created was a luminous book written by Gelernter in 1991 entitled Mirror Worlds. In a style reminiscent of Bucky Fuller at his most lucid, he described in layman-accessible prose a transformed society linked by networked computers that modeled the whole society as a growing set of applications. Mirror Worlds remains oft-cited as "the book that anticipated the web". In fact, it anticipated something better than the World Wide Web, which we still don't have yet.
Gelernter imagined that the computers of the new networked society would use something like his 'Linda' protocol for coordination. Like Berners-Lee's HTTP protocol which followed it, 'Linda' had four main primitive operations out of which all communications were constructed. The similarity ended there. The HTTP protocol gives rise to hub-and-spoke, "client/server" communication between web clients (browsers) and web servers (sites where data and applications reside). Gelernter's protocol, on the other hand, put all the computers on the network, user machines and server machines alike, on an equal footing as "peers". It encouraged distributed processing of a vast, shared database and did not centralize control over the data being processed at specialized web sites, as HTTP tends to do.
It now appears that Gelernter’s protocol did not prosper (as Berners-Lee’s later protocol did) because the former was ahead of its time, and the latter was more completely specified. Linda’s elegant simplicity of architecture swept certain thorny unsolved problems under the rug. HTTP is carefully contrived to make all its limitations crystal clear, so it is easy to implement.
Both protocols lacked any means of specifying the semantics (the meaning) of data or the pieces of code that operated on it. Berners-Lee would come to see this as a shortcoming. He has spent much of his time in the last decade working with others to develop the “semantic web”, initiatives designed to make the meaning of web data universally accessible to programs. One fruit of this community has been a semantic database format called Resource Description Framework (RDF). Elegant and well-standardized , it has yet to find its ‘killer application’. Meanwhile, web sites become larger, and more proprietary, every day – as if they did not really wish to ‘coordinate’.
The Samizdat Protocol is our synergetic solution to this problem. It combines a Linda-inspired set of operations with an RDF database, and borrows other elements from other traditions. It uses the web as its underlying infrastructure and builds upon it. We call its Linda-inspired part Resource Coordination Protocol (RCP), and the combination RCP/RDF. Like HTTP/HTML before it, it consists of a set of operations applied to a data structure.
RCP differs from Linda in adopting RDF’s semantic model. Unlike either Linda or RDF, it incorporates a transactional actor model, which functions like a stored procedure mechanism for attaching code to data in a database. RCP/RDF data is not passive; it has behaviors. Unlike Linda, which did not specify what its programs did, RCP/RDF programs have well-described transaction semantics encoded in the RDF database ‘statements’ themselves.
RCP/RDF is not a computer language; like Linda it is language-independent. RCP/RDF does not replace HTTP/HTML; it uses it as a transport mechanism.
We have written a specification and reference implementation of RCP/RDF, and are preparing it for release under the Free Software Foundation’s AGPL open-source license. The reference implementation combines existing open source components with our protocol implementation and its actor model. The result is the Samizdat Model Agent, a system service that can run on a client or server machine. It is portable across all major operating systems . Each Model Agent maintains an RDF database and serves RCP/RDF requests . It is an application development platform that runs actors.
Installed on your client machine, your Model Agent acts as an invisible assistant for your web browsing and email transactions, maintaining your personal view of the web and relationships with correspondents. Installed at a server, it can implement services for many clients.
The advantage of installing a Model Agent at each machine is that the same program can run on many Model Agents which coordinate via RCP/RDF. This is what makes RCP/RDF different from HTTP/HTML and the web as it exists today. It is not necessary to own the machines on which it runs to program an actor. Any reputable application can run anywhere users accept it for use, and perform local database transactions. Private user data stays on the user’s own machine. Shared data is not dependent on a single service provider. Applications are not proprietary, and they are always open source.
The reference implementation has grown to ~100,000 lines of original C++ code. We plan to release it soon and document its use for the open source developer community. We are working on a ‘killer application’ for the personal Model Agent, to drive adoption by motivating users to download and install the agent.
The AGPL is a free license subject to ‘copyleft’, i.e. derived works must also be shared with the community. This approach retains many opportunities for commercial applications while it protects the community from monopolistic competitors . Nothing less open would facilitate adoption of the protocol, the greatest enabler of commercial success.
Login to add/view comments
