Monday, June 27, 2011

Libre Web

At the risk of sounding like RMS (ie Richard Stallman who believe in complete software freedom) I am becoming concerned that the Web is becoming "free as in beer" rather than "free as in freedom" (the word Libre is often used to capture the second meaning).

We pour our digital lives in Google apps, Facebook, Twitter, etc,  but where is our data, what is it being used for and what happens when they stop providing the service ?  I went to add someone's birthday into my Android phone which is synced to GMail and I thought how birth dates are often used as identification confirmation questions over the phone and about recent hacker activities where user information has been dumped to globally accessible site.  I decided not to put the birth date in but realised that many of this person's family and friends may already have done so (and my own birthday is probably on various servers as well despite anything I might do).  I don't use Facebook but all my emails and contacts and calendar entries are on Google servers so they probably have a lot of information on me should they choose to use it.   I am becoming more and more dependent on the services too but what would happen if there was a huge hack or legal issue and the servers were shut down.  I have no protection or control but at least it is free (as in beer).

So what is the answer ?  Well I would like to see the web services I use to have Open Source code, be able to run on servers I nominate and store data where I want it and encrypted in a way that keeps my data safe.  Like Linux, this ensures I have control over applications and data and the code is there for anyone to fix, extend and modify so the overall ecosystem grows.  We can also see how the data is being used and be vigilant for backdoors etc.

For this to work I think we need to pay for use of the hardware that is providing the service.  I am thinking of something like Amazon but dedicated to running Libre Web services.  You would pay a fair price for what you use but it could be kept quite low.  It should even be possible for user's PC to do computation on behalf of the cloud and get paid for it.  You could set up your own dedicated Web Services or use shared ones which had well documented version numbers, references to source code and administrators who were trusted and accountable.

Another important component would be having the data kept separate from the Web Service.  We would need a standard Data Service API for the Web Service to use such that the user could configure which Data Server (or even multiple Data servers) stored the user's data which of course would be encrypted.  The user could then even arrange to backup the Data Server data to their own PC (an interesting reversal).  The user can thus protect themselves from data loss and data misuse.

Another component that could do with improvement is digital payment (I have ideas about this too - so there is probably another blog post coming).  If it was easy, safe and accessible to everyone (eg those without credit cards) to pay for computer based services then people would do it a lot more readily.

With these components we would start seeing free (as in freedom) Web Services and Data Services pop up, with associated source code,  to be used by people who value the benefit of services that run on the web but also their freedom and don't mind paying for the hardware that runs it.

Software Development Documentation 2.0

How should we do Software Development Documentation ?  How do we capture and document requirements, Use Cases, Software Architecture, High Level Design, Low Level Design, code and associated comments, fault reports, test plans, test cases and results.  More importantly how do we keep them consistent, up to date and properly reviewed.  How do we know if a requirement change has filtered down to implementation and testing.  When reviewing documents how do we know what changed since we last reviewed it or tell whether the comments we made last time are reflected in the document.

In my experience we struggle to get all of this right.  We know documentation is important but we also know it can be very expensive to keep it accurate.  A seemingly small software change can require updates to many documents.  When time is tight (which it usually is) documentation can be neglected.  Also we tend to combine documents so there are less documents to maintain.  Software Architecture and even High Level design can creep into Requirements related documents.  High level design and low level design can also end up in the same document.  Low level design can often look like code.

Lots of people have tried to solve these problems in different ways.  I remember CASE (Computer Aided Software Engineering) was going to make the world a better place and maybe it has but I haven't seen it.   Perhaps because the CASE tools are so expensive and its hard to do it right.  Agile methodologies seem to reduce the emphasis on documentation and update it iteratively which sounds like a good thing and may be the most realistic way to go about it but must be hard for upper management and the bean counters to see what a project will cost.

So my idea is to get rid of the static linear MS Word documenta which only provides one way to look at our information and replace it with a dynamic non-linear web based repository.   This is not a new idea as far as viewing information goes since we see it all the time when we access information via a browser.  We select what we want to see (eg via a search term in Wikipedia) and the information is fetched, transformed and presented.

So take all the documentation, break it up into discreet chunks, define what the chunks are and how they link to each other and then serve up into whatever view is useful to the user.  Even code could be integrated into this view of documentation.  We would need to be able to export to PDF and probably MS Word to keep those people happy who still like to have linear documents.

Once accessible in this manner very powerful use cases become possible.  For example imagine we have a number of chunks tagged as Requirements and they are linked to other chunks tagged for Design and Test Cases.  Also imagine that the Requirements and Design chunk have attributes indicating version number and review dates and the Test Case chunks have attributes indicating number of outstanding defects at various levels.  Now imagine a project is associated with a number of Requirements and how easy it would be to run a report of the status of the project with respect to completion and review of all required documentation artifacts as well as outstanding defects.  If a Requirement is changed it would be straightforward to determine what else needed to be changed and for reviewers to be assigned and for them to only need to review the particular requirements that have changed.

With this sort of approach we can be more granular in the documentation so we could have Software Architecture, High Level Design and Low Level Design chunks that link together and are straightforward to keep consistent enabling us to produce a High Level Design document if so desired.

Literate programming becomes more viable if we can extract the documentation out of the code and view it in conjunction with the rest of the documentation.  We just need to have the appropriate links from Design chunks to the code.

So what's the catch ?  Integration with existing tools is key.  One needs to be able to produce a good looking PDF or Word document for any required view of the documentation.  This would require generating all the appropriate sections headings at the correct level and embedding pictures at appropriate places.  Acrobat and MS may actually provide tools to help with this (ie construct documents using tools rather than via a user interface) but another avenue could be to generate OpenOffice documents (assuming its XML format is documented well enough) and then use its export tools to create PDF and MS Word.

One also needs to be able to import data from other tools into the documentation system and then launch the original tool when the data needs to be edited again.  So for example Visio could be still used to create diagrams.   Over time browser based replacements for the external tools would appear and it would become a seamless experience.  One could even imagine an integration with a tool like Eclipse to have a single application for editing code and documentation (eg via an inline Browser).

Sunday, June 26, 2011

Git Wave

Git Wave is something I actually started working on whilst I had some time off between jobs but at the time of writing has not progressed very far.  I needed a project to learm Scala with and this was the idea I selected to put into the real world.  I may continue to work on it but time will tell how much energy I have for it.

The idea of Git Wave is to combine the shared conversation idea from Google Wave with the decentralised , distributed, versioning functionality of Git.  So you can have a shared conversation that is not centrally stored on someone else's server.  Effectively everyone in the conversation has there own copy and other people's changes get merged in with your own changes.  So the conversation is like a source file which you collaborate on.

I'll admit I was a big fan of Wave and was disappointed when Google stopped working on it.  The concept of the shared conversation is a powerful one even if Google didn't completely nail it.   In my opinion they gave up too early and they needed to provide good email integration (so you could at least partially include non Wave users), platform notification integration (so you could tell when a Wave was modified or created),  notification prioritisation and control (to avoid notification overload)  and third party Wave servers (so that companies could control their own data).

The Git Wave data would end up in public accessible Git repositories so that others could access the data when your computer is down (though a peer to peer mechanism would also be possible) so encryption, identity management and notification would be an important part of the implementation.  So you make your changes to the conversation in your own Git repository and push them to a public Git repository (one that you are authorised to push to).  The data would be encrypted with a generated conversation key and then the conversation key would be encrypted with each recipient's public encryption key (which of course you need to have locally stored or be able to retrieve).  Each recipient would need to be notified in some way that a new conversation involving them is available in the public repository.  Each recipient would "pull" the changes into there own repository make their own changes, push to their own public repository and notify the other recipients of the new repository that contains their updates.

At first glance it seems a bit over engineered since there is effectively two Git repositories for each recipient (a private one and a public one) but the advantage is that each recipient is in control of their own copy of the conversation and can choose what they want to merge and the public repositories are necessary to ensure everyone can access the conversation even when some of the participants have their computers turned off.  Also it is not too different from normal mail where an email is copied to your ISP mail server and then the recipients' mail server(s) before getting to their computers.

Git Wave would not scale well to large numbers of active recipients but each recipient does not really have to monitor all other recipients' public repositories if they are willing to trust other recipients to do the merging for them since all changes will eventually appear in all repositories.   For example the creator of the conversation could monitor each of the other recipients changes and merge them into his public repository so that all the other recipients would just need to monitor the creator's public repository.

In a peer to peer approach the user's would exchange details (like public key and notification method) but a more scaleable approach would require identity servers.  This could be as simple as a REST API that responded to particular http queries eg http://foo.com/username/publickey - it would be simple to allow username@foo.com to be a synonym for the REST query prefix to enable a more familiar style of user id.

Notification could be via email, twitter or (my preference) a new generic notification service (which could logically be integrated with the Identity service).  Again it could use a simple REST API and allow users to register how they want to be notified.  I would also like to see a generic notification agent that runs on the client platform that communicates with the server and receives the notifications.    The combination of Notification service and Notification agent would be useful in much larger contexts to consolidate and prioritise notifications from the many and varied sources that they currently come from.

There would be a Git repository per conversation since with GIT you have to clone the whole repository so multiple conversations in a repository would not work very well.  A conversation would be made up of multiple files with XHTML a likely candidate for the data format of the viewable part of the conversation.  Other files containing meta data like the encrypted session key would also be required.

The gadgets were quite good in Google Wave - the data for the gadget was usually stored as JSON in the Wave with code for the gadget being separate so something similar should be possible in Git Wave.

Google Wave also had Robots.  They would be doable but would work like automated humans.  There would need to be a remote robot service that would instantiate a robot with a given public key and a URL to code to run.  Another possibility for robot like behaviour is to have scripting and/or plugin behaviour, that was defined in the conversation or in the client, that executed before a change was submitted or merged.  This scripting could also reject other people's changes (ie refuse to merge them in) if they didn't conform to certain rules.  The thing I liked most about the robot idea in Google Wave was the potential for state change driven work flow to automate processes eg a leave application process.

I envision multiple clients.  Nowadays you have to have a Web Client - that would mean not keeping your conversation data on your own machine but many people seem happy with that.  Native clients would be good as well and I started using Eclipse RCP as a cross platform basis for the client which can even be turned in a web app using Eclipse RAP.

The big caveat to Git Wave is getting the merging done properly.  Merging multiple changes from multiple people could get messy.  I imagined that every recipient would have their own branch and that each user could choose to merge others changes in or keep them separate and view them separately.  Mostly you would want to just do an automated merge and hope for the best but I could see the use of manual merging with merge tools (eg showing colour coded differences) being a possibility.   Google Wave had its whole Operational Transformation functionality which probably avoids some of the messier merge scenarios.

Thursday, June 23, 2011

Ideas Man

In this blog I will describe ideas that come to me from time to time.  A lot of them are Big ideas in that they would need a lot of work to get them to happen.  Some may be unrealistic or impractical or just destined for failure for one reason or another but who really knows.  I don't even know who will end up reading these ideas but since they are not doing much good rattling around in my head I will just put them out there and see what happens.