Tuesday, September 25, 2007

Web 2.0 - Must be an upgrade…

What is this version number jazz all about? Is the internet getting an upgrade? Should I install a patch so that I can run the latest version of internet? I have heard it all and most of it wrongly from nerds.

All good things require innovation to stay current and adapt to user demands. When revenue and market competition become factors innovation and development become faster, more prominent, and necessary. The Web X.0 version number label simply refers to a highly generalized method in which innovation forces a change in how the internet is used to generate revenue.

Web 1.0

The originally public available internet offered innovation through convenience. This was the age of the electronic storefront. You could buy everything online from groceries to cat food to ebola. It was a concept that sounded great and worked well in its infancy, because there are certain time consuming aspects of commerce people would rather buy from home. In this regard certain Web 1.0 outlets still exist and will likely always exist.

The problem is that there are certain aspects of a market economy that people prefer to spend time away from home investigating. It is next to impossible to convince people it is necessary to adopt pets, cure disease, shop for food, and solve all entertainment needs from a computer miles away from reality. People are not going to suddenly become xenophobic and fear reality merely to meet the demands of an emerging economic shift that could hardly qualify its existence beyond that of a mere trend. To make things worse is that many corporate plans ignored market research in favor a faster entry into a market with artificial market share. The result is known as the internet balloon, the dot bomb era, or the complete loss of reason. Web 1.0 can be summarized as the retail web.

Web 2.0

The purpose of an electronic medium is not solely to provide a retail marketplace if a marketplace in the real world has more to offer. So people had to rethink how to make money. The money will always be there to extract, but now if its painfully obvious that people simply won't give it to you merely for a website and shopping cart software unless the product is compelling. Innovative people, many of whom did not even originally intend to make money, realized that the internet was a great way to serve information.

Many new information services sprang up such as social networks, information sharing tools, free online games, and so forth. The idea quickly become obvious that capturing the time and attention of a computer user is a market worth money. To capture this new market many new server-side languages were created and many new businesses sprang to life who did little more than provide a repository for serving information. In summary Web 2.0 is providing information service where revenue is generated from the marketing of a user's experience. This is the present.

Web 3.0

The next generation of internet services will be called the semantic web. The semantic web will build on the concepts paved by Web 2.0 by increasing data services through a more complete understanding and processing of linguistics and human communication. The immediately conceived technological method of performing lingustic processing is through the use of ontology languages and an emphasis on proper design structure.

It is not immediately known how Web 3.0 will translate into revenue generating services, but the solutions are not hard to imagine. Consider search engines that are capable answering questions or deliver desired results instead of merely keyword searches. Consider how much more accurate advertisement placement techniques can be advanced. Consider how retail goods can be suggested to the correct people at the correct times for targeted and accurate selling. These ideas are merely the tip of an uniformed iceberg.

Web 4.0

Internationalization is the ultimate goal of Sir Tim Berners Lee vision of the internet. Internationalization will build entirely on the foundations provided by the semantic web. Most immediately internationalization refers to the translation of linguistic phrases with accuracy to the intent of the phrase's context opposed to merely a translation of the vocabulary. This will build to allow the possiblity of inter-industry communication. Consider the possibilities for increased revenue if a translation scheme allows an electrical engineer to communicate directly with jargon to a cardiologist for the proper planning of a medically advanced pacemaker device. Now consider the possibilities of inter-industry communication across different language classes and the marketing capabilities following such.

Web 5.0

Could this be the final frontier? Who knows. It is too early to determine what will exist so far into the future. We will have to wait and see where where marketing and creative revenue generation take us when combined with previously described technological concepts. Ultimately, it is the ability to make money that will determine the migration path of future development.

Monday, September 24, 2007

Data - What is it and why should you care?

Data is information, any information. Data is all that is capable of being dreamed, imagined, or communicated. For the scope of this blog data will be confined to solely all that which is capable of being communicated electronically.

All data, as previously defined, falls into specifically one of five categories: enumerated data, human consumable content, structure code, presentation, and binary large objects (referred to as BLOB). This demands some clarification. Enumerated data is that which is manually filled into database tables. Human consumable content is all text created specifically to be read by humans. Structure code is that which defines the organization of other data in any means internal or external to the structure instance. Presentation refers to the code necessary to transform any data for any purpose other than structural. BLOB, or binary large object, refers to collections or instances of data written in machine language.

BLOBs may be wrapped in structure data, but any structure data that is created specifically to wrap a blog is still considered a functional component of the BLOB it is wrapping. An example of this is an EPS file that contains binary information. This EPS file can contain instructions written in ASCII for reproducing font information but still likely to be largely a binary based graphic, and as such it would be a BLOB despite having any amount of ASCII instructions to accompany the binary information.

It is important that the five data categories always be well separated. A web designer is a low level information architect, because they are constantly creating data. It is important for the proper operation of a semantic web that the data be well-formed so that it can be used properly.

The early internet established no conventions strictly for presentations. The result is that designers were left with horribly bland documents or were forced to use unconventional methods to create presentation exaggerating the document structure. This means a person reading the code could no longer determine the relationship, meaning, or layout of any of the document's content. The tags used in HTML infer meaning, called meta-data, about the content they contain. For instance a <p> block means a paragraph, a <h1> means a level one heading, and a <ul> refers to any unordered list instead of an easy method of invoking bullet.

Unfortunately, many designers mixed the human consumable content, structure, and presentation of the documents they wrote. To this day many designers still do. To solve this problem the W3C recommended a presentation language called CSS and gradually removed presentation components from HTML.

The capabilities of a well formed document include the ability to separate the human consumable content from the rest of the document for advanced querying, the ability to see what a document looks like under different conditions, the ability to apply various presentation modules simultaneously, the ability to compare or layer various document structures.

KAON - Ontologies for the rest of us

While reading an IBM white-paper I discovered where somebody was performing semantic research and recommended an ontology system other than W3C's OWL language. This language is KAON, and it is completely open source.

What makes KAON stronger than OWL is not so much the language itself, but the tool suite included for authoring and validation. KAON is built as an application platform for establishing data relationships using a graphical model engine with necessary tools for ontology inclusion, validation, instance querying, and a clipboard. The documentation is rich and detailed. The freedom to make corrections to the code is available is always there. Just like OWL the KAON language is built directly on XML/RDFS so the code is easy to read and the logic if familiar. The real strength to the KAON language is logic evolution and its ability to reverse changes to the hierarchy in an evolutionary manner.

The KAON software comes in three separate flavors that together comprise the KAON Tool Suite: KAON, KAON Extensions, TextToOnto. KAON is the main ontology building engine that includes KAON Workbench, KAON Portal, and the core ontology engine. KAON Extensions is a collection of optional enhancements not included with the code engine. TextToOnto is a KAON based tool for ontology engineering directly in the code.

I recommend reading their documentation and forming your own opinions. Also read up on their client/server scheme to determine if this could be a business solution for mining data relationships.

Saturday, September 22, 2007

Browsers Suck - Blame Idiot Desgners

All modern browsers seek to be standards compliant… honest. The problem is that several of them are dragging an iron-chain history of standards ignorance and proprietary initiatives. That is a really colorful and bloated way of saying backwards compliance to methods that are known to fail.

The cause of the problem is market-share. You cannot simply abandon all previously applied methods, no matter how much of a violation, without inciting a riot. This is partly the fault of the browser for maintaining obsolete methods but it is far more the fault of the designers still reliant upon those obsolete methods. Dated and obsolete designers are never going away until they loose the ability to produce according to such methods.

The solution is easy. The browser designers need to reduce their stress by making a compromised stand. Modern browsers should allow anything and everything to fly if the document contains either no mime type or a mime type of text/html. Simply ignore the doctype and proceed with everything in quirks more and transitional.

For the designers who are attempting to use well structured code and comply to the standards the browsers should only process code that is well-formed according to the syntax of XML and definitions of the stated doctype. Browsers should operate in that manner if the documents are using the mime type of application/xhtml+xml. This means the browsers should have two seperate processing engines that are used in regards to the document's mime type.

An added bonus to this method of development is that HTML 5 can be completely ignored in favor of HTML 2.0, which will allow for better data implementations. If groups like Google and Yahoo wish to preserve the use of obsolete mime types and dated techniques they can build their browsers.

HTML 5

HTML 4.01 was supposed to be the final end all of HTML standards as the W3C was moving towards a semantic web. The original plan was to force migration to a XML syntax compatible web where content, structure, and presentation were entirely seperated. At that point, in regards to data collection and analysis, the web would become open to anybody. Extremely primitive tools built upon XML and RDF structures would allow any user to parse human supplied content from written documents as a result of proper structuring. The web, and its users, are clearly not here yet.

Its seems this will be later than sooner.

Currently as small collection of a very few have created brilliant tools and methods for collecting and parsing human supplied content on an otherwise unstructured web. It is not in the best interest of these elite few to reduce their marketshare by expanding the abilities that comprise their business unto the hands of the masses. If you were Google would you want to loose AdShare market revenue by making this technology and marketing openly available with free and quickly developed applications?

Ultimately, there is only 1 line of code on nearly every web document that this concerns, the mime type. HTML 5 was created only for the purpose of extending the life of the text/html mime type. Documents written in XML or the compatible XHTML use the mime type application/XHTML+XML. The idea inspiring HTML 5 is that something like 98% of web users have no idea what a mime type is.

This is merely delaying the inevitable. HTML 5 is of no benefit to web users and is a frustrating pain in the ass for browser and web application developers.

Most designers and developers will continue to not understand the difference between XHTML and HTML is more than the allowance of tag attributes. All commonly used browsers now fully understand the application/xhtml+xml doctype. XHTML 2.0 document standard will likely be finished and awaiting approval by the end of the year. The result is that every designer, web coder, or data architect who believes themselves a professional or to be credible should only be writing XHTML 1.1 compliant code using the application/xhtml+xml mime type.

When the next generation of browsers support XHTML 2.o I will recommend only writing for this standard. As long as you are writing code in the correct structure using the proper mime type then it will be the web users that must catch up to you instead of delaying massive migrations.