Searching for the Perfect Inline Code Documentation Tool
Even amongst programmers I’m weird because I have an intense love for documentation. No, that doesn’t mean I overly comment my code, or that you’ll catch me browsing happily through the product requirements document during my coffee break. I should be more specific.

I have an intense love automatic documentation generation. Nothing makes me more tickled pink than seeing code and documentation living side by side in perfect harmony. I hate seeing documentation put on the company intranet only to diverge from the code it’s supposed to explain as the days go past. I hate hitting my head against a brick wall as I’m pouring through the source code trying to understand an API because at no point does it mention that it’s documented in a Word doc in another directory.
This is my rule of programming: documentation should live beside the code it documents, in the comments, especially if it’s API documentation. If your language of choice doesn’t already have some kind of automatic code generation tool then you’re probably using the wrong language.
Do You Make These Mistakes with Wikis? 9 Ways To Build a Wiki That Doesn’t Suck
There’s something about the hint of fall in the air that has always appealed to me. It’s my favorite time of the year, and as the seasons change I find the motivation to apply change to my own life. Last month I had the epiphany that I’ve been far too busy and I need to get a handle on the way I spend my time. The Internet is buzzing about using David Allen’s Getting to Done system to be more productive. There are a hundred and one different software tools you can use with the system; for the past week I’ve been using a personal wiki software called d-cubed/d3 gtd to do it.
Astute readers may guess from the title that there’s a rant coming up, and I want to prefix to say that I have nothing against d-cubed/d3 gtd. It’s good software. I respect Tom, the guy who built it, and appreciate what he’s done and how he’s been available for help. I’m still using and enjoying d-cubed/d3 gtd. No, my beef is with the entire foundation behind d3: that dark Hawaiian voodoo called wiki.
What Is a Wiki?
“Wiki is a piece of server software that allows users to freely create and edit Web page content using any Web browser. Wiki supports hyperlinks and has a simple text syntax for creating new pages and crosslinks between internal pages on the fly.
Wiki is unusual among group communication mechanisms in that it allows the organization of contributions to be edited in addition to the content itself.
Like many simple concepts, “open editing” has some profound and subtle effects on Wiki usage. Allowing everyday users to create and edit any page in a Web site is exciting in that it encourages democratic use of the Web and promotes content composition by nontechnical users.” — Ward Cunningham, creator of Wikis
The first time I saw a wiki was in 1996 and I remember being struck by two distinct thoughts: “wow, that’s ugly” and “why would I want to let people edit what I write?” Fast forward eleven years later and one of the most popular sites on the Internet is Wikipedia, a publicly editable encyclopedia. My uncanny ability to predict what’s going to become widely popular explains why my stock portfolio is doing to badly (damn you, Microsoft Zune).
This video from CommonCraft does a very good job of explaining what Wikis are good for. The wonderful thing about wikis is that wikis are wonderful let you quickly edit a web page and let more than one person collaborate on a document.
You might recognize that video from a post I wrote on LorelleOnWordPress that includes other videos that explain RSS, Social Networks, and Social Bookmarking. How Stuff Works gives even more information about how wikis work and what they are good for.
Wikis in Practice
The two strongest features of a wiki is that 1) anyone can edit a page and 2) it is quick to do an edit. Put those together and the power of a wiki is that they make it trivial to correct mistakes in the current document you are viewing. Wikipedia shows the power of collaborative editing — what is hidden is the massive effort and time sink that people put into it.
“With enough eyes all bugs are shallow.” But any programmer can tell you that a project with poor communication between different contributors always turns into SNAFU. There is great benefit to having many people improving a document through collaborative editing, but not if they aren’t all heading towards the same final result.
Photo by gadl
Small Scale Wiki | Personal or a small group of people | Works well |
Medium Scale Wiki | 10s of people | Can grow out of control if there isn’t a clear vision and people don’t own things |
Large Scale Wiki | 100s or 1000s of people | Usually a public wiki where some contributors are actively acting against the best interests of the wiki |
Massive Scale Wiki | 10000s of people | Works well because there are enough contributors to handle the massive amount of work involved |
I’m not an expert on wikis by any means, but I have used PBWiki, TWiki, TiddlyWiki and Wikipedia before. I’ve used wikis in multiple contexts from personal information storing, to corporate intranet backbone to internet social software. Wikis seem to work best on the small scale and massive scale. It’s in the middle with a medium/large number of collaborators information gets confusing. Like some plants, wikis tend to grow the same way no matter who is building them. Wikis grow wide and shallow, not narrow and deep. This makes them perfect for something like a dictionary or encyclopedia, but not as good for document tracking on an Intranet. Wikis favour a large number of pages at the same level instead of a tree hierarchy.
Creating a wiki is a grassroots process.
How Wikis Should Be Organized
- Administration
- Forms
- Time Tracking
- Vacations
- Engineering
- Project1
- Design
- Product Spec
- Verification
- Verification Spec
- Verification Environment
- Verification Plan
- Regression Results
- Validation
- Design
- Project2
- Project1
- Manufacturing
What Wikis Really Grow Into
- Administration
- Forms
- Time Tracking
- Vacations
- Phone Numbers
- Board Room Booking
- Engineering
- Project1
- Design
- Product Spec
- Verification
- Verification Spec
- Verification Environment
- Verification Plan
- Verification Review Minutes
- Action Items from Friday’s Review
- How to Run a Test
- Lab Tracking
- Regression Results
- Validation
- Project2
- Manufacturing
Wikis tend to spread out wide rather than have a strict hierarchy — and this can make it very hard to find what you’re looking for.
“Wikis are great for ad-hoc arrangement and re-arrangement of data, but they don’t respect existing data. And with 2-million-plus documents in dozens of formats sitting in our document management system, we need to respect existing data. Wikis will be useful to the extent they enable us to re-use, remix, reorganize, review, and extend those documents. What is needed is a wiki that is created, edited, and saved in Word.”
http://barelylegalsubstance.chattablogs.com/archives/027444.html
Why Do Wikis Suck?
If you aren’t familiar with wiki software (and you’re still reading?!) you should skip this section. I’m not talking about a specific wiki implementation, but general wikisms I’ve noticed in the various wiki software I’ve tried. If your WikiFlavour doesn’t have these problems then give yourself a pat on the back because you dodged a bullet.
The inventor of WikiWords should be shot
- I understand that the core of wikis is that they can be quickly edited but creating links haphazardly is the primary reason why wikis grow like weeds instead of carefully tended gardens.
- Having non-standard capitalization (CamelCase) automatically links to another page on the Wiki is only useful approximately 10% of the time.
- The other 90% of the time you have to go back and re-edit a page to remove unintentional WikiWord links.
- It promotes writing everything in lowercase to avoid the unintentional creation of WikiWords.
CamelCase is the dumbest linking structure ever invented. Even the Wiki page on Wikipedia agrees with me:“Originally, most wikis used CamelCase when naming program identifiers. These are produced by capitalizing words in a phrase and removing the spaces between them (the word “CamelCase” is itself an example). While CamelCase makes linking very easy, it also leads to links which are written in a form that deviates from the standard spelling. … There is no easy way to determine which capital letters should remain capitalized. As a result, many wikis now have “free linking” using brackets, and some disable CamelCase by default.”
Wiki syntax reinvents the wheel
- Wiki software uses its own syntax for formatting text in an effort to be more human readable than HTML.
- Wiki syntax succeeds in being more concise than HTML, but more often than this means your normal punctuation or capitalization is being misinterpreted as wiki syntax.
- Non-standard — different Wiki software uses different syntax.
- Is learning wiki syntax really easier than HTML? <b>bold</b> is easier to remember than ”bold”.
- WYSIWYG HTML editors are a solved problem thanks to software like TinyMCE — using wiki syntax is much more complicated then learning a WYSIWYG editor that essentially works like every other wordprocessing software you’ve ever used.
- If I don’t like having to learn a non-standard formatting syntax when switch between 3-5 different programming languages on a weekly basis, then how do normal people feel about it?
Wikis create an information sink-hole
- It is hard to import information into a wiki from other sources.
- It is hard to export information out of wikis (eg: RSS feeds).
- Wiki data remains stationary, when users want filtered data moving at them via email or RSS.
- Where’s the API? Wikis are intended to make it “easier for humans to edit” documents, but corporate wikis can benefit from automation like updating a report or a log on the wiki instead of sending updates by email. Wikis need an API so that it is easy to create scripts to add or edit pages on a Wiki.
Large scale wikis become chaotic and disorganized
- Multiple collaboration means no one owns anything — organization comes from someone having a vested interest to organize and maintain.
- Information is hard to navigate consistently because there is no unifying vision to the structure.
- Large scale wikis turn into a flat hierarchy of documents with no hierarchy.
Having multiple editors *requires* tracking changes
- With multiple editors on a document, version control and discussion of changes become essential requirements.
- All changes should be saved and easily backed out of.
- Need the ability to protect pages (lock) from edits.
Wikis and Search
- Using WikiWords as titles makes it near impossible to build a decent search system. Wikis usually generate overly concise URLs or incomprehensible URLs with no meaning.
- Always have to click on search results to see what the document really is because the title isn’t descriptive enough.
- The default search results are usually “what search found first” with no attempt to sort by relevance.
- If I’ve learned anything from GMail it’s the power of search+tags. So the problem with finding information in a Wiki is really a problem with search.
- Search isn’t as big a problem with publicly accessible wikis because you can use Google. It is a much bigger problem with personal/intranet wikis. Data goes into the wiki but good luck EVER finding it again.
- The ability to “Jump” to a specific WikiWord is
- usually misinterpreted as a search form
- encourages the use of short WikiWords that makes a large scale wiki more of a mess
- is sometimes case sensitive which adds much more complexity than entering search terms for a specific document
Plugins
- Many wikis offer plugins for adding additional functionality.
- Adding plugins creates another layer of complexity and potential conflicts with upgrading the core software.
- Developing custom plugins can be a huge time sink — it’s nice to have the ability to do so, but it should be a last resort.
- Dependence on plugins can create chicken-and-egg scenarios that complicate upgrading the wiki software.
Plugins that greatly improve the wiki software’s functionality should always *become* core functionality. This is a classic problem with all software that supports plugins — at some point they need to be packaged together into a distribution so that the majority of users can appreciate them instead of living in the dark age.
Building Wiki Software That Doesn’t Suck
You know what a wiki is, you know why wikis end up sucking, and if you’re still reading this far then you’ve probably used a wiki yourself. Some wiki software gets it right, but unfortunately the core distributions of many WikiFlavours are still missing some of these essential features. This is a list of what I think *every* wiki software should do to improve the WikiExperience for everyone.
1. Make It Simple to Edit, Not Just Quick to Edit
1.1 Disable WikiWords and CamelCase
Users have to create links by hand instead of unintentionally creating links because of capitalization. It will lead to meaningful document titles with headings longer than JimsListOfBugs.
1.2 WYSIWYG text editor
Let Ctrl-B bold the selected text! Contributors should not have to write in all lowercase with no punctuation in constant fear of accidentally embedding wiki syntax.
2. Help Me Find What I’m Looking For
2.1 Indexed search that orders by relevance
I’ve mentioned before that wikis need to build meaningful URLs that are human readable. They also should be able to rank pages based on what links to it, and to do something smart like click tracking where if I always click on result #6 when I search for product plan then MAYBE it should be one of the first results.
2.2 Navigation clues
Wikis need to support effective navigation with good titles, breadcrumbs, and easily created tables of contents. When I’m looking at a page I should be able to easily the parent hierarchy and child pages, as well as neighbouring pages.
3. Never Lose Data
3.1 Store and track changes
Wikis need version control for every change and easy rollback for all edits. Users need to have notification, watchlists, and easy changelogs. This is mostly a solved problem to various degrees, but it’s still surprising that some personal wiki software doesn’t support this.
3.2 Refactoring
I’ve said that wikis grow like weeds and that need a gardener to prune them. Refactoring and reorganizing pages needs to be simple to do and well supported. Information should be easy to move and automatically leave a forwarding address behind.
3.3 Discussions
Each page on the wiki needs to have a behind the scenes discussion page where direction can be agreed on, differences can be debated and issues can be captured in a message board / forum format.
4. Getting Data In and Out
4.1 Document management
People are going to want to attach all kinds of documents to wikis: from office documents like pdf, doc and xls, to traditional media files like images and video. These attachments should be treated the same as wiki pages when it comes to search and version control.
4.2 Wikis need APIs for in/out
One of the things people often complain about is importing/exporting data from a wiki. They’re meant to be easily human editable, but for some reason they overlook that you likely have a existing information in another format that you want to merge in and retain as much formatting/linking as possible. If there is an easy-to-use API then data can be moved around by writing scripts.
Conclusion
Wikis are very powerful when used correctly, but unfortunately there are 51 flavours of wikis and what has become best practice in advanced wiki software can seem painful archaic in software that still follows in the footsteps of the WorldWideWiki. Yes, I’m looking at you WikiWords. Wikis are becoming the defacto standard for modern corporate intranets, while they are undoubtedly better than the static and out of date web sites that existed before the still have a long way to go in some areas where intranets have always been weak — namely search.
Any day now Google will be opening up registration for it’s JotSpot wiki software. It’ll be interesting to see if they can get over their product schizophrenia and intelligently integrate wikis with wordprocessing, spreadsheets, slides, blogs, email, calendar, rss readers and build an intranet solution that far outclasses anything currently available. They have all the pieces, and the killer knowledge that everyone is missing — how to build an intranet search that works over all the formats.
It’s sad that downloading documents from the corporate intranet and using Google Desktop search is still 95 times more effective than using intranet search.
Links You Can Use
- HowStuffWorks explains how to do a wiki.
- More information about why you would want to use a wiki and some of the problems you can experience using wikis.
- The Truth About Intranet Wikis
- How to Build a Grassroots Enterprise Wiki Culture
Related Posts
Automatic Documentation of Python Code using Doxygen
All programming is maintenance programming, meaning that the most value comes from programming code that can be picked up and maintained by someone else. I strongly believe that code and documentation should always go hand in hand. When someone else is trying to modify your code they have no idea they need to read a PDF API document to find out more information about what a function is supposed to do. Whenever documentation exists in a seperate file it always seems to drift away from the code.
A while back I compared several open source tools for automatically generating documentation based on code comments. Doxygen is easily one of the best programs. It was written for C/C++ but there are hacks/filters for getting it working with other languages like Python, Perl and Verilog.
Python comes with a tool for generating documentation called Pydoc, but I don’t like tools that use introspection because they usually choke on weird file import rules. I was elated to find out that they’ve included Python support in Doxygen without having to translating Python to C++. This is a guide for automatically generating documentation off of Python source code using Doxygen.
Inline source code documentation (Language independent)
I am a firm believer that code documentation belongs as mark-up inside the code. It is natural that as code changes it drifts away from the documentation. When the documentation and the code reside in the same file at least there is a chance that they might remain in lock-step. Programmers are lazy (otherwise known as “overworked”), any having to make updates in multiple places is a recipe for disaster.
I get a lot of searches for people who are looking for a Doxygen (1, 2) like utility for documenting their code, especially for HDL/HVLs (Verilog, VHDL, SystemVerilog, Specman e, SystemC). So I decided to create a list of potential non-homebrew solutions to creating inline documentation for your HDL/HVL.
Much, much more information about tying documentation to source code can be found at the literate programming website. It focuses on the reverse process of generating source code from documentation, but still has some interesting points.
Here are some possible solutions:
- vhdldoc
- Example
- http://schwick.home.cern.ch/schwick/vhdldoc/
- VHDLDOC is a tool to generate automatically hyperlinked html-documentation of vhdl-code. The markup-language is like the one of JAVADOC.
- DOC++
- Example
- http://docpp.sourceforge.net/
- DOC++ is a documentation system for C, C++, IDL and Java generating both TeX output for high quality hardcopies and HTML output for sophisticated online browsing of your documentation. The documentation is extracted directly from the C/C++/IDL header/source files or Java class files.Here is a short list of highlights:
- hierarchically structured documentation
- automatic class graph generation (as Java applets for HTML)
- cross references
- high end formatting support including typesetting of equations
- DOC++ is a documentation system for C, C++, IDL and Java generating both TeX output for high quality hardcopies and HTML output for sophisticated online browsing of your documentation. The documentation is extracted directly from the C/C++/IDL header/source files or Java class files.Here is a short list of highlights:
- ROBODoc
- Example of extending ROBODoc to support other languages
- http://www.xs4all.nl/~rfsber/Robo/robodoc.html
- ROBODoc can reformat the documentation in HTML, XML DocBook, TROFF, ASCII, LaTeX or RTF format. Indirectly you can convert to pdf and windows help format.
- ROBODoc can be used to document functions, methods, classes, variables, makefile entries, system tests, and anything else you can think of.
- ROBODoc works with C, C++, Fortran, Perl, shell scripts, Assembler, DCL, DB/C, Tcl/Tk, Forth, Lisp, COBOL, Occam, Basic, HTML, Clarion, and any other language that supports remarks.
- ProgDoc
- http://www.progdoc.org/
- Language independent documentation with strong typesetting features.
- http://www.progdoc.org/
- LXR – Linux Cross Reference tool
- http://sourceforge.net/projects/lxr/
- A general purpose source code indexer and cross-referencer that provides web-based browsing of source code with links to the definition and usage of any identifier. Supports multiple languages.
- The main feature of the indexer is of course the ability to jump easily to the declaration of any global identifier. Indeed, even all references to global identifiers are indexed. Quick access to function declarations, data (type) definitions and preprocessor macros makes code browsing just that tad more convenient. At-a-glance overview of e.g. which code areas that will be affected by changing a function or type definition should also come in useful during development and debugging.
- http://sourceforge.net/projects/lxr/
- Synopsis – Source Code Introspection Tool
- Example
- http://synopsis.fresco.org/
- Synopsis is a multi-language source code introspection tool that provides a variety of representations for the parsed code to enable further processing such as documentation extraction, reverse engineering, and source-to-source translation.
- Synopsis provides a framework of C++ and Python APIs to access these representations and allows Processor objects to be defined and composed into processing pipelines, making this framework very flexible and extensible.
- Natural Docs
- http://www.naturaldocs.org/
- Natural Docs is an open-source, extensible, multi-language documentation generator. You document your code in a natural syntax that reads like plain English. Natural Docs then scans your code and builds high-quality HTML documentation from it.
- http://www.naturaldocs.org/
Knuth developed some solutions and methodology for generating code off of documentation (a backwards approach from what is ideal). It isn’t immediately useful for the problem at hand, but provides some history on the problem.
- CWEB
- http://sunburn.stanford.edu/~knuth/cweb.html
- CWEB is a version of WEB for documenting C, C++, and Java programs. It uses TeX for output.
- http://sunburn.stanford.edu/~knuth/cweb.html
- nuweb
- Example
- http://nuweb.sourceforge.net/
- In 1984, Knuth introduced the idea of literate programming. The idea was that a programmer wrote one document, the web file, that combined documentation with code. Nuweb works with any programming language and LaTeX.
Doxygen still takes the cake as the most ambitious tool for inline program documentation, and if you are using a supported language it should be your first choice.
Please comment if you have further suggestions.
Creating HTML documentation of Verilog code
v2html is a perl script for generating beautified HTML representation of a Verilog design. The outputted HTML can be used to navigate signals throughout the source code (although there are many other commerical tools that can do the same thing now).
The parser can be used as a stand-alone perl module, and could be a useful tool for building custom scripts. Not to be confused with Verilog-Perl, another Verilog parser that supports SystemVerilog.
What is semantic markup, and why does it matter?
This is an interesting essay talking about the difference between semantic and non-semantic markup. The gist is that marking up what the data means makes it more valuable, and that the non-semantic visual appearance can be broken apart from the semantic meaning (like CSS and HTML, although that it’s a clear boundary).
He puts together some good reasons and examples (using HTML, LaTeX and Microsoft Word) why this distintion is important.
General Disarray » Blog Archive » What is semantic markup, and why does it matter?
Comments Off on What is semantic markup, and why does it matter?
Become an Excel ninja
This is a follow-up to the previous tips for Microsoft Word.
It covers:
- Formulas
- Show formulas
- Filling
- Special formula syntax
- Paste values
- Logic
- Conditionals
- Look ups
- Offset
- Substrings
Comments Off on Become an Excel ninja
Moving up the wisdom hierarchy // Creating Passionate Users
One of the writers from the “Heads First” series of books has an interesting post that the fundamental flaw with most technical documentation is that it focuses on the “What” instead of the higher levels of thinking.
The Data-Information-Knowledge-Wisdom hierarchy explained
Wisdom (systems thinking) | If and When |
Understanding (grokking) | Why |
Knowledge (useful patterns) | How |
Information (organized data) | What |
Data |
Some key things to add to move yourself up the hierarchy:
- When NOT to use something
- Consequences
- How to recognize when it was NOT a good idea to use or apply this [whatever]
- Lessons learned from others, real case studies good AND bad
- Links/referrals to communities of practice
- Simulations (best of all–provide the tools and scenarios that let them discover what the long-term consequences could be)
Comments Off on Moving up the wisdom hierarchy // Creating Passionate Users
LaTeX: from beginner to TeXPert
Tutorial on getting started with LaTeX.
General Disarray » Blog Archive » LaTeX: from beginner to TeXPert
Ten things every Microsoft Word user should know
The general gist is that most people treat word processors like a typewriter instead of using the software features to automate a lot of the styles they would be attempting using manual formatting. Having recently come off of writing a 70 page technical document, some of these techniques are so dead-on. Although I feel that “How to turn off auto-format” should be the first point, since that feature alone causes the greatest number of headaches when writing documentation that is going to contain code examples.
General Disarray » Blog Archive » Ten things every Microsoft Word user should know
Comments Off on Ten things every Microsoft Word user should know
Python and Doxygen — Automatically Document Your Python Code
You may have heard of Doxygen, a program that generates source code documentation by directly reading the source files and extracting the documentation from there. Unfortunately, doxygen can’t read Python code directly (it was originally written to extract documentation from C/C code). However, it is possible to apply a filter to an input file and doxygen will then process the filtered result. And this is what you can get right here, a filter that transforms Python code into C stubs so that doxygen can process the documentation.
Perl Doxygen Filter — Automatically Document Your Perl Code
Of course, Perl developers are used to use POD rather than some other code documentation tools. However, most developers actually are not restricted to using one single language. Instead of using multiple code documentation systems one tends to use one tool for all – Doxygen is quite a powerful code documentation system that already has built-in support for multiple programming languages.
Unfortunately, Doxygen does not directly support Perl. Thus, Doxygen Filter has been written in order to be able to use Doxygen for generating code documentation for Perl projects, too.
Doxygen — Generate Documentation from Code Comments
Doxygen is a documentation system for C , C, Java, Objective-C, Python, IDL (Corba and Microsoft flavors) and to some extent PHP, C#, and D.It can help you in three ways:
1. It can generate an on-line documentation browser (in HTML) and/or an off-line reference manual (in $mbox{LaTeX}$) from a set of documented source files. There is also support for generating output in RTF (MS-Word), PostScript, hyperlinked PDF, compressed HTML, and Unix man pages. The documentation is extracted directly from the sources, which makes it much easier to keep the documentation consistent with the source code.
2. You can configure doxygen to extract the code structure from undocumented source files. This is very useful to quickly find your way in large source distributions. You can also visualize the relations between the various elements by means of include dependency graphs, inheritance diagrams, and collaboration diagrams, which are all generated automatically.
3. You can even `abuse’ doxygen for creating normal documentation (as I did for this manual).
http://www.stack.nl/~dimitri/doxygen/results.html
12 comments