December 24, 2013

Day 24 - Ars Longa, Vita Brevis

Written By: Mohit Chawla (@a1cy)
Edited By: Aleksey Tsalolikhin (@atsaloli)

A common, recurring argument and observation about the field of operations is that its quite young, compared to other engineering disciplines. The nature of learning, in becoming a good operations engineer, benefits from, and requires a multi- and inter-disciplinarian approach - the career, the culture, the actual technical practices and runbooks, and the tools that we design, build and use every day - all of these borrow ideas from other, more established and better understood industries and professions, in order to progress and improve understanding of our own endeavours in the field.

One fundamental area of study in operations, is the human-computer interface. From the design of our systems' peripherals, to the automation, monitoring technologies we use and develop, and the postmortems we carry out of failures and disasters, a comprehensive understanding and acknowledgement of the intricacies of this interaction is central, as emphasized effectively in the talks and writings of many respectable members from the community.

Another area of research at these crossroads but not directly related to system administration, are the digital humanities, that make use of, but are not solely characterized by, powerful tools for information retrieval, text analysis, visualization and statistics to enhance existing understanding, and gain new insights in humanities and social sciences.

To complement existing ongoing projects and efforts aiming to formalize, develop and solidify knowledge in operations, such as Ops School, SABOK, and other scattered forms of dissemination such as articles, podcasts, weeklies, books, talks and mailing lists, it could be beneficial to use some of the techniques employed by digital humanists, as an additional aid, an extra lens to view our field from.

Particularly, Topic Modeling is an indispensable technique used in digital humanities.

'Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections. These algorithms help us develop new ways to search, browse and summarize large archives of texts.'

The above definition is taken from the page of one of the pioneers of the field, David M. Blei, who's also one of the original developers of the Latent Dirichlet Allocation algorithm - which is the dominant topic modeling algorithm used in digital humanities.

By doing a comparative analyses of data corpuses from other engineering disciplines, and the existing literature and understanding of system administration, we can possibly develop newer ways of inference.

An oversimplified version of doing this process could be stated in the following steps:
1) Selection and gathering of literature from different engineering and scientific disciplines.
2) Gathering literature about system administration.
3) Application of LDA to these collections, followed by comparative analysis between the disciplines.

All the three tasks pose various practical problems - procurement of literature limited by economical and personal resources, non overlapping or conflicting rosters of vocabularies and terminologies in these disciplines, the relative lack of formally published literature in system administration, and the tuning of the algorithms themselves. Perhaps by the next installment of sysadvent, I'll have ironed out some of these problems. Meanwhile, if other members of the community are interested in the idea, do get in touch.

References:
1) John Allspaw's Blog
2) Jordan Sissel's entry from SysAdvent 2011 on Learning From Other Industries
3) Slides by Lindsay Holmwood on Alert Design
4) Accessible introduction to Topic Modeling
5) David M. Blei's Topic Modeling page
6) gensim, a python library for topic modeling
7) Mark Burgess's amazon page 8) SABOK
9) Ops School

No comments :