Text will always be a major format for data and it will never be well-organized. According to Phil Karlton’s famous joke, the two hard problems in computing are naming things, cache invalidation, and off-by-one errors. A third problem could be “formatting things”. Most textual data has irregularities in the formatting that make it a pain to process. And much of the work in text processing goes into dealing with formatting issues. These are just the sad facts.
Tom Radcliffe has over 20 years experience in software development, data science, machine learning, and management in both academia and industry. He is a professional engineer (PEO and APEGBC) and holds a PhD in physics from Queen's University at Kingston. Tom brings a passion for quantitative, data-driven processes to ActiveState. He is deeply committed to the ideas of Bayesian probability theory, and assigns a high Bayesian plausibility to the idea that putting the best software tools in the hands of the most creative and capable people will make the world a better place.