Artificial Intelligence (AI) is everywhere in our everyday life. From analyzing our online shopping to predict our buying behavior to scrutinizing our movie or music preferences to anticipate what we might like. All the suggestions we receive come from algorithms that have examined all the content we have already seen, listened to or rated.
We are surrounded by AI, so get on board!
Artificial intelligence is a field of computer science that aims at creating intelligent machines to reproduce and enhance certain capabilities of the human brain, such as reading, understanding or predicting. Machine learning (ML) and Natural Language Processing (NLP) are a core part of AI. They aim at building mathematical models from sample data, commonly named training samples, in order to complete complex tasks such as prediction, classification, regularization, filtering, detection, etc.
Artificial Neural Networks (ANN or NN) belong to the family of ML algorithms that have been inspired by biological neural networks. Driven by the deep learning movement, research over NNs has been booming for a few years. NNs are commonly organized into layers that aggregate collections of connected units called neurons. Their behavior is similar to human neurons, seeing that they take in some inputs and trigger an output via an activation function, such as an hyperbolic tangent or a softmax function. Connections between neurons have a weight that adjusts, as learning proceeds through gradient descent techniques. Various NN variants exist like Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), autoencoders, etc.
AI is being studied for many years but it is only recently that it is successfully applied in academia and industry. It is now not only reserved for an elite anymore but becomes accessible to a large community of engineers. Many schools and universities have already included it into their education program: a new generation will soon be able to work on AI, without necessarily mastering the strong mathematics hiding behind.
AI is such a hot topic, that at the latest Neural Information Processing Systems (NeurIPS) conference — specialized in machine learning, artificial intelligence and computational neuroscience — there was a limit of 8,000 attendees. Tickets were sold in only 11 minutes, i.e. faster than the Burning Man festival… but still slower than a Beyoncé show.
R&D and AI at MyScript
To some extent, AI is part of our DNA. MyScript was founded in 1998 with the mission to build the most advanced handwriting recognition system. Machine Learning was already at the core of our first handwriting recognition system released in the early 2000s. Both print and cursive writing were supported, and feedforward neural networks classified the various character hypotheses. We used a novel approach based on a global discriminative training scheme, in order to estimate the gradient to be propagated through the NN. It is nowadays a technique commonly employed in the Connectionist Temporal Classification (CTC) framework to train sequence-to-sequence neural systems. The main difference is that we used a Maximum Mutual Information (MMI) criterion instead of a Maximum Likelihood (ML) criterion as a loss function, to better balance character frequencies and to reject bad segmentations during the learning process.
Over the years, our engine evolved towards a multi-language system capable of recognizing a wide range of languages and scripts. In 2006, we were able to recognize more than 30,000 Chinese characters using a specific ANN architecture, pushing accuracy at a level never reached at that time. This architecture enables a decrease of the memory footprint, by having neurons identify sub-characters components or radicals common to several Hanzi characters. The activation potential of an output neuron is obtained by summing the values of the composing radical neurons. Radical modeling also provides a better description of rare characters, for which we have fewer training samples. The inference of this wide NN is speeded up by an efficient softmax activation function, in which outputs are grouped into clusters according to character similarities. A first softmax is computed to estimate the most probable clusters, before activating only their corresponding characters in a second softmax, thus avoiding the computation over the 30,000 outputs.
For Arabic, we developed a spatio-temporal segmentation technique to support the bi-directional specificity of Arabic writing, where words are written from right to left, except for Latin words that are written from left to right. In such a context, the management of delayed strokes is a challenge, due to the difficulty of attributing these marks to their corresponding characters. For this specific case, we developed a technique where two processes intertwine: stroke ordering and character recognition. The basic idea is to model all possible orderings and let the system choose the one that maximizes the character probabilities.
Our NLP technology has also evolved over the years moving from n-gram and n-class models to recurrent neural networks (RNN), to predict what character or word comes next (language modeling). RNNs have the advantage to solve the sparsity problem encountered by n-gram models, that is not enough data in a corpus to accurately estimate the likelihood of a character/word based on the previous ones. These neural models rely on embeddings that map a linguistic item (character or word) to a vector of real values. So characters or words appearing in similar context or having semantic similarities are associated with relatively close vectors. We are currently adopting this neural language model, not only to natural languages like English, French or Chinese, but also to the language of mathematics.
About 10 years ago, we extended our expertise in Artificial Intelligence to 2D technologies. It implies the analysis and interpretation of two-dimensional languages, such as mathematics, musical notations, as well as charts, graphs and diagrams. We developed specific AI algorithms based on 2D parsers and 2D grammars to understand and interpret the written version of these languages. It analyzes input ink by recursively applying production rules, in order to build a parse tree representing the structure of the object to be recognized. For example, it can retrieve the elements of a fraction by identifying its numerator and denominator, assign music accidentals to their corresponding notes, recover the relationship between connectors and nodes in a flowchart or a mind map.
We have recently invested a lot of R&D effort in one recurrent neural architecture: the LSTM network (Long Short-Term Memory). This network is based on specific memory units, composed of a cell and three gates. The input gate controls the flow of a value into the cell, the forget gate controls the duration of this value and the output gate controls the activation of the unit based on this value. LSTM networks were developed to cope with the vanishing gradient problem frequently encountered by standard RNNs. We successfully applied this architecture to meet the many requirements in the analysis of a document or handwritten note layout. Based on the LSTM’s strong ability to memorize context, we are now able to separate text strokes from geometrical shapes in a diagram, break down a text block into lines or identify mathematical expressions or symbols in the middle of a text.
To challenge ourselves, we regularly participate in academic contests organized during international research conferences. For every contest in which we participated, we ended up in the first place. Being a total of 10 awards that demonstrate the level of quality of our technology.
Last summer, we attended the International Conference on Frontiers in Handwriting Recognition (ICFHR) in Niagara Falls. The contest was about Vietnamese online handwritten text recognition and participants were invited to submit a system for recognizing words, text lines and full paragraphs. MyScript was ranked first for these three tasks and won with a large margin compared to the runner-up.
The level of excellence of our AI and UX (User eXperience) research has also been repeatedly recognized by the market. We won the Mobile Apps Showdown contest at CES twice with our Calculator and Nebo applications, each time getting the online and public voting.
20 years of experience and research in AI
Last year, we celebrated our 20th anniversary… What have we accomplished in this timeframe? Well, we have the most outstanding handwriting recognition system and a diverse team of skillful people improving it constantly. We work together as one team to provide our users with the most natural way to create and manage digital content.
But most of all, we went further than just solving the problem of recognizing digital ink. Thanks to MyScript Interactive Ink technology (iink), our latest innovation, we offer the opportunity to interact with it. Not only can users have their handwriting recognized, they are also able to more intuitively manipulate, edit, and share digital content. By combining the infinite possibilities of handwriting with Interactive Ink, we aim at pushing the boundaries of productivity.
MyScript Interactive Ink technology is based on an augmented digital ink representation. A linking system keeps the consistency between the stroke coordinates and their corresponding interpretation. Each gesture (e.g. an erasure) occurring on an ink element is automatically replicated in real time on its recognition counterpart and vice versa. It becomes as easy to handle ink as it is to handle ASCII characters on your computer. Users can select their handwritten text or position a cursor as they would do in a text document. The layout of a handwritten note becomes fully responsive, making the note portable across devices. Handwritten text, mathematical equations or even diagrams are interpreted in real-time to be editable via simple gestures, responsive and easy to convert to a neat output.
What is the future of MyScript AI?
We will continue to boost innovation by making Interactive Ink a first-class citizen in tomorrow’s digital world. Our 10-year plan is full of challenges: table manipulation, chemical formulas recognition, interactive sketch, ink collaboration, semantic analysis, virtual reality, etc. AI will remain forever in our DNA!
More information on MyScript Interactive Ink.
Human knowledge works in layers: we can rely on existing technology without mastering it and still be able to create, invent or develop something new. No need to know how to build a computer to use it. The same will soon happen with AI.
Pierre-Michel Lallican. CTO MyScript