News

03 JUL 2020

Coreon MKS as LLOD is European Language Grid top funded project

Coreon’s proposal for using the European Language Grid (ELG) as a platform for making multilingual interoperability assets discoverable and retrievable has been awarded. This will be achieved by complementing Multilingual Knowledge Systems with a SPARQL interface. The ELG Open Call 1 received 121 proposals, of which 110 were eligible and 10 were selected. Coreon’s proposal “MKS as Linguistic Linked Open Data” was amongst the three winning proposal from industry and received the highest funding.

The goals of the project are a) to enable Semantic Web systems to query Coreon’s richly elaborated multilingual terminologies stored in concept systems and knowledge graphs…
Coreon’s proposal for using the European Language Grid (ELG) as a platform for making multilingual interoperability assets discoverable and retrievable has been awarded. This will be achieved by complementing Multilingual Knowledge Systems with a SPARQL interface. The ELG Open Call 1 received 121 proposals, of which 110 were eligible and 10 were selected. Coreon’s proposal “MKS as Linguistic Linked Open Data” was amongst the three winning proposal from industry and received the highest funding.

The goals of the project are a) to enable Semantic Web systems to query Coreon’s richly elaborated multilingual terminologies stored in concept systems and knowledge graphs and b) to prove how to overcome the limits of RDF/knowledge graph editors, which usually are fine to model concept relations, but are weak in capturing linguistic information. When deployed in March 2021 on the ELG, the innovation will enable the Semantic Web community to query rich multilingual data with a familiar, industry standard syntax.
07 NOV 2019

CEFAT4Cities Action Gets Funding

The CEFAT4Cities Action, to be executed by a multinational consortium of five partners, led by CrossLang, has received funding. The action starts in April 2020 and runs up to March 2022.
The main objective of the CEFAT4Cities Action is to develop a “Smart cities natural language context”, providing multilingual interoperability of the Context Broker DSI and making public “smart city” services multilingual, with pilots in Vienna and Brussels.
The language resources that will be created will be committed to the ELRC repository and the following languages will be developed: Dutch, English, French, German, Italian, Slovenian, Croatian and Norwegian.

Coreon's…
The CEFAT4Cities Action, to be executed by a multinational consortium of five partners, led by CrossLang, has received funding. The action starts in April 2020 and runs up to March 2022.
The main objective of the CEFAT4Cities Action is to develop a “Smart cities natural language context”, providing multilingual interoperability of the Context Broker DSI and making public “smart city” services multilingual, with pilots in Vienna and Brussels.
The language resources that will be created will be committed to the ELRC repository and the following languages will be developed: Dutch, English, French, German, Italian, Slovenian, Croatian and Norwegian.

Coreon's role in the consortium is provide the appropriate technology, to turn vocabularies into multilingual knowledge graphs, to curate and extend them to model the domain of smart cities.
12 DEC 2018

Sunsetting CAT

Neural Machine Translation is making CAT Tools obselete.

For decades Computer Assisted Translation (CAT), based on sentence translation memories, has been the standard tool for going global. Although CAT tools had originally been designed with a mid-90s PC in mind and there have been proposals for changing the underlying data model, the basic architecture of CAT has been left unchanged. The dramatic advances in Neural Machine Translation (NMT) have now made the whole product category obsolete.

NMT Crossing the Rubicon

Neural networks, stacked deeply enough, do understand us sufficiently to create a well formed translation.

When selling translation memory, I always said that machines would only be…

Neural Machine Translation is making CAT Tools obselete.

For decades Computer Assisted Translation (CAT), based on sentence translation memories, has been the standard tool for going global. Although CAT tools had originally been designed with a mid-90s PC in mind and there have been proposals for changing the underlying data model, the basic architecture of CAT has been left unchanged. The dramatic advances in Neural Machine Translation (NMT) have now made the whole product category obsolete.

NMT Crossing the Rubicon

Neural networks, stacked deeply enough, do understand us sufficiently to create a well formed translation.

When selling translation memory, I always said that machines would only be able to translate once they understand text; and if one day they can, MT will be a mere footnote of a totally different revolution. Now it turns out that neural networks, stacked deeply enough, do understand us sufficiently to create a well formed translation. Over the last two years NMT has progressed dramatically. It has now achieved “human parity” for important language pairs and domains. That changes everything.

Industry Getting it Wrong

Most players in the $50b translation industry - service providers but also their customers - think that NMT is just another source for a translation proposal. In order to preserve their established way of delivery, they pitch the concept of “augmented translation”. However, if the machine translation is as good (or bad!) as human translation, who would you have revise it, another translator or a subject matter expert?

Yes, the expert who knows what the text is about. The workflow is thus changing to automatic translation and expert revision. Translation becomes faster, cheaper, and better!

Different Actors, Different Tools

A revision UI will have to look very different to CAT tools. The most dramatic change is that it has to be extremely simple. To support the current model of augmented translation, CAT tools have become very powerful. However, their complexity can only be handled by a highly demanded group of perhaps a few thousand professional translators globally.

For the new workflow, a product design is required that can support millions of (mostly occasional) expert revisers. Also, the revisers need to be pointed to the sentences which need revision. This requires multilingual knowledge.

Disruption Powered by Coreon

Coreon can answer the two key questions for using NMT in a professional translation workflow: 1) which parts of the translated text are not fit-for-purpose, and 2) why aren't they? To do so, the multilingual knowledge system classifies linguistic assets, human resources, QA, and projects in a unified system which is expandable, dynamic, and provides fallback paths. Coreon is a key component for LangOps. In the future linguists will engineer localization workflows such as Semiox and create multilingual knowledge in Coreon. "Doing words” is left to NMT.

The post Sunsetting CAT appeared first on .

12 DEC 2018

Why Machine Learning Still Needs Humans for Language

Despite recent advances, Machine Learning needs humans!

Outperforming Humans

Machine Learning (ML) has begun to outperform humans in many tasks which seemingly require intelligence. The hype about ML has even made it regularly into the mass media, and it can now read lips, recognize faces, or transform speech to text. Yet when it comes to dealing with the ambiguity, variety and richness of language, or understanding text or extracting knowledge, ML continues to need human experts.

Knowledge is Stored as Text

The web is certainly our greatest knowledge source. However, it has been designed for consumption by humans, not machines. The web’s knowledge is mostly stored in…

Despite recent advances, Machine Learning needs humans!

Outperforming Humans

Machine Learning (ML) has begun to outperform humans in many tasks which seemingly require intelligence. The hype about ML has even made it regularly into the mass media, and it can now read lips, recognize faces, or transform speech to text. Yet when it comes to dealing with the ambiguity, variety and richness of language, or understanding text or extracting knowledge, ML continues to need human experts.

Knowledge is Stored as Text

The web is certainly our greatest knowledge source. However, it has been designed for consumption by humans, not machines. The web’s knowledge is mostly stored in text and spoken language, enriched with images and video. It is not a structured relational database storing numeric data in machine processable form.

Text is Multilingual

The web is also very multilingual. Recent statistics surprisingly show that only 27% of the web’s content is in English, and only 21% in the next 5 most used languages. That means more than half of its knowledge is expressed in a long tail of other languages.

Constraints of Machine Learning

ML faces some serious challenges. Even with today’s availability of hardware, the demand for computing power can become astronomical when input and desired output are rather fuzzy (see the great NYT article, "The Great A.I. Awakening").

ML is great for 80/20 problems, but it is dangerous in contexts with high accuracy needs: “Digital assistants on personal smartphones can get away with mistakes, but for some business applications the tolerance for error is close to zero", emphasizes Nikita Ivanov, from Datalingvo, a Silicon Valley startup.

ML performs well on n-to-1 questions. In facial recognition, for instance, there is only one correct answer to the question “which person do all these pixels show?” However, ML struggles in n-to-many or in gradual circumstances…there are many ways to translate a text correctly or express a certain piece of knowledge.

ML is only as good as its available relevant training material. For many tasks, mountains of data are needed, and data that should be of supreme quality. For language related tasks these mountains of data are often required per language and per domain. Furthermore, it is also hard to decide when the machine has learned enough.

Monolingual ML Good Enough?

Some would suggest we should just process everything in English. ML also does an 'OK' job at machine translation (Google Translate, for example). So why not translate everything into English and then simply run our ML algorithms? This is a very dangerous approach, since errors multiply. If the output of an 80% accurate machine translation becomes the input to an 80% accurate Sentiment Analysis, errors multiply to 64%. At that hit rate you are getting close to flipping a coin. 

Human Knowledge to Help

The world is innovating constantly. Every day new products and services are created. To talk about them, we continuously craft new words: the bumpon, the ribbon, a plug-in hybridTTIP ‒ only with the innovative force of language can we communicate new things.

A Struggle with Rare Words

By definition, new words are rare. They first appear in one language and then may slowly propagate into other domains or languages. There is no knowledge without these rare words, the terms. Look at a typical product catalog description with the terms highlighted. Now imagine this description without the terms – it would be nothing but a meaningless scaffold of fill-words.

Knowledge Training Required

At university we acquire the specific language and terminology of the field we are studying. We become experts in that domain. Even so, later when we change jobs during our professional career we still have to acquire the lingo of a new company: names of products, modules, services, but also job roles and their titles, names for departments, processes, etc. We get familiar with a specific corporate language by attending training, by reading policies, specifications, and functional descriptions. Machines need to be trained in the very same way with that explicit knowledge and language.

Multilingual Knowledge Systems Boost ML with Knowledge

There is a remedy: Terminology databases, enterprise vocabularies, word lists, glossaries – organizations usually already own an inventory of “their” words. This invaluable data can be leveraged to boost ML with human knowledge: by transforming these inventories into a Multilingual Knowledge System (MKS). An MKS captures not only all words in all registers in all languages, but structures them into a knowledge graph (a 'convertible' IS-A 'car' IS-A 'vehicle'…, 'front fork' IS-PART of 'frame' IS-PART of 'bicycle').

It is the humanly curated Multilingual Knowledge System that enables Machine Learning and Artificial Intelligence solutions to work for specific domains, with only small amounts of textual data, including for less resourced languages.

The post Why Machine Learning Still Needs Humans for Language appeared first on .

12 DEC 2018

Centuple Your Market with Language-Neutral Product Search

Multilingual e-commerce means a hugely increased market for your business

A German and an Italian go on a Polish online shop…this may sound a like the start of a bad bar joke, but this bizarre situation still occurs in real life, in spite of all the EU propaganda about the Digital Single Market. The German and the Italian simply won’t understand a single word. They cannot find any products in this Polish shop, as it has not embraced the idea of multilingual e-commerce.

Easy Fix Translation?

Even before considering multiple languages, surely translating static content and product catalogues at least into English would help? In some EU countries, half…

Multilingual e-commerce means a hugely increased market for your business

A German and an Italian go on a Polish online shop…this may sound a like the start of a bad bar joke, but this bizarre situation still occurs in real life, in spite of all the EU propaganda about the Digital Single Market. The German and the Italian simply won’t understand a single word. They cannot find any products in this Polish shop, as it has not embraced the idea of multilingual e-commerce.

Easy Fix Translation?

Even before considering multiple languages, surely translating static content and product catalogues at least into English would help? In some EU countries, half of the population has a satisfying passive command of English. However, that percentage quickly becomes minuscule when consumers have to enter the right English search term for a specific domain. Full translation into all, or even into major languages only, is cost prohibitive for most.

Domestic Customers Have a Hard Time, Too!

For domestic customers it’s not easy to find products either. Today’s string-based search often returns no matches, or it finds way too many and displays them in unintuitively sorted lists, which has the same effect. Instead of searching semantically for what the customer wants, online shops expect their customers to enter the very same strings they have used in their catalogues.

Scroll Hit Lists or Explore Product Offering?

Search results are always displayed in a list of matches: a column of product names or tiles of product images. Yet what if there are many matches in several different categories? An alternative, more natural way would be to display the search result graphically in a product tree with related products closely organized. This way the shopper can quickly find the product they were actually looking for and is motivated to buy more.

Social Shopping

E-commerce has increased buying options to a degree, which leaves many consumers completely lost. Therefore, online shoppers often rely on third party information such as test reports, customer feedback, and blog articles to make a buying decision. Shops should give their customers the comforting feeling of having made an informed and good decision, but without having to leave the shop.

Solution Architecture for Advanced Linguistic Product Search

All the above requirements, particularly the semantic and cross-language search, can be relatively easily fulfilled by deploying Advanced Linguistic Search (ALS) on top of a Multilingual Knowledge System (MKS). The following chart illustrates the architecture for finding products semantically and language-neutrally:

The ALS deals with language specifics such as morphology, spelling variations, etc. Deployed with the MKS, it can expand searches semantically and across languages. The MKS stores the product information in a knowledge graph. This way, found products are listed by semantic proximity and not by string match scores. Alternatively, the shopper can explore the offering in a product graph. Supporting third party information is provided and also machine translated if originally in a different language, to help the consumer to make buying decisions without leaving the shop.

Find, Upsell, Advice = Higher Revenue

The above solution, based on a Multilingual Knowledge System such as Coreon, enables online shops to increase sales. Without ongoing translation efforts, shops can drastically extend their customer base in the Digital Single Market. For shops in almost half of the EU countries, that increase would be hundredfold!

*Feature Image: Gold vector created by rawpixel.com - www.freepik.com

The post Centuple Your Market with Language-Neutral Product Search appeared first on .

12 FEB 2018

Internet of Things Banks on Semantic Interoperability

Internet of Things devices need semantic interoperability

The biggest challenge for widespread adoption of the Internet of Things is interoperability. A much-noticed McKinsey report states that achieving interoperability in IoT would unlock an additional 40% of value. This is not surprising since the IoT is in essence about connecting machines, devices, and sensors – ideally cross organization, cross industries, and even cross borders. But while technical and syntactic interoperability are pretty much solved, little has been available so far to make sure devices actually understand each other.

Focus on Semantic Interoperability

Embedded Computing Design superbly describes the situation in a recent series of articlesTechnical interoperability

Internet of Things devices need semantic interoperability

The biggest challenge for widespread adoption of the Internet of Things is interoperability. A much-noticed McKinsey report states that achieving interoperability in IoT would unlock an additional 40% of value. This is not surprising since the IoT is in essence about connecting machines, devices, and sensors – ideally cross organization, cross industries, and even cross borders. But while technical and syntactic interoperability are pretty much solved, little has been available so far to make sure devices actually understand each other.

Focus on Semantic Interoperability

Embedded Computing Design superbly describes the situation in a recent series of articlesTechnical interoperability, the fundamental ability to exchange raw data (bits, frames, packets, messages), is well understood and standardized. Syntactic interoperability, the ability to exchange structured data, is supported by standard data formats such as XML and JSON. Core connectivity standards such as DDS or OPC-UA provide syntactic interoperability cross-industries by communicating through a proposed set of standardized gateways.

Semantic interoperability, though, requires that the meaning (context) of exchanged data is automatically and accurately interpreted. Several industry bodies have tried to implement semantic data models. However, these semantic data schemes have either been way too narrow for cross-industry use cases or had to stay too high-level. Without schemes, data from Internet of Things devices lacks information to describe its own meaning. Therefore, a laborious and, worse, inflexible normalization effort is required before that data can be really used. 

Luckily there is a solution: abstract metadata from devices by creating an IoT knowledge system.

Controlled Vocabulary and Ontologies

A controlled vocabulary is a collection of identifiers which ensure consistency of metadata terminology. These terms are used to label concepts (nodes) in a graph which provides a standardized classification for a particular domain. Such ontology, incorporating characteristics of a taxonomy and thesaurus, links concepts with their terms and attributes in semantic relationships. This way it provides metadata abstraction. It represents knowledge in machine-readable form and thus functions as a knowledge system for specific domains and their IoT applications.

IoT Knowledge Systems made Easy

A domain ontology can be maintained in a repository completely abstracted from any programming environment. It needs to be created and maintained by domain experts. With the explosive growth of IoT constantly new devices, applications, organizations, industries, and even countries are added. Metadata abstraction parallels object-oriented programming and unfortunately so do the tools used so far to maintain and extend ontologies.

However, our SaaS solution Coreon now makes sure that IoT devices understand each other. Not only does Coreon function with its API as a semantic gateway in the IoT connectivity architecture, it also provides a modern, very easy-to-use application to maintain ontologies; featuring a user interface domain experts can actually work with. With Coreon they can deliver the knowledge necessary for semantic interoperability so that IoT applications can unlock their full value.

Coreon will be presented at the Bosch ConnectedWorld Internet of Things conference February 2018 in Berlin. If you cannot come by our stand (S20) just flip thru our presentation or drop us a mail with questions.

*Feature Image: Mockup PSD created by www.freepik.com

The post Internet of Things Banks on Semantic Interoperability appeared first on .