When speaking of data, the first thought is usually a digital collection of tables with predominantly numerical values. Yet, unconsciously, many companies are sitting on another treasure trove of data - text data. The art lies in analyzing and interpreting this text data in a meaningful way. Nowadays, text mining and natural language processing (NLP) - as a subarea of text mining - provide the means to do this. In this article, we will show you how NLP and text mining relate to artificial intelligence and machine learning (ML), which exciting use cases exist for different business areas and how an NLP project for your idea can be implemented.
What do text mining and NLP mean?
Text mining is similar to data mining - the processing of unstructured data. The difference is only in the type of data.
Here, the text is read in and converted into useful information. The function of counting words of your favorite word processing program already belongs to text mining. In addition, the specification of the terms, i.e., whether it concerns nouns, adjectives, or proper names. Every conceivable linguistic information provided by a text can be extracted by text mining.
Natural Language Processing is, as the name suggests, the processing of natural, i.e., human, languages. NLP includes various methods. Since NLP is a subset of machine learning (ML), these methods work in the same pattern as ML models. Meaning, the algorithms learn something from the data. The techniques include Document Classification & Topic Modeling, Sentiment Analysis, and Text Generation:
- Document Classification & Topic Modeling:
A mathematical algorithm finds different topics from multiple text documents. This process is called topic modeling. With the help of these calculated predefined topics, future documents can also be classified automatically (document classification).
- Sentiment Analysis:
Sentiment analysis uses specific words and word combinations to determine the mood, emotion, and subjectivity of a text document. For this purpose, a pre-trained algorithm calculates specific scores that define how positive or objective a document is.
- Text Generation:
For the automatic generation of text documents, large amounts of text data are needed. This allows an ML model to learn how sentences are ordered. This is an ongoing process, as the model can learn new relationships with each new text document.
Text Mining and NLP - Use Cases
In the following, typical use cases in business are explained. Text mining opens up new possibilities in every area of a company.
In recruiting and human resources, NLP can improve the classification of applications, especially unsolicited applications. The CVs are scanned and automatically assigned to a suitable job offer. | |
NLP finds many example applications in E-commerce. A typical use case is the automated reading of customer ratings. Subsequent sentiment analysis shows which product is well received and which is not. An advanced, increasingly common use case is chatbots. | |
In particular, the marketing department benefits from the results of text analyses. Companies that use blog articles for marketing purposes can write their articles in a targeted manner and know beforehand what is important to readers. This particular use case has been presented in detail in this article. | |
The so-called fuzzy matching - an NLP method - helps the controlling department correctly assign customer addresses, for example. Manual input errors happen, but these are automatically corrected by NLP and lead to qualitative and standardized data. Fuzzy Matching is already partially implemented and usable in Tableau and SAP. |
That is only a small selection of use cases. NLP finds application wherever texts have to be extracted, for example, in the automated prioritization of e-mails, the routing of service tickets, and the extraction of information from documents such as invoices, contracts, or other receipts. NLP works optimally in combination with web mining or web scraping. This process serves to obtain information from the Internet. For example, the automated reading and analysis of a news feed help colleagues from the finance department keep track of prices and events.
Boost your business with
Artificial Intelligence and Machine Learning
Requirements for the technical implementation of your NLP project
In the meantime, there are already various possibilities to implement a project in text mining and NLP. Thereby no complex mathematical algorithms have to be created, and programming knowledge is not mandatory.
SAP
In the SAP environment, there are already several applications for text analysis, text mining, and NLP.
SAP Conversational AI
With SAP Conversational AI, SAP offers a concrete product for the creation of chatbots. As part of the SAP Business Technology Platform, users can flexibly create and expand a digital assistant and analyze the results. That allows usage in customer contact as well as the assistance of employees in automatic support.
SAP Analytics Cloud (SAC)
The SAP Analytics Cloud (SAC) also offers exciting NLP use cases. Of particular interest is the Search to Insight feature. It can be used for verbally asking the system questions and getting quick answers. If you want to learn more about Search to Insight and the possibilities that SAC offers in the field of AI and ML, we recommend this article.
SAP HANA
Users of SAP HANA have the option of text analysis and text mining. Text Analysis includes the flexible SAP HANA text search (Exact, Fuzzy, Linguistic) and the possibility to create a text index, which automatically breaks down the text into its components (nouns, verbs, adjectives, predicates, ...) and recognizes so-called entities. Text mining, on the other hand, offers the concrete possibility of similarity analysis of text documents. For this purpose, statistical analysis methods or KNN (K-Nearest-Neighbors) clustering are used. The choice here is limited but sufficient for first simple applications.
Data Intelligence
The disadvantage of these applications is transparency. It is often unclear how the results are achieved. For a sustainable evaluation, the analysis with Python (see the section below) is recommended for the first NLP project.
As an alternative to the complete programming of an NLP use case or as a bridge between SAP and Python, Data Intelligence can be used. That is an innovative, visual data management tool. On the one hand, Data Intelligence offers its own ML options via the SAP Leonardo machine learning service, and on the other hand, Data Scientists, IT, and the business department can collaborate to optimally prepare the selected NLP/Text Mining use case as an overall project.
Python
For data scientists, ML engineers, and all those users who feel comfortable in the Python ecosystem, two essential libraries are available.
On the one hand, the Natural Language Toolkit is the leading and classic platform for processing text data, which provides the program library nltk open-source. The library nltk makes it possible for the user to divide the text into its parts and thus extract the information in a corpus in a structured way.
The second open-source library for NLP is spaCy. SpaCy extends the nltk offering with specific applications. At the beginning of February 2021, the latest version "spaCy v3" was released, which offers the user several additional possibilities, such as using pre-trained transformers to train their pipelines.
Furthermore, various libraries are available which already contain pre-trained scores for sentiment analysis (e.g., TextBlob), provide mathematical algorithms for similarity analysis or topic modeling (e.g., gensim), and can be used for text extraction of web content (e.g., scrapy, beautifulsoup).
Natural Language Processing - Our Conclusion
Text mining and NLP enable several exciting applications, some of which companies are not yet aware of. Especially when working with text documents and records - whether virtual or paper - this special sub-area of ML can be used to simplify or even replace manual tasks. The field of NLP is gaining importance and finds application in more places than one realizes. That includes translators and search engines already perfected with machine learning.
The first chatbots are already appearing on some corporate websites and can be either complex and individually programmed or implemented using given tools. Using the presented tools for implementing an NLP project is suitable for people with little programming experience. However, there is a lack of transparency and understanding. For this, we recommend users with programming affinity using the broad spectrum of possibilities of Python. However, specific technical skills are a prerequisite here.
If you are now interested in finding promising machine learning use cases for text mining and NLP in your company, or even if you have a specific problem in mind, please do not hesitate to contact us. Our team includes experts for various solutions and applications. Together with you, we would be more than pleased to develop a strategy for your text data and provide you with full support in the conception, implementation, and deployment of the solution. Feel free to contact us!