Skip to content
NextLytics
Megamenü_2023_Über-uns

Shaping Business Intelligence

Whether clever add-on products for SAP BI, development of meaningful dashboards or implementation of AI-based applications - we shape the future of Business Intelligence together with you. 

Megamenü_2023_Über-uns_1

About us

As a partner with deep process know-how, knowledge of the latest SAP technologies as well as high social competence and many years of project experience, we shape the future of Business Intelligence in your company too.

Megamenü_2023_Methodik

Our Methodology

The mixture of classic waterfall model and agile methodology guarantees our projects a high level of efficiency and satisfaction on both sides. Learn more about our project approach.

Products
Megamenü_2023_NextTables

NextTables

Edit data in SAP BW out of the box: NextTables makes editing tables easier, faster and more intuitive, whether you use SAP BW on HANA, SAP S/4HANA or SAP BW 4/HANA.

Megamenü_2023_Connector

NextLytics Connectors

The increasing automation of processes requires the connectivity of IT systems. NextLytics Connectors allow you to connect your SAP ecosystem with various open-source technologies.

IT-Services
Megamenü_2023_Data-Science

Data Science & Engineering

Ready for the future? As a strong partner, we will support you in the design, implementation and optimization of your AI application.

Megamenü_2023_Planning

SAP Planning

We design new planning applications using SAP BPC Embedded, IP or SAC Planning which create added value for your company.

Megamenü_2023_Dashboarding

Dashboarding

We help you with our expertise to create meaningful dashboards based on Tableau, Power BI, SAP Analytics Cloud or SAP Lumira. 

Megamenü_2023_Data-Warehouse-1

SAP Data Warehouse

Are you planning a migration to SAP HANA? We show you the challenges and which advantages a migration provides.

Business Analytics
Megamenü_2023_Procurement

Procurement Analytics

Transparent and valid figures are important, especially in companies with a decentralized structure. SAP Procurement Analytics allows you to evaluate SAP ERP data in SAP BI.

Megamenü_2023_Reporting

SAP HR Reporting & Analytics

With our standard model for reporting from SAP HCM with SAP BW, you accelerate business activities and make data from various systems available centrally and validly.

Megamenü_2023_Dataquality

Data Quality Management

In times of Big Data and IoT, maintaining high data quality is of the utmost importance. With our Data Quality Management (DQM) solution, you always keep the overview.

Career
Megamenü_2023_Karriere-2b

Working at NextLytics

If you would like to work with pleasure and don't want to miss out on your professional and personal development, we are the right choice for you!

Megamenü_2023_Karriere-1

Senior

Time for a change? Take your next professional step and work with us to shape innovation and growth in an exciting business environment!

Megamenü_2023_Karriere-5

Junior

Enough of grey theory - time to get to know the colourful reality! Start your working life with us and enjoy your work with interesting projects.

Megamenü_2023_Karriere-4-1

Students

You don't just want to study theory, but also want to experience it in practice? Check out theory and practice with us and experience where the differences are made.

Megamenü_2023_Karriere-3

Jobs

You can find all open vacancies here. Look around and submit your application - we look forward to it! If there is no matching position, please send us your unsolicited application.

Blog
NextLytics Newsletter Teaser
Sign up now for our monthly newsletter!
Sign up for newsletter
 

How to use text mining and NLP to increase your blog success

The potential of machine learning and advanced analytics is not limited to the structured data that can be easily extracted from a database or data warehouse. An even larger amount of data is hidden in documents, emails, comments and of course the internet.

This unstructured data contains information that is not directly accessible. Under the keywords text mining and natural language processing (NLP), methods can be found that make it possible to extract a variety of insights from text data. In this article, you will learn about basic methods and associated frameworks using a practical example from the field of marketing and thus open up a further data field for your analysis.

Text mining can often be used profitably, for example, with complaint comments and maintenance notes. For example, the text data is used to derive prognosis factors for a machine learning project. In addition to the quantitative rating of the customer based on the order history (see RFM analysis), qualitative ratings are also possible with text mining.

We present below the use case of success prediction for blog articles.
For this purpose, the following steps are followed:

  1. Determining the concrete analysis objective
  2. Generating the database
  3. Forming features from the text content and title
  4. Creating a prediction model
  5. Interpreting the results

Determination of the analysis objective

Before analysing text data, a suitable analysis objective should be formed in order to provide added value. From a marketing point of view, the success of a blog article is crucial and this can first be measured with various KPIs. For example, the number of views, the length of time spent on the article or the website, the lead to conversion content or similar can be interesting. Once a target value is envisioned and defined in more detail, the selection of the data basis and the extraction of the relevant prediction features can begin. In our example, the number of average views in the first 6 months after publication of the article was considered as an analysis objective.

Generating the database

Depending on the nature of the data source, making it available is a simple or complex process. In the simplest case, the text data is available directly as a database field, an easily readable file or via an API. For all types of text files (Word, PowerPoint, PDF) there are a number of useful Python libraries that can be used for extraction. If the desired data is hidden on the internet, a so-called web scraper can automatically process web pages and extract texts and other information. Properly designed, this provides up-to-date data with external information to enrich the data set. However, care should be taken to ensure the legality of the process and to avoid conflicts with data protection laws. In our application example, the blog data is generated via web extraction. The effort is justified because the final state of the articles is there, as it is available to readers on the internet.

Data_Text Mining

  • Direct access in file form or via APIs
  • Compliant web mining with the frameworks Scrapy or BeautifulSoup
  • Extraction from PDF documents using pdfplumber, PyPDF4 or Optical Character Recognition

If the text data is to be collected in a database after extraction, SAP HANA platform is a good choice. There, the text can be stored as NCLOB data type alongside other metadata such as title, date and tags. SAP HANA platform offers the possibility to create a text index, which breaks down the text into its components and adds word classes, positions in the document, etc. This breakdown is excellent for data analysis.


Boost your business with
Artificial Intelligence and Machine Learning

ML AI for Business


Forming features from the text content and title

The next step is to explore the data and form the predictive factors. This process is more creative and extensive than with structured data. Based on the existing texts, a number of possible influencing factors can be formed and evaluated.
In addition to text characteristics such as the number of words and sentence length, the vocabulary used is also important. There are also a variety of NLP techniques that can be used to create specific features. Sentiment analysis assesses the texts based on subjectivity and polarity (negative, neutral, positive) and provides corresponding numeric features. Topic modelling is used to cluster thematically similar documents. The clustering itself can also be used as a feature. Finally, existing metadata is also helpful. Since blog articles have a time factor in terms of community building, the time of publication of the respective blog articles is important. This Topic modelling is used to cluster thematically similar documents. The clustering itself can also be used as a feature. Finally, existing metadata is also helpful. Since blog articles have a time factor in terms of community building, the time of publication of the respective blog articles is important.This community growth was latently built in by including the average website visits in the month before the time of publication as a factor.

 

Features_Text Mining

  • Text characteristics such as word count, average sentence length and title length.
  • Metadata such as topic tags and publication date
  • Created features based on the vocabulary used 
  • Results of sentiment analysis
  • Topic assignment through topic modelling

Creating a prediction model

Once the data including influencing factors is prepared, building a first forecast model is an easy task. Based on the data and the associated target value, the model derives the underlying rules on its own. This is why we refer to artificial intelligence and machine learning. In principle, model parameters are set with the help of the data. In the process, some differences occur between the model result and reality. The aim of modelling is to reduce these deviations to a minimum when new data is used. For this purpose, different model types, model settings and preparation steps for the data basis are systematically evaluated.

In our case, the target value is the views of the blog article in the first 6 months. So the prediction refers to a numerical value. It is a regression problem. For the underlying data, a random forest model was the most promising model. If the model should be applied to new, unpublished blog articles, these must be prepared in the same way like the training data. The implementation and orchestration of such data pipelines is also a crucial point for the long-term added value of a machine learning model.

Interpreting the results

The promise of success of a machine learning application does not end with the mere prediction of new results. Let's say that some blog articles are considered by the model to be particularly likely to succeed. If this claim is repeatedly confirmed, you would not want to stop. It is now interesting to find out why the prediction occurs and what levers there are to increase the reach.
With some models, the insights are easier to extract than with others. For example, the decision rules of a trained decision tree give an indication of the importance of the influencing factors. For more complex models, special Explainable AI frameworks are used. Here, the feature influence is determined via approaches from game theory, for example.

When analysing the reach of our blog articles, we were able to derive and quantify interesting findings. For example, an article on the topic of SAP Analytics Cloud or SAP Dashboarding generates twice the reach of an average article. Or as soon as the title of a blog article suggests a how-to guide, the reach is also particularly high.

Text Mining - Our Conclusion 

The analysis of unstructured text data can thus produce interesting insights and also presents the analyst with an exciting new challenge. The open-source tool landscape is suitable for initial use cases, although libraries for German-language analyses are generally less elaborated. In the course of modelling, the real added value lies in the extraction of insights using ExplainableAI methods, which make the black box of the model transparent.

Would you like to discuss which machine learning use cases for text mining exist in your environment or do you even have a concrete problem in mind? We would be happy to work with you to develop a strategy for your text data and provide you with full support in the design, implementation and operation of the solution. Please do not hesitate to contact us.

Learn more about Machine Learning and AI

avatar

Luise

Luise Wiesalla joined NextLytics AG in 2019 as a working student / student consultant in the field of data analytics and machine learning. She has experience with full-stack data science projects and using the open-source workflow management solution Apache Airflow. She likes to spend her free time exploring her surroundings and being on the move.

Got a question about this blog?
Ask Luise

Blog - NextLytics AG 

Welcome to our blog. In this section we regularly report on news and background information on topics such as SAP Business Intelligence (BI), SAP Dashboarding with Lumira Designer or SAP Analytics Cloud, Machine Learning with SAP BW, Data Science and Planning with SAP Business Planning and Consolidation (BPC), SAP Integrated Planning (IP) and SAC Planning and much more.

Subscribe to our newsletter

Related Posts

Recent Posts