Everything You Wanted to Know About DATA INVESTIGATION USING TEXT ANALYTICS!!
Data Investigation using text analytics, though sounds complicated to work with, is easy enough to understand. Also referred to as text mining, it is a discipline of computer science which breaks down large data of texts into a more structured and specific format, which is easy to analyze and derive meaning from. To do this, different tools like machine learning and natural language processor are used. You must have heard these words before, but if you aren’t sure of what they really mean, let’s break it down into smaller bits. Much like what text analytics does!
Let’s start with Machine Learning. Under this data science technique, the machine is quite literally ‘learning’ how to solve problems and exhibit human traits. The next obvious question would then be, “How can the machine learn anything at all?” Until now, we used algorithms to write a set of rules to take the input problem for the computer to execute and return the output of solution. But now, we are demanding the computer to do much more with information we provide it. While it is easy for a human to identify the kind of vehicle or bird in an image, but to write an algorithm whereby the computer would be able to do the same is very complex. That is what Machine learning Programs can do. Machine Learning powered with linguistics, Natural Language Processing allows the computer to examine a text in the given language in order to ‘understand’ the natural language and perform tasks such as Language Translation and Question Answering.
Text analytics breaks down unstructured text data using statistical patterns into smaller and more coherent bits of information. Imagine a scenario where you’re asked to find specific information from a book in the middle of a large library. Essentially text analytics breaks down the data in all books into specific information and then with natural language processing and statistical algorithms, it will search for the specific information asked. Different tools like text classification, sentiment analysis, named entity recognition and relation extraction are used to extract the necessary information from a complex pattern of unstructured sea of data. There are multiple applications of text analytics which we use in our daily lives without much awareness of its amazing capability, for example, spam filtering, document retrieval, chatbots, etc.
Sentiment Analysis brings the emotion element of human being in text analysis. It is also known as opinion mining or emotion AI. It refers to the use of natural language processing and computational linguistics to systematically study effective states and subjective information. A basic task in sentiment analysis is classifying the polarity of a given text in a document or sentence – whether the expressed opinion is positive, negative, or neutral. Advanced, “beyond polarity” sentiment classification looks, for instance, at emotional states such as “angry”, “sad”, and “happy”. It is widely applied to customer reviews and survey responses, online and social media, from marketing to customer service to clinical medicine.
With that, lets focus on data investigation. Data investigation deals with the search for malicious, misplaced or sensitive data with an objective to collect them as evidence and further review and analyze them.Text Analytics can be achieved using several tools and programming languages.
Let me explain to you the approach we took in one of our recent case.
The email data was stored in the form of Offline Storage Table (OST). Our first step was to Convert Offline Storage Table (OST) files to Personal Storage Table Files (PST) as it helped us to easily extract the information from a PST. We have used the all-powerful Python program to extract approximately 600 GB’s of email text into a database.
The second step was to clean and analyse data based on the client’s requirement within a very limited time frame. This brought in Alteryx which helped in transforming and analyzing data. We identified emails with critical words and also performed an extensive sentiment analysis on each email using Artificial Intelligence. This allowed us to understand the stress level for each email. It further allowed us to risk rate each email and help identify a critical set of emails very relevant to the objective of the engagement. With all these technologies, we were able to knock off 92% of emails and allowed our team to focus only on the most relevant 8% emails.
Finally, for visualization we used Power BI and Tableau. With these tools, we were able to categorize the data, develop a story line and insights from all of the relevant emails. We also presented some interesting facts on the frequency of conversations, timing of conversations of users, etc. These tools were favored for their appealing visualization, user friendliness and easy accessibility.
With a combination of Python, Alteryx, Power Bi and Tableau, we helped our client achieve their objective at an impressive speed of 6 days.
Nu-Pie, provides Text Analytics Services & best solutions to your business needs. Know more on Text Analytics services at https://www.nu-pie.com/ or connect to us for a free consultation on firstname.lastname@example.org.
Author: Mariyam Ghaley, Co-author: Krithika