Mining Your Organizations Communications with IzyCloud Part II - Elastic Data Grid and Content Extractor Scripts

PRODUCT
PRODUCT

PRICING
PRICING

HELP
HELP

BLOG
BLOG

APPSTORE
APPSTORE

COMPANY
COMPANY

LEGAL
LEGAL

Mining Your Organizations Communications with IzyCloud Part II - Elastic Data Grid and Content Extractor Scripts

In part II of the series, we will demonstrate how you can leverage your IzyCloud data to extract structured data from your organizations commucation archives including mails, text messages, slack message, etc.

Setup the Environment
You should purchase and install the Enterprise Version of the Content Pattern Extractor app. If you do not have an enterprise dashboard, you should contact sales.
Elastic Data Grids
IzyCloud allows you to build elastic data grids from streaming data that scale with content size and schema.
While they are not limited to the Content Pattern Extractor, they form the cornerstone of the app and will be used heavily.

Extraction Pipeline
You can pass the data stream through different stages of the extraction pipeline. Typically it is recommeded that you use the hosted Izy Cloud Tika Server inside the IzyCloud to sanitize and format the source data (be it HTML, PDF, Office, etc.) and then feed the data to the pattern extractors.
LBL extractor
LBL (Line By Line) will simply organize the the input stream by rows and will generate a simple grid with the following snippet:
var lines = text.split('\n').filter(function(item) {
return item != '';
});
var ret = [];
var i;
var topLimit = params.limit 1 + params.offset 1;
for (i = params.offset; i < topLimit; i++) {
if (lines[i] == '') {
continue;
}
ret.push({
id: i*1 + 1,
offset: i - params.offset,
content: lines[i]
});
}
return cb({
success: true,
data: ret
});
This will allow the data to become addressable and linkable. Other pipeline processors such as RAG will often provid a link to the source of the generated data by linking to an LBL grid.

RAG
Review Aggreagator will generate rows of JSON data from the source stream. Each JSON object is generated from a block within the stream:
var blockDefinition = {
offset: [-2, 0],
rowBegin: '[bookmark: ',
rowEnd: 'Leave a comment',
fields: {
reviewerName: [0],
revieweeName: [1],
reviewDate: [3]
},
blocks: [{
rowBegin: 'Review Details:',
rowEnd: 'Rate reviews',
fields: {
reviewItem1: [1, 1],
reviewItem2: [27],
}
}]
};
The rowBeing/End will define the boundaris for the block. If ommited, they will default to the first line and last line of the stream.

The offset will allow you to adjust the boundaries for the block initially found by rowBegin and rowEnd.

The fields will define how data should be extracted within the block. The format is:

fields: {
fieldName: [start(BLN), length]
}

start is the the BLN (block line number) where to start the extraction, starting with index 0. If start is positive and greater than, or equal, to the length of the block the field will be set to empty. If start is negative, it uses it as an index from the end of the block.

if length is not specified, the rest of the block will be attached to field value.

Izy Cloud Tika Server: http://localhost:9998/

Izyware Blog
Izyware Blog

PRODUCT

Mining Your Organizations Communications with IzyCloud Part II - Elastic Data Grid and Content Extractor Scripts

Setup the Environment

Elastic Data Grids

Extraction Pipeline

LBL extractor

RAG