• PRODUCT

    PRODUCT

  • PRICING
    PRICING

  • HELP
    HELP

  • BLOG
    BLOG

  • APPSTORE
    APPSTORE

  • COMPANY
    COMPANY

  • LEGAL
    LEGAL

  • LOGIN
    LOGIN

  • Mining Your Organizations Communications with IzyCloud Part II - Elastic Data Grid and Content Extractor Scripts


  • In part II of the series, we will demonstrate how you can leverage your IzyCloud data to extract structured data from your organizations commucation archives including mails, text messages, slack message, etc.


  • Setup the Environment

    You should purchase and install the Enterprise Version of the Content Pattern Extractor app. If you do not have an enterprise dashboard, you should contact sales.

    Elastic Data Grids

    IzyCloud allows you to build elastic data grids from streaming data that scale with content size and schema.

    While they are not limited to the Content Pattern Extractor, they form the cornerstone of the app and will be used heavily.

    Extraction Pipeline

    You can pass the data stream through different stages of the extraction pipeline. Typically it is recommeded that you use the hosted Izy Cloud Tika Server inside the IzyCloud to sanitize and format the source data (be it HTML, PDF, Office, etc.) and then feed the data to the pattern extractors.

    LBL extractor

    LBL (Line By Line) will simply organize the the input stream by rows and will generate a simple grid with the following snippet:

    var lines = text.split('\n').filter(function(item) {

    return item != '';

    });

    var ret = [];

    var i;

    var topLimit = params.limit 1 + params.offset 1;

    for (i = params.offset; i < topLimit; i++) {

    if (lines[i] == '') {

    continue;

    }

    ret.push({

    id: i*1 + 1,

    offset: i - params.offset,

    content: lines[i]

    });

    }

    return cb({

    success: true,

    data: ret

    });

    This will allow the data to become addressable and linkable. Other pipeline processors such as RAG will often provid a link to the source of the generated data by linking to an LBL grid.

    RAG

    Review Aggreagator will generate rows of JSON data from the source stream. Each JSON object is generated from a block within the stream:

    var blockDefinition = {

    offset: [-2, 0],

    rowBegin: '[bookmark: ',

    rowEnd: 'Leave a comment',

    fields: {

    reviewerName: [0],

    revieweeName: [1],

    reviewDate: [3]

    },

    blocks: [{

    rowBegin: 'Review Details:',

    rowEnd: 'Rate reviews',

    fields: {

    reviewItem1: [1, 1],

    reviewItem2: [27],

    }

    }]

    };

    The rowBeing/End will define the boundaris for the block. If ommited, they will default to the first line and last line of the stream.

    The offset will allow you to adjust the boundaries for the block initially found by rowBegin and rowEnd.

    The fields will define how data should be extracted within the block. The format is:

    fields: {

    fieldName: [start(BLN), length]

    }

    start is the the BLN (block line number) where to start the extraction, starting with index 0. If start is positive and greater than, or equal, to the length of the block the field will be set to empty. If start is negative, it uses it as an index from the end of the block.

    if length is not specified, the rest of the block will be attached to field value.

    Izy Cloud Tika Server: http://localhost:9998/


  • Izyware Blog
    Izyware Blog