May | 2014 | IT Assorti

Some time ago I’ve outlined Six Graphs of Big Data as a pathway to the individual user experience. Then I’ve did the same for Five Sources of Big Data. But what’s between them remained untold. Today I am going to give my vision how different data sources allow to build different data graphs. To make it less dependent on those older posts, let’s start from the real-life situation, business needs, then bind to data streams and data graphs.

Context is a King

Same data in different contexts has different value. When you are late to the flight, and you got message your flight was delayed, then it is valuable. In comparison to receiving same message two days ahead, when you are not late at all. Such message might be useless if you are not traveling, but airline company has your contacts and sends such message on the flight you don’t care about. There was only one dimension – time to flight. That was friendly description of the context, to warm you up.

Some professional contexts are difficult to grasp by the unprepared. Let’s take situation from the office of some corporation. Some department manager intensified his email communication with CFO, started to use a phone more frequently (also calling CFO, and other department managers), went to CFO office multiple times, skipped few lunches during a day, remained at work till 10PM several days. Here we got multiple dimensions (five), which could be analyzed together to define the context. Most probably that department manager and CFO were doing some budgeting: planning or analysis/reporting. Knowing that, it is possible to build and deliver individual prescriptive analytics to the department manager, focused and helping to handle budget. Even if that department has other escalated issues, such as release schedule or so. But severity of the budgeting is much higher right away, hence the context belongs to the budgeting for now.

By having data streams for each dimension we are capable to build run-time individual/personal context. Data streams for that department manager were kind of time series, events with attributes. Email is a dimension we are tracking; peers, timestamps, type of the letter, size of the letter, types and number of attachments are attributes. Phone is a dimension; names, times, durations, number of people etc. are attributes. Location is a dimension; own office, CFO’s office, lunch place, timestamps, durations, sequence are attributes. And so on. We defined potentially useful data streams. It is possible to build an exclusive context out of them, from their dynamics and patterns. That was more complicated description of the context.

Interpreting Context

Well, well, but how to interpret those data streams, how to interpret the context? What we have: multiple data streams. What we need: identify the run-time context. So, the pipeline is straightforward.

First, we have to log the Data, from each interested dimension. It could be done via software or hardware sensors. Software sensors are usually plugins, but could be more sophisticated, such as object recognition from surveillance cameras. Hardware sensors are GPS, Wi-Fi, turnstiles. There could be combinations, like check-in somewhere. So, think that it could be done a lot with software sensors. For the department manager case, it’s plugin to Exchange Server or Outlook to listen to emails, plugin to ATS to listen to the phone calls and so on.

Second, it’s time for low-level analysis of the data. It’s Statistics, then Data Science. Brute force to ensure what is credible or not, then looking for the emerging patterns. Bottleneck with Data Science is a human factor. Somebody has to look at the patterns to decrease false positives or false negatives. This step is more about discovery, probing and trying to prepare foundation to more intelligent next step. More or less everything clear with this step. Businesses already started to bring up their data science teams, but they still don’t have enough data for the science:)

Third, it’s Data Intelligence. As MS said some time ago “Data Intelligence is creating the path from data to information to knowledge”. This should be described in more details, to avoid ambiguity. From Technopedia: “Data intelligence is the analysis of various forms of data in such a way that it can be used by companies to expand their services or investments. Data intelligence can also refer to companies’ use of internal data to analyze their own operations or workforce to make better decisions in the future. Business performance, data mining, online analytics, and event processing are all types of data that companies gather and use for data intelligence purposes.” Some data models need to be designed, calibrated and used at this level. Those models should work almost in real-time.

Fourth, is Business Intelligence. Probably the first step familiar to the reader:) But we look further here: past data and real-time data meet together. Past data is individual for business entity. Real-time data is individual for the person. Of course there could be something in the middle. Go find comparison between stats, data science, business intelligence.

Fifth, finally it is Analytics. Here we are within individual context for the person. There worth to be a snapshot of ‘AS-IS’ and recommendations of ‘TODO’, if the individual wants, there should be reasoning ‘WHY’ and ‘HOW’. I have described it in details in previous posts. Final destination is the individual context. I’ve described it in the series of Advanced Analytics posts, link for Part I.

Data Streams

Data streams come from data sources. Same source could produce multiple streams. Some ideas below, the list is unordered. Remember that special Data Intelligence must be put on top of the data from those streams.

In-door positioning via Wi-Fi hotspots contributing to mobile/mobility/motion data stream. Where the person spent most time (at working place, in meeting rooms, on the kitchen, in the smoking room), when the person changed location frequently, directions, durations and sequence etc.

Corporate communication via email, phone, chat, meeting rooms, peer to peer, source control, process tools, productivity tools. It all makes sense for analysis, e.g. because at the time of release there should be no creation of new user stories. Or the volumes and frequency of check-ins to source control…

Biometric wearable gadgets like BodyMedia to log intensity of mental (or physical) work. If there is low calories burn during long bad meetings, then that could be revealed. If there is not enough physical workload, then for the sake of better emotional productivity, it could be suggested to take a walk.

Data Graphs from Data Streams

Ok, but how to build something tangible from all those data streams? The relation between Data Graphs and Data Streams is many to many. Look, it is possible to build Mobile Graph from the very different data sources, such as face recognition from the camera, authentication at the access point, IP address, GPS, Wi-Fi, Bluetooth, check-in, post etc. Hence when designing the data streams for some graph, you should think about one to many relations. One graph can use multiple data streams from corresponding data sources.

To bring more clarity into relations between graphs and streams, here is another example: Intention Graph. How could we build Intention Graph? The intentions of somebody could be totally different in different contexts. Is it week day or weekend? Is person static in the office or driving the car? Who are those peers that the person communicates a lot recently? What is a type of communication? What is a time of the day? What are person’s interests? What were previous intentions? As you see there could be data logged from machines, devices, comms, people, profiles etc. As a result we will build the Intention Graph and will be able to predict or prescribe what to do next.

Context from Data Graphs

Finally, having multiple data graphs we could work on the individual context, personal UX. Technically, it is hardly possible to deal with all those graphs easily. It’s not possible to overlay two graphs. It is called modality (as one PhD taught me). Hence you must split and work with single modality. Select which graph is most important for your needs, use it as skeleton. Convert relations from other graphs into other things, which you could apply to the primary graph. Build intelligence model for single modality graph with plenty of attributes from other graphs. Obtain personal/individual UX at the end.

IT Assorti

blog by Vasyl Mylko

Monthly Archives: May 2014

Custom PDFs for Kindle

Big Data Graphs Revisited