Tag Archives: Graph

Big Data Graphs Revisited

Some time ago I’ve outlined Six Graphs of Big Data as a pathway to the individual user experience. Then I’ve did the same for Five Sources of Big Data. But what’s between them remained untold. Today I am going to give my vision how different data sources allow to build different data graphs. To make it less dependent on those older posts, let’s start from the real-life situation, business needs, then bind to data streams and data graphs.

 

Context is a King

Same data in different contexts has different value. When you are late to the flight, and you got message your flight was delayed, then it is valuable. In comparison to receiving same message two days ahead, when you are not late at all. Such message might be useless if you are not traveling, but airline company has your contacts and sends such message on the flight you don’t care about. There was only one dimension – time to flight. That was friendly description of the context, to warm you up.

Some professional contexts are difficult to grasp by the unprepared. Let’s take situation from the office of some corporation. Some department manager intensified his email communication with CFO, started to use a phone more frequently (also calling CFO, and other department managers), went to CFO office multiple times, skipped few lunches during a day, remained at work till 10PM several days. Here we got multiple dimensions (five), which could be analyzed together to define the context. Most probably that department manager and CFO were doing some budgeting: planning or analysis/reporting. Knowing that, it is possible to build and deliver individual prescriptive analytics to the department manager, focused and helping to handle budget. Even if that department has other escalated issues, such as release schedule or so. But severity of the budgeting is much higher right away, hence the context belongs to the budgeting for now.

By having data streams for each dimension we are capable to build run-time individual/personal context. Data streams for that department manager were kind of time series, events with attributes. Email is a dimension we are tracking; peers, timestamps, type of the letter, size of the letter, types and number of attachments are attributes. Phone is a dimension; names, times, durations, number of people etc. are attributes. Location is a dimension; own office, CFO’s office, lunch place, timestamps, durations, sequence are attributes. And so on. We defined potentially useful data streams. It is possible to build an exclusive context out of them, from their dynamics and patterns. That was more complicated description of the context.

 

Interpreting Context

Well, well, but how to interpret those data streams, how to interpret the context? What we have: multiple data streams. What we need: identify the run-time context. So, the pipeline is straightforward.

First, we have to log the Data, from each interested dimension. It could be done via software or hardware sensors. Software sensors are usually plugins, but could be more sophisticated, such as object recognition from surveillance cameras. Hardware sensors are GPS, Wi-Fi, turnstiles. There could be combinations, like check-in somewhere. So, think that it could be done a lot with software sensors. For the department manager case, it’s plugin to Exchange Server or Outlook to listen to emails, plugin to ATS to listen to the phone calls and so on.

Second, it’s time for low-level analysis of the data. It’s Statistics, then Data Science. Brute force to ensure what is credible or not, then looking for the emerging patterns. Bottleneck with Data Science is a human factor. Somebody has to look at the patterns to decrease false positives or false negatives. This step is more about discovery, probing and trying to prepare foundation to more intelligent next step. More or less everything clear with this step. Businesses already started to bring up their data science teams, but they still don’t have enough data for the science:)

Third, it’s Data Intelligence. As MS said some time ago “Data Intelligence is creating the path from data to information to knowledge”. This should be described in more details, to avoid ambiguity. From Technopedia: “Data intelligence is the analysis of various forms of data in such a way that it can be used by companies to expand their services or investments. Data intelligence can also refer to companies’ use of internal data to analyze their own operations or workforce to make better decisions in the future. Business performance, data mining, online analytics, and event processing are all types of data that companies gather and use for data intelligence purposes.” Some data models need to be designed, calibrated and used at this level. Those models should work almost in real-time.

Fourth, is Business Intelligence. Probably the first step familiar to the reader:) But we look further here: past data and real-time data meet together. Past data is individual for business entity. Real-time data is individual for the person. Of course there could be something in the middle. Go find comparison between stats, data science, business intelligence.

Fifth, finally it is Analytics. Here we are within individual context for the person. There worth to be a snapshot of ‘AS-IS’ and recommendations of ‘TODO’, if the individual wants, there should be reasoning ‘WHY’ and ‘HOW’. I have described it in details in previous posts. Final destination is the individual context. I’ve described it in the series of Advanced Analytics posts, link for Part I.

Data Streams

Data streams come from data sources. Same source could produce multiple streams. Some ideas below, the list is unordered. Remember that special Data Intelligence must be put on top of the data from those streams.

In-door positioning via Wi-Fi hotspots contributing to mobile/mobility/motion data stream. Where the person spent most time (at working place, in meeting rooms, on the kitchen, in the smoking room), when the person changed location frequently, directions, durations and sequence etc.

Corporate communication via email, phone, chat, meeting rooms, peer to peer, source control, process tools, productivity tools. It all makes sense for analysis, e.g. because at the time of release there should be no creation of new user stories. Or the volumes and frequency of check-ins to source control…

Biometric wearable gadgets like BodyMedia to log intensity of mental (or physical) work. If there is low calories burn during long bad meetings, then that could be revealed. If there is not enough physical workload, then for the sake of better emotional productivity, it could be suggested to take a walk.

 

Data Graphs from Data Streams

Ok, but how to build something tangible from all those data streams? The relation between Data Graphs and Data Streams is many to many. Look, it is possible to build Mobile Graph from the very different data sources, such as face recognition from the camera, authentication at the access point, IP address, GPS, Wi-Fi, Bluetooth, check-in, post etc. Hence when designing the data streams for some graph, you should think about one to many relations. One graph can use multiple data streams from corresponding data sources.

To bring more clarity into relations between graphs and streams, here is another example: Intention Graph. How could we build Intention Graph? The intentions of somebody could be totally different in different contexts. Is it week day or weekend? Is person static in the office or driving the car? Who are those peers that the person communicates a lot recently? What is a type of communication? What is a time of the day? What are person’s interests? What were previous intentions? As you see there could be data logged from machines, devices, comms, people, profiles etc. As a result we will build the Intention Graph and will be able to predict or prescribe what to do next.

 

Context from Data Graphs

Finally, having multiple data graphs we could work on the individual context, personal UX. Technically, it is hardly possible to deal with all those graphs easily. It’s not possible to overlay two graphs. It is called modality (as one PhD taught me). Hence you must split and work with single modality. Select which graph is most important for your needs, use it as skeleton. Convert relations from other graphs into other things, which you could apply to the primary graph. Build intelligence model for single modality graph with plenty of attributes from other graphs. Obtain personal/individual UX at the end.

Tagged , , , , , , , , , , , , , , , , , , , , , ,

Five Sources of Big Data

Some time ago I’ve described how to think when you build solutions from Big Data in the post Six Graphs of Big Data. Today I am going to look in the opposite direction, where Big Data come from? I see distinctive five sources of the data: Transactional, Crowdsourced, Social, Search and Machine. All the details are below.

Transactional Data

This is old good data, most familiar and usual for the geeks and managers. It’s plenty of RBDMSes, running or archived, on-premise and in the cloud. The majority of transactional data belong to corporations because the data was authored/created mainly by businesses. It was a golden era of Oracle and SQL Server (and some others). At some point the RDBMS technology appeared to be incapable of handling more transactional data, thus we got Teradata (and others) to fix the problem. But there was no significant shift for the way we work with those data sources. Data warehouses and analytic cubes are trending, but they were used for years already. Financial systems/modules of the enterprise architectures will continue to rely on transactional data solutions from Oracle or IBM.

Crowdsourced Data

This data source has emerged from the activity rather than from a type of technology. The phenomenon of Wikipedia confirmed that crowdsourcing really works. Much time passed since Wikipedia adoption by the masses… We got other fine data sources built by the crowds, for example, OpenStreetMaps, Flickr, Picasa, Instagram.

Interesting things happen with the rise of personal genetic testing (verifying DNA for millions of known markers via 23andme). This leads to public crowdsourced databases. More samples available, e.g. amateur astronomy. Volunteers do author useful data. The size of the crowdsourced data is increasing.

What differentiates it from transactional/enterprise data? It’s a price. Usually, crowdsourced data is free for use, with one of creative commons licenses. Often, the motivation for the creation of such data set is the digitization of our world or making a free alternative to paid content. With the rise of nanofactories, we will see the growth of 3D models of every physical product. By using crowdsourced models we will print the goods at home (or elsewhere).

Social Data

With the rise of Friendster–>MySpace–>Facebook and then others (Linkedin, Twitter, etc.) we got a new type of data — Social. It should not be mixed for Crowdsourced data, because of completely different nature of it. The social data is the digitization of ourselves as persons and our behavior. Social data is very well complementing the Crowdsourced data. Eventually, there will be digital representation of everyone… So far social profiles are good enough for meaningful use. Social data is dynamic, it is possible to analyze it in real-time. E.g. put Tweets or Facebook posts thru the Google Predictive API to grab emotions. I’m sure everybody intuitively understands this type of data source.

Search Data

This is my favorite. Not obvious for many of you, while really strong data source. Just recall how much do you search on Amazon or eBay? How do you search on Wikis (not messing up with Wikipedia)? Quora gets plenty of search requests. StackOverflow is a good source of search data within Information Technology. There are intranet searches within Confluence and SharePoint. If those search logs are analyzed properly, then it is clear about potential usefulness and business application. E.g. Intention Graph and Interest Graph are related to the search data.

There is a problem with “walled gardens” for search data… This problem is big, bigger than for social data, because public profiles are fully or partially available, while searches are kept behind the walls.

Machine Data

This is also my favorite. In the Internet of Things every physical thing will be connected. New things are designed to be connectable. Old things are got connected via M2M. Consumers adopted wearable technology. I’ve posted about it earlier. Go to Wearable Technology and Wearable Technology, Part II.

The cost of data gathering is decreasing. The cost of wireless data transfer is decreasing. The bandwidth of wireless transfer is increasing dramatically. Fraunhofer and KIT completed 100Gbps transmission. It’s fourteen times faster than the most robust 802.11ac. The moral is — measure everything, just gather data until it becomes Big Data, then analyze it properly and operate proactively. Machine data is probably the most important data source for Big Data during the next years. We will digitize the world and ourselves via devices. Open Street Map got competitors, the fleet of eBees described Matterhorn with million of spatial points. More to expect from machines.

Tagged , , , , , , , , , , , , , , , , , , , , ,

Six Graphs of Big Data

This post is about Big Data. We will talk about the value and economical benefits of Big Data, not the atoms that constitute it [Big Data]. For the atoms you can refer to Wearable Technology or Getting Ready for the Internet of Things by Alex Sukholeyster, or just logging of the clickstream… and you will get plenty of data, but it will be low-level, atom level, not much useful.

The value starts at the higher levels, when we use social connections of the people, understand their interests and consumptions, know their movement, predict their intentions, and link it all together semantically. In other words, we are talking about six graphs: Social, Interest, Consumption, Intention, Mobile and Knowledge. Forbes mentions five of them in Strategic Big Data insight. Gartner provided the report “The Competitive Dynamics of the Consumer Web: Five Graphs Deliver a Sustainable Advantage”, it is a paid resource, unfortunately. It would be fine to look inside, but we can move forward with our vision, then compare to Gartner’s and analyze the commonality and variability. I foresee that our vision is wider and more consistent!

Social Graph

This is mostly analyzed and discussed graph. It is about connections between people. There are fundamental researches about it, like Six degrees of separation. Since LiveJournal times (since 1999), the Social Graph concept has been widely adopted and implemented. Facebook and its predecessors for non-professionals, LinkedIn mainly for professionals, and then others such as Twitter, Pinterest. There is a good overview of Social Graph Concepts and Issues on ReadWrite. There is good practical review of the social graph by one of its pioneers, Brad Fitzpatrick, called Thoughts on the Social Graph. Mainly he reports a problem of the absence of a single graph that is comprehensive and decentralized. It is a pain for integrations because of all those heterogeneous authentications and “walled garden” related issues.

Regarding the implementation of the Social Graph, there are advises from the successful implementers, such as Pinterest. The official Pinterest engineering blog revealed how to Build a Follower Model from scratch. We can look at the same thing [Social Graph] from a totally different perspective – technology. The modern technology provider Redis features a tutorial on how to Build a Twitter clone in PHP and (of course) Redis. So the situation with Social Graph is less or more established. Many build it, but nobody solved the problem of having a single consistent independent graph (probably built from other graphs).

Interest Graph

It is a representation of the specific things in which an individual is interested. Read more about Interest Graph on Wikipedia. This is the next hot graph after the social. Indeed, the Interest Graph complements the Social one. Social Commerce sees the Interest + Social Graphs together. People provide the raw data on their public and private profiles. Crawling and parsing of that data, plus special analysis is capable of building the Interest Graph for each of you. Gravity Labs created a special technology for building the Interest Graph. They call it Interest Graph Builder. There is an overview (follow the previous link) and a demo. There are ontologies, entities, entity matching, etc. Interesting insight about the Future of Interest Graph is authored by Pinterest’s head of engineering. The idea is to improve Amazon’s recommendation engine, based on the classifiers (via pins). Pinterest knows the reasoning, “why” users pinned something, while Amazon doesn’t know. We are approaching the Intention Graph.

Intention Graph

Not much could be said about intentions. It is about what we do and why we do.  Social and Interests are static in comparison to Intentions. This is related to prescriptive analytics, because it deals with the reasoning and motivation, “why” it happens or will happen. It seems that other graphs together could reveal much more about intentions, than trying to figure them [Intentions] out separately.

Intention Graph is tightly bound to personal experience or personal UX. It was foreseen in far 1999, by Harvard Business Review, as Experience Economy. Many years were spent, but not much implemented towards personal UX. We still don’t stage a personal ad hoc experience from goods and services exclusively for each user. I predict that Social + Interest + Consumption + Mobile graphs will allow us to build useful Intention Graph and achieve capabilities to build/deliver individual experiences. When the individual is within the service, then we are ready to predict some intentions, but it is true when Service Design was done properly.

Consumption Graph

One of the most important graphs of Big Data. Some call it Payment Graph. But Consumption is a better name because we can consume without payment, Consumption Graph is relatively easy for e-commerce giants, like Amazon and eBay, but tricky for 3rd parties, like you. What if you want to know what the user consumes? There are no sources of such information. Both Amazon and eBay are “walled gardens”. Each tracks what you do (browse, buy, put into wish list, etc.), how you do it (when logging in, how long staying within, sequence of your activities, etc.), they send you some notifications/suggestions and measure how do you react, and many other tricks how to handle descriptive, predictive and prescriptive analytics. But what if a user buys from other e-stores? There is the same problem as with Social Graph. IMHO there should be a mechanism to grab the user’s Consumption Graph from sub-graphs (if the user identifies herself).

Well, but there is still big portion of retail consumption. How to they build your Consumption Graph? Very easy, via loyalty cards. You think about discounts by using those cards, while retailers think about your Consumption Graph and predict what to do with all of the users/clients together and even individually. There is the same problem of disconnected Consumption Graphs as in e-commerce because each store has its own card. There are aggregators like Key Ring. Theoretically, they simplify the life of the consumer by shielding her from all those cards. But in reality, the back-end logic is able to build a bigger Consumption Graph for retail consumption! Another aspect: consumption of goods vs. consumption of services and experiences, is there a difference? What is the difference between hard goods and digital goods? There are other cool things about retail, like tracking clients and detecting their sex and age. It is all becoming the Consumption Graph. Think about that yourself:)

Anyway, Consumption Graph is very interesting, because we are digitizing this World. We are printing digital goods on 3D printers. So far the shape and look & feel are identical to the cloned product (e.g. cup), but the internals are different. As soon as 3D printer will be able to reconstruct the crystal structure, it will be a brand new way of consumption. It is thrilling and wide topic, hence I am going to discuss it separately. Keep in touch to not miss it.

Mobile Graph

This graph is built from mobile data. It does not mean the data comes from mobile phones. Today may be the majority of data is still generated by smartphones, but tomorrow it will not be the truth. Check out Wearable Technology to figure out why. A second important notion is about the views of the understanding of the Mobile Graph. Marketing based view described on Floatpoint is indeed about smartphone usage. It is considered that Mobile Graph is a map of interactions (with contexts how people interact) such as Web, social apps/bookmarks/sharing, native apps, GPS and location/checkins, NFC, digital wallets, media authoring, pull/push notifications. I would view the Mobile Graph as a user-in-motion. Where user resides at each moment (home, office, on the way, school, hospital, store, etc.), how user relocates (fast by car, slow by bike, very slow by feet; or uniformly or not, e.g. via public transport), how user behaves on each location (static, dynamic, mixed), what other users’ motions take place around (who else traveled the same route, or who also reside in the same location for that time slot) and so on. I am looking at the Motion Graph more as to the Mesh Network.

Why dynamic networking view makes more sense? Consider users as people and machines. Recall about IoT and M2M. Recall the initiatives by Ford and Nokia for resolving the gridlock problems in real-time. Mobile Graphs is better related to motion, mobility, i.e. to the essence of the word “mobile”. If we consider it from a motion point of view and add/extend with the marketing point of view, we will get a pretty useful model for the user and society. Mobile Graph is not for oneself. At least it is more efficient for many than for one.

Knowledge Graph

This is a monster one. It is about the semantics between all digital and physical things. Why Google rocks still? Because they built the Knowledge Graph. You can see it in action here. Check out interesting tips & tricks here. Google’s Knowledge Graph is a tool to find the UnGoogleable. There is a post on Blumenthals that Google’s Local Graph is much better than Knowledge, but this probably will be eliminated with time. IMHO their Knowledge Graph is being taught iteratively.

As Larry Page said many times, Google is not a search engine or ads engine, but the company that is building the Artificial Intelligence. Ray Kurzweil joined Google to simulate the human brain and recreate the kind of intelligence. Here is a nice article How Larry Page and Knowledge Graph helped to seduce Ray Kurzweil to join Google. “The Knowledge Graph knows that Santa Cruz is a place and that this list of places is related to Santa Cruz”.

We can look at those graphs together. Social will be in the middle because we (people) like to be in the center of the Universe:) The Knowledge Graph could be considered as meta-graph, penetrating all other graphs, or as super-graph, including multiple parts from other graphs. Even now, the Knowledge Graph is capable of handling dynamics (e.g. flight status).

Other Graphs

There are other graphs in the world of Big Data. The technology ecosystems are emerging around those graphs. The boost is expected from the Biotech. There is plenty of gene data, but lack of structured information on top of it. Brand new models (graphs) to emerge, with ease of understanding those terabytes of data. Circos was invented in the field of genomic data, to simplify the understanding of data via visualization. More experiments could be found on Visual Complexity web site. We are living in a different World than a decade ago. And it is exciting. Just plan your strategies correspondingly. Consider Big Data strategically.

Tagged , , , , , , , , , , , , , , , , , , , , , , , ,