Tag Archives: five

Big Data Graphs Revisited

Some time ago I’ve outlined Six Graphs of Big Data as a pathway to the individual user experience. Then I’ve did the same for Five Sources of Big Data. But what’s between them remained untold. Today I am going to give my vision how different data sources allow to build different data graphs. To make it less dependent on those older posts, let’s start from the real-life situation, business needs, then bind to data streams and data graphs.


Context is a King

Same data in different contexts has different value. When you are late to the flight, and you got message your flight was delayed, then it is valuable. In comparison to receiving same message two days ahead, when you are not late at all. Such message might be useless if you are not traveling, but airline company has your contacts and sends such message on the flight you don’t care about. There was only one dimension – time to flight. That was friendly description of the context, to warm you up.

Some professional contexts are difficult to grasp by the unprepared. Let’s take situation from the office of some corporation. Some department manager intensified his email communication with CFO, started to use a phone more frequently (also calling CFO, and other department managers), went to CFO office multiple times, skipped few lunches during a day, remained at work till 10PM several days. Here we got multiple dimensions (five), which could be analyzed together to define the context. Most probably that department manager and CFO were doing some budgeting: planning or analysis/reporting. Knowing that, it is possible to build and deliver individual prescriptive analytics to the department manager, focused and helping to handle budget. Even if that department has other escalated issues, such as release schedule or so. But severity of the budgeting is much higher right away, hence the context belongs to the budgeting for now.

By having data streams for each dimension we are capable to build run-time individual/personal context. Data streams for that department manager were kind of time series, events with attributes. Email is a dimension we are tracking; peers, timestamps, type of the letter, size of the letter, types and number of attachments are attributes. Phone is a dimension; names, times, durations, number of people etc. are attributes. Location is a dimension; own office, CFO’s office, lunch place, timestamps, durations, sequence are attributes. And so on. We defined potentially useful data streams. It is possible to build an exclusive context out of them, from their dynamics and patterns. That was more complicated description of the context.


Interpreting Context

Well, well, but how to interpret those data streams, how to interpret the context? What we have: multiple data streams. What we need: identify the run-time context. So, the pipeline is straightforward.

First, we have to log the Data, from each interested dimension. It could be done via software or hardware sensors. Software sensors are usually plugins, but could be more sophisticated, such as object recognition from surveillance cameras. Hardware sensors are GPS, Wi-Fi, turnstiles. There could be combinations, like check-in somewhere. So, think that it could be done a lot with software sensors. For the department manager case, it’s plugin to Exchange Server or Outlook to listen to emails, plugin to ATS to listen to the phone calls and so on.

Second, it’s time for low-level analysis of the data. It’s Statistics, then Data Science. Brute force to ensure what is credible or not, then looking for the emerging patterns. Bottleneck with Data Science is a human factor. Somebody has to look at the patterns to decrease false positives or false negatives. This step is more about discovery, probing and trying to prepare foundation to more intelligent next step. More or less everything clear with this step. Businesses already started to bring up their data science teams, but they still don’t have enough data for the science:)

Third, it’s Data Intelligence. As MS said some time ago “Data Intelligence is creating the path from data to information to knowledge”. This should be described in more details, to avoid ambiguity. From Technopedia: “Data intelligence is the analysis of various forms of data in such a way that it can be used by companies to expand their services or investments. Data intelligence can also refer to companies’ use of internal data to analyze their own operations or workforce to make better decisions in the future. Business performance, data mining, online analytics, and event processing are all types of data that companies gather and use for data intelligence purposes.” Some data models need to be designed, calibrated and used at this level. Those models should work almost in real-time.

Fourth, is Business Intelligence. Probably the first step familiar to the reader:) But we look further here: past data and real-time data meet together. Past data is individual for business entity. Real-time data is individual for the person. Of course there could be something in the middle. Go find comparison between stats, data science, business intelligence.

Fifth, finally it is Analytics. Here we are within individual context for the person. There worth to be a snapshot of ‘AS-IS’ and recommendations of ‘TODO’, if the individual wants, there should be reasoning ‘WHY’ and ‘HOW’. I have described it in details in previous posts. Final destination is the individual context. I’ve described it in the series of Advanced Analytics posts, link for Part I.

Data Streams

Data streams come from data sources. Same source could produce multiple streams. Some ideas below, the list is unordered. Remember that special Data Intelligence must be put on top of the data from those streams.

In-door positioning via Wi-Fi hotspots contributing to mobile/mobility/motion data stream. Where the person spent most time (at working place, in meeting rooms, on the kitchen, in the smoking room), when the person changed location frequently, directions, durations and sequence etc.

Corporate communication via email, phone, chat, meeting rooms, peer to peer, source control, process tools, productivity tools. It all makes sense for analysis, e.g. because at the time of release there should be no creation of new user stories. Or the volumes and frequency of check-ins to source control…

Biometric wearable gadgets like BodyMedia to log intensity of mental (or physical) work. If there is low calories burn during long bad meetings, then that could be revealed. If there is not enough physical workload, then for the sake of better emotional productivity, it could be suggested to take a walk.


Data Graphs from Data Streams

Ok, but how to build something tangible from all those data streams? The relation between Data Graphs and Data Streams is many to many. Look, it is possible to build Mobile Graph from the very different data sources, such as face recognition from the camera, authentication at the access point, IP address, GPS, Wi-Fi, Bluetooth, check-in, post etc. Hence when designing the data streams for some graph, you should think about one to many relations. One graph can use multiple data streams from corresponding data sources.

To bring more clarity into relations between graphs and streams, here is another example: Intention Graph. How could we build Intention Graph? The intentions of somebody could be totally different in different contexts. Is it week day or weekend? Is person static in the office or driving the car? Who are those peers that the person communicates a lot recently? What is a type of communication? What is a time of the day? What are person’s interests? What were previous intentions? As you see there could be data logged from machines, devices, comms, people, profiles etc. As a result we will build the Intention Graph and will be able to predict or prescribe what to do next.


Context from Data Graphs

Finally, having multiple data graphs we could work on the individual context, personal UX. Technically, it is hardly possible to deal with all those graphs easily. It’s not possible to overlay two graphs. It is called modality (as one PhD taught me). Hence you must split and work with single modality. Select which graph is most important for your needs, use it as skeleton. Convert relations from other graphs into other things, which you could apply to the primary graph. Build intelligence model for single modality graph with plenty of attributes from other graphs. Obtain personal/individual UX at the end.

Tagged , , , , , , , , , , , , , , , , , , , , , ,

Next Five Years of Healthcare

This insight is related to all of you and your children and relatives. It is about the health and healthcare. I feel confident to envision the progress for five years, but cautious to guess for longer. Even next five years seem pretty exciting and revolutionary. Hope you will enjoy they pathway.

We have problems today

I will not bind this to any country, hence American readers will not find Obamacare, ACO or HIE here. I will go globally as I like to do.

The old industry of healthcare still sucks. It sucks everywhere in the world. The problem is in uncertainty of our [human] nature. It’s a paradox: the medicine is one of the oldest practices and sciences, but nowadays it is one of least mature. We still don’t know for sure why and how are bodies and souls operate. The reverse engineering should continue until we gain the complete knowledge.

I believe there were civilisations tens of thousands years ago… but let’s concentrate on ours. It took many years to start in-depth studying ourselves. Leonardo da Vinci did breakthrough into anatomy in early 1500s. The accuracy of his anatomical sketches are amazing. Why didn’t others draw at the same level of perfection? The first heart transplant was performed only in 1967 in Cape Town by Christiaan Barnard. Today we are still weak at brain surgeries, even the knowledge how brain works and what is it. Paul Allen significantly contributed to the mapping of the brain. The ambitious Human Genome project was performed only in early 2000s, with 92% of sampling at 99.99% accuracy. Today, there is no clear vision or understanding what majority of DNA is for. I personally do not believe into Junk DNA, and ENCODE project confirmed it might be related to the protein regulation. Hence there is still plenty of work to complete…

But even with the current medical knowledge the healthcare could be better. Very often the patient is admitted from the scratch as a new one. Almost always the patient is discharged without proper monitoring of the medication, nutrition, behaviour and lifestyle. There are no mechanisms, practices or regulations to make it possible. For sure there are some post-discharge recommendations, assignments to the aftercare professionals, but it is immature and very inaccurate in comparison to what it could be. There are glimpses of telemedicine, but it is still very immature.

And finally, the healthcare industry in comparison to other industries such as retail, media, leisure and tourism is far behind in terms of consumer orientation. Even automotive industry is more consumer oriented than healthcare today. Economically speaking, there must be transformation to the consumer centric model. It is the same winning pattern across the industries. It [consumerism] should emerge in healthcare too. Enough about the current problems, let’s switch to the positive things – technology available!

There could be Care Anywhere

We need care anywhere. Either it is underground in the diamond mine, or in the ocean on-board of Queen Mary 2, or in the medical center or at home, at secluded places, or in the car, bus, train or plane.

There is wireless network (from cell providers), there are wearable medical devices, there is a smartphone as a man-in-the-middle to connect with the back-end. It is obvious that diagnostics and prevention, especially for the chronical diseases and emergency cases (first aid, paramedics) could be improved.

care anywhere

I personally experienced two emergency landings, once by being on-board of the six hour flight, second time by driving for the colleague to another airport. The impact is significant. Imagine that 300+ people landed in Canada, then according to the Canadian law all luggage was unloaded, moved to X-ray, then loaded again; we all lost few hours because of somebody’s heart attack.

It could be prevented it the passenger had heart monitor, blood pressure monitor, other devices and they would trigger the alarm to take the pill or ask the crew for the pill in time. The best case is that all wearable devices are linked to the smartphone [it is often allowed to turn on Bluetooth or Wi-Fi in airplane mode]. Then the app would ring and display recommendations to the passenger.

4P aka Four P’s

The medicine should go Personal, Predictive, Preventive and Participatory. It will become so in five years.

Personal is already partially explained above. Besides consumerism, which is a social or economic aspect, there should be really biological personal aspect. We all are different by ~6 million genes. That biological difference does matter. It defines the carrier status for illnesses, it is related to risks of the illnesses, it is related to individual drug response and it uncovers other health-related traits [such as Lactose Intolerance or Alcohol Addiction].

Personal medicine is an equivalent to the Mobile Health. Because you are in motion and you are unique. The single sufficiently smart device you carry with you everywhere is a smartphone. Other wearable devices are still not connected [directly into the Internet of Things]. Hence you have to use them all with the smartphone in the middle.

The shift is from volume to value. From pay to procedures to pay for performance. The model becomes outcome based. The challenge is how to measure performance: good treatment vs. poor bedside, poor treatment vs. good bedside and so on.

Predictive is a pathway to the healthcare transformation. As healthcare experts say: “the providers are flying blind”. There is no good integration and interoperability between providers and even within a single provider. The only rationale way to “open the eyes” is analytics. Descriptive analytics to get a snapshot of what is going on, predictive analytics to foresee the near future and make right decisions, and prescriptive analytics to know even better the reasoning of the future things.

Why there is still no good interoperability? Why there is no wide HL7 adoption? How many years have gone since those initiatives and standards? My personal opinion is that the current [and former] interoperability efforts are the dead end. The rationale is simple: if it worth to be done, it would be already done. There might be something in the middle – the providers will implement interoperability within themselves, but not at the scale of the state or country or globally.

Two reasons for “dead interop”. First is business related. Why should I share my stuff with others? I spent on expensive labs or scans, I don’t want others to benefit from my investments into this patient treatment. Second is breakthrough in genomics and proteomics. Only 20 minutes needed to purify the DNA from the body liquids with Zymo Research DNA Kit. Genome in 15 minutes under $100 has been planned by Pacific Biosciences by this year. Intel invested 100 million dollars into Pacific Biosciences in 2008. Besides gene mechanisms, there are others, not related to DNA change. They are also useful for analysis, predicting and decision making per individual patient. [Read about epigenetics for more details]. There is a third reason – Artificial Intelligence. We already classify with AI, very soon will put much more responsibility onto AI.

Preventive is very interesting transformation, because it is blurring the boarders between treatment and behaviour/lifestyle/wellness and between drugs and nutrition. It is directly related to the chronic diseases and to post-discharge aftercare, even self aftercare. To prevent from readmission the patient should take proper medication, adjust her behaviour and lifestyle, consume special nutrition. E.g. diabetes patients should eat special sugar-free meal. There is a question where drug ends and where nutrition starts? What Coca Cola Diet is? First step towards the drugs?

Pharmacogenomics is on the rise to do proactive steps into the future, with known individual’s response to the drugs. It is both predictive and preventive. It will be normal that mass universal drugs will start to disappear, while narrowly targeted drugs will be designed. Personal drugs is a next step, when the patient is a foundation for almost exclusive treatment.

Participatory is interesting in the way that non-healthcare organisations become related to the healthcare. P&G produce sun screens, designed by skin type [at molecular level], for older people and for children. Nestle produces dietary food. And recall there are Johnson & Johnson, Unilever and even Coca Cola. I strongly recommend to investigate PWC Health practice for the insights and analysis.

Personal Starts from Wearable

The most important driver for the adoption of wearable medical devices is ageing population. The average age of the population increases, while the mobility of the population decreases. People need access to healthcare from everywhere, and at lower cost [for those who retired]. Chronic diseases are related to the ageing population too. Chronic diseases require constant control, interventions of physician in case of high or low measurements. Such control is possible via multiple medical devices. Many of them are smartphone-enabled, where corresponding application runs and “decides” what to tell to the user.

Glucose meter is much smaller now, here is a slick one from iBGStar. Heart rate monitors are available in plenty of choices. Fitness trackers and dietary apps are present as vast majority of [mobile health] apps in the stores. Wrist bands are becoming the element of lifestyle, especially with fashionably designed Jawbone Up. Triceps band BodyMedia is good for calories tracking. Add here wireless weight… I’ve described gadgets and principles in previous posts Wearable Technology and Wearable Technology, Part II. Here I’d like to distinguish Scanadu Scout, measuring vitals like temperature, heart rate, oxymetry [saturation of your hemoglobin], ECG, HRV, PWTT, UA [urine analysis] and mood/stress. Just put appropriate gadgets onto your body, gather data, analyse and apply predictive analytics to react or to prevent.


Personal is a Future of Medicine

If you think about all those personal gadgets and brick mobile phones as sub-niche within medicine, then you are deeply mistaken. Because the medicine itself will become personal as a whole. It is a five year transition from what we have to what should be [and will be]. Computer disappears, into the pocket and into the cloud. All pocket sized and wearable gadgets will miniaturise, while cloud farms of servers will grow and run much smarter AI.

Everybody of us will become a “thing” within the Internet of Things. IoT is not a Facebook [it’s too primitive], but it is quantified and connected you, to the intelligent health cloud, and sometimes to the physicians and other people [patients like you]. This will happen within next 5-10 years, I think rather sooner or later. The technology changes within few years. There were no tablets 3.5 years ago, now we have plenty of them and even new bendable prototypes. Today we experience first wearable breakthroughs, imagine how it will advance within next 3 years. Remember we are accelerating, the technology is accelerating. Much more to come and it will change out lives. I hope it will transform the healthcare dramatically. Many current problems will become obsolete via new emerging alternatives.

Predictive & Preventive is AI

Both are AI. Period. Providers must employ strong mathematicians and physicists and other scientists to create smarter AI. Google works on duplication of the human brain on non-biological carrier. Qualcomm designs neuro chips. IBM demonstrated brainlike computing. Their new computing architecture is called TrueNorth.

Other healthcare participatory providers [technology companies, ISVs, food and beverage companies, consumer goods companies, pharma and life sciences] must adopt strong AI discipline, because all future solutions will deal with extreme data [even All Data], which is impossible to tame with usual tools. Forget simple business logic of if/else/loop. Get ready for the massive grid computing by AI engines. You might need to recall all math you was taught and multiply it 100x. [In case of poor math background get ready to 1000x efforts]

Education is a Pathway

Both patients and providers must learn genetics, epigenetics, genomics, proteomics, pharmacogenomics. Right now we don’t have enough physicians to translate your voluntarily made DNA analysis [by 23andme] to personal treatment. There are advanced genetic labs that takes your genealogy and markers to calculate the risks of diseases. It should be simpler in the future. And it will go through the education.

Five years is a time frame for the new student to become a new physician. Actually slightly more needed [for residency and fellowship], but we could consider first observable changes in five years from today. You should start learning it all for your own needs right now, because you also must be educated to bring better healthcare to ourselves!


Tagged , , , , , , , , , , , , , , , ,

Five Sources of Big Data

Some time ago I’ve described how to think when you build solutions from Big Data in the post Six Graphs of Big Data. Today I am going to look in the opposite direction, where Big Data come from? I see distinctive five sources of the data: Transactional, Crowdsourced, Social, Search and Machine. All details are below.

Transactional Data

This is old good data, most familiar and usual for the geeks and managers. It’s plenty of RBDMSes, running or archived, on premise and in the cloud. Majority of transactional data belong to corporations, because the data was authored/created mainly by businesses. It was a golden era of Oracle and SQL Server (and some others). At some point the RDBMS technology appeared to be incapable of handling more transactional data, thus we got Teradata (and others) to fix the problem. But there was no significant shift for the way we work with those data sources. Data warehouses and analytic cubes are trending, but they were used for years already. Financial systems/modules of the enterprise architectures will continue to rely on transactional data solutions from Oracle or IBM.

Crowdsourced Data

This data source has emerged from the activity rather than from type of technology. The phenomenon of Wikipedia confirmed that crowdsourcing really works. Much time passed since Wikipedia adoption by the masses… We got other fine data sources built by the crowds, for example Open Street Maps, Flickr, Picasa, Instagram.

Interesting things happen with the rise of personal genetic testing (verifying DNA for million of known markers via 23andme). This leads to public crowdsourced databases. More samples available, e.g. amateur astronomy. Volunteers do author useful data. The size of crowdsourced data is increasing.

What differentiates it from transactional/enterprise data? It’s a price. Usually crowdsourced data is free for use, with one of creative commons licenses. Often, the motivation for creation of such data set is digitization of our world or making free alternative to paid content. With the rise of nanofactories, we will see the growth of 3D models of every physical product. By using crowdsourced models we will print the goods at home (or elsewhere).

Social Data

With the rise of Friendster–>MySpace–>Facebook and then others (Linkedin, Twitter etc.) we got new type of data — Social. It should not be mixed for Crowdsourced data, because of completely different nature of it. The social data is a digitization of ourselves as persons and our behavior. Social data is very well complementing the Crowdsourced data. Eventually there will be digital representation of everyone… So far social profiles are good enough for meaningful use. Social data is dynamic, it is possible to analyze it in real-time. E.g. put Tweets or Facebook posts thru the Google Predictive API to grab emotions. I’m sure everybody intuitively understands this type of data source.

Search Data

This is my favourite. Not obvious for many of you, while really strong data source. Just recall how much do you search on Amazon or eBay? How do you search on Wikis (not messing up with Wikipedia). Quora gets plenty of search requests. StackOverflow is a good source of search data within Information Technology. There are intranet searches within Confluence and SharePoint. If those search logs are analyzed properly, then it is clear about potential usefulness and business application. E.g. Intention Graph and Interest Graph are related to the search data.

There is a problem of “walled gardens” for search data… This problem is big, bigger than for social data, because public profiles are fully or partially available, while searches are kept behind the walls.

Machine Data

This is also my favourite. In the Internet of Things every physical thing will be connected. New things are designed to be connectable. Old things are got connected via M2M. Consumers adopted wearable technology. I’ve posted about it earlier. Go to Wearable Technology and Wearable Technology, Part II.

The cost of data gathering is decreasing. The cost of wireless data transfer is decreasing. The bandwidth of wireless transfer is increasing dramatically. Fraunhofer and KIT completed 100Gbps transmission. It’s fourteen times faster than the most robust 802.11ac. The moral is — measure everything, just gather data until it become Big Data, then analyze it properly and operate proactively. Machine data is probably the most important data source for Big Data during next years. We will digitize the world and ourselves via devices. Open Street Map got competitors, the fleet of eBees described Matterhorn with million of spatial points. More to expect from machines.

Tagged , , , , , , , , , , , , , , , , , , , , ,