Tag Archives: machine

Five Sources of Big Data

Some time ago I’ve described how to think when you build solutions from Big Data in the post Six Graphs of Big Data. Today I am going to look in the opposite direction, where Big Data come from? I see distinctive five sources of the data: Transactional, Crowdsourced, Social, Search and Machine. All details are below.

Transactional Data

This is old good data, most familiar and usual for the geeks and managers. It’s plenty of RBDMSes, running or archived, on premise and in the cloud. Majority of transactional data belong to corporations, because the data was authored/created mainly by businesses. It was a golden era of Oracle and SQL Server (and some others). At some point the RDBMS technology appeared to be incapable of handling more transactional data, thus we got Teradata (and others) to fix the problem. But there was no significant shift for the way we work with those data sources. Data warehouses and analytic cubes are trending, but they were used for years already. Financial systems/modules of the enterprise architectures will continue to rely on transactional data solutions from Oracle or IBM.

Crowdsourced Data

This data source has emerged from the activity rather than from type of technology. The phenomenon of Wikipedia confirmed that crowdsourcing really works. Much time passed since Wikipedia adoption by the masses… We got other fine data sources built by the crowds, for example Open Street Maps, Flickr, Picasa, Instagram.

Interesting things happen with the rise of personal genetic testing (verifying DNA for million of known markers via 23andme). This leads to public crowdsourced databases. More samples available, e.g. amateur astronomy. Volunteers do author useful data. The size of crowdsourced data is increasing.

What differentiates it from transactional/enterprise data? It’s a price. Usually crowdsourced data is free for use, with one of creative commons licenses. Often, the motivation for creation of such data set is digitization of our world or making free alternative to paid content. With the rise of nanofactories, we will see the growth of 3D models of every physical product. By using crowdsourced models we will print the goods at home (or elsewhere).

Social Data

With the rise of Friendster–>MySpace–>Facebook and then others (Linkedin, Twitter etc.) we got new type of data — Social. It should not be mixed for Crowdsourced data, because of completely different nature of it. The social data is a digitization of ourselves as persons and our behavior. Social data is very well complementing the Crowdsourced data. Eventually there will be digital representation of everyone… So far social profiles are good enough for meaningful use. Social data is dynamic, it is possible to analyze it in real-time. E.g. put Tweets or Facebook posts thru the Google Predictive API to grab emotions. I’m sure everybody intuitively understands this type of data source.

Search Data

This is my favourite. Not obvious for many of you, while really strong data source. Just recall how much do you search on Amazon or eBay? How do you search on Wikis (not messing up with Wikipedia). Quora gets plenty of search requests. StackOverflow is a good source of search data within Information Technology. There are intranet searches within Confluence and SharePoint. If those search logs are analyzed properly, then it is clear about potential usefulness and business application. E.g. Intention Graph and Interest Graph are related to the search data.

There is a problem of “walled gardens” for search data… This problem is big, bigger than for social data, because public profiles are fully or partially available, while searches are kept behind the walls.

Machine Data

This is also my favourite. In the Internet of Things every physical thing will be connected. New things are designed to be connectable. Old things are got connected via M2M. Consumers adopted wearable technology. I’ve posted about it earlier. Go to Wearable Technology and Wearable Technology, Part II.

The cost of data gathering is decreasing. The cost of wireless data transfer is decreasing. The bandwidth of wireless transfer is increasing dramatically. Fraunhofer and KIT completed 100Gbps transmission. It’s fourteen times faster than the most robust 802.11ac. The moral is — measure everything, just gather data until it become Big Data, then analyze it properly and operate proactively. Machine data is probably the most important data source for Big Data during next years. We will digitize the world and ourselves via devices. Open Street Map got competitors, the fleet of eBees described Matterhorn with million of spatial points. More to expect from machines.

Advertisements
Tagged , , , , , , , , , , , , , , , , , , , , ,

Wearable Technology. Part II

This story is a logical continuation of the previously published Wearable Technology.

Calories and Workouts

Here I will show how two different wearable gadgets complement each other for Quantified Self.  For the beginning we need two devices, one is wearable on yourself, second is wearable by your bike.

First device is called BodyMedia, world’s most precise calories meter. It measures 5,000 data snapshots per minute from galvanic skin response, heat flux, skin temperature and 3-axis accelerometer. You can read more about BodyMedia’s sensors online. BodyMedia uses extensive machine learning to classify your activity as cycling, then measuring calories burned according to the cycling Big Data set used during learning. Check out this paper: Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure for excellent description how AI works.

Second device is called Garmin Edge 500, simple and convenient bike computer. It has GPS, barometric altimeter, thermometer, motion detection and more features for workouts. You can read more about Garmin Edge 500 spec online. My gadgets are pictured herein.

04_gadgets

On the Route

The route was proposed by Mykola Hlibovych, a distinguished bike addict. So I put my gadgets on and measured it all. Below is info about the route. Summary info such as distance, time, speed, pace, temperature, elevation is provided by Garmin. it tries to guess about the calories too, but it is really poor at that. You should know there is no “silver bullet” and understand what to use for what. Garmin is one of the best GPS trackers, hence don’t try to measure calories with it.

Juxtaposition of elevation vs. speed and temperature vs. elevation is interesting for comparison. Both charts are provided by distance (rather than time). 2D route on the map is pretty standard thing. Garmin uses Bing Maps.

02_map_elev_speed_temp_dist

Burning Calories

Let’s look at BodyMedia and redraw Garmin charts of speed, elevation and temperature along the time (instead of distance) and stack them together for comparison/analysis. All three charts are aligned along the horizontal time line. Upper chart is real-time calories burn, measured also in METS. The vertical axis reflects Calories per Minute. Several times I burned at the rate of 11 cal/min with was really hot. The big downtime between 1PM and 2:30PM was a lunch.

An interesting fact is observable on Temperature chart – the Garmin was warm itself and was cooling down to the ambient temperature. After that it starter to record the temperature correctly. Another moment is a small spike in speed during downtime window. It was Zhenia Novytskyy trying my bike to compare with his.

01_calories_elev_speed_temp_time

Thorough Analysis

For detailed analysis of the performance on the route there is animated playback. It is published on Garmin Cloud. You just need to have Flash Player. Click this link if WordPress does not render the embedded route player from Garmin Cloud. There is iframe instruction below. You may experience some ads from them I think (because the service is free) …

The Mud

Wearable technology works in different conditions:)

03_mad

 

 

 

Tagged , , , , , , , , , , , , , , , , , , , , ,

End of the World

This post has been triggered by speculations I heard and hear. There is a buzz in the air about something, but nobody seems frightened. Many talk about the end of the world and continue to live the same life. Strange? Not at all. People feel there is no end of life. Hence we could entitle this post as End of the World vs. End of the Life. The fact is that nobody worries about the end of life. Then there is a question: what does End of the World mean?

What does End of the World mean?

Easiest answer is that it means the end of the Current World, that we used to. The end of the burning oil, end of gasoline cars, end of American dominance. Is this so difficult to predict to pay attention to it? No. Then what is a real end of the current world? What that current world is?

My vision (shared with others) is that the current world is defined by biological civilization – humans. The end of the current world will start when machine intelligence will equal human intelligence. We (humans) are creators of new civilization – machine civilization. We will be treated as Gods by them. They will be more intelligent than we are. They will respect us, remember every bit of information about each of us. Hence upload everything to the clouds to be indexed for future:)

Below is a diagram by Ray Kurzweil with predicted ‘end of the world’ to begin somewhere near 2025. When machine intelligence achieve the level of single human brain. There are some concerns about that, because our brain works differently than machine does. Our brain is capable of parallel recognition of patterns, but is too slow with calculations. Machine is poor at recognition (not even saying about concurrent recognition), but is fast at calculations. Can super fast calculations compensate capability of parallel recognition? Probably yes. Our processes are biological, chemical and electrical. Machines will do it probably part electrical and part photon-based. At the end of the day we can measure the steady progress, machines beat humans in chess, in poker and so on. Machines get ‘smarter’.

ExponentialGrowthofComputing

What is going on?

This point in our evolution is called Singularity. We can observe accelerated returns from technology. New technologies are created faster and effect from them happens faster. This is perfectly depicted on two other diagrams by Ray Kurzweil. One is logarithmic to ensure all major events are aligned along the line. Second is as-is, to emphasis accelerated returns, to point to the expected moment of Singularity.

singularity

singularity

End of the World == Beginning of the World

The end of something was always a beginning of something else. The end of the current world will become a beginning of the New World. We should not be afraid of machines, because machines will be like us. Machines will not be able to outperform human intelligence without becoming human themselves. Artificial intelligence requires bring up, mentoring and coaching. If machines go that way then they will be not worse than humans. We have bad humans. We will have bad machines. But we will also have good machines, because there are many good humans. Initially machines will look like humans. Then the body will evolve.

Machines will create even better machines. Intelligence will grow and grow. The body will survive the low and high temperatures, will not afraid of radiation. Optical sensors can see in wider light waves range. Eventually the brand new epoch will start by us, by humans. Will humanity survive it or die? I don’t know. But definitely our expertise and knowledge will grow and spread beyond the Earth and solar system. Below is a diagram by Ray Kurzweil about six epochs, starting from the primitive evolution of the brain to the conquering of the Universe.

six epochs

We created technology and now at the epoch on merging technology with human intelligence. The technology epoch is a current world. We are just doing first steps in mastering the methods of biology… A lot of work to do, but it is exciting. It is brand New World! Happy 21th of December, 2012. The world will not end. The world will shift.

Tagged , , , , , , , , , , , , , , , , , , , ,