Tag Archives: device

Five Sources of Big Data

Some time ago I’ve described how to think when you build solutions from Big Data in the post Six Graphs of Big Data. Today I am going to look in the opposite direction, where Big Data come from? I see distinctive five sources of the data: Transactional, Crowdsourced, Social, Search and Machine. All details are below.

Transactional Data

This is old good data, most familiar and usual for the geeks and managers. It’s plenty of RBDMSes, running or archived, on premise and in the cloud. Majority of transactional data belong to corporations, because the data was authored/created mainly by businesses. It was a golden era of Oracle and SQL Server (and some others). At some point the RDBMS technology appeared to be incapable of handling more transactional data, thus we got Teradata (and others) to fix the problem. But there was no significant shift for the way we work with those data sources. Data warehouses and analytic cubes are trending, but they were used for years already. Financial systems/modules of the enterprise architectures will continue to rely on transactional data solutions from Oracle or IBM.

Crowdsourced Data

This data source has emerged from the activity rather than from type of technology. The phenomenon of Wikipedia confirmed that crowdsourcing really works. Much time passed since Wikipedia adoption by the masses… We got other fine data sources built by the crowds, for example Open Street Maps, Flickr, Picasa, Instagram.

Interesting things happen with the rise of personal genetic testing (verifying DNA for million of known markers via 23andme). This leads to public crowdsourced databases. More samples available, e.g. amateur astronomy. Volunteers do author useful data. The size of crowdsourced data is increasing.

What differentiates it from transactional/enterprise data? It’s a price. Usually crowdsourced data is free for use, with one of creative commons licenses. Often, the motivation for creation of such data set is digitization of our world or making free alternative to paid content. With the rise of nanofactories, we will see the growth of 3D models of every physical product. By using crowdsourced models we will print the goods at home (or elsewhere).

Social Data

With the rise of Friendster–>MySpace–>Facebook and then others (Linkedin, Twitter etc.) we got new type of data — Social. It should not be mixed for Crowdsourced data, because of completely different nature of it. The social data is a digitization of ourselves as persons and our behavior. Social data is very well complementing the Crowdsourced data. Eventually there will be digital representation of everyone… So far social profiles are good enough for meaningful use. Social data is dynamic, it is possible to analyze it in real-time. E.g. put Tweets or Facebook posts thru the Google Predictive API to grab emotions. I’m sure everybody intuitively understands this type of data source.

Search Data

This is my favourite. Not obvious for many of you, while really strong data source. Just recall how much do you search on Amazon or eBay? How do you search on Wikis (not messing up with Wikipedia). Quora gets plenty of search requests. StackOverflow is a good source of search data within Information Technology. There are intranet searches within Confluence and SharePoint. If those search logs are analyzed properly, then it is clear about potential usefulness and business application. E.g. Intention Graph and Interest Graph are related to the search data.

There is a problem of “walled gardens” for search data… This problem is big, bigger than for social data, because public profiles are fully or partially available, while searches are kept behind the walls.

Machine Data

This is also my favourite. In the Internet of Things every physical thing will be connected. New things are designed to be connectable. Old things are got connected via M2M. Consumers adopted wearable technology. I’ve posted about it earlier. Go to Wearable Technology and Wearable Technology, Part II.

The cost of data gathering is decreasing. The cost of wireless data transfer is decreasing. The bandwidth of wireless transfer is increasing dramatically. Fraunhofer and KIT completed 100Gbps transmission. It’s fourteen times faster than the most robust 802.11ac. The moral is — measure everything, just gather data until it become Big Data, then analyze it properly and operate proactively. Machine data is probably the most important data source for Big Data during next years. We will digitize the world and ourselves via devices. Open Street Map got competitors, the fleet of eBees described Matterhorn with million of spatial points. More to expect from machines.

Tagged , , , , , , , , , , , , , , , , , , , , ,

Wearable Technology. Part II

This story is a logical continuation of the previously published Wearable Technology.

Calories and Workouts

Here I will show how two different wearable gadgets complement each other for Quantified Self.  For the beginning we need two devices, one is wearable on yourself, second is wearable by your bike.

First device is called BodyMedia, world’s most precise calories meter. It measures 5,000 data snapshots per minute from galvanic skin response, heat flux, skin temperature and 3-axis accelerometer. You can read more about BodyMedia’s sensors online. BodyMedia uses extensive machine learning to classify your activity as cycling, then measuring calories burned according to the cycling Big Data set used during learning. Check out this paper: Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure for excellent description how AI works.

Second device is called Garmin Edge 500, simple and convenient bike computer. It has GPS, barometric altimeter, thermometer, motion detection and more features for workouts. You can read more about Garmin Edge 500 spec online. My gadgets are pictured herein.


On the Route

The route was proposed by Mykola Hlibovych, a distinguished bike addict. So I put my gadgets on and measured it all. Below is info about the route. Summary info such as distance, time, speed, pace, temperature, elevation is provided by Garmin. it tries to guess about the calories too, but it is really poor at that. You should know there is no “silver bullet” and understand what to use for what. Garmin is one of the best GPS trackers, hence don’t try to measure calories with it.

Juxtaposition of elevation vs. speed and temperature vs. elevation is interesting for comparison. Both charts are provided by distance (rather than time). 2D route on the map is pretty standard thing. Garmin uses Bing Maps.


Burning Calories

Let’s look at BodyMedia and redraw Garmin charts of speed, elevation and temperature along the time (instead of distance) and stack them together for comparison/analysis. All three charts are aligned along the horizontal time line. Upper chart is real-time calories burn, measured also in METS. The vertical axis reflects Calories per Minute. Several times I burned at the rate of 11 cal/min with was really hot. The big downtime between 1PM and 2:30PM was a lunch.

An interesting fact is observable on Temperature chart – the Garmin was warm itself and was cooling down to the ambient temperature. After that it starter to record the temperature correctly. Another moment is a small spike in speed during downtime window. It was Zhenia Novytskyy trying my bike to compare with his.


Thorough Analysis

For detailed analysis of the performance on the route there is animated playback. It is published on Garmin Cloud. You just need to have Flash Player. Click this link if WordPress does not render the embedded route player from Garmin Cloud. There is iframe instruction below. You may experience some ads from them I think (because the service is free) …

The Mud

Wearable technology works in different conditions:)





Tagged , , , , , , , , , , , , , , , , , , , , ,