Data Visualization – GPS logs from Google Location History

It’s been 10 days since I am back from a 5 month stint in Europe, and its 10 days to move on to my next stint (more on that very very soon 😉 ). Being in Mumbai has its perks, with the alma mater, friends, vada pav and Marine Drive being high points. However, 20 days of doing nothing can really get to you, and you need to keep on doing something just to keep sane.

I had wanted to do a data visualization project for quite some time, and I had always wondered what data I could use. Being back from Europe and having travelled a few cities, and owning a tablet, gave me the spark I was searching for. I had 5 months of GPS data (recorded automatically by my trusty companion, the Nexus 7) logging my travels. That amounted to 73,000 recordings, spread over 5 months, 6 cities and a lot of geography.

Well, Google Location History exported all the data to a KML file readily enough, and there I hit the first hurdle. Too much data. I had data from all over, starting from India to my transit there via the Middle East, and those points just seemed like “outliers” (data which was much different from the rest). So, before I got around to plotting anything, I figured I ought to delete some data, and I set about to do it. Unfortunately, the KML exported by Google Location History isn’t exactly the most beautiful.  Data is split across two lines, one with time and one with co-ordinates, and needs a bit of work to be made into CSV, so one can do some mathemagic with the numbers. So, what do you do ?

I learnt vim. I went and talked to a friend, and showed him what I was doing. He fired up vi, typed in a few commands, and there was my data for one city, all cleaned up and ready to be analysed. So, I went home that night, fired up vimtutor, and learnt vim. Totally worth the two hours I put into it.

Discussing what I was doing also helped me brush up statistics and algorithms. Lots of ways come to mind about ways to “cluster” the data. Since a lot of points were pretty much the same location, with all the quirks of GPS measurements, they could be replaced with just one point. Additionally, I could take a city’s data , combine it with my knowledge of my walk in the city, and use k-means clustering to cluster it into a few places that I hung out. That raised the question, did I really want to cluster the places, when the aim I had in keeping logging turned on was to record everywhere I had been and to later see and show all the streets that I had walked ? Ought I just show all the data points, or would it be nicer to show the path I followed, step by step, and with insights into how much time I spent where along the way, maybe for some other traveller looking for something like I had ?

Too many questions, or rather decisions and tests to make, and not too many answers. So, while I figured it out, I went ahead and plotted a few of the cities I visited on a map, using Google Fusion Tables (incredibly easy and handy). Here are a couple of those maps, until I decide to (and/or) finish this project. And if you want to play with this dataset yourself, you can get it here :

Also, some terms that you might like to look up, that I came across while working on this : [ clustering, k-means clustering, elbow method, visual block mode in vi, Manhattan distance, curse of dimensionality, forward difference, second forward difference, normalisation, outliers ]

Phototime – Photo organizer minion

Code is alright, but I have always been looking for opportunities to write small scripts, little things to automate life. Now, when I have the opportunity, I was only too glad to indulge myself.
I have been maintaining a photo-log since a few days, and that means a lot of photos everyday. The least I hope to gain out of this exercise is to know how my life progressed over a year.
So I figured I would have to name them consistently with their date and time as the file names.
What would let me do this ? Either have the camera’s processor run the code (very far-fetched, true, but it might be a possibility) or have a script on my computer to do the same.
So I chose this opportunity to write a small Python script to rename all photos in a folder in such a fashion.
What it exactly does is :
Supply the path to a photo folder as a command line argument, something like :
python –ftype=filetype –path=filepath
and the rest is taken care of.
As an additional step, you could copy it to your “bin” folder which is in the shell path, rename it as “phototime” without the “.py” extension, make it executable (chmod a+x), and then enjoy the added ease of use. A more detailed documentation is on the Github page linked below
I’ve uploaded the project to a Github repository. You can download it from here.
Please leave back feedback if its useful or if it sucked. Remarks and suggestions for modifications are most welcome.