Tag Archives: crowdsourcing

Flickr as a vehicle of narrative: photos contextualised in space and time

After my project proposal had been accepted, I have attended a workshop at ETH Zurich, titled “Cartography & Narratives” organised by Barbara Piatte, Sébastien Caquard and Anne-Kathrin Reuschel in last summer. The goal of the workshop was to explore “mapping as a conceptual framework to improve our understating of narratives”. Narratives are

“an expression in discourse of a distinct mode of experiencing and thinking about the world, its structures, and its processes“ (White 2010)


any cultural artefact that ‘tells a story’ (Bal 2009)

I decided to investigate the photo-sharing platform Flickr as vehicle of narratives (think: the slide show of pictures from a trip, be it directly on the camera’s screen or as an image projected onto your living room wall, as one of arguably the most ubiquitous types of every day narrative).

I have uploaded a preliminary result of my workshop paper on Vimeo (view it large, for good quality):

[vimeo http://vimeo.com/56999213 w=600]


The movie shows the temporal and spatial patterns that emerge, when we conflate 80’000+ images taken by 4’000 photographers over the course of several years in the city of Zurich, Switzerland (I only looked at georeferenced photographs). See the description of the video on Vimeo for full information.

I will post more about the workshop results and further work, shortly.

Background of Mapocalypse

The internet has been abuzz about Apple’s iPhone 5 “mapocalypse“. The Verge has new background information: Apple took the decision to ship their own mapping app “over a year before the company’s agreement to use Google Maps expired“. Apparently, the people at Apple “felt that the older Google Maps-powered Maps in iOS were falling behind Android — particularly since they didn’t have access to turn-by-turn navigation (…).” Google on the other hand has been reported to have wished for more branding and inclusion of Latitude.

Anyway, “mapocalypse” is here and is presumably bound to stay a while, until either Apple fixes its globe or Google has finished their iOS6 mapping app and it’s been given admission to the app store by Apple. While competition is always a good thing, I’m not sure if Apple has indeed the capacity to amend their data on a global scale within a short time range. Combing through and improving the consistency of geodata from heterogeneous sources is a daunting task, after all. Google Maps (started 2005) and Google Earth (released under this moniker for the first time in 2005) also took years to arrive at a level which most users are happy with most of the time. The praise at the introduction of the new Apple maps is what their progress in quality will be measured against:

(…) when iOS software VP Scott Forstall introduced the new mapping system in June, he called it “beautiful” and “gorgeous” and stressed that “we’re doing all the cartography ourselves.”

(Source The Verge)

Eric Fischer: Mapmaker, artist and programmer

The Atlantic Cities has a nice portrait of Eric Fischer: Mapmaker, artist, or programmer?. If you have been following information visualization and geovisualization news online over the recent years, I bet you have come across Fischer’s work. A few examples:

See something or say something: In this piece Fischer has overlaid georeferenced tweets (blue) and georeferenced Flickr pictures (orange). White areas have been posted to both Twitter and Flickr.

Locals and tourists: In this piece Fischer has coloured georeferenced Flickr images depending on if they were taken by tourists (red) or locals (blue). Pictures whose author’s origin was ambiguous are coloured yellow.

Race and ethnicity: Map of racial and ethnic divisions in Chicago, based on US Census 2010 data. Each dot represents 25 residents, red dots represent white people, blue dots represent black people, green represents Asian, orange Hispanic and yellow other origing.

“Ultimately, almost everything I have been making tries to take the dim, distant glimpse of the real world that we can see through data and magnify some aspect of it in an attempt to understand something about the structure of cities.”

“When the maps succeed, I think it is when they can confirm something that the viewer already knows about their neighborhood or their city, and then broaden that knowledge a little by showing how some other places that the viewer doesn’t know so well are similar or different.”
– Eric Fischer

What I like most about Fischer’s projects is that they are often crowdsourced (from Flickr or Twitter), data-heavy and employ often, not always, quite simple analysis or visualization approaches, but to great effect. In the end it’s all about the ideas behind the visualizations and Fischer doesn’t seem to be short on those.

Eric Fischer, formerly programmer at Google, is currently artist-in-residence at a San Francisco museum, where he will hopefully continue to produce interesting maps and visualizations. It’s probably indeed safe to respond to the Atlantic Cities article’s title that Fischer is all: mapmaker and artist and programmer.

Journalists’ Twitter network

[Deutsch weiter unten]

Recently, I’ve been looking into analysis and visualization of Twitter networks. So, David Bauer posting a list of 300+ German-speaking, Twitter-using journalists came just right. Scroll down to see the resulting network. By they way, you can find more information on the technical background of the production of these Twitter network visualizations in this post.


In letzter Zeit habe ich an der Extraktion, Analyse und Visualisierung von Twitter-Netzwerken herumgepröbelt. Da kam es gerade recht, dass David Bauer eine Liste von über 300 deutschsprachigen, twitternden Journalistinnen und Journalisten gepostet hat. Hier präsentiere ich das resultierende Netzwerk. Achtung: Die resultierenden Darstellung sind optisch komplex und erfordern deshalb eine genügend grosse Auflösung. Die Grafiken sind also nicht ideal für die Betrachtung auf mobilen Geräten mit kleinen Bildschirmen.

Übrigens: In diesem Post finden Sie mehr Angaben zu den technischen Hintergründen der Erstellung dieser Twitter-Netzwerkvisualisierungen.

Netzwerk von Journalistinnen und Journalisten auf Twitter, Knotengrösse gemäss "Branchen-Followern" // Network of tweeting journalists, node size scaled according to fellow journalist followers

Die erste Grafik zeigt das Netzwerk der tweetenden Journalistinnen und Journalisten. Die Knoten sind nach Gruppen eingefärbt, die sich aus den Follower-Beziehungen ergeben. Die Knotengrösse richtet sich nach der Anzahl Follower des einzelnen Users unter den Journalistinnen und Journalisten.

Man sieht, dass die meisten Journalistinnen und Journalisten in einem grossen, aber kompakten Bereich des Netzwerks liegen, der aber in unterschiedliche Gruppen zerfällt. Der Urheber der Liste, David Bauer, ist gemessen an den Brachen-Followern sehr gut vernetzt. Er bildet zudem mit Ronnie Grob und Peter Hogenkamp eine Art Brückenkopf zur blauen Gruppe. Ich vermute, bei der blauen Gruppe handelt es sich um deutsche Journalistinnen und Journalisten. Sie sind nicht gut mit dem Rest des Netzwerks vernetzt.

Auf den ersten Blick sind T. Benkö, M. Binswanger, P. Müller, S. Brotz, N. Lüthi, W. de Schepper, S. Bärtschi, M. Daum, K. Weber, C. Moser und A. Sautter Journalistinnen und Journalisten mit vielen “Branchen-Followern”.

Erhellend ist auch die zweite Visualisierung, in der die Knoten anhand der Twitter-Follower (also nicht nur brancheninterne Follower) skaliert sind:

Netzwerk von Journalistinnen und Journalisten auf Twitter, Knotengrösse gemäss Twitter-Followern // Network of tweeting journalists, node size scaled according to total Twitter followers

Die zweite Visualisierung zeigt anhand der allgemeinen Follower auf Twitter, wer viele Follower ausserhalb des Journalismus anziehen kann. Natürlich ist das nicht ganz präzise, denn die Grafik zeigt einfach alle Follower, also:

  • Follower auf der Liste (Journalistinnen und Journalisten) und
  • Follower, die nicht auf der Liste figurieren. Diese können Journalistinnen/Journalisten sein oder auch nicht.

Die Visualisierung zeigt deutlich, dass zum Beispiel Nik Hartmann und Tom Brühwiler in dieser Interpretation mehr Gewicht erhalten, sie verfügen also über einen atypisch hohen Anteil von Nicht-JournalistInnen unter ihren Followern. Peter Hogenkamp, Thomas Benkö und Michèle Binswanger sind andere, die auch immer noch gross angezeigt werden. Es fällt weiter auf, dass die Knoten der blauen Gruppe besonders gross dargestellt sind – daher meine Vermutung, dass es sich hierbei um deutsche Journalistinnen und Journalisten handelt, die über einen grösseren Follower-Pool verfügen als die scheizerischen Kolleginnen und Kollegen.

Mich würde interessieren, was es sonst noch in diesen Darstellungen zu entdecken gibt. Was sehen InsiderInnen in diesen Darstellungen? Haben Sie Korrekturen? Was sind ihre Interpretationen und Hypothesen? Diskutieren Sie mit!

[Aktualisierung vom 28.2.2012: Kommentare und Pingbacks auf diesen Post sind geschlossen, mehr dazu hier]

Twitter networks – Mechanics

[Deutsch weiter unten]

Recently, I’ve been working on a Twitter-related project with two friends of mine. As there’s nothing to present yet, I won’t go into detail regarding that project. But working on Twitter-related stuff led me to explore the generation, modelling, analysis and visualization of Twitter networks.

An excerpt from a Twitter network

Then, some weeks back, Swiss journalist/author/blogger David Bauer started a Google Doc to collect Twitter handles of journalists (read his post here, in German). Two weeks later David Bauer’s list featured 300 accounts from German-speaking, mostly Swiss journalists (as of now there are 360 accounts) – a nice crowdsourcing success!

I think David Bauer had an interesting idea there. And some people even took to simple analyses such as gender proportions of journalists on Twitter (see below – it’s disappointingly biased!).

Gender proportion of journalists using Twitter (based on David Bauer's list)

Now, I wanted to visualize the network of these tweeting journalists. The tools of trade in this case are:

Using the API I could get the User IDs of each journalist (rather than the Twitter handles which can be changed, the User IDs are numerical, stable IDs) and the User IDs of the people who follow them as well as the people they follow. Obviously, as a side-product of this process I also got the current follower numbers for each journalist.

Now, all that was left to do to derive the Twitter network was to find for each pair of journalists, if one of them followed the other or both followed each other or no one followed the other. Using Python with custom modules I could generate this structure and export it to a GraphML file that can be read by Gephi. Using this programme I did some network analysis and created visualizations, check them out in this post.


In letzter Zeit bin ich daran, mit zwei Freunden eine Art Twitter-Projekt aufzubauen. Es gibt da aber noch nichts zu präsentieren, dieser Post handelt aber von einer Art Spin-Off-Projekt. Ich habe mich nämlich vermehrt für die Erzeugung und Modellierung, Analyse und Visualisierung von Twitter-Netzwerken interessiert und in diesen Bereichen einiges ausprobiert.

Auszug aus einem Twitter-Netzwerk

Dann hat der Schweizer Journalist/Autor/Blogger David Bauer vor ein paar Wochen ein Google Doc eröffnet, in dem er die Twitter-Handles deutschsprachicher (und mehrheitlich schweizerischen) Journalistinnen und Journalisten per Crowdsourcing sammelt (vgl. auch seinen Post hier). Zwei Wochen später zählte die Liste bereits 300 Einträge, aktuell sind es 360 – also ein schöner Crowdsourcing-Erfolg!

Ich finde David Bauers Idee sehr interessant. Auch andere Leute liessen sich davon begeistern und haben sogar einfache Analysen durchgeführt, zum Beispiel des Geschlechterverhältnisses (welches in enttäuschendem Mass unausgelichen ist):

Geschlechterverhältnis der twitternden Journalistinnen und Journalisten (basierend auf David Bauers Liste)

Ich für meinen Teil wollte das Netzwerk der twitternden Journalistinnen und Journalisten sehen. Die Tools, die ich dazu verwendet habe, sind:

Von der Twitter API konnte ich die User-IDs aller Journalistinnen und Journalisten abrufen (diese numerischen IDs sind – anders als die veränderbaren Twitter-Handles – über die Zeit stabil). Weiter konnte ich die User-IDs der Leute abrufen, die einer Journalistin/einem Journalisten folgen bzw. denen eine Journalistin/ein Journalist folgt. Als ein Nebenprodukt erhielt ich dabei natürlich auch die Anzahl Follower jeder Journalistin und jedes Journalisten.

Als letzter Schritt zur Ableitung des Netzwerks von Journalistinnen und Journalisten musste ich für jedes Paar von Personen auf der Liste herausfinden, ob die eine der anderen folgt, sich beide gegenseitig folgen oder sich beide nicht folgen. Mit Python und einem speziellen Modul konnte ich dann das Netzwerk aufbauen und als eine GraphML-Datei exportieren. Diese wiederum konnte ich in Gephi einlesen, um weitere Analysen durchzuführen und einige Visualisierungen zu erstellen. Die Resultate finden Sie in diesem Post.

Economist’s Africa Twitter map provides some teachable insights

Mark Graham has posted a critique of a “Twitter map” that featured in the Economist at Zerogeography. The map was compiled by Portland Communications and Tweetminster and shows the number of tweets per country (original version of the map can be found in this presentation by Portland Communications):

Africa Twitter map by Portland Communications, Economist

Mark Graham raises these interesting points regarding this map:

  • 11m Tweets in Africa over a three months period is probably vastly underestimated, since the joint Portland Communications/Tweetminster analysis looked only at geocoded tweets.
  • The analysis doesn’t account for the provencance of the tweets: are many of them issued by few users or are actually many people behind the many tweets of a country? This is likely a very relevant point, since it is found with many crowdsourcing projects that a small minority of the users contributes the majority of the content. It may be the same with Twitter, the only question which remains then is: could it be that the proportion of heavy contributors varies between countries (thus harming comparability of countries)
  • The analysis doesn’t relate the number of tweets to the number of inhabitants. We have thus no way of knowing whether a big number of tweets means an extraordinarily high proportion of Twitter users in the population, or not.

Mark states that in a study conducted by him and his team using the Twitter Streaming API, it was found that only 0.7% of all tweets indeed contain geolocation information. (and thus the Africa Twitter map is based on a really small sample of the tweets which have been sent from within African countries!). That proportion was something I have wondered about since I have started to tinker with the Twitter REST API a few weeks ago. Other than the Streaming API (the so-called “firehose”), the REST API has tight query limits, so I haven’t acquired a big enough sample of tweets to actually make the judgment regarding the prevalence of location information in tweets (acquiring a random sample of tweets is also not the aim of my studies).

As Mark further points out this shortcoming on the data side makes the map potentially useless, in the worst case even misleading: Users in different countries may expose location in their tweets with different probabilities, due to for example:

  • different brand mix of end user devices (for example, different prevalence of smartphones versus dumbphones (which can use Twitter via SMS)
  • different mix of Twitter clients. Twitter clients may expose the location sharing settings in different ways and may rather encourage or discourage a user to opt into or out of location sharing
  • varying awareness of, or views on, privacy issues around location sharing
  • different societal norms towards location sharing

If the prevalence of location sharing is different in different countries, the Africa Twitter map cannot serve even as a proxy of the true numbers of Tweets sent from African countries.

Further takeaways thanks to Mark Graham:

  • Using the location information in description fields of Twitter users’ profiles is a bad substitute for actual location information attached to tweets.
  • Time zone information as another approach to rough positioning of a Twitter user isn’t a feasible alternative route either, since many users don’t bother to set it in their profile.
  • And, most importantly and generally applicable: Any analysis of data from social media or crowdsourcing initiatives has to scrutinise the data for potential confounding variables, inherent biases, flaws in data collection (sampling), data processing and analysis. No analysis is complete without these questions asked – if they’re not clarified in the analysis, it’s the end user’s duty, though unfortunately it can be difficult without access to the raw data.

OpenStreetMap: A valid competitor to official base maps?

Still in last year, Cédric Moullet, amongst others MapFish and GeoExt contributor, sparked a discussion by his post “Why OpenStreetMap fails to replace official or proprietary base maps in a sustainable way ?” (note how this doesn’t sound like a question but bears a question mark ;)

For simplicity, I will re-list Cédric’s 13 points here:

1. Because it’s not possible to make a map for all zoom levels
2. Because the finances are not secured on long term
3. Because the data model is not defined
4. Because the precision is heterogeneous
5. Because the reliability is heterogeneous
6. Because the completeness is heterogeneous
7. Because it requires attribution
8. Because the data are difficult to extract
9. Because noone takes the responsibility about the data
10. Because it lacks a QA step by an accountable body
11. Because it is not multilingual
12. Because first acquisition is fun and data update is boring
13. Because Google Map Maker Workflow is for the broad public and OpenStreetMap workflow for the map enthusiasts

Numerous reactions (not further detailed) motivated Cédric to post some clarifications.

Enters Stefan Keller. He is a professor at the Hochschule für Technik in Rapperswil (Switzerland) and (I think) could be described as an open source and OSM enthusiast and evangelist. In agreement with Cédric Moullet, Stefan Keller started a thread on the Swiss GIS/geospatial industry forum geowebforum. In his post he objects to various of Cédric’s points. Marc Wick (founder of Geonames) also weighs in on the debate with some interesting points and finally, Stephan Heuel and I, also contribute our view of the topic (spoiler: we agree with most of Cédric’s points).

Of course, I’m biased, but I think the thread which developed is definitely worth reading. If you feel like it, please do also contribute (everybody can on geowebforum) with your insights!