Web scraping – the Jekyll and Hyde of Web 2.0 (part 2)?

In the last post I wrote about a positive use of web scraping, the software process used to extract data from the HTML mark-up language used on websites. I highlighted Planningalerts, a web and phone app to deliver real-time information gathered from council websites about development proposals in that may affect a specific property.

Now for a look at a social media application which opens the door to web scraping in a much more controversial manner. Foursquare is described by Wikipedia as a web and mobile application that allows registered users to connect with friends and update their location. Points are awarded for “checking in” at venues.

Foursquare website

Foursquare website

Basically, Foursquare broadcasts the user’s location to their friends and if they wish to allow it, to other Foursquare members, based on their smartphone’s global positioning system (GPS).

The process is best described, not on the Foursquare website but in a recent Guardian article. In summary, users “check in” on their phone whenever they arrive at a point of interest so that fellow users know where they are. They can also use their phone to check the names of all the other users in the same area, where exactly they are and if they are with other users

The application is still in its infancy but is already attracting a lot of users. Recently it signed up its two-millionth user, just three months after reached its first million. According to the Guardian:

“Foursquare is now being widely touted as the app which will, after years of anticipation and prediction, mark the beginning of ‘life as a game’ computing. Whatever you do, wherever you go, you will be scoring points, earning ‘medals’, and be in, at the very least, social competition with other users around you.”

However, as the Guardian article points out, this “game” could come with a price – a potentially huge loss of privacy. There are at least three areas of concern. First, by its nature, Foursquare automatically reveals a fundamental item of information, the user’s precise location, which is not disclosed even to the user’s friends by any other social media application. This has implications which are only beginning to be understood.

Second, while they have recently been tightened up, Foursquare privacy settings still require users to actively opt out in relation to key options to share data instead of the other way around. As the recent fracas over Facebook privacy rules demonstrates, this approach can leave users very vulnerable.

This is a particular issue with Foursquare, however, as there is little point to the program unless you choose to release your location information to at least some other users. However, even if you do opt to disclose your location only to your friends, this can still be risky, especially if you haven’t been too discriminating about who your “friends” are.

This risk is also compounded by the way in which the program facilitates the linking of Foursquare’s locational broadcast to a user’s Twitter feed, thus enabling their location to be spread even more widely.

 The biggest concern however, is that Foursquare could be potentially vulnerable to “malicious” web scraping. Unlike the Planningalerts application’s use of web scraping described in my last post, this involves the collecting and collating of private data that users have revealed (intentionally or otherwise) on social media websites rather than gathering public information made available on council or government websites.

Even if a user avoids the temptation to link their own Foursquare, Facebook, Twitter and other social media accounts to reduce the risks described above, someone with the right skills can gather pieces of information from these sites and link it with other publicly available information such as phone directories and electoral rolls to build a detailed picture of that user’s address, employment, lifestyle, friends, associates, shopping preferences etc.

The role of web-scraped Foursquare data could be particularly critical in providing information on the user’s movements on a day-to-day basis. The Guardian sums up the risks:

“The big worry … is who might get to make use of this information. Pick your paranoia. Someone with criminal intent, such as a burglar, identity thief or stalker? Governments, the security services or police? Terrorists? Or a corporation looking to target its products at you with incredible precision?”

This is not to condemn web scraping and similar data-gathering techniques out of hand – as Planningalerts demonstrates, they can provide a particularly effective way of making already publicly-available data even more accessible. Nor is it a criticism of the innovation demonstrated by applications such as Foursquare. It does however provide a strong argument for all social media applications to beef up their privacy measures and to inform users of all the risks involved.

If we are going to march into the brave new world promised by Foursquare and the other locationally-enabled social media apps to follow, we had better do so with our eyes open.

This entry was posted in Local Government, Web 2.0 and tagged , , . Bookmark the permalink.

1 Response to Web scraping – the Jekyll and Hyde of Web 2.0 (part 2)?

  1. Paul Hempsall says:

    Not sure if you’ve heard of the following website that has been developed to highlight this growing concern of “over sharing” your private information.

    Please Rob Me (http://pleaserobme.com/) when first launched, provided a stream of people’s locations, gathered mostly from Foursquare, with the intent of humorously promoting an awareness of sharing to much personal data.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s