Thursday, 12 August 2010

Humans are boring, say machines

Behavioural psychologist Gavin Potter is an expert on idiosyncrasy. He helps computers understand what it means to be human. Potter works as a consultant to online dating companies, using massive data sets to help match prospective partners. Every time a user clicks on a potential date’s profile, all the relevant information leading to that click (hair, eye colour etc) is logged, helping the software to make predictions about the types of people we might be attracted to in the future.

While Potter is an expert on what makes us unique, his success in teaching computers to predict our behaviour comes from spotting the things that make us alike. For example, female daters click on profiles of men who are on average three years older than them. If you think that’s clichéd, consider that over 90 per cent of men seek women who are half their age plus seven. "People think they are unique," says Potter. "They say, ‘you can't possibly profile me’. The depressing thing is that you can actually do it pretty accurately."

The key to accurate, automated profiling lies in the size of the dataset. The bigger the dataset, the easier it is to iron out statistical anomalies. With evermore data, those clichés get easier and easier to find. It’s ironic that in our attempt to examine humanity through artificially intelligent means, all we ultimately reveal is our own robotic predictability.

Large-scale datasets are becoming increasingly available to the public. Take a look at behavioural economist Dan Ariely’s blog for an interesting take on password selection. Data from 100,000 Israeli user accounts, which were published following a cyber attack, show a depressing lack of what the author defines as necessary “randomness”.

The accounts, which were hacked from user registrations for Pizza Hut and an Israeli property website, revealed individuals’ usernames, email addresses and passwords. The most commonly chosen password was 123456 (584 users), with 1234 as the runner up (569) and 12345 in third (388). All in all, 1786 passwords (nearly 6 per cent) were comprised of consecutive increasing numerals.

“This means that one person in 18 didn’t muster the cognitive capacity to generate a password more intricate than 1234 and the like.”

Even worse,

“788 people (roughly 2.5 per cent, or one in forty people) chose a password identical to their username. [And] 417 people (1.32%) chose a password comprised of identical digits (e.g. 1111).”

The password analysis was completed by hand, but advances in computer processing power, and the addition of psychological insight to statistical analysis, have greatly improved our ability to automate the discovery of behavioural patterns from large datasets. It's unfortunate that the most enlightening discovery to date is our penchant for banality.

Pic credit: Tleilaxus