Did anyone really need a report telling them that conducting research using self-reported data from social media and online communities might make the results unreliable and more than a little biased?
In a just published article in PLOS Medicine, Research Conducted Using Data Obtained through Online Communities: Ethical Implications of Methodological Limitations, epidemiologists from the Schools of Public Health at Harvard and Rollins note that an increasing number of government and commercial interests are using the internet to conduct scientific research. They example numerous online research initiatives that claim to enable anyone to create group health studies.
Information from the internet and social networks doesn’t make for a reliable database. It’s a clear example of “garbage in — garbage out.” The authors caution that using self-reported data from self-selected online participants produces every possible research bias, from selection to information. And, any risk factor correlation observed would have innumerable unmeasured confounding factors.
Asking the no brainer, they write: “Can online data collection lead to major breakthroughs in health research?”
Social media is marketing. It’s also a virtual reality, with much of it not even real.
- Marketers and public relations companies are using social media as never before, and are expected to spend nearly one-fifth of their budgets on social media marketing over the next five years, according to the CMO Survey from Duke University.
- The average midsize to large company (with 1000 or more employees) has 178 social media outlets (Twitter handles, blogs, etc).
- 75% of those “thumbs-up” LIKES come from paid advertisers.
- Social media stats show that 43% of companies use blogs for marketing, paying ghost writers to write blog entries to advance clients’ objectives, then recruiting activists and consumers to share and reinforce key messages. It’s been a thriving business for years. Many blogs are fake, created by companies or organizations to market a product, service or political viewpoint.
- Spammers create up to 40% of the accounts on social media sites, according to Impermium.
- Facebook admitted in its IPO filing last December that between 42.25 and 50.70 million Facebook accounts were fake.
- Yet, increasing numbers of people are unaware participants and easy targets for manipulation by marketers. The largest and fastest growing segment of users of social network sites are 18-29 year olds. College-age kids and Chicagoans make up the fastest growing segments on Facebook.
Many web pages and social sites appear to the public to be legitimate and from credible sources — many times they’re linked to their healthcare providers, employers or insurance companies. Increasingly, marketing interests are encouraging consumers to fill out online health risk assessments and share personal health information, under the guise of providing disease information and wellness health advice, conducting research, or helping to organize and safely store their medical information. Most consumers don’t realize that the personal information they volunteer on these sites is not protected under HIPAA (Health Insurance Portability and Accountability Act) privacy regulations and their information may be, and is, sold and used without their knowledge.
These sites were especially called out by the PLOS authors. In fact, they gave a fairly cursory mention of the problems with research bias stemming from the use of online data, devoting more of their discussion to ethical concerns with online health sites:
We worry that overstating the conclusions that can be drawn from these resources may impinge on individual autonomy and informed consent…
Clarity regarding the benefits of research using solicited personal data is particularly important when the data collected are also used for other purposes (e.g., PatientsLikeMe may sell members’ information to pharmaceutical and insurance companies), lest the allure of participation in a scientific study be used as a Trojan horse to entice individuals to part with information they might not otherwise volunteer….
Initiators of online data collections are strong advocates of openness and transparency, but they are relatively reserved about the methodological limitations of their research in communications to their participants…but this concern is not necessarily reflected on their websites, where they encourage people to provide information and where they list and describe their scientific discoveries…
Despite the limitations, the PLOS authors still appear to believe it is possible to use online media for epidemiological research, as long as the researchers address communication issues: the limitations of the results and conclusions are disclosed to participants, participants are informed that the results need to be replicated, and participants are advised that their personal data may be shared for third parties for commercial uses. The potential uses of online databases are too enticing for epidemiologists to ignore. They conclude by answering their opening question: “[A] responsible approach with realistic expectations about what can be done with and concluded from the data will benefit science in the long run.”
This succinctly illustrates the problems with much of today’s epidemiology.