BPadvertisementfrom

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 11 August 2011

The rise of data science

Posted on 12:12 by Unknown
See also this follow up article from O'Reilly Radar, and the earlier post Exuberant geeks.



What is data science: ... Data science requires skills ranging from traditional computer science to mathematics to art. Describing the data science group he put together at Facebook (possibly the first data science group at a consumer-oriented web property), Jeff Hammerbacher said:



"... on any given day, a team member could author a multistage processing pipeline in Python, design a hypothesis test, perform a regression analysis over data samples with R, design and implement an algorithm for some data-intensive product or service in Hadoop, or communicate the results of our analyses to other members of the organization."



Where do you find the people this versatile? According to DJ Patil, chief scientist at LinkedIn (@dpatil), the best data scientists tend to be "hard scientists," particularly physicists, rather than computer science majors. Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you've just spent a lot of grant money generating data, you can't just throw the data out if it isn't as clean as you'd like. You have to make it tell its story. You need some creativity for when the story the data is telling isn't what you think it's telling.



... Entrepreneurship is another piece of the puzzle. Patil's first flippant answer to "what kind of person are you looking for when you hire a data scientist?" was "someone you would start a company with." That's an important insight: we're entering the era of products that are built on data. We don't yet know what those products are, but we do know that the winners will be the people, and the companies, that find those products. Hilary Mason came to the same conclusion. Her job as scientist at bit.ly is really to investigate the data that bit.ly is generating, and find out how to build interesting products from it. No one in the nascent data industry is trying to build the 2012 Nissan Stanza or Office 2015; they're all trying to find new products. In addition to being physicists, mathematicians, programmers, and artists, they're entrepreneurs.



Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdiscplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: "here's a lot of data, what can you make from it?"



The future belongs to the companies who figure out how to collect and use data successfully. Google, Amazon, Facebook, and LinkedIn have all tapped into their datastreams and made that the core of their success. They were the vanguard, but newer companies like bit.ly are following their path. Whether it's mining your personal biology, building maps from the shared experience of millions of travellers, or studying the URLs that people pass to others, the next generation of successful businesses will be built around data.



Here is a nice talk on machine learning and data science by Hilary Mason of bit.ly. One of my students will be working with her starting in the fall.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in careers, data mining, physics, statistics | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • PhD Comics: the movie
    PHD Movie Trailer from PHD Comics on Vimeo . I met Jorge Cham , the cartoonist who draws PhD Comics, a few years ago at Sci Foo. Cham was ...
  • Finding the Next Einstein
    Duke researcher Jonathan Wai interviewed me for his Psychology Today blog, Finding the Next Einstein . Below are my answers to two of his q...
  • Beanbags and causal variants
    Not only do these results implicate common causal variants as the source of heritability in disease susceptibility, but they also suggest th...
  • Sitzfleisch
    Freeman Dyson reviews the new biography of Oppenheimer by Ray Monk. I discussed the book already here . NYBooks : ... The subtitle, “A Life ...
  • A blog is born
    Raghu Parasarathy , a biophysicist at U Oregon, and my correspondent in this previous post on faculty blogging, has decided to try it out. ...
  • News from Microsoft Research Faculty Summit 2013
    Measuring the maximal commuting subset of observables uniquely determines the pure state of a quantum system (recently proved Kadison-Singer...
  • Talk cancelled
    This talk has been cancelled, for complex reasons that I will not discuss.
  • East Asian sociopaths?
    Some would assert that CEOs and other people in leadership positions are often warm sociopaths . Interestingly, it is claimed that there is ...
  • Swedish height in the 20th century
    Average height of Swedish military conscripts during the 20th century. Looks like an increase of roughly 1 cm per decade or about 1.5 SD in ...
  • The differences are enormous
    Luis Alvarez laid it out bluntly: The world of mathematics and theoretical physics is hierarchical. That was my first exposure to it. There...

Categories

  • ability (2)
  • academia (9)
  • affirmative action (8)
  • ai (13)
  • aig (1)
  • alan turing (3)
  • algorithms (2)
  • alpha (2)
  • american society (54)
  • art (3)
  • ashkenazim (1)
  • aspergers (4)
  • athletics (6)
  • autism (4)
  • autobiographical (13)
  • basketball (4)
  • bayes (1)
  • behavioral economics (4)
  • berkeley (5)
  • bgi (24)
  • biology (23)
  • biotech (6)
  • bjj (5)
  • black holes (4)
  • blade runner (2)
  • blogging (3)
  • books (5)
  • borges (2)
  • bounded rationality (10)
  • brainpower (57)
  • bubbles (3)
  • caltech (14)
  • cambridge uk (1)
  • careers (18)
  • charles darwin (1)
  • chet baker (2)
  • China (25)
  • christmas (1)
  • class (2)
  • cognitive science (35)
  • cold war (1)
  • complexity (1)
  • computing (9)
  • conferences (4)
  • cosmology (4)
  • creativity (3)
  • credit crisis (10)
  • crossfit (5)
  • cryptography (2)
  • data mining (4)
  • dating (2)
  • demographics (1)
  • derivatives (5)
  • determinism (1)
  • digital books (1)
  • dna (4)
  • economic history (5)
  • economics (38)
  • econtalk (2)
  • ecosystems (1)
  • education (5)
  • efficient markets (8)
  • Einstein (2)
  • elitism (14)
  • encryption (1)
  • energy (1)
  • entrepreneurs (3)
  • entropy (1)
  • environmentalism (1)
  • eugene (3)
  • evolution (19)
  • expert prediction (6)
  • fake alpha (2)
  • feminism (2)
  • Fermi problems (2)
  • feynman (7)
  • film (9)
  • finance (42)
  • fitness (3)
  • flynn effect (1)
  • foo camp (1)
  • football (5)
  • france (1)
  • free will (1)
  • freeman dyson (2)
  • fx (2)
  • game theory (1)
  • geeks (2)
  • gender (4)
  • genetic engineering (15)
  • genetics (79)
  • genius (24)
  • genomics (2)
  • geopolitics (7)
  • gilded age (13)
  • global warming (1)
  • globalization (23)
  • godel (2)
  • goldman sachs (2)
  • google (4)
  • happiness (2)
  • harvard (8)
  • harvard society of fellows (5)
  • hedge funds (4)
  • hedonic treadmill (1)
  • height (2)
  • higher education (38)
  • history (8)
  • history of science (12)
  • hormones (3)
  • hugh everett (2)
  • human capital (34)
  • humor (1)
  • income inequality (21)
  • india (2)
  • industrial revolution (1)
  • innovation (38)
  • intellectual history (10)
  • intellectual property (1)
  • intellectual ventures (1)
  • internet (4)
  • iq (16)
  • italy (4)
  • james salter (3)
  • japan (4)
  • jiujitsu (8)
  • keynes (1)
  • kids (13)
  • lewontin fallacy (1)
  • lhc (1)
  • literature (12)
  • luck (1)
  • machine learning (8)
  • malcolm gladwell (1)
  • manhattan (2)
  • many worlds (10)
  • mathematics (14)
  • meritocracy (7)
  • microsoft (2)
  • mma (10)
  • monsters (2)
  • moore's law (1)
  • movies (9)
  • MSU (18)
  • music (5)
  • mutants (2)
  • nathan myhrvold (1)
  • neal stephenson (1)
  • neanderthals (2)
  • nerds (3)
  • net worth (5)
  • neuroscience (7)
  • new yorker (1)
  • nicholas metropolis (1)
  • noam chomsky (2)
  • nobel prize (2)
  • nsa (2)
  • nuclear weapons (5)
  • obama (7)
  • olympics (4)
  • oppenheimer (7)
  • patents (1)
  • personality (9)
  • philip k. dick (1)
  • philosophy of mind (2)
  • photos (40)
  • physical training (13)
  • physics (73)
  • podcasts (10)
  • political correctness (6)
  • politics (4)
  • pop culture (2)
  • prisoner's dilemma (1)
  • privacy (2)
  • probability (5)
  • prostitution (2)
  • psychology (25)
  • psychometrics (31)
  • qcd (1)
  • quants (9)
  • quantum computers (2)
  • quantum field theory (3)
  • quantum mechanics (18)
  • race relations (10)
  • real estate (1)
  • realpolitik (6)
  • renaissance technologies (1)
  • research (3)
  • russia (2)
  • sad but true (2)
  • sci fi (8)
  • science (42)
  • sec (1)
  • security (5)
  • silicon valley (6)
  • singularity (1)
  • smpy (1)
  • social networks (2)
  • social science (12)
  • software development (2)
  • solar energy (1)
  • sports (13)
  • startups (19)
  • statistics (16)
  • success (2)
  • taiwan (1)
  • talks (16)
  • teaching (2)
  • technology (34)
  • television (2)
  • travel (24)
  • turing test (1)
  • ufc (8)
  • ultimate fighting (1)
  • universities (33)
  • university of oregon (6)
  • usain bolt (2)
  • venture capital (3)
  • volatility (1)
  • von Neumann (10)
  • wall street (2)
  • war (1)
  • warren buffet (1)
  • wwii (3)

Blog Archive

  • ►  2013 (134)
    • ►  August (10)
    • ►  July (15)
    • ►  June (22)
    • ►  May (20)
    • ►  April (21)
    • ►  March (18)
    • ►  February (14)
    • ►  January (14)
  • ►  2012 (222)
    • ►  December (17)
    • ►  November (19)
    • ►  October (20)
    • ►  September (25)
    • ►  August (19)
    • ►  July (18)
    • ►  June (16)
    • ►  May (20)
    • ►  April (16)
    • ►  March (18)
    • ►  February (20)
    • ►  January (14)
  • ▼  2011 (144)
    • ►  December (20)
    • ►  November (16)
    • ►  October (25)
    • ►  September (23)
    • ▼  August (21)
      • Epistasis vs additivity
      • Footnotes and citations
      • Gattaca
      • Video of Google talk on cognitive genomics
      • Pais: Pauli aspie?
      • Paleo man
      • Googleplex action photos
      • @Google: Genetics and Intelligence
      • Clark, Cowen, DeLong discuss genetics and deep eco...
      • I love Jack Kirby
      • Svante Pääbo New Yorker profile
      • A problem for data scientists
      • The rise of data science
      • Demography and fast evolution
      • Intelligence: heritable and polygenic
      • The sweet science
      • Ditch Day
      • More from Hamming: ambiguity and commitment
      • Yukio Mishima
      • Bicycle days
      • Predictive power of early childhood IQ
    • ►  July (26)
    • ►  June (13)
Powered by Blogger.

About Me

Unknown
View my complete profile