Join our Mailing List
body { -webkit-animation-delay: 0.1s; -webkit-animation-name: fontfix; -webkit-animation-duration: 0.1s; -webkit-animation-iteration-count: 1; -webkit-animation-timing-function: linear; } @-webkit-keyframes fontfix { from { opacity: 1; } to { opacity: 1; } } /* ]]> */

A Momentary Flow

Evolving Worldviews

Tag Results

25 posts tagged data

What Does Big Data Look Like? Visualization Is Key for Humans
A simple Google image search on “big data” reveals numerous instances of three dimensional one’s and zero’s, a few explanatory infographics, and even the interface from The Matrix. So what does “big data” look like, within human comprehension? Ask a CEO of a major company what “big data” is, and they’ll likely describe something akin to a blackbox, the flight recorders on airplanes, or draw a cloud on a whiteboard. Ask a data scientist and you might get an explanation of the 4 V’s, itself an attempt at an infographic (but really just a visual collection of facts) and a corresponding explanation. The reason for this is that “big data” is a nebulous term with different meanings, representations, and uses for different organizations. Understandably, it’s hard to fathom where to start when there’s so darn much of it. From the beginning of recorded time until 2003, humans had created 5 exabytes (5 billion gigabytes) of data. In 2011, the same amount was created every two days. It’s true that we’ve made leaps and bounds with showing earlier generations of data. However, when it comes to today’s big data, how it looks can help convey information but it needs to be more than just beautiful and superficial. It has to work, show multiple dimensions, and be useful. New software and technologies have enabled us to gain higher level access to understanding these enormous sets of data. However, the only way we’re going to truly gather and juice all the information big data is worth is to apply a level of relatively unprecedented data visualization. How do we get to actionable analysis, deeper insight, and visually comprehensive representations of the information? The answer: we need to make data more human. (via What Does Big Data Look Like? Visualization Is Key for Humans | Innovation Insights | Wired.com)

What Does Big Data Look Like? Visualization Is Key for Humans

A simple Google image search on “big data” reveals numerous instances of three dimensional one’s and zero’s, a few explanatory infographics, and even the interface from The Matrix. So what does “big data” look like, within human comprehension? Ask a CEO of a major company what “big data” is, and they’ll likely describe something akin to a blackbox, the flight recorders on airplanes, or draw a cloud on a whiteboard. Ask a data scientist and you might get an explanation of the 4 V’s, itself an attempt at an infographic (but really just a visual collection of facts) and a corresponding explanation. The reason for this is that “big data” is a nebulous term with different meanings, representations, and uses for different organizations. Understandably, it’s hard to fathom where to start when there’s so darn much of it. From the beginning of recorded time until 2003, humans had created 5 exabytes (5 billion gigabytes) of data. In 2011, the same amount was created every two days. It’s true that we’ve made leaps and bounds with showing earlier generations of data. However, when it comes to today’s big data, how it looks can help convey information but it needs to be more than just beautiful and superficial. It has to work, show multiple dimensions, and be useful. New software and technologies have enabled us to gain higher level access to understanding these enormous sets of data. However, the only way we’re going to truly gather and juice all the information big data is worth is to apply a level of relatively unprecedented data visualization. How do we get to actionable analysis, deeper insight, and visually comprehensive representations of the information? The answer: we need to make data more human. (via What Does Big Data Look Like? Visualization Is Key for Humans | Innovation Insights | Wired.com)

Source Wired

thenewenlightenmentage:

Biology’s Big Problem: There’s Too Much Data to Handle
Twenty years ago, sequencing the human genome was one of the most ambitious science projects ever attempted. Today, compared to the collection of genomes of the microorganisms living in our bodies, the ocean, the soil and elsewhere, each human genome, which easily fits on a DVD, is comparatively simple. Its 3 billion DNA base pairs and about 20,000 genes seem paltry next to the roughly 100 billion bases and millions of genes that make up the microbes found in the human body.
And a host of other variables accompanies that microbial DNA, including the age and health status of the microbial host, when and where the sample was collected, and how it was collected and processed. Take the mouth, populated by hundreds of species of microbes, with as many as tens of thousands of organisms living on each tooth. Beyond the challenges of analyzing all of these, scientists need to figure out how to reliably and reproducibly characterize the environment where they collect the data.
Continue Reading

thenewenlightenmentage:

Biology’s Big Problem: There’s Too Much Data to Handle

Twenty years ago, sequencing the human genome was one of the most ambitious science projects ever attempted. Today, compared to the collection of genomes of the microorganisms living in our bodies, the ocean, the soil and elsewhere, each human genome, which easily fits on a DVD, is comparatively simple. Its 3 billion DNA base pairs and about 20,000 genes seem paltry next to the roughly 100 billion bases and millions of genes that make up the microbes found in the human body.

And a host of other variables accompanies that microbial DNA, including the age and health status of the microbial host, when and where the sample was collected, and how it was collected and processed. Take the mouth, populated by hundreds of species of microbes, with as many as tens of thousands of organisms living on each tooth. Beyond the challenges of analyzing all of these, scientists need to figure out how to reliably and reproducibly characterize the environment where they collect the data.

Continue Reading

(via megacosms)

Source Wired

Reblogged from The New Enlightenment Age

Five Dimensions Store More Data Than Three
An experimental computer memory format uses five dimensions to store data with a density that would allow more than 300 terabytes to be crammed onto a standard optical disc. But unlike an optical disc, which is made of plastic, the experimental media is quartz glass. Researchers have long been trying to use glass as a storage material because it is far more durable than existing plastics. A team led by optoelectronics researcher Jingyu Zhang at the University of Southampton, in the U.K., has demonstrated that information can be stored in glass by changing its birefringence, a property related to how polarized light moves through the glass (PDF). In conventional optical media, such as DVDs, you store data by burning tiny pits on one or more layers on the plastic disc, which means you’re using three spatial dimensions to store information. But in Zhang’s experiment, he and colleagues exploit two additional, optical dimensions. (via Five Dimensions Store More Data Than Three - IEEE Spectrum)

Five Dimensions Store More Data Than Three

An experimental computer memory format uses five dimensions to store data with a density that would allow more than 300 terabytes to be crammed onto a standard optical disc. But unlike an optical disc, which is made of plastic, the experimental media is quartz glass. Researchers have long been trying to use glass as a storage material because it is far more durable than existing plastics. A team led by optoelectronics researcher Jingyu Zhang at the University of Southampton, in the U.K., has demonstrated that information can be stored in glass by changing its birefringence, a property related to how polarized light moves through the glass (PDF). In conventional optical media, such as DVDs, you store data by burning tiny pits on one or more layers on the plastic disc, which means you’re using three spatial dimensions to store information. But in Zhang’s experiment, he and colleagues exploit two additional, optical dimensions. (via Five Dimensions Store More Data Than Three - IEEE Spectrum)

A molecular database for developing organic solar cells
Harvard researchers have released a massive database of more than 2 million molecules that might be useful in the construction of solar cells that rely on organic compounds for construction of organic solar cells for the production of renewable energy.
Developed as part of the Materials Genome Initiative launched by the White House’s Office of Science and Technology Policy (OSTP) the goal of the database is to provide researchers with a starting point for research aimed at increasing the efficiency of this cheap, easy-to-produce solar energy technology.
“One of the problems with organic solar cells is, right now, there are only a handful of molecules that are in the same league with silicon in terms of efficiency,” Harvard Professor of Chemistry and Chemical Biology Alán Aspuru-Guzik said. “This is really a guide for experimentalists. What we’re doing is democratizing access to this type of data in the same way that the biologists did with the Human Genome Project.”
“In many ways, biology is far ahead of chemistry in these efforts,” he added. “You can find the genome of a frog online, or the genome of a worm, but you cannot do that for the quantum properties of molecular materials. This database will provide access to the ‘secret sauce’ of these materials, so people can explore innovative new ideas.”
The data was generated by the Harvard Clean Energy Project in partnership with IBM and the group of Prof. Zhenan Bao at Stanford University. It uses supercomputing power provided by a network of thousands of volunteer donors around the world. (via A molecular database for developing organic solar cells | KurzweilAI)

A molecular database for developing organic solar cells

Harvard researchers have released a massive database of more than 2 million molecules that might be useful in the construction of solar cells that rely on organic compounds for construction of organic solar cells for the production of renewable energy.

Developed as part of the Materials Genome Initiative launched by the White House’s Office of Science and Technology Policy (OSTP) the goal of the database is to provide researchers with a starting point for research aimed at increasing the efficiency of this cheap, easy-to-produce solar energy technology.

“One of the problems with organic solar cells is, right now, there are only a handful of molecules that are in the same league with silicon in terms of efficiency,” Harvard Professor of Chemistry and Chemical Biology Alán Aspuru-Guzik said. “This is really a guide for experimentalists. What we’re doing is democratizing access to this type of data in the same way that the biologists did with the Human Genome Project.”

“In many ways, biology is far ahead of chemistry in these efforts,” he added. “You can find the genome of a frog online, or the genome of a worm, but you cannot do that for the quantum properties of molecular materials. This database will provide access to the ‘secret sauce’ of these materials, so people can explore innovative new ideas.”

The data was generated by the Harvard Clean Energy Project in partnership with IBM and the group of Prof. Zhenan Bao at Stanford University. It uses supercomputing power provided by a network of thousands of volunteer donors around the world. (via A molecular database for developing organic solar cells | KurzweilAI)

When the Large Hadron Collider went online in 2009, most scientists saw it as an unprecedented opportunity to conduct experiments involving the building blocks of the physical world. But to Stanislav Shalunov, a networking engineer, it looked like a whole new kind of Big Data problem.
A few years before the LHC went live, Shalunov worked on Internet 2, an experimental network that connects universities and research organizations. Given the amount of data the Collider would be spitting out — about 10 Gigabits per second, to 70 academic institutions — he knew that the LHC it was likely to clog up the Internet 2 network. So Shalunov developed a networking protocol designed to relieve the congestion that was sure to come. “This was an amount of traffic that neither the networks nor the transport protocols at the time were really prepared to cope with,” Shalunov remembers.
He didn’t realize it at the time, but by solving the Large Hadron Collider data-pumping problem, Shalunov also was helping fix a big problem for peer-to-peer networks. By the time scientists at CERN flipped the switch on the LHC, Shalunov was working for BitTorrent on its popular peer-to-peer file-sharing service. The work he started at Internet 2 and finished at BitTorrent eventually was rolled into an internet standard called the Low Extra Delay Background Transport. (via How the Large Hadron Collider Will Bring the Internet to Everything | Wired Enterprise | Wired.com)

When the Large Hadron Collider went online in 2009, most scientists saw it as an unprecedented opportunity to conduct experiments involving the building blocks of the physical world. But to Stanislav Shalunov, a networking engineer, it looked like a whole new kind of Big Data problem.

A few years before the LHC went live, Shalunov worked on Internet 2, an experimental network that connects universities and research organizations. Given the amount of data the Collider would be spitting out — about 10 Gigabits per second, to 70 academic institutions — he knew that the LHC it was likely to clog up the Internet 2 network. So Shalunov developed a networking protocol designed to relieve the congestion that was sure to come. “This was an amount of traffic that neither the networks nor the transport protocols at the time were really prepared to cope with,” Shalunov remembers.

He didn’t realize it at the time, but by solving the Large Hadron Collider data-pumping problem, Shalunov also was helping fix a big problem for peer-to-peer networks. By the time scientists at CERN flipped the switch on the LHC, Shalunov was working for BitTorrent on its popular peer-to-peer file-sharing service. The work he started at Internet 2 and finished at BitTorrent eventually was rolled into an internet standard called the Low Extra Delay Background Transport. (via How the Large Hadron Collider Will Bring the Internet to Everything | Wired Enterprise | Wired.com)

The Modern Data Nerd Isn’t as Nerdy as You Think
-
Data scientists are fast becoming the rock stars of the 21st century. Thanks in part to Nate Silver’s eerily accurate election predictions and Paul DePodesta’s baseball-revolutionizing Moneyball techniques, math nerds have become celebrities. It’s debatable how much their work differs from what statisticians have done for years, but it’s a growing field, and many companies are desperate to hire their own data scientists.
The irony is that many of these math nerds aren’t as math nerdy as you might expect.
Some of the best minds in the field lack the sort of heavy math or science training you might expect. Silver and Paul DePodesta have bachelor’s degrees in economics, but neither has a PhD. Former Facebook data scientist and Cloudera co-founder Jeff Hammerbacher — who helped define the field as it’s practiced today — only has a bachelor’s in mathematics. The top ranked competitor at Kaggle — which runs regular contest for data scientists — doesn’t have a PhD, and many of the site’s other elite competitors don’t either.
“In fact, I argue that often Ph.D.s in computer science in statistics spend too much time thinking about what algorithm to apply and not enough thinking about common sense issues like which set of variables (or features) are most likely to be important,” says Kaggle CEO Anthony Goldbloom.
Data scientist John Candido agrees. “An understanding of math is important,” he says, “but equally important is understanding the research. Understanding why you are using a particular type of math is more important than understanding the math itself.” (via The Modern Data Nerd Isn’t as Nerdy as You Think | Wired Enterprise | Wired.com)

The Modern Data Nerd Isn’t as Nerdy as You Think

-

Data scientists are fast becoming the rock stars of the 21st century. Thanks in part to Nate Silver’s eerily accurate election predictions and Paul DePodesta’s baseball-revolutionizing Moneyball techniques, math nerds have become celebrities. It’s debatable how much their work differs from what statisticians have done for years, but it’s a growing field, and many companies are desperate to hire their own data scientists.

The irony is that many of these math nerds aren’t as math nerdy as you might expect.

Some of the best minds in the field lack the sort of heavy math or science training you might expect. Silver and Paul DePodesta have bachelor’s degrees in economics, but neither has a PhD. Former Facebook data scientist and Cloudera co-founder Jeff Hammerbacher — who helped define the field as it’s practiced today — only has a bachelor’s in mathematics. The top ranked competitor at Kaggle — which runs regular contest for data scientists — doesn’t have a PhD, and many of the site’s other elite competitors don’t either.

“In fact, I argue that often Ph.D.s in computer science in statistics spend too much time thinking about what algorithm to apply and not enough thinking about common sense issues like which set of variables (or features) are most likely to be important,” says Kaggle CEO Anthony Goldbloom.

Data scientist John Candido agrees. “An understanding of math is important,” he says, “but equally important is understanding the research. Understanding why you are using a particular type of math is more important than understanding the math itself.” (via The Modern Data Nerd Isn’t as Nerdy as You Think | Wired Enterprise | Wired.com)

Source Wired

Data struggles with the social. Your brain is pretty bad at math (quick, what’s the square root of 437), but it’s excellent at social cognition. People are really good at mirroring each other’s emotional states, at detecting uncooperative behavior and at assigning value to things through emotion.

Computer-driven data analysis, on the other hand, excels at measuring the quantity of social interactions but not the quality. Network scientists can map your interactions with the six co-workers you see during 76 percent of your days, but they can’t capture your devotion to the childhood friends you see twice a year, let alone Dante’s love for Beatrice, whom he met twice.

Therefore, when making decisions about social relationships, it’s foolish to swap the amazing machine in your skull for the crude machine on your desk.

What Data Can’t Do - NYTimes.com

Source The New York Times

Researchers in Japan have come up with a storage solution to keep your most important data with a method that seems to be drawn directly from the pages of Superman.
Everyone who has gone through the process of upgrading their computer system knows the inevitable task of transferring data involves a certain amount of acceptance that some data will forever be lost.
Saved on storage devices without drives to retrieve the files, or by the deterioration of the storage substrate, data becomes lost.
Even Ray Kurzweil mentions in The Singularity Is Near, how he resorts to paper printouts to save his most important data for the long term.
Now, Japanese storage and electronics company Hitachi has announced that it has come up with a solution that stores data on slivers of quartz glass, keeping important data safe and sound for perhaps as long as hundreds of millions of years. The company’s main research lab has developed a way to etch digital patterns into robust quartz glass with a laser at a data density that is better than compact discs, then read it using an optical microscope. The data is etched at four different layers in the glass using different focal points of the laser. (via 33rd Square | Superman’s Indestructible Data Crystals May Be Possible)

Researchers in Japan have come up with a storage solution to keep your most important data with a method that seems to be drawn directly from the pages of Superman.

Everyone who has gone through the process of upgrading their computer system knows the inevitable task of transferring data involves a certain amount of acceptance that some data will forever be lost.

Saved on storage devices without drives to retrieve the files, or by the deterioration of the storage substrate, data becomes lost.

Even Ray Kurzweil mentions in The Singularity Is Near, how he resorts to paper printouts to save his most important data for the long term.

Now, Japanese storage and electronics company Hitachi has announced that it has come up with a solution that stores data on slivers of quartz glass, keeping important data safe and sound for perhaps as long as hundreds of millions of years. The company’s main research lab has developed a way to etch digital patterns into robust quartz glass with a laser at a data density that is better than compact discs, then read it using an optical microscope. The data is etched at four different layers in the glass using different focal points of the laser. (via 33rd Square | Superman’s Indestructible Data Crystals May Be Possible)