Drew Conway gave this list http://www.quora.com/Programming-Challenges-1/What-are-some-good-toy-problems-in-data-science
- World Bank - http://data.worldbank.org
/ there are literally too many data sets here to count, but given the mission of WB most of them are focused on growth and development. A small project that did some basic time-series or correlation of this data could be interesting. Comparing post-earthquake metrics for the reliefs efforts in Haiti vs. Pakistan might be a cool place to start. - U.S. Census - http://www.census.gov/mai
n/www/a... also Infochimps has a great set of APIs focused on census data (http://api.infochimps.com/ ), and if you are an R hacker you could use my wrapper to access it (http://cran.r-project.org/web/pa... ). Census data is great for doing spatial analysis, e.g., compare the average level of education to mean household income for all US zipcodes and stick it on a map. - ICPSR - http://www.icpsr.umich.ed
u/icpsr... the Inter-university Consortium for Political and Social research is a treasure trove of socially relevant data, and includes current and past waves of the American National Election Study. This would be a good place to consider doing a mash-up, perhaps voting patterns in a given census track controlling for income and education. - Yelp - http://www.yelp.com/devel
opers/d... people love to eat and be entertained, and the Yelp API has a decent set of tools for extracting these preferences. Recently, I have tried to play around with this API as part of a project involving health code violation data from in NYC (http://www.nyc.gov/html/datamine... ) and found it to be a bit unruly to work with. But, if you had a smaller project in mind it certainly fits your description. - Local data - speaking of NYC Data Mine, some of the most useful toy data apps I have seen involve local open data. Check to see if your city, or one nearby, maintains an open data repository and start hacking. Hint: people love to know where buses and taxis are.