Stage 2:
This builds upon the Dashboard v0.1 created in Fall 2008. This work is now complete. (Apr 16, 2009). Please see Dashboard v0.3
The goals of this section are to get an auto-updating stream of data from the internet. For this project, two streams of data have been chosen:
-
the NiCHE News RSS Feed
-
the NiCHE Google Analytics XML Feed depicting Daily Traffic to a NiCHE Member Project
1) The RSS Feed has successfully been downloaded with a spider and parsed for the desired information using a Python program and urllib2. This data is then displayed on a Phidgets LCD screen, ticker-tape style.
2) Using the techniques explained in "No Google Analytics API? No Problem", I was able to send an XML file of data from Google Analytics to a Gmail Account. That file was then automatically forwarded to a Google Group using the "Filter" feature in Gmail. Google Groups then automatically compiled an RSS feed which contained enough information to scrape together the stable URL of the XML file. A spider was then able to download the XML file to a local txt file.
The XML file chosen displays daily traffic data for one of the NiCHE member projects. Once downloaded, this data is then parsed and combined into four sets of seven days each. This gives the viewer an idea of how many people were coming each week, without making the current week seem undervisited (since Google Analytics considers weeks Monday to Sunday, not the past 7 days). This information is then randomly displayed to a Phidgets LCD screen.
The downside of this method is that it relies heavily on 3 Google Services that could be changed or removed at any time. The data also must be stored publicly (as in not password protected, but not listed anywhere clearly visible). Google Groups also would not allow Python's urllib2 to download the RSS feed citing "403 Forbidden." I was able to get around this by sending HTTP Header info that said my Python request was actually coming from a Firefox browser - though Google could make changes that break this at any time.
Ideally, we should be Spidering the data directly from Google Analytics, though I have not been able to figure out how to get into secure sites using Python's urllib2. Live HTTP Headers showed 16 separate HTTP requests just to log into Google Analytics (not including the extra steps required to download the XML file) so that might be a big task.
—
These programs can only work one at a time; they have not yet been integrated. See Dashboard v0.3 for the next step in development
Images:
Links:
-
http://www.nomadjourney.com/2009/03/automatic-site-login-using-python-urllib2/
-
http://lethain.com/entry/2008/sep/11/extracting-data-from-google-analytics-reports/
-
http://thinkingphp.org/code/datasources/google_analytics_source.phps
-
http://www.ibm.com/developerworks/webservices/library/ws-pyth11.html#code1
-
http://blogoscoped.com/archive/2008-01-17-n73.html (No Google Analytics API? No Problem)
-
http://www.python.org/doc/2.6/howto/urllib2.html (Sending a Firefox request using Python)
Books:
-
Hemenway & Calishain, Spidering Hacks (O'Reilly, 2004).
-
Fry, Ben. Visualizing Data (O'Reilly, 2008), p. 284.

