Investment Fund Analytics: Using Daily World news for Stock Market Prediction

This is a summary presentation about the final group project I worked on during this winter for the Data Mining course in the Masters of Data Science and Analytics program at Ryerson University.


In this project we use daily world news (and more specifically the /r/worldnews subreddit) to try to predict trends (up or down) on the Dow Jones Industrial Average daily prices. The idea for this project is not originally mine, and it was first posted as part of a Kaggle dataset, with many kernel submissions , and our project changed a couple of things:

  • Reprocess the data from the source: Extract the /r/worldnews directly from the complete reddit dataset, get up/down from DJIA data coming from wsj.com
  • Change analytics tool: Use KNIME instead of R, Python or the likes
  • Spent some more time with EDA: And it wasn’t even enough, if we would have had more time we may have with the same conclusion way earlier

Using the complete Reddit dataset available (posts, comments, everything!) to reprocessing the data (and get to the same data as the Kaggle dataset) was a very interesting exercise: I used Azure HDInsight to rapidly create a cluster and Hive to process and filter the JSON files to extract just the subreddit content. The DJIA data is much smaller (and simple to manage) and then both of them were joined to obtain a dataset similar to the one from Kaggle.

In a future post, I will publish the project report paper we published with our detailed procedure and reports.

#FluxFlow: A visual tool to analyze “anomalous information” in social media

Thanks to the Social Media Analytics course I’m taking as part of my Masters in Data Science program, I found a very interesting paper about #FluxFlow that I had to summarize and present.

#FluxFlow is an analytics data visualization tool that helps identifying and understanding how ‘anomalous’ information spreads in social media. In the context of social media, “anomalous information” can be in most cases equated to rumors and ‘fake news’. Having a tool like this available to understand how this type of patterns work can help identifying and taking action over potentially harmful consequences.

The original paper (written by Jian Zhao, Nan Cao, Zhen Wen, Yale Song, Yu-Ru Lin, Christopher Collins) used for this research is available here for you to read plus a very concise and descriptive video here, and also the real #FluxFlow tool is here for you to see and understand. I created a super simple and brief presentation to summarize the tool and its potential applications to other scenarios.


Social Media Analytics: Bell Let’s Talk 2017

Two weeks ago I started the second semester of the Masters in Data Science program and as part of it I am taking a course in Social Media Analytics. The first lab assignment for this course was on January 25 and the objective is to analyze Bell Let’s Talk social media campaign. Using a proposed tool called Netlytic (a community-supported text and social networks analyzer that automatically summarizes and discovers social networks from online conversations on social media sites) created by the course’s professor Dr. Anatoliy Gruzd I downloaded a tiny slice of #BellLetsTalk hashtagged data and created this super simple Power BI dashboard.

I have been wanting to play with Power BI’s Publish to Web functionality for quite some time and thought this was a great chance to give it a cool use. The data was exported from Netlytic as three CSV files and then imported into Power BI desktop. With the desktop tool I created a couple of simple measures (Total number of tweets and posts, Average number of tweets and posts per minute and so on) and then some simple visualizations.

Continue reading “Social Media Analytics: Bell Let’s Talk 2017”

Trabajando con Blog52

Estoy trabajando con Blog52 para poder publicar una versi贸n final. Lo mas complicado (en realidad lo que m谩s fiaca da) es hacer un manager en ASP de toda la aplicaci贸n. Claro, cuando uno hace sus propias aplicaciones no usa mangers, toca los valores y el c贸digo asi como viene de fabrica. Metemos mano en el corazon de los bytes y estrujamos los valores hasta que la aplicaci贸n hace lo que nos place.
No se como sali贸 ese comentario, pero bueh.

Trabajando con Blog52

Estoy trabajando con Blog52 para poder publicar una versi贸n final. Lo mas complicado (en realidad lo que m谩s fiaca da) es hacer un manager en ASP de toda la aplicaci贸n. Claro, cuando uno hace sus propias aplicaciones no usa mangers, toca los valores y el c贸digo asi como viene de fabrica. Metemos mano en el corazon de los bytes y estrujamos los valores hasta que la aplicaci贸n hace lo que nos place.
No se como sali贸 ese comentario, pero bueh.

Aplicaciones de foros

Estoy por poner online varios sites que requieren una aplicaci贸n de foros (algo similar a un weblog) y estuve investigando que aplicaciones en ASP y freeware andan dando vueltas por el mundo.
La m谩s conocida creo que es ASP Forums, es incre铆ble, toda la aplicaci贸n pesa solo 87 kb (en un archivo zip). No me convenci贸 demasiado, requiere meterse mucho en ver como funciona para poder implementarla. Eso si, como est谩 hecha en JavaScript y OOP, permite sobrecargar los m茅todos por default que traen sus propios objetos, y customizarla por completo.
Luego encontr茅 Toast Forums, que parece bastante bonita y skinneable.
Ya se que sonar谩 obvia la pregunta, pero, porque no puede haber m谩s software en ASP Open Source como hay en PHP???

La satisfaccion del deber cumplido

Como decia antes, luego de haber estado horas investigando y revisando, buscando alguna buena aplicacion de weblogs en ASP que use base de datos (encontre blogworks pero guarda los datos en archivos de texto en el disco), me decid铆 por escribir la mia propia. Ya tiene nombre: Blog52, y su sitio tambien: http://www.blog52.com.ar. Puede que los primeros dias el sitio no este funcional, al menos hasta que NIC concluya el tramite de registro y delegacion del dominio. De todas formas, ya tengo algunas paginas publicadas alli (simplemente en texto, sin nada de estilo ni nada) como para asegurar un buen comienzo del proyecto. Espero que la gente se cope con el proyecto y al menos lo use.