API

Below is the documentation found in the sources. It is autogenerated by sphinx from the docstrings in the source code.

A reddit bot for various analysis

karma_breakdown.clean_comment(comment, stopwords=None)

Tokenize the string comment and remove various words in the stopwords list. Notably remove punctuation, numbers in the string, url, and various little french and english words.

To get thoses words, see french_stopwords in Model.py

The goal is to get a list of words that can be used for sentiment analysis.

Parameters:
  • comment (str) – A string to be cleaned.
  • stopwords (list of str, optional) – A list of string to
  • from the comment. (remove) –
Returns:

List of clean words.

Return type:

list of str

karma_breakdown.create_mask(filename)

Create a mask from an image Idea comes from https://github.com/amueller/word_cloud/blob/master/examples/masked.py

Parameters:filename (str or None) – The name of the file to create a mask with. If None, returns None.
Returns:An array defining the shape of the image. Or None is filename=None.
Return type:np.array or None
karma_breakdown.extract_comment(data)

Takes a dict of Post and returns the concatenation of all the comment of the Posts cleaned into a string

Parameters:data (dict) – The data to extract comment from. The key/value pair is of type str/:obj:Post.
Returns:A text composed of all the clean comment gathered in data.
Return type:str
karma_breakdown.generate_wordcloud(text, background_color='white', mask=None, max_words=500, savefilename=None)

Create and save a wordcloud.

Parameters:
  • text (str) – A text to makes a wordcloud from.
  • background_color (str, optional) – The string represents known color to matplolib. Define the background color of the wordcloud. Default is “white”.
  • mask (str or None, optional) – The name of the file to create the mask to apply to the wordcloud if you want it to not be a rectangle or None for no mask. Default is None.
  • max_words (int, optional) – Maximum of words to consider when creating the wordcloud from the text. Default is 500.
Returns:

None

karma_breakdown.load_counter(filename='counter2016-12-04', dirname='save')

Unshelve a counter from filename and returns it. Returns an empty counter if filename doesnt exist.

Parameters:
  • filename (str, optional) – The name of the file to retrieve data from. Default is “counter” + today_str(). today_str returns the date of today with the isoformat. i.e. “2016-11-22”
  • dirname (str, optional) – The name of the dir to retrieve data from. Default is “save”
Returns:

The counter object retrieved from filename or an empty counter.

Return type:

Counter

karma_breakdown.load_data(filename='posts2016-12-04', dirname='save')

Unshelve a dict from filename. The dict contains data about Post objects retrieved from reddit.

Parameters:
  • filename (str, optional) – The name of the file to retrieve the data from. Default is “posts” + today_str(). today_str returns the date of today with the isoformat. i.e. “2016-11-22”
  • dirname (str, optional) – The name of the dir to retrieve the data from. Default is “save”
Returns:

Dictionary containing data about Post.

Return type:

dict

karma_breakdown.save_counter(data, filename='counter2016-12-04', dirname='save')

Shelve data into filename.

Parameters:
  • data (Counter) – A counter object to shelve.
  • filename (str, optional) – The name of the file where to shelve data to. Default is “counter”. today_str returns the date of today with the isoformat. i.e. “2016-11-22”
  • dirname (str, optional) – The name of the dir where to shelve data to. Default is “save”
Returns:

Nothing.

Return type:

None

karma_breakdown.save_data(data, filename='posts2016-12-04', dirname='save')

Shelve data to filename

Parameters:
  • data (dict) – The dict to be shelved. Keys are id(str) of posts objects, values are posts objects
  • filename (str, optional) – The name of the file where to sheve data to. Default is “posts” + today_str(). today_str returns the date of today with the isoformat. i.e. “2016-11-22”
  • dirname (str, optional) – The name of the dire where to whelve data to. Default is “save”
Returns:

Nothing

Return type:

None

karma_breakdown.save_wordcloud(wordcloud, filename='wordcloud.jpg', dirname='wordcloud')

Create a plot with wordcloud and save it in dirname/filename Also creates dirname if not found.

Parameters:
  • wordcloud (WordCloud) – A WordCloud object. See module wordcloud.
  • filename (str, optional) – The name to save the file to. Default is “wordcloud.jpg”.
  • dirname (str, optional) – The name of the dir to save the file to. Default is “wordcloud”.
Returns:

Side effect is creates a file, and if dirname did not exist, creates dirname.

Return type:

None

karma_breakdown.search_reddit(subreddit='france', limit=100, category=1)

Connect to reddit and gather posts from a subreddit.

Note

Can be quite long (few minutes to run with limit=100) so cache it with save_data.

Parameters:
  • subreddit (str, optional) – A string that designate an existing subreddit. Default is “france”.
  • limit (int, optional) – Number of reddit posts to retrieve. Default is 100.
  • category (int, optional) – Either HOT or TOP. Those constants are defined in model.py. They determine if you gather post from the TOP (best posts ever) or HOT (newest posts) category. Default is HOT.
Returns:

Return a dict of posts gathered. With key being the id of the post and value being the posts themselves.

Return type:

dict

karma_breakdown.show_wordcloud(wordcloud)

Create a plot with wordcloud and displays it.

Parameters:wordcloud (WordCloud) – A WordCloud object. See module wordcloud.
Returns:Side effect is showing a plot on screen.
Return type:None
karma_breakdown.today_str()

The date of today in isoformat. i.e. ‘2016-11-26’

karma_breakdown.words_count_update(comment)

Takes a clean comment and count words. It adds up to previous count. Also updates the shelve/pickle with the new count.

Parameters:comment (list) – A clean list of words. See the function clean_comment.
Returns:A counter object, retrived with the function load_counter and updated with comment.
Return type:Counter

See also

clean_comment load_counter save_counter