Reading modes

1. Quick skim to glean the basic hypothesis of the article/text
2. Careful read to check the logical consistency
3. Extra careful read to see if it’s well-writter(usually, a function of brevity and coherence)
4. Editor mode.. look at spells, grammar, punctuation.. actually read out aloud etc..

I mostly indulge 1 or 2. Am trying to reduce the mode1 reading and increase mode 3 reading.
Let’s see.

Chennai Impressions — part II

has been a big revelation. Am seeing things in a very different light. More
cynical, more objective. Today some propaganda playing some song here.
Few observations:
Rheoteric involves
1. Presenting correlation as causality
2. Overwrought metaphors
3. Meme/idea/thought Repetition
4.

While the ADMK, seems to take a more communist/socialist approach, the DMK seems to take the “Tamil”, self-respect, authoritarianism approach.

Generating a Plain Text Corpus from Wikipedia

Originally posted on After the Deadline:

AtD *thrives* on data and one of the best places for a variety of data is Wikipedia. This post describes how to generate a plain text corpus from a complete Wikipedia dump. This process is a modification of Extracting Text from Wikipedia by Evan Jones.

Evan’s post shows how to extract the top articles from the English Wikipedia and make a plain text file. Here I’ll show how to extract all articles from a Wikipedia dump with two helpful constraints. Each step should:

  • finish before I’m old  enough to collect social security
  • tolerate errors and run to completion without my intervention

Today, we’re going to do the French Wikipedia. I’m working on multi-lingual AtD and French seems like a fun language to go with. Our systems guy, Stephane speaks French. That’s as good of a reason as any.

Step 1: Download the Wikipedia Extractors Toolkit

Evan made available a…

View original 541 more words

Originally posted on After the Deadline:

AtD *thrives* on data and one of the best places for a variety of data is Wikipedia. This post describes how to generate a plain text corpus from a complete Wikipedia dump. This process is a modification of Extracting Text from Wikipedia by Evan Jones.

Evan’s post shows how to extract the top articles from the English Wikipedia and make a plain text file. Here I’ll show how to extract all articles from a Wikipedia dump with two helpful constraints. Each step should:

  • finish before I’m old  enough to collect social security
  • tolerate errors and run to completion without my intervention

Today, we’re going to do the French Wikipedia. I’m working on multi-lingual AtD and French seems like a fun language to go with. Our systems guy, Stephane speaks French. That’s as good of a reason as any.

Step 1: Download the Wikipedia Extractors Toolkit

Evan made available a…

View original 541 more words

Messed up online payment

i cannot believe the amount of mess that one has to go through to pay online now… It has taken me about 40-45 mins to pay my internet bill. First of all airtel sends you a password by mobile, before it will display your bill amount and link to pay, next the standard card no, expiry date and then once again a verified by visa page with another set of password… Phew….. am supposed to set so many passwords and remember all of them?? Jesus….. not to mention these passwords come with expiry period… I had to reset mine…. Ofcourse, with my address change and number not being updated, because of hdfc insistingno coming in person and displaying a original rather than mailing in their form filled with details… God…. it’s a mess