Tag Archives: text analysis

Etymologic cartography

I propose Etymologic cartography as a field of study: Somebody had the simple but appealing idea to simply translate the toponyms on a map to English. In this case the subject in question is the USA:

Map of the USA with state names in English

Some of the names are rather interesting (and were unknown to me), e.g. Asleep for Iowa, Flattened Water for Nebraska, Great Hills for Massachusetts, Lord of War for Delaware, Dugout Canoe for Missouri (see here for an ordinary USA map for comparison). Note also, that both peripheral Alaska and peripheral Maine consider(ed) themselves the Mainland and that Idaho was apparently named such as a practical joke (really!? – possibly!)!

Also, the map nicely answers a question a friend of mine recently wondered about (and which I couldn’t answer): Kansas apparently means Wind, while Arkansas means People of the Wind!

The principle of etymologic cartography is of course easily transferred to other geographic areas. Though, coming to think of it, given its history the USA has probably a substantial (more than average?) density of toponyms that don’t stem from the local language but rather from Spanish or Aboriginal American languages (think, for example, Utah). I wonder what other countries or regions would especially lend themselves to such an experiment?

(via gisn8)

Syntax highlighting on WordPress.com

In order to present readable and usable code here on WordPress.com without manually formatting it I searched the intertubes for an elegant solution – and found one here.

import math, string

def readFile(file):
    """Reads a file from disk and returns its content as a string variable
    :param file: path to a file to be read
    """
    with open(file, 'r') as f:
        data = f.read()
    return data

def countCharsWhitespace(data):
    """Returns the number of characters in a string as an integer,
       including whitespace characters and punctuation marks, but
       not newline characters
    :param data: string variable containing the text to be analysed
    """
    data = data.replace('\n','')
    return len(data)

def countWords(data):
    """Returns the number of words in a string as an integer
    :param data: string variable containing the text to be analysed
    """
    return len(data.split())

The syntax highlighting works with what WordPress seems to call a shortcode: [ sourcecode ]. Presently some 25 languages are supported and various options further customise the look and feel of the code representation (for example, line numbers could easily be added to above display):

  • autolinks (true/false) — Makes all URLs in your posted code clickable. Defaults to true.
  • collapse (true/false) — If true, the code box will be collapsed when the page loads, requiring the visitor to click to expand it. Good for large code posts. Defaults to false.
  • firstline (number) — Use this to change what number the line numbering starts at. It defaults to 1.
  • gutter (true/false) — If false, the line numbering on the left side will be hidden. Defaults to true.
  • highlight (comma-seperated list of numbers) — You can list the line numbers you want to be highlighted. For example “4,7,19″.
  • htmlscript (true/false) — If true, any HTML/XML in your code will be highlighted. This is useful when you are mixing code into HTML, such as PHP inside of HTML. Defaults to false and will only work with certain code languages.
  • light (true/false) — If true, the gutter (line numbering) and toolbar (see below) will be hidden. This is helpful when posting only one or two lines of code. Defaults to false.
  • padlinenumbers (true/false/integer) — Allows you to control the line number padding. true will result in automatic padding, false will result in no padding, and entering a number will force a specific amount of padding.
  • toolbar (true/false) — If false, the toolbar containing the helpful buttons that appears when you hover over the code will not be shown. Defaults to true.
  • wraplines (true/false) — If true, line line wrapping will be disabled. This will cause a horizontal scrollbar to appear for long lines of code.

(and yes, for one of my next projects I’m looking into (very basic!) language analysis ;)

Typealyzer: Analysing blog(ging) types

Recently, I was pointed to Typealyzer, a tool for analysing blog types or, actually, the personality types of the people behind the blog. The information is visualised in a spider chart with eight personality dimensions.
Typealyzer is the doing of Mattias Ostmar and Jon Kågström, the former being a (self-described) media and communication geek and Communication Analyst at Sweden-based PRfekt. Mattias specialises in psychological text analysis. Besides Typealyzer he has several other projects in that field. His website/blog is http://www.mattiasostmar.net and his Typealyzer profile looks like this:

Continue reading