Per Erik Strandberg /cv /kurser /blog

Background

Remember that I am sort of new to Java - I've read the first couple of hundred pages of Java in a nutshell ([1]) - and have only written hell o world before.

There is an excellent IDE for programming in Java: Eclipse (read more on [2] or [3]). I used Eclipse and my close friend Gnu Emacs.

Understanding Java is a little annoying since there are many things called Java (like .NET has the same name as a popular top domain):

When Microsoft created a number of extra nice and fancy libraries they wanted to insert into the Sun Java Platform something interesting happened. These libraries were only to target Windows systems, but since Java is intended to target all systems Sun did not allow the libraries in there. Microsoft got pissed off and created their own Java: .NET. It's not Java, but it has everything (more or less) Java has with some lessons learned.

Since none of the parts of Java* were free software the GNU project created at least some parts (I don't know which really). But recently Java relicensed and started using the Gnu General Public License - making it free software.

Goal

I copied some name-statistics from SCB ([9] and [10]) and was about to make a little python script to harvest some statistics from it. But instead I decided to implement it in Java.

Show me the code

Indata

In the namn.txt (download here: [11]), from the SCB homepage, only names with more that ten occurances at least one year during the last ten years are listed.

Since I just copied it from their homepage the file contains unwanted line breaks, tabs and spaces.

PojkNamn
	2007 	2006 	2005 	2004 	2003 	2002 	2001 	2000 	1999 	1998
	
	
Aaron
	35 	31 	21 	24 	20 	15 	- 	12 	- 	-
Abbas
	11 	- 	- 	- 	- 	- 	- 	- 	- 	-
Abbe
	39 	27 	21 	16 	20 	11 	- 	- 	- 	-

...

Zion
	23 	16 	- 	- 	- 	- 	- 	- 	- 	-
Åke
	14 	19 	17 	13 	10 	10 	- 	13 	- 	-


----------------------------------------------------------------

FlickNamn
	2007 	2006 	2005 	2004 	2003 	2002 	2001 	2000 	1999 	1998
Ada
	12 	14 	14 	12 	12 	- 	14 	- 	- 	-
Adela
	13 	12 	- 	- 	12 	- 	- 	10 	- 	-

...


Åsa
	- 	- 	17 	- 	10 	- 	10 	18 	13 	13
Ängla
	31 	36 	24 	- 	11 	- 	- 	- 	- 	-

A name counter class

See NameCount.java (download here [12]).

I implement a my own class that has an integer and a string member. Inheritance from for example the String class would of course have been possible since we might want to do things like:

  if (nc1.startsWith('Per'))
  {
    // ...
  }

Also you might want to make the members private - but I didn't want to take the time to do so.

The class implements the interface Comparable so that we later can sort lots of namecounters.

  public class NameCount implements Comparable<NameCount>

The interface requires us to code out own public int compareTo(NameCount nc) and this is a sloppy and stupid implementation but it does exactly what I want. It sorts namecounters with a large n as small. If two counters have an equal n then it sorts alphabetically. The body of the function is:

    int mdiff = this.name.compareTo(nc.name);
    int ndiff = this.n.compareTo(nc.n);
        
    if (ndiff != 0)
      return -ndiff;
    else
      return mdiff;        

The class also contains a constructor and a to string method.

Some kind of parser

My quick and dirty indata parser is SomeParser.java (download here: [13]) basically contains code that reads the indata file, makes a sum of the occurrences of each name in an own instance of a namecounter in a vector (could just as well have been a List of some sort).

Then it java.util.Collections.sorts the list and prints it.

Compile it

$ ls -l
total 96
-rw-r--r-- 1 per None   646 Aug  7 09:01 NameCount.java
-rw-r--r-- 1 per None  1928 Aug  7 10:47 SomeParser.java
-rw-r--r-- 1 per None 57878 Aug  5 11:37 namn.txt

$ javac *.java

Use it with BASH

Names that are almost as popular as...

$ java SomeParser namn.txt | grep -C 1 "Knut,\|Per,\|Anna,"
61 Sofia, 4151
62 Anna, 3999
63 Johan, 3979
--
361 Emilio, 359
362 Per, 359
363 Pelle, 357
--
594 Alba, 138
595 Knut, 138
596 Viking, 138

Most popular names that starts with a ...

$ java SomeParser namn.txt | grep "\ A" | head -n 3
4 Alexander, 9652
5 Anton, 9625
12 Amanda, 7803

$ java SomeParser namn.txt | grep "\ B" | head -n 3
89 Benjamin, 2884
268 Bianca, 605
292 Beatrice, 535

$ java SomeParser namn.txt | grep "\ C" | head -n 3
71 Carl, 3741
85 Clara, 3104
108 Casper, 2367


This page belongs in Kategori Programmering