CSCI 544 - Applied Natural Language Processing


Homework 0
Due: Jan 19, 2015 (9:59 PST)


Overview:

Your objective is to write a program which will read in a file, count the words on each line, and write an output file with the number of words on each line of the input. This is primarily a chance for you to install and check your setup with the required tools for the class (Bitbucket, Python 3, Ubuntu (VirtualBox)) and make sure you’re comfortable with basic programming.

Details:

You will write a Python3 program (assignment0.py) which will take a path to an input file (absolute path name) as the first parameter and a path to an output file (absolute path) as a second parameter. It will read the lines from the input and write the number of words in each input line as a separate line in the output. So, your program would be expected to handle:

> python3 assignment0.py /path/to/input /path/to/output

If the line “People love to read about Nelson” occured in the input file, you would write the line: “6” to the output file. To simplify this task, the sentences will not include punctuation or contractions.

You can test your code using this sample input file (dev.sentences) and a sample results file (dev.results). The actual test file will consist of a similar set of sentences.

Submission:

All submissions will be completed through your Bitbucket account. If you haven't already, please register for a Bitbucket account and submit your username. You then need to create an assignment repository ("csci544-hw0" (case sensitive)) following the instructions for repository settings and giving grading access.

Getting started:

  1. Setup your Ubuntu development environment (setup guide)
  2. Setup your Bitbucket account (setup guide)
  3. In your Bitbucket account, create a new repository for homework 0 (csci544-hw0). Follow the instructions for correct repository settings.
  4. In your repository, create a file “assignment0.py”
  5. Tips: