Recently I came across John Crickett‘s Coding Challenges. John suggests a lot of fun challenges to learn a new programming language or technology.
The first one is writing our own wc tool. Since I have used wc frequently and I am learning Go, I decided to try implementing the wc tool in Go.
wc provides different pieces of information via different options. The default output without any option shows the results below.
wc gutenberg.txt
7137 58159 341836 gutenberg.txt
The above output shows the number of lines, words, and bytes in the file gutenberg.txt.
For the character count, we use the -m option.
Here is my current implementation of this exercise without the output formatting (shout out to my mentor, Abhishek Somani, for his help).
Breaking down the code:
- Define a struct called FileMetadata that contains the different options:
- RuneCount – count of characters in the file (should change this to CharCount)
- ByteCount – count of bytes in a file
- WordCount – number of words
- LineCount – number of lines
- Counter function – This takes a bufio.Scanner (add examples to explain this) pointer as input and return an integer. The function will count based on the type of scanner being passed in the call and return the count.
- The readfromFile function takes two arguments (the name of the file and another input of type bufio.Splitfunc to determine how the scanner should split the given byte slice). The function reads a file, creates a new scanner object, splits it based on the type of Splitfunc being passed, and returns the count. (To Do – I should add examples to explain this function)
- The checkError function has been created to simplify the error checking.
- The final function is GetFileMetadata, which is exported. This is where the main code is. The different counts are calculated by calling the functions mentioned above and, finally returning the FileMetaData struct.
- The function is then called with test files by code in the file main.go which is in the cmd directory.
The issue with this approach is that for each option, I have to open the file separately, as can be seen by the code from lines 47-61.
I started exploring how I can avoid this and whether I can come up with a simpler way to count everything. Here again. Abhishek helped me.
Here is the simplified code.
package main
import (
"fmt"
"os"
"strings"
"unicode/utf8"
)
func main() {
f := os.Args[1]
data, err := os.ReadFile(f)
if err != nil {
fmt.Println("File reading error", err)
return
}
d := string(data)
fmt.Printf("Number of bytes: %d, %s\n", len(data), f)
s := []rune(d)
runecount := utf8.RuneCountInString(string(s))
lines := strings.Split(d, "\n")
words := strings.Fields(d)
fmt.Printf("Number of characters: %d, %s\n", runecount, f)
fmt.Printf("Number of lines: %d, %s\n", len(lines), f)
fmt.Printf("Number of words: %d, %s\n", len(words), f)
}
Here is the output of the above code:
go run main.go gutenberg.txt
Number of bytes: 341836, gutenberg.txt
Number of characters: 339120, gutenberg.txt
Number of lines: 7138, gutenberg.txt
Number of words: 58159, gutenberg.txt
As you can see above, the output of all the options are correct, except for the line count. The line count is incorrect from the wc output of the same file. It is off by 1 (7138 vs 7137).
To Do – Add my Learnings.