Classical IO
<-binds the result from executing an IO action to a name1 2 3 4 5 6 7
-- can run the program with runghc (run the script without compiling) main = do putStrLn "Hello, what's your name?" name <- getLine putStrLn $ "Welcome to Haskell " ++ name ++ "!" -- putStrLn :: String -> IO () -- getLine :: IO String
- functions with
IOreturn type means they may have side effects (e.g., show something in the terminal), or return different values when called with the same arguments - type
IO tis an action. Actions can be created, assigned and passed everywhere, but is only executed within another IO action (e.g.,main).1 2 3
-- ghci> let writefoo = putStrLn "foo" -- ghci> writefoo -- when given an IO action, ghci will execute it -- foo -- foo is not a return value of pustStrLn, but is the side effect of putStrLn writing foo to the terminal
IO tindicates the return value is of typet,IO ()means there is no return value- Io actions produce an effect when executed/performed/called by something else in a IO context, but not when evaluated.
1
writefoo = putStrLn "foo" -- this expression is not evaluated, writefoo just stores an IO action
main :: IO ()is the entry point of any Haskell program. It is the mechanism that provides isolation from side effects: IO actions are executed in a controlled environment (the function return type indicates whether it is pure).dodefine a sequence of actions, it is only needed when there are multiple actions to execute. The value of adoblock is the value of the last action executed. In adoblock, use<-to get results from IO actions, and useletto get results from pure code- every line in a
dois a new context, so one can re-declare variables1 2 3 4 5 6 7
main = do let a = 1 let a = 2 -- fine str <- getLine putStrLn $ "Data: " ++ str str <- getLine putStrLn $ "Data: " ++ str
System.Environment.getArgs :: IO [String]returns a list of command line arguments- use
System.Environment.getEnvto look for a specific environment variable
Handles
- import
System.IOfor basic IO functions openFilereturns a fileHandleto perform specific operations on the file, e.g.,hPutStrLnworks just likeputStrLnbut takes additionalHandleargument,hCloseis to close theHandle. There arehfunctions corresponding to usually all of the non-hfunctions inSystem.IO.returnin adoblock is the opposite of<-, it takes pure value and wraps it to IO, since every IO action must return some IO type. E.g.,return 7would create ana action stored inIO Inttype. When executed, the action would produce the result7.1 2 3 4
returnTest = do one <- return 1 -- return does not terminate the do block, <- pull out of the values in the IO (e.g., Int) let two = 2 putStrLn $ show (one + two)
openFile :: FilePath -> IOMode -> IO Handle,openBinaryFilehandles binary filesFilePathis just a type synonym forStringIOMode:ReadMode,WriteMode(file is completely emptied),AppendMode(start from the end of the file),ReadWriteMode
- Haskell maintains internal buffers for files, until
hCloseis called on a file, the data may not be flushed out to the OS. Although when a program exits, Haskell will take care of closing files, but in some crashes this may not happen, so callhCloseexplicitly.1 2 3 4 5 6 7
do tempDir <- catch (getTemporaryDirectory) (\_ -> return ".") -- the return value from finally is the first actions's return value -- ensures even there is en exception before, the file will be closed finally (func tempfile temph) (do hClose temph removeFile tempfile)
- when reading anf writing from a
Handle, the OS maintains an internal record of the current position,hTelltakes aHandleand returns the current position in the file,hSeektakes aHandleand aSeekModeand moves the position to the specified location.SeekModecan beAbsoluteSeek,RelativeSeek(from the current position, can be positive or negative),SeekFromEnd(seek from the end of the file,hSeek handle SekFromEnd 0goes to the end of the file). Handlecan also corresponds to a network connection or a terminal, usehIsSeekkableto check if aHandleis seekable.- there are 3 predefine
Handles:stdin,stdout,stderr, some OS allows to redirect the file handles, e.g.,$ echo John | runghc callingpure.hs1 2 3
getLine = hGetLine stdin -- non-h functions are shortcuts for h functions putStrLn = hPutStrLn stdout print = hPrint stdout
Files
System.DirectoryprovidesremoveFileandrenameFile(can also be used to move files if the second argument is a different directory)openTempFiletakes a directory (e.g.,.or withSystem.Directory.getTemporaryDirectory) and a template for naming the file (it will be added with random characters), and returns the file path and a handle opened inReadWriteMode, so canhCloseandremoveFilewhen done1 2
-- the second is a lambda function, that takes an argument and ignores it, and return "." tempdir <- catch (getTemporaryDirectory) (\_ -> return ".")
pos <- hTell tempcan appear multiple times in a IO context
Lazy IO
- in classical IO, each line or block of data is requested and processed individually
hGetContents :: Handle -> IO String, the string it returns is evaluated lazily. Data is only read from the Handle as the characters are processed. When elements are no longer used, the gabage collector automatically frees the memory (when all inputs are consumed, the file is automatically closed). The return value can be directly passed to pure functions.1 2
inpStr <- hGetContents inh hPutStr outh (map toUpper inpStr) -- the compilers knows the consumed dtata can be freed
hGetContentsallows to read file that is larger than the memory, but the file cannot be closed until no processing is needed (e.g., after thehPutStrcall). IfinpStris hanged on, the file will be kept open.readFile :: FilePath -> IO StringwrapsopenFile,hGetContentsandhClosetogether,writeFile :: FilePath -> String -> IO ()wrapsopenFile,hPutStrandhClosetogether1 2 3
main = do inpStr <- readFile "input.txt" writeFile "output.txt" (map toUpper inpStr) -- hClose is no need
- one can think of
StringbetweenreadFileandwriteFileas a pipe linking the two, data goes in one end, is transformed and comes out the other end. Therefore even for large files, the memory usage is small. interact :: (String -> String) -> IO ()reads from standard input, applies the function to the input, and writes the result to standard output.1 2 3
main = interact (map toUpper) -- runghc toUpper.hs < input.txt > output.txt -- redirect input and output -- runghc toUpper.hs -- ENTER will output the result and wait for more input
IO monad
- actions resemble functions
- IO actions are defined within the IO monad. Monads are a way of chaining functions together purly
- can store and pass actions in pure code
1 2
list2actions :: [String] -> [IO ()] list2actions = map str2action
- every element, except
let, in adoblock must yield an IO action1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
str2message :: String -> String str2message input = "Data: " ++ input str2action :: String -> IO () str2action = putStrLn . str2message numbers :: [Int] numbers = [1..10] main = do str2action "Start of the program" -- mapM_ :: Monad m => (a -> m b) -> [a] -> m () -- mapM :: Monad m => (a -> m b) -> [a] -> m [b] -- takes an IO action which is executed for each element in the list -- map does not execute the action mapM_ (str2action . show) numbers str2action "Done!"
Sequencing
doblocks join together actions,(>>) :: (Monad m) => m a -> m b -> m band(>>=) :: (Monad m) => m a -> (a -> m b) -> m bare also sequencing operators(>>): the first action is executed, then the second, the result is the result of the second action, e.g.,putSteLn "line 1" >> putStrLn "line 2"(>>=): runs an action, pass the result to a function that returns the second action, runs the second action and return the result of the second action, e.g.,getLine >>= putStrLnreads a line from the keyboard and displays it back1 2 3
main = putStrLn "Greetings! What is your name?" >> getLine >>= (\inpStr -> putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!")
Buffering
- writing to a disk can be thousands slower than writing to memory, even if the operation does not directly communicate with the disk (e.g., the data is cached), IO involves a system call, which is slow
- buffering reduces the number of IO requests by request a large chunk of data at once, even if the code processes one character at a time
- can manually change buffering modes
Nobuffering: no buffering, data read/write is one character at a timeLineBuffering: the input/output buffer is read/written whenever a newline character is met, or it gets too largeBlockBuffering: the input/output buffer is read/written whenever in fixed-size chunks whenever possible, unusable for interactive programs
- can check the buffering mode with
hGetBufferingand change it withhSetBuffering stdin LineBuffering - can force Haskell to write out all data in (flush) the buffer with
hFlush(hCloseautomatically flushes the buffer). This is useful when we want to make the data on disk available as another program is reading it concurrently