lisp代写: CS 480 / 580 Assignment 4

#|

;;;;;; Assignment 4

; Write a two-layer backpropagation neural network.
;
; Due THURSDAY, October 20, at MIDNIGHT.
;

Note that the 580 students must implement one additional function (K-fold validation).

I have decided to provide the list version of this code rather than the array version
(the array version is tougher to write but harder to write bugs in).

See the questions at the end of the file.

Note that this is a STOCHASTIC system and so you need to run a number of times to assess whether the
system is really working well -- sometimes it just gets lucky (or doesn't).  What is
"a number of times"?  I'd say: 5-10 runs per experiment to get an idea, and 30 to really
be able to make a claim.  I'd also be interested in modifications you made to the
project to add interesting gizmos.

In actuality this project isn't particularly difficult.  It's really just a few lines of code.
You should be able to get this in almost immediately.

In class I mentioned that you can do backpropagation entirely with
matrix manipulation, and it's a LOT easier.  Well, it is.  Here are
the rules for doing backpropagation:

"."  (a dot)    means matrix multiply
As in:  C = A . B
no dot          means element-by-element multiply -- by which
we mean if C = (A) (B) , then for each index i and j
Cij = Aij * Bij
sigmoid[]       sigmoid function applied to each element
tr[]            transpose the matrix
1 -             subtract from 1
-               subtract two matrices
+               add two marices
alpha (...)     multiply by the scalar value alpha

c is the expected output vector
i is the provided input vector
h, c, o, i, odelta, and hdelta are all COLUMN vectors

v is a matrix with (num hidden units) rows and (size of i) columns,
initially set to random values between -(initialsize) and initialsize
w is a matrix with (size of c) rows and (num hidden units) columns,
initially set to random values between -(initialsize) and initialsize

The forward pass rules are (in this order):

h = sigmoid[v . i]
o = sigmoid[w . h]

The backpropagation rules are (in this order):

odelta = (c - o) o (1 - o)
hdelta = (h (1 - h) (tr[w] . odelta) )
w = w + alpha (odelta . tr[h])
v = v + alpha (hdelta . tr[i])

Also for your enlightenment, the error metric ( 1/2 SUM_i (Ci - Oi)^2 )
can be matrixified simply as:

error =  0.5 ( tr[c - o] . (c - o) )

...of course error would be a 1x1 matrix -- you should be able to extract
the number out of that.  :-)

odelta and hdelta are temporary matrix variables to simplify the
equations for you.  You should be able to figure out what
i, h, c, o, v, and w are.  :-) BE CAREFUL not to mix up
matrix multiply with element-by-element multiply!

Important hint: the matrix manipulation functions I have provided do NOT
have error checking in them.  If you multiply two matrices together
which are mismatched size-wise, for example, the matrix multiply
function will truncate each matrix until they *are* matched size-wise.
Thus you should check to make sure that your matrices are coming out
right.  One good hint is to make sure your W and V matrices don't
change size after backpropagation!  :-)

You will need to implement the following functions:

SIGMOID
NET-ERROR
FORWARD-PROPAGATE
BACK-PROPAGATE
NET-BUILD
SIMPLE-GENERALIZATION

The extra CS580 function is

K-FOLD-VALIDATION

The long functions are back-propagate and net-build.
Hint: BACK-PROPAGATE will have to reimplement most of the FORWARD-PROPAGATE
function, but it doesn't help much to call FORWARD-PROPAGATE
from within BACK-PROPAGATE.

After you have implemented these functions, you will test them out on
some data.  I have chosen several problems:

1. XOR and NAND   (Well, actually EQUAL rather than XOR, but we'll
pretend it's XOR -- they're isomorphic so it doesn't matter much).
These are binary classification problems.

2. Congressional Voting Records for 1984 -- to see if you can
determine who's a democrat and who's a republican.  This is again
a binary classification problem.

3. Miles per gallon for various cars in the 1980s.  This is a
regression problem with a one-dimensional output.

4. Wine classifications.  This is a classification problem with three
classes.  I have chosen to encode those three classes as
corners of a three-dimensional hypercube (you'll see).

I have built in a bias in my conversion function so you can forget about
that issue.

For the NAND and XOR problems, you have to get the code working.  This usually requires
several runs before it will converge to a perfect solution.

Once you have gotten NAND and XOR working, try running the other problems and
seeing how they generalize.




You need to provide:

1. Your working code
2. Your 500-word report.




ABOUT THE MATRICES

Here's some info about my matrix-manipulation functions, to help you
in debugging....

A matrix as follows:

2   3   7
6   2   4
5   2   9

...takes the form:

'( (2   3   7)
(6   2   4)
(5   2   9) )

...so it's in row-major order.  Traditionally, a matrix's elements
are accessed as Wij,  where i is the ROW and j is the COLUMN.  Also,
rows and columns start with 1, not 0.  I'll adhere to those standards.

I'm implementing matrices as lists of lists.  That includes vectors
(1-dimensional matrices).  So the row vector:

3   4   5

...takes the form:

'( (3   4   5) )

...and the column vector

3
5
9

...takes the form:

'( (3)
  (5)
  (9) )

All the matrix manipulation functions I've given are non-destructive.




ABOUT *VERIFY*

The matrix math code can be run with or without verification that the
various matrices are the right size for the operations.  Without checking
it's a little slower.  By default the checks are on:  *verify* is set to t

You can turn off the checks by setting *verify* to nil but only do so after
you know for a fact that your network is operating properly.
|#



(defun shuffle (lis)
  "Shuffles a list.  Non-destructive.  O(length lis), so
pretty efficient.  Returns the shuffled version of the list."
  (let ((vec (apply #'vector lis)) bag (len (length lis)))
    (dotimes (x len)
      (let ((i (random (- len x))))
	(rotatef (svref vec i) (svref vec (- len x 1)))
	(push (svref vec (- len x 1)) bag)))
    bag))   ;; 65 s-expressions, by the way


(defparameter *verify* t)

;;; hmmm, openmcl keeps signalling an error of a different kind
;;; when I throw an error -- a bug in openmcl?  dunno...
(defun throw-error (str)
  (error (make-condition 'simple-error :format-control str)))

(defun verify-equal (funcname &rest matrices)
  ;; we presume they're rectangular -- else we're REALLY in trouble!
  (when *verify*
    (unless (and
	     (apply #'= (mapcar #'length matrices))
	     (apply #'= (mapcar #'length (mapcar #'first matrices))))
      (throw-error (format t "In ~s, matrix dimensions not equal: ~s"
			   funcname
			   (mapcar #'(lambda (mat) (list (length mat) 'by (length (first mat))))
				   matrices))))))

(defun verify-multiplicable (matrix1 matrix2)
  ;; we presume they're rectangular -- else we're REALLY in trouble!
  (when *verify*
    (if (/= (length (first matrix1)) (length matrix2))
	(throw-error (format t "In multiply, matrix dimensions not valid: ~s"
			     (list (list (length matrix1) 'by (length (first matrix1)))
				   (list (length matrix2) 'by (length (first matrix2)))))))))


;; Basic Operations

(defun map-m (function &rest matrices)
  "Maps function over elements in matrices, returning a new matrix"
  (apply #'verify-equal 'map-m  matrices)
  (apply #'mapcar #'(lambda (&rest vectors)       ;; for each matrix...
		      (apply #'mapcar #'(lambda (&rest elts)     ;; for each vector...
					  (apply function elts))
			     vectors))
	 matrices))   ;; pretty :-)

(defun transpose (matrix)
  "Transposes a matrix"
  (apply #'mapcar #'list matrix))  ;; cool, no?

(defun make-matrix (i j func)
  "Builds a matrix with i rows and j columns,
    with each element initialized by calling (func)"
  (map-m func (make-list i :initial-element (make-list j :initial-element nil))))

(defun make-random-matrix (i j val)
  "Builds a matrix with i rows and j columns,
    with each element initialized to a random
    floating-point number between -val and val"
  (make-matrix i j #'(lambda (x)
		       (declare (ignore x))  ;; quiets warnings about x not being used
		       (- (random (* 2.0 val)) val))))

(defun e (matrix i j)
  "Returns the element at row i and column j in matrix"
  ;; 1-based, not zero-based.  This is because it's traditional
  ;; for the top-left element in a matrix to be element (1,1),
  ;; NOT (0,0).  Sorry about that.  :-)
  (elt (elt matrix (1- i)) (1- j)))

(defun print-matrix (matrix)
  "Prints a matrix in a pleasing form, then returns matrix"
  (mapcar #'(lambda (vector) (format t "~%~{~8,4,,F~}" vector)) matrix) matrix)

;;; Matrix Multiplication

(defun multiply2 (matrix1 matrix2)
  "Multiplies matrix1 by matrix2
    -- don't use this, use multiply instead"
  (verify-multiplicable matrix1 matrix2)
  (let ((tmatrix2 (transpose matrix2)))
    (mapcar #'(lambda (vector1)
		(mapcar #'(lambda (vector2)
			    (apply #'+ (mapcar #'* vector1 vector2))) tmatrix2))
	    matrix1)))  ;; pretty :-)

(defun multiply (matrix1 matrix2 &rest matrices)
  "Multiplies matrices together"
  (reduce #'multiply2 (cons matrix1 (cons matrix2 matrices))))

;;; Element-by-element operations

(defun add (matrix1 matrix2 &rest matrices)
  "Adds matrices together, returning a new matrix"
  (apply #'verify-equal 'add matrix1 matrix2 matrices)
  (apply #'map-m #'+ matrix1 matrix2 matrices))

(defun e-multiply (matrix1 matrix2 &rest matrices)
  "Multiplies corresponding elements in matrices together,
        returning a new matrix"
  (apply #'verify-equal 'e-multiply matrix1 matrix2 matrices)
  (apply #'map-m #'* matrix1 matrix2 matrices))

(defun subtract (matrix1 matrix2 &rest matrices)
  "Subtracts matrices from the first matrix, returning a new matrix."
  (let ((all (cons matrix1 (cons matrix2 matrices))))
    (apply #'verify-equal 'subtract all)
    (apply #'map-m #'- all)))

(defun scalar-add (scalar matrix)
  "Adds scalar to each element in matrix, returning a new matrix"
  (map-m #'(lambda (elt) (+ scalar elt)) matrix))

(defun scalar-multiply (scalar matrix)
  "Multiplies each element in matrix by scalar, returning a new matrix"
  (map-m #'(lambda (elt) (* scalar elt)) matrix))

;;; This function could
;;; be done trivially with (scalar-add scalar (scalar-multiply -1 matrix))
(defun subtract-from-scalar (scalar matrix)
  "Subtracts each element in the matrix from scalar, returning a new matrix"
  (map-m #'(lambda (elt) (- scalar elt)) matrix))






;;; Functions you need to implement

;; IMPLEMENT THIS FUNCTION

(defun sigmoid (u)
  "Sigmoid function applied to the number u"
  )

;; output and correct-output are both column-vectors

;; IMPLEMENT THIS FUNCTION

(defun net-error (output correct-output)
  "Returns (as a scalar value) the error between the output and correct vectors"
  )


;; a single datum is of the form
;; (--input-column-vector--  -- output-column-vector--)
;;
;; Notice that this is different from the raw data provided in the problems below.
;; You can convert the raw data to this column-vector form using CONVERT-DATA

;; IMPLEMENT THIS FUNCTION

(defun forward-propagate (datum v w)
  "Returns as a vector the output of the OUTPUT units when presented
the datum as input."
  )


;; IMPLEMENT THIS FUNCTION

(defun back-propagate (datum alpha v w)
  "Back-propagates a datum through the V and W matrices,
returning a list consisting of new, modified V and W matrices."
  ;; Consider using let*
  ;; let* is like let, except that it lets you initialize local
  ;; variables in the context of earlier local variables in the
  ;; same let* statement.
  )




(defun optionally-print (x option)
  "If option is t, then prints x, else doesn't print it.
In any case, returns x"
  ;;; perhaps this might be a useful function for you
  (if option (print x) x))


(defparameter *a-good-minimum-error* 1.0e-9)



;; data is of the form:
;; (
;;  (--input-column-vector--  --output-column-vector--)
;;  (--input-column-vector--  --output-column-vector--)
;;  ...
;; )
;;
;;
;; Notice that this is different from the raw data provided in the problems below.
;; You can convert the raw data to this column-vector form using CONVERT-DATA


;;; IMPLEMENT THIS FUNCTION

(defun net-build (data num-hidden-units alpha initial-bounds max-iterations modulo &optional print-all-errors)
  "Builds a neural network with num-hidden-units and the appropriate number
of input and output units based on the data.  Each element should be a random
value between -(INITIAL-BOUNDS) and +(INITIAL-BOUNDS).

Then performs the following loop MAX-ITERATIONS times, or until the error condition
is met (see below):

   1. For each data element in a randomized version of the data, perform
      backpropagation.
   2. Every modulo iterations,
          For every data element in the data, perform forward propagation and
          A.  If print-all-errors is true, then print the error for each element
          B.  At any rate, always print the worst error and the mean error
          C.  If the worst error is better (lower) than A-GOOD-MINIMUM-ERROR,
              quit all loops and prepare to exit the function --
              the error condition was met.

The function should return a list of two items: the final V matrix
and the final W matrix of the learned network."
  )



;; For this function, you should pass in the data just like it's defined
;; in the example problems below (that is, not in the "column vector" format
;; used by NET-BUILD.  Of course, if you need to call NET_BUILD from this function
;; you can alway convert this data to column-vector format using CONVERT-DATA within
;; the SIMPLE-GENERALIZATION function.
;;
;; Yes, this is ridiculously inconsistent.  Deal with it.  :-)

;;; IMPLEMENT THIS FUNCTION

(defun simple-generalization (data num-hidden-units alpha initial-bounds max-iterations)
  "Given a set of data, trains a neural network on the first half
of the data, then tests generalization on the second half, returning
the average error among the samples in the second half.  Don't print any errors,
and use a modulo of MAX-ITERATIONS."
  )



;; For this function, you should pass in the data just like it's defined
;; in the example problems below (that is, not in the "column vector" format
;; used by NET-BUILD.  Of course, if you need to call NET_BUILD from this function
;; you can alway convert this data to column-vector format using CONVERT-DATA within
;; the SIMPLE-GENERALIZATION function.
;;
;; Yes, this is ridiculously inconsistent.  Deal with it.  :-)


;;; IMPLEMENT THIS FUNCTION FOR CS 580

(defun k-fold-validation (data k num-hidden-units alpha initial-bounds max-iterations)
  "Given a set of data, performs k-fold validation on this data for
the provided value of k, by training the network on (k-1)/k of the data,
then testing generalization on the remaining 1/k of the data.  This is
done k times for different 1/k chunks (and building k different networks).
The average error among all tested samples is returned.  Don't print any errors,
and use a modulo of MAX-ITERATIONS."
  )




;;;; Some useful preprocessing functions

(defun scale-list (lis)
  "Scales a list so the minimum value is 0.1 and the maximum value is 0.9.  Don't use this function, it's just used by scale-data."
  (let ((min (reduce #'min lis))
	(max (reduce #'max lis)))
    (mapcar (lambda (elt) (+ 0.1 (* 0.8 (/ (- elt min) (- max min)))))
	    lis)))

(defun scale-data (lis)
  "Scales all the attributes in a list of samples of the form ((attributes) (outputs))"
  (transpose (list (transpose (mapcar #'scale-list (transpose (mapcar #'first lis))))
		   (transpose (mapcar #'scale-list (transpose (mapcar #'second lis)))))))

(defun convert-data (raw-data)
  "Converts raw data into column-vector data of the form that
can be fed into NET-LEARN.  Also adds a bias unit of 0.5 to the input."
  (mapcar #'(lambda (datum)
	      (mapcar #'(lambda (vec)
			  (mapcar #'list vec))
		      (list (cons 0.5 (first datum))
			    (second datum))))
	  raw-data))

(defun average (lis)
  "Computes the average over a list of numbers.  Returns 0 if the list length is 0."
  (if (= (length lis) 0)
      0
      (/ (reduce #'+ lis) (length lis))))


;;; here are the inputs and outputs of your three problems to test
;;; the net function on.


(defparameter *nand*
  '(((0.1 0.1) (0.9))
    ((0.9 0.1) (0.9))
    ((0.1 0.9) (0.9))
    ((0.9 0.9) (0.1))))


(defparameter *xor*
  '(((0.1 0.1) (0.1))
    ((0.9 0.1) (0.9))
    ((0.1 0.9) (0.9))
    ((0.9 0.9) (0.1))))




;; I converted the attribute values as follows:
;; Democrat -> 0.9, Republican -> 0.1
;; y -> 0.9, n -> 0.1, no-vote -> 0.5

;; The output is democrat or republican

;; the input is the following votes:
					;   1. handicapped-infants: 2 (y,n)
					;   2. water-project-cost-sharing: 2 (y,n)
					;   3. adoption-of-the-budget-resolution: 2 (y,n)
					;   4. physician-fee-freeze: 2 (y,n)
					;   5. el-salvador-aid: 2 (y,n)
					;   6. religious-groups-in-schools: 2 (y,n)
					;   7. anti-satellite-test-ban: 2 (y,n)
					;   8. aid-to-nicaraguan-contras: 2 (y,n)
					;   9. mx-missile: 2 (y,n)
					;  10. immigration: 2 (y,n)
					;  11. synfuels-corporation-cutback: 2 (y,n)
					;  12. education-spending: 2 (y,n)
					;  13. superfund-right-to-sue: 2 (y,n)
					;  14. crime: 2 (y,n)
					;  15. duty-free-exports: 2 (y,n)
					;  16. export-administration-act-south-africa: 2 (y,n)

#|
;;; some test code for you to try

;;; you'll need to run this a couple of times before it globally converges.
;;; When it *doesn't* converge what is usually happening?
(net-build (convert-data *nand*) 3 1.0 5 20000 1000 t)

(net-build (convert-data *xor*) 3 1.0 5 20000 1000 t)


;; how well does this converge on average?  Can you modify it to do better?
(net-build (convert-data *voting-records*) 10 1.0 2 5000 250 t)


;;; how well does this generalize usually?  Can you modify it to typically generalize better?
(simple-generalization *voting-records* ...)  ;; pick appropriate values

;;; how well does this generalize usually?  Can you modify it to typically generalize better?
(simple-generalization *mpg* ...) ;; pick appropriate values

;;; how well does this generalize usually?  Can you modify it to typically generalize better?
(simple-generalization *wine* ...)  ;; pick appropriate values

|#