Thursday, October 13, 2011

Simple clojure string examples

One of the easiest places to delve into a new programming language I've always found to be string manipulation.  Not one of the most exciting aspects of most modern programming languages.  However so many atomic processes either boil down to manipulating strings or rely on it at one level or another.  Now in clojure as with lisp before it due to the fact that all data is code and all code is also data, clojure does not explicitly provide functions to manipulate strings.  Clojure's lazy evaluation means that all functions are data type agnostic, which takes some real getting used to coming from such strongly typed languages such as Java or C#.  Meaning that even though these examples use strings they work with any type of collection:  sets, lists, etc.  Here are some examples of some basic data transformations you can do with clojure:



01. First function example
  (first "test") = t
Return the first item in the collection.

02. Rest function example
  (rest "test") = (e s t)
Return all the items in the collection except the first element.

03. Str function example
  (str "test") = test
Return the items in the collection as a string.  If there is more than 1 item they are concatinated together, if a single item is passed then str is equivalent to the java toString() method.

04. Set function example
  (set "test") = #{e s t}
Return a set of the distinct items in the collection.

05. Subs function example
  (subs "test" 1) = est
Return a subset of the collection from the start or designated starting point (0 based) to either the collection's end or the designated ending point.

06. Nth function example
  (nth "test" 1) = e
Return the nth item in the collection.

07. Reverse function example
  (reverse "test") = (t s e t)
Return a collection with the original collection's items in reverse order.

08. Drop function example
  (drop 2 "test") = (s t)
Returns a collection of all but the first n item in the collection.

09. Drop-last function example
  (drop-last "test") = (t e s)
Drop last item in collection.

10. Count function example
  (count "test") = 4
Return the size of the collection.

11. Cons function example
  (cons "test" "01") = (test 0 1)
Cons function from lisp.  Constructs a new collection with "test" as the first item in the final collection and treats "01" as a collection separating its elements.

12. Concat function example
  (concat "test" "01") = (t e s t 0 1)
Returns a lazy sequence representing the concatenation of the first collection's elements with the elements of the second collection.

13. Lazy-cat function example
  (lazy-cat "test" "01") = (t e s t 0 1)
Returns a lazy sequence of the supplied collections.

14. Take function example
  (take 2 "test" = (t e)
Returns the first n items in the supplied collection.

15. Take-last function example
  (take-last 2 "test" = (t e)
Returns the last n items in the supplied collection.

16. Take-nth function example
  (take-nth 2 "test" = (t s)
Returns every nth item in the supplied collection.

See the clojure 1.3 api for full specifications:  http://clojure.github.com/clojure/

Tuesday, July 12, 2011

Converting VARBINARY to VARCHAR yields only opening character

Came across this interesting feature of SQL Server the other week. The system we're working on takes incoming text files and stores their contents in full as VARBINARY fields in order to maintain a complete collection of messages that have been uploaded to the system. Currently we're circumventing this default load process and loading directly into the VARBINARY columns from our external database.

DECLARE @foo NVARCHAR(3)
SET @foo = 'bar'
SELECT CONVERT(VARBINARY,@foo)

This yields the expected Unicode binary: 0x620061007200

However when unpacking these VARBINARY fields to VARCHARs we get the following:

SELECT CONVERT(VARCHAR,CONVERT(varbinary,@foo))

Yielding: 'b'

Did you see the error?

The problem arises when unknowingly boxing an NVARCHAR into a VARCHAR because you're unaware of what the initial datatype was before it got converted to binary. The variable @foo was originally an NVARCHAR that got converted to binary thus retaining the additional UTF-8 bytes. Now when unpacking that binary data into a VARCHAR expecting ASCII data we're only left with the opening character.

Say two different programmers wrote these two separate pieces. The first programmer loads NVARCHARs into the binary fields, and the second programmer keeps trying to extract these binary fields as VARCHARs only to find the data truncated down to a single character. Obviously the best thing to do in this case is to have the programmers communicate with each other in order to maintain data consistency. However if this proves impossible for whatever reason it may prove useful to check to make sure your fields aren't being inadvertently truncated when being unpacked from their binary fields.

tl;dr Just use NVARCHAR for everything and save yourself the headache

Sunday, February 6, 2011

Getting emacs's rgrep working in windows

In order to get many of the useful emacs utilities working in windows you'll need to install a UNIX emulator for windows. I chose to go with cygwin.

There are 3 different ways of grepping in emacs:
grep
lgrep (local grep - will only search the directory you're currently in)
rgrep (recursive grep - will search subdirectories)

Installing cygwin seemed to grant access to 2 types of grep: grep and lgrep. While rgrep continued to spout "Parameter format not correct" messages.


> rgrep -nH "meta" *.*

find . "(" -path "*/CVS" -o -path "*/.svn" -o -path "*/{arch}" -o -path "*/.hg" -o -path "*/_darcs" -o -path "*/.git" -o -path "*/.bzr" ")" -prune -o -type f "(" -iname "*.*" ")" -exec grep -i -nH -e "meta" {} /dev/null ";"
FIND: Parameter format not correct

Grep exited abnormally with code 2 at Sun Feb 06 23:12:00


It turns out that lgrep utilizes only the normal grep formatting. However, it appears that rgrep pipes its arguments to "find" in order to build the subdirectory tree where it runs grep on each folder individually. However emacs utilizes windows find by default and must be pointed to cygwin's find instead in order for rgrep's unix style arguments to work.

Please add the following line to your .emacs:
(setq find-program "C:\\path-to-cygwin\\bin\\find.exe")

*NOTE* You need to fully exit emacs and restart instead of merely running load-file ~/.emacs in order to overwrite many of the cached cygwin interfaces.

At this point rgrep sort of works. When running the same above search query I started receiving the following output (note how we are now running cygwin's find method:

> rgrep -nH "meta" *.*

C:\cygwin\bin\find.exe . "(" -path "*/CVS" -o -path "*/.svn" -o -path "*/{arch}" -o -path "*/.hg" -o -path "*/_darcs" -o -path "*/.git" -o -path "*/.bzr" ")" -prune -o -type f "(" -iname "*.*" ")" -exec grep -i -nH -e "meta" {} NUL ";"
/usr/bin/find: `grep': No such file or directory
/usr/bin/find: `grep': No such file or directory
/usr/bin/find: `grep': No such file or directory
.
.
.
etc

Emitting a "/usr/bin/find: `grep': No such file or directory" error for each unsuccessful search attempt.

After much searching I found a workaround here detailing how to pipe the warning message to "/dev/null" instead of "windows-null" in order to suppress the warning messages found in the grep output buffer.

Please add the following to your .emacs file:

;; Prevent issues with the Windows null device (NUL)
;; when using cygwin find with rgrep.
(defadvice grep-compute-defaults (around grep-compute-defaults-advice-null-device)
"Use cygwin's /dev/null as the null-device."
(let ((null-device "/dev/null"))
ad-do-it))
(ad-activate 'grep-compute-defaults)

This fixed rgrep for me. Nothing I seemed to do has gotten grep -r working however, and it still continues to only search the current directory. Supposedly installing ack will make all these problems go away but I haven't gotten it working quite yet. Hope this helps.