m8ta
use https for features.
text: sort by
tags: modified
type: chronology
{1509}
hide / / print
ref: -2002 tags: hashing frequent items count sketch algorithm google date: 03-30-2020 02:04 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Finding frequent items in data streams

  • Notation:
    • S is a data stream, S=q 1,q 2,...,q n S = q_1, q_2, ..., q_n length n.
    • Each object q iO=o 1,...o mq_i \in O = {o_1, ... o_m} That is, there are m total possible objects (e.g. English words).
    • Object o i o_i occurs n in_i times in S. The o no_n are ordered so that n 1n 2n m n_1 \geq n_2 \geq n_m .
  • Task:
    • Given an input stream S, integer k, and real ε\epsilon
    • Output a list of k elements from S such that each element has n i>(1ε)n k n_i \gt (1-\epsilon)n_k .
      • That is, if the ordering is perfect, n in k n_i \geq n_k , with equality on the last element.
  • Algorithm:
    • h 1,...,h th_1, ..., h_t hashes from object q to buckets 1,...,b{1, ..., b}
    • s 1,...,s ts_1, ..., s_t hashes from object q to 1,+1{-1, +1}
    • For each symbol, add it to the 2D hash array by hashing first with h ih_i , then increment that counter with s is_i .
      • The double-hasihing is to reduce the effect of collisions with high-frequency items.
    • When querying for frequency of a object, hash like others, and take the median over i of h i[q]*s i[q] h_i[q] * s_i[q]
    • t=O(log(nδ))t = O(log(\frac{n}{\delta})) where the algorithm fails with at most probability δ\delta
  • Demonstrate proof of convergence / function with Zipfian distributions with varying exponent. (I did not read through this).
  • Also showed that it's possible to compare these hash-counts directly to see what's changed,or importantly if the documents are different.


Mission: Ultra large-scale feature selection using Count-Sketches
  • Task:
    • Given a labeled dataset (X i,y i)(X_i, y_i) for i1,2,...,ni \in {1,2, ..., n} and X i p,y iX_i \in \mathbb{R}^p, y_i \in \mathbb{R}
    • Find the k-sparse feature vector / linear regression for the mean squares problem min||B|| 0=k||yXΒ|| 2 \frac{min}{||B||_0=k} ||y-X\Beta||_2
      • ||B|| 0=k ||B||_0=k counts the non-zero elements in the feature vector.
    • THE number of features pp is so large that a dense Β\Beta cannot be stored in memory. (X is of course sparse).
  • Such data may be from ad click-throughs, or from genomic analyses ...
  • Use the count-sketch algorithm (above) for capturing & continually updating the features for gradient update.
    • That is, treat the stream of gradient updates, in the normal form g i=2λ(y iX iΒ iX t) tX ig_i = 2 \lambda (y_i - X_i \Beta_i X^t)^t X_i , as the semi-continuous time series used above as SS
  • Compare this with greedy thresholding, Iterative hard thresholding (IHT) e.g. throw away gradient information after each batch.
    • This discards small gradients which may be useful for the regression problem.
  • Works better, but not necessarily better than straight feature hashing (FH).
  • Meh.

{1353}
hide / / print
ref: -0 tags: PEDOT electropolymerization electroplating gold TFB borate counterion acetonitrile date: 10-18-2016 07:49 gmt revision:3 [2] [1] [0] [head]

Electrochemical and Optical Properties of the Poly(3,4-ethylenedioxythiophene) Film Electropolymerized in an Aqueous Sodium Dodecyl Sulfate and Lithium Tetrafluoroborate Medium

  • EDOT has a higher oxidation potential than water, which makes polymers electropolymerized from water "poorly defined".
  • Addition of SDS lowers the oxidation potential to 0.76V, below that of EDOT in acetonitrile at 1.1V.
  • " The potential was first switched from open circuit potential to 0.5 V for 100 s before polarizing the electrode to the desired potential. This initial step was to allow double-layer charging of the Au electrode|solution interface, which minimizes the distortion of the polymerization current transient by double-layer capacitance charging.17,18 "
    • Huh, interesting.
  • Plated at 0.82 - 0.84V, 0.03M EDOT conc.
  • 0.1M LiBF4 anion / electrolyte; 0.07M SDS sufactant.
    • This SDS is incorporated into the film, and affects redox reactions as shown in the cyclic voltammagram (fig 4)
      • Doping level 0.36
    • BF4-, in comparison, can be driven out of the film.

Improvement of the Electrosynthesis and Physicochemical Properties of Poly(3,4-ethylenedioxythiophene) Using a Sodium Dodecyl Sulfate Micellar Aqueous Medium

  • "The oxidation potential of thiopene = 1.8V; water = 1.23V.
  • Claim: "The polymer films prepared in micellar medium [SDS] are more stable than those obtained in organic solution as demonstrated by the fact that, when submitted to a great number of redox cycles (n ≈ 50), there is no significant loss of their electroactivity (<10%). These electrochemical properties are accompanied by color changes of the film which turns from blue-black to red-purple upon reduction."
  • Estimate that there is about 21% DS- anions in the PEDOT - SDS films.
    • Cl - was at ~ 7%.
  • I'm still not sure about incorporating soap into the electroplating solution.. !

Electrochemical Synthesis of Poly(3,4-ethylenedioxythiophene) on Steel Electrodes: Properties and Characterization

  • 0.01M EDOT and 0.1M LiClO4 in acetonitrile.
  • Claim excellent adhesion & film properties to 316 SS.
  • Oxidation / electrodeposition at 1.20V; voltages higher than 1.7V resulted in flaky films.

PMID-20715789 Investigation of near ohmic behavior for poly(3,4-ethylenedioxythiophene): a model consistent with systematic variations in polymerization conditions.

  • Again use acetonitrile.
  • 1.3V vs Ag/AgCl electrode.
  • Perchlorate and tetraflouroborate both seemed the best counterions (figure 4).
  • Figure 5: Film was difficult to remove from surface.
    • They did use a polycrystaline Au layer:
    • "The plating process was allowed to run for 1 min (until approximately 100 mC had passed) at a constant potential of 0.3 V versus Ag/AgCl in 50 mM HAuCl4 prepared in 0.1 M NaCl."
  • Claim that the counterions are trapped; not in agreement with the SDS study above.
  • "Conditions for the consistent production of conducting polymer films employing potentiostatic deposition at 1.3 V for 60-90 s have been determined. The optimal concentration of the monomer is 0.0125 M, and that of the counterion is 0.05 M. "

PMID-24576579 '''Improving the performance of poly(3,4-ethylenedioxythiophene) for brain–machine interface applications"

  • Show that TFB (BF4-) is a suitable counterion for EDOT electropolymerization.
  • Comparison is between PEDOT:TFB deposited in an anhydrous acetronitrile solution, and PEDOT:PSS deposited in an aqueous solution.
    • Presumably the PSS brings the EDOT into solution (??).
  • figure 3 is compelling, but long-term, electrodes are not that much better than Au!
    • Maybe we should just palate with that.

PEDOT-modified integrated microelectrodes for the detection of ascorbic acid, dopamine and uric acid

  • Direct comparison of acetonitrile and water solvents for electropolymerization of EDOT.
  • "PEDOT adhesion is best on gold surface due to the strong interactions between gold and sulphur atoms.
  • images/1353_2.pdf
    • Au plating is essential!

{91}
hide / / print
ref: notes-0 tags: perl one-liner svn strip lines count resize date: 03-22-2011 16:37 gmt revision:13 [12] [11] [10] [9] [8] [7] [head]

to remove lines beginning with a question mark (e.g. from subversion)

svn status | perl -nle 'print if !/^?/' 

here's another example, for cleaning up the output of ldd:

ldd kicadocaml.opt | perl -nle '$_ =~ /^(.*?)=>/; print $1 ;' 

and one for counting the lines of non-blank source code:

cat *.ml | perl -e '$n = 0; while ($k = <STDIN>) {if($k =~ /\w+/){$n++;}} print $n . "\n";'

By that metric, kicadocaml (check it out!), which I wrote in the course of learning Ocaml, has about 7500 lines of code.

Here is one for resizing a number of .jpg files in a directory into a thumb/ subdirectory:

ls -lah | perl -nle 'if( $_ =~ /(\w+)\.jpg/){ `convert $1.jpg -resize 25% thumb/$1.jpg`;}'
or, even simpler:
ls *.JPG | perl -nle '`convert $_ -resize 25% thumb/$_`;'

Note that -e command line flag tells perl to evaluate the expression, -n causes the expression to be evaluated once per input line from standard input, and -l puts a line break after every print statement. reference

For replacing charaters in a file, do something like:

cat something |  perl -nle '$_ =~ s/,/\t/g; print $_'

{464}
hide / / print
ref: notes-0 tags: Blackfin perl loopcounters registers ABI application-binary interface gcc assembly date: 10-19-2007 17:24 gmt revision:2 [1] [0] [head]

The problem: I have an interrupt status routine (ISR) which can interrupt the main, radio-servicing routine at any time. To keep the ISR from corrupting the register values of the main routine while it works, these registers must be pushed, and later popped, to the stack. Now, doing this takes time, so I'd prefer to pop / push as few registers as possible. Namely, I don't want to push/pop the hardware loop registers - LC0 (loop counter 0), LB0 (loop bottom 0, where the hardware loop starts) & LT0 (loop top 0, where the hardware loop ends).

Gcc seems to only touch bank 1, never bank 0, so I don't have to save the 3 regs above. However, to make sure, I've written a perl file to examine the assembled code:

my $file = "decompile.asm"; 
open(FH, $file); 
@j = <FH>; 
my $i=0; 
my @badregs = ("LC0", "LB0", "LT0"); 
foreach $reg (@badregs){
	foreach $k (@j){
		if($k =~ /$reg/){
			$i++;
			print "touch register $reg : $k";
		}
	}
}
#tell make if we found problems or not.
if($i>0){
	exit 1;
}else{
	exit 0;
}

'make' looks at the return value perl outputs, as instructed via the makefile (relevant portion below):

headstage.ldr:headstage.dxe
	rm -f *.ldr
	$(LDR) -T BF532 -c headstage.ldr $<
	bfin-elf-objdump -d headstage.dxe > decompile.asm
	perl register_check.pl

if it finds assembly which accesses the 'bad' registers, make fails.

{107}
hide / / print
ref: notes-0 tags: SQL kinarm count date: 0-0-2007 0:0 revision:0 [head]

SELECT file, COUNT(file) FROM info2 WHERE unit>1 AND maxinfo/infoshuf > 10 AND analog < 5 GROUP BY file ORDER BY COUNT(file) DESC

to count the number of files matching the criteria.. and get aggregate frequentist statistics.