GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310

Released Tuesday, 29th April 2025
Good episode? Give it some love!
GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310

Tuesday, 29th April 2025
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

So I'm Wei. My full name

0:02

is actually Wei Dong Yang, but Wei

0:04

is easier to pronounce. And

0:06

I'm CEO for cannabis at

0:08

Data Analytics, a visual data

0:10

analytics company. And

0:12

I love coffee.

0:14

I think the civilization starts with

0:16

the invention of coffee. So I have to

0:19

drink a coffee. I do

0:21

add milk to coffee because the black

0:23

coffee is a little bit too strong

0:25

for me. Welcome back

0:27

to another in the lobs community

0:29

podcast today We are lucky enough

0:31

to have not one but two

0:33

graph experts who have been doing this

0:35

for a very long time. I

0:37

got schooled I felt like I

0:39

learned a ton about how to

0:41

use graphs as Tools and ways

0:43

that we can leverage them better. Let's

0:45

get into this conversation with Paco

0:47

and way as always I'm your

0:49

host Demetrios and You know what

0:52

is a huge help if you

0:54

can hit little review and

0:56

whatever you are listening to this on that

0:58

would mean the world to me boom

1:00

let's jump into it and oh

1:02

yeah if you are one

1:04

of those people that is listening

1:06

on a podcast player I

1:08

have got the recommendation for

1:11

you for our

1:13

music record this

1:15

is thanks to

1:17

one of the people

1:20

in the community, Lee Wells, who

1:22

just joined and now whenever someone

1:24

joins the community, I ask them

1:26

what their favorite music is. Today,

1:29

we're listening to We Are One by Maze.

2:30

We're talking about PII and

2:32

using different methods to anonymize

2:35

data, right? And Paco, you

2:37

had said something that I

2:39

didn't fully understand, and then

2:41

Wei, you said something else that I didn't fully

2:43

understand, so maybe we can rehash that and I

2:45

can understand it the second time. Awesome.

2:50

Well, I was going to ask if you

2:52

all ever came across, there's another podcast

2:54

that I followed called The Dark Money Files,

2:57

and it's people who There's

3:00

a couple of consultants who have worked in

3:02

banks and understand a lot of the

3:04

ins and outs of financial crimes

3:07

and investigations. And

3:10

so I was just gonna preface it because

3:12

they've had a great series recently. If

3:14

you've ever heard of this thing called the

3:16

SAR, it's a suspicious activity report. And

3:18

the laws are really weird depending on

3:21

what country the bank is in. But

3:23

basically this, if you're at a

3:25

bank and you see some suspicious activity, like

3:27

there's a money transfer, and the

3:29

counter party is like a known

3:31

terrorist group or something, you see

3:33

something weird going on. Okay, number

3:35

one, you have an obligation to

3:37

report a crime to a criminal

3:39

investigation unit. If

3:41

you see something suspicious and you don't report it,

3:43

that's a crime. If

3:45

you see something suspicious,

3:48

you have not an obligation,

3:50

but a responsibility to send it up

3:52

the chain so that other financial houses

3:54

might share. If

3:57

you send too much information, you might

3:59

get sued. And then

4:01

so there are these reports and it

4:03

usually costs on average about $50 ,000 to

4:05

process each report. So you don't want to

4:07

generate too many of them. And like

4:09

machine learning models could generate thousands per day,

4:12

which would be like, you know, tens

4:14

of millions of dollars of liability. So this

4:16

whole space of like, what do I

4:18

do? I'm getting, I'm getting attacked. And what

4:20

do I do? Because I mean, also

4:22

these people are taking money and you might

4:24

have to. under some

4:26

situations as a bank, you might

4:28

have to compensate if there

4:30

is some kind of scam. So

4:33

you could be losing money and

4:35

facing like legal threats from three sides.

4:38

And meanwhile, there's this thing called

4:40

a SAR. And like, I've actually been

4:42

yelled at for asking what I was supposed to

4:44

integrate with something. And I was like, can I see

4:46

what the scheme is like? No, you're not allowed

4:48

to, no, it's too confidential. So it's like, it's

4:51

just this whole tangle of worms about

4:53

how to What

4:56

do you actually do with once

4:58

you have evidence of financial

5:00

crime or even suspicion of it?

5:03

What next steps you take are really

5:05

tangled. And I think, Wei Dong,

5:07

you probably have a lot more experience

5:09

about this in certain theaters too. I

5:13

have some similar experiences where

5:15

even the schema is

5:17

not allowed to see because

5:19

the schema may actually

5:21

reveal some... secrets or certain

5:23

activities may become liable to

5:25

certain parties. So that

5:28

can be pretty tricky. And

5:30

so it basically gives

5:32

away information that if you

5:34

were looking at it, you now, because

5:37

you know the schema, you can

5:39

guess a few other. parts of

5:41

this puzzle and get information that

5:43

people don't want out there. The

5:45

banks are using a lot of

5:47

data that come from providers. There

5:50

may be other cases where

5:52

there's data that's coming from,

5:54

say, public sector agencies, crime

5:56

investigations. There may be

5:58

intelligence reports, and so there may be

6:00

parts of the schema that are highly sensitive

6:02

and only certain people are allowed to see.

6:05

But you were saying

6:07

that with graphs...

6:10

anonymizing that PII, you're

6:12

still able to gather insights,

6:14

right? Yeah, that was

6:16

cool. We were just in a talk

6:18

and Brad Corey from Nice Actimize

6:20

was showing where like they're preparing to

6:22

do RAG and they were using

6:24

I think Bedrock and they know that

6:26

they've got a hot potato. They

6:29

know they've got a lot of customer

6:31

PII that just can't go outside

6:33

the bank. So what they

6:35

were doing is substituting PII with

6:37

Unique identifiers, they generate tokens, they

6:39

generate on the fly, and then

6:41

they make the round trip after

6:43

they've run three LLMs and made

6:45

a summary, and they replace

6:47

the tokens with the highly

6:50

confidential material they just

6:52

have internally. And so this

6:54

is a way of being able to use

6:56

some sort of external AI resources, AI resources, but

6:59

still manage a lot

7:01

of data privacy. Yeah,

7:06

I've seen it with we had

7:08

these folks on here from tonic

7:10

AI and they were talking about

7:12

how they would use Basically the

7:14

same Information but swapping it

7:16

out. So if it is

7:18

someone's name, they just changed

7:20

the name So it went

7:22

from Paco to John and if

7:24

it is a social security number,

7:27

they would swap out the social

7:29

security number and totally randomize the

7:31

number But it still is

7:33

a social security number. So you,

7:35

at the end of the day, you

7:38

get almost like this double blind.

7:41

So even if you're a data scientist

7:43

looking at the information, you can understand

7:45

it. But you don't

7:47

know if it is the

7:49

true information that's going to reveal

7:51

that PII. Interesting.

7:55

Interesting, yeah. Although

7:58

I do see situations where Even

8:02

the structured the document itself

8:04

is gets revealed it revealed

8:06

Information that you do not

8:08

want people to know like

8:10

you reviewed it like in

8:13

the investigation space very often

8:15

you do not want people

8:17

being investigated know that being

8:19

investigated But certainly information even

8:21

the structure is you structure

8:23

the document being reviewed can

8:25

become a problem So so

8:28

part by some point I

8:30

felt like the

8:32

in -house on -prem

8:34

LLM might

8:36

be necessary, especially

8:39

just red news

8:41

that the M3

8:43

Ultra Studio with

8:45

the 500GB RAM,

8:47

Karan large -dunk

8:49

remodels at 20

8:51

tokens per second,

8:53

that could potentially

8:55

be an interesting

8:58

solution for that. Yeah,

9:00

I mean for for our end

9:03

use cases, you know like 60

9:05

% of those are air -gapped and

9:07

so Yeah, you know the largest

9:09

chunk of that they're they're gonna

9:11

be a lot of like public

9:14

public sector agencies running in skiffs

9:16

So they they can't do any

9:18

data out. Yeah And

9:21

there's good news for running

9:23

really interesting LLMs on local

9:25

hardware. There's a lot of

9:28

really good news. I will

9:30

shout out to my friends

9:32

over at Useful Sensors, Pete

9:35

Wharton and company, I'll put that

9:37

in the chat. You

9:39

can do a lot with local hardware.

9:42

What are they doing? Useful

9:45

sensors, so Pete

9:47

Wharton. and

9:50

Mentoreth Kudler, they were part

9:53

of the TensorFlow team at

9:55

Google. And for

9:57

I think like eight years, they

9:59

evangelized use of deep learning inside

10:01

of products at Google, like internally. And

10:04

then they left and the team, has

10:06

a startup in Mountain View now. And what

10:08

they're showing is, hey,

10:10

here's like $50 worth of hardware. Here's

10:13

an ARM chip with a neural network

10:16

accelerator on it. And we can run 3L

10:18

LMS on battery power. So

10:20

it's pretty cool because they

10:22

came out of like the tiny

10:24

ML, I don't know if you've ever seen the

10:26

conference. Oh, yeah. And

10:29

so. You know,

10:31

this is a lot of the

10:33

specialty that Pete has. And,

10:35

you know, he

10:37

was on the CUDA

10:39

team at NVIDIA before. So,

10:42

I mean, these folks really

10:45

know how to make AI infrastructure

10:47

run on hardware, and particularly

10:49

how to handle a lot of

10:51

low power and low latency

10:53

kinds of situations, and

10:55

where to punch through the

10:57

bottlenecks. You don't necessarily have

10:59

to have a ginormous GPU

11:01

cluster, although in some cases

11:03

it helps. But especially when you're

11:05

running inference, you can be running on much

11:08

lower power and doing really interesting things out

11:10

in the field. So wild

11:12

now, I know that

11:14

we had originally wanted

11:16

to chat a bit

11:18

about this idea that

11:20

I think way you

11:22

had proposed and it's a

11:24

little bit of a

11:26

a differentiation on Graphrag

11:28

and so maybe you

11:30

can set the scene for

11:33

us because yeah, I

11:35

want to go deeper

11:37

there Yeah I

11:39

run in danger

11:41

of pulling way

11:44

far. Fundamentally,

11:48

I think with LLM,

11:51

whole machine processing information has

11:53

changed. Before

11:55

LLM, everything is

11:57

exact, symbolic, like

12:00

matching all the APIs, all

12:02

the rigid data

12:04

structures. Just think about...

12:08

deep blue when beat the

12:10

chest, everything is rigid

12:12

knowledge as rules and

12:14

things. L .O .M.

12:17

changed everything because L .O .M. started

12:19

to understand things in the

12:21

contextual base, started to

12:23

understand fuzzy things. And

12:25

it suffers the same

12:27

weakness of a human

12:29

being, not exact, like

12:31

we glide over information,

12:33

we draw conclusions, we

12:35

make lips. make

12:37

jumps. But at

12:39

the same time, LLM's

12:42

ability to reason like

12:44

humans, that for me

12:46

is fundamentally changed how

12:48

we approach the computing. And

12:52

so in

12:54

applying LLM to

12:56

analyze documents, my

13:00

analysis is now we

13:02

can let LLM work more

13:04

like humans. rather than

13:06

like machine, we understand in

13:08

the past, that also

13:11

implies what the data structure

13:13

is preferred for LLM, which

13:15

I would argue that

13:18

a data structure, a data

13:20

management that preserves as

13:22

much contextual information as possible,

13:24

preserves as much nuance

13:26

as possible, that the

13:29

subtle nuances may come out

13:31

to be important. So

13:34

so I use the example

13:36

of my wife is Brazilian

13:38

the American tourist to Brazil

13:40

gets invited to a horse

13:42

party says the party start

13:44

at 6 p .m. So

13:46

so as a good American

13:49

guy show up promptly on

13:51

time at 6 p .m. And

13:53

the hostess comes out still

13:55

wrapped in the shower shower

13:57

tower and totally confused and

13:59

So right And the turnout

14:01

over there when this is

14:03

6pm is where the hostess

14:05

start thinking about the party,

14:07

start like going out shopping,

14:09

preparing food and getting ready.

14:12

And the people usually don't show up until like

14:14

two or three hours later. And

14:16

a bad culture difference. Yeah,

14:19

if we try to capture

14:21

that in a knowledge graph, what

14:24

kind of construct allows us

14:26

to capture those subtle cultural

14:28

nuances there? And that might

14:30

become important in understanding the

14:32

document later. So I think

14:34

that's the challenge. Yeah. Parkoo,

14:37

you want to add something there? Let's

14:39

hear what you think. Well,

14:42

from a perspective of natural

14:44

language, something that the models

14:46

bring in, but it's kind of a

14:48

nuance and I don't think it's talked

14:51

about a lot. There's a very recursive

14:53

nature to how we as people talk

14:55

with each other and tell stories and

14:57

share information. We do reference

14:59

it in the sense of like going down

15:01

the rabbit hole. Like if you follow

15:03

a thread too far, you're kind of going

15:05

down the rabbit hole. And there's this very recursive

15:07

nature of how we think and especially how

15:10

we express. It certainly comes across

15:12

in written language, although we tend

15:14

to think of written language as

15:16

something linear. There's paragraphs and sentences,

15:18

and it can all be diagrammed.

15:20

But when you look at the

15:22

actual references that are inside of

15:24

those sentences, they're making recursive calls

15:26

throughout a story, throughout

15:28

somebody's speech or throughout

15:30

a book. And

15:33

we can try to linearize that and

15:35

come up with an index or a bibliography,

15:37

but at the end the day, it's

15:39

a graph. And you get this

15:41

very self -referential thing in any text. And

15:43

this is something that the LLMs have

15:45

really, I think, pulled out. And

15:48

we were also just part

15:50

of the talk we were

15:52

in, also Tom Smoker from

15:54

Why How is Showing about

15:57

how they leverage ontology, they

15:59

leverage schema, and chase after

16:01

information recursively. So

16:04

that's just another kind of view

16:06

on this, but I, way

16:09

I love how you all

16:11

are approaching this. You have a

16:13

very powerful view of kind

16:15

of relaxing the constraints upfront, but

16:17

then having the context propagated

16:19

through. I realized there's an important

16:21

philosophical approach difference between East

16:23

and the West. And

16:26

the Eastern philosophy very much drive

16:28

towards the nature of things. And

16:30

it's important, which is that

16:32

that's very curiosity about nature of

16:34

things, that the desire to

16:36

have a definitive definition of nature

16:39

of something is led to

16:41

the great scientific discovery over the

16:43

past several hundred years. The

16:46

Eastern philosophy very much on

16:48

the outside is focused on the

16:50

contextual, focused on shifting, changing

16:52

nature of things. Like

16:54

the Chinese Bible, the Daoism, the

16:56

Bible, Dao De Jing, the

16:58

first verse it says Dao Ke

17:00

Dao Fei Chang Dao means

17:02

if you name something, you get

17:04

it wrong. Or it's

17:06

not permanent. It's

17:08

really focused on impermanence of things.

17:10

It focuses on everything changes

17:12

nature in context with other things.

17:15

So that is essentially a

17:17

graph. Now,

17:19

you're putting both things together.

17:22

So, okay, I have to

17:24

say that that attitude towards

17:26

like, oh, everything changes. Thus,

17:28

we cannot see anything. Thus,

17:30

everything is fuzzy, is very

17:32

much contribute to the Chinese

17:34

technology science developed very far

17:37

in about a thousand years

17:39

ago and stalled. And

17:41

a lot of its attribute to this

17:43

like philosophical like

17:45

things like reduce a lot

17:47

of curiosity and drive down deeper

17:49

into the nature of things.

17:52

However, in practical things, there's

17:54

some practical application of that

17:56

approach, which in today with

17:58

LAM and graph, we really

18:01

see that it's like a

18:03

great combination of you allow

18:05

certain things to be drilled

18:07

down to be very definitively

18:09

defined, to be clearly defined

18:11

within the context. But

18:13

a lot of information,

18:15

contextual information, stay fuzzy.

18:19

So in fact, I feel

18:21

like I'm really excited about

18:23

integrating sensing and our graphics

18:25

are a kind of a

18:27

solution together because the sensing

18:30

helps to drive this definitive

18:32

part. Once you have

18:34

the definitive part, drill it

18:36

down. named, defined, it really

18:38

speeds up to make a

18:40

lot of assessment fast, definitive,

18:43

and precise, which is crucially

18:45

important. But on the

18:47

other hand, you allow

18:49

this loose structure of information

18:51

decomposed as a graph

18:53

that you can easily retrieve.

18:56

and without losing the nuances, the

18:58

subtleties, like in the in the

19:00

in the cultural differences, things like

19:03

you still preserve that. So don't

19:05

think come together. I feeling is

19:07

the one how you how you

19:09

want to grant LLM to protocol

19:11

to create a precise accurate and

19:13

no the limit, no when it

19:16

does not know not to make

19:18

a judgment. I think that's also

19:20

very very important. So in my

19:22

mind is the graph and AI

19:24

right now is present opportunity to

19:26

allow this Western way of Drive

19:28

to the nature of things and

19:31

Eastern way of focus on the

19:33

contextual information Come together to work

19:35

together to solve practical problems So

19:37

so very well said and you

19:39

know the the challenge we face

19:41

is we don't really know what

19:43

the downstream application will be Like

19:46

we're doing investigation. We're doing some

19:48

kind of discovery whether you're trying

19:50

to find you know Money launderers

19:52

or whether you're trying to find

19:54

you know, who's my best customer

19:56

for this hotel? It's a discovery

19:59

process and by nature of discovery

20:01

You don't know what the answers

20:03

are in fact in a complex

20:05

system. You don't even know where

20:07

or how just you know, it's

20:09

unknown unknowns, right? so by

20:11

preserving that context then you are

20:14

sort of fortifying yourself so

20:16

that When the time presents, you'll

20:18

be able to make the

20:20

right discoveries. You won't

20:22

have cut them off in advance. I

20:24

think if you go back to

20:26

before relational databases came out, you

20:28

go back to some of the

20:30

earlier writings from Ted Codd,

20:32

and one of his colleagues was

20:35

William Kent, who did... a book

20:37

called Data and Reality. If

20:39

you go back to some of

20:41

the early like 1970s thinking about data

20:43

management, it's really interesting to see

20:45

where the lines are drawn because in

20:47

this Western view, so much

20:49

of data management was about, let's

20:52

have a data warehouse, let's

20:54

pretty much throw away the relationships, let's

20:56

focus on the facts. We

20:58

have a lot of, as we were saying,

21:00

a very Western view of like, I

21:02

just want to know like millions of facts

21:04

and I will piece them together with

21:06

a query. I'm not, yeah, I'm not really

21:08

interested in preserving the context. So, I

21:10

mean, I think we have a long history

21:12

from like data warehousing of going too

21:14

far on the Western side. Well,

21:18

what is interesting to me

21:20

is the conversation that we

21:22

had with Robert Caulk on

21:24

here probably three months ago,

21:26

and how he said, we've

21:29

completely thrown out ontologies. And

21:31

for his specific use case, That

21:33

isn't the way that they wanted

21:35

to go. And I

21:37

wonder if you guys have thought

21:40

through that and what that looks

21:42

like, what the benefits are, and

21:45

is it one of these

21:47

things where you potentially are experimenting

21:49

on those levels too? In

21:51

my perspective, ontology is important, but

21:54

you have to know the boundaries. I

21:57

give a parallel into all the

21:59

theory in the physics theory, like Newton's

22:01

law. Newton's law is

22:03

important. It captures important truth

22:05

in the nature. However,

22:09

just like any physicist,

22:11

any physicist's theories, the

22:13

moment when the theory is as proposed, it's

22:16

a very important fact. Important concept

22:18

is you're waiting to be disapproved.

22:21

So you never accept as

22:23

the truth of everything. You

22:25

have a theory. Park

22:27

was a math scientist, so I think

22:29

he's also very familiar with the

22:31

concept. When you propose a theory, be

22:33

test true, but you're always

22:35

looking for situations, looking for the

22:37

boundaries where the theory will stop

22:39

to be true. So I

22:41

don't think ontology is

22:44

anything different. It's just like

22:46

ontology needs to be

22:48

very well -grounded. The contextual

22:50

context needs to be defined.

22:52

And within this context,

22:54

this ontology knowledge

22:57

is real. It's truth.

23:00

The problem I see as a

23:02

lot of traditional knowledge graph

23:04

approach is people ignore the fact

23:06

that ontology has to be

23:08

confined within a specific domain. The

23:11

moment you step out of the domain,

23:13

you have problem. But

23:15

the other thing is, we think

23:17

this domain ontology is fantastic. It

23:20

helps you to solve problems

23:22

so much faster, so much precise.

23:25

But again, as long as you

23:28

can define the boundaries, define the

23:30

domains, it's great. What

23:35

Rob Kolk and Ellen Tornquist

23:37

and others at Ask News,

23:39

what they're doing is they're

23:41

looking at news sources, especially

23:43

regional news sources across the

23:46

world, and they

23:48

really are finding hard

23:50

evidence, groundbreaking

23:52

evidence on the ground, literally,

23:54

if you're doing ESG. uh work and

23:57

you're trying to do diligence on a

23:59

company or a set of suppliers and

24:01

you want to find out like what

24:03

are their operations really like over in

24:05

that other country where they're based and

24:07

then you find out they're engaged in

24:09

like I don't know child labor or

24:11

something and you know you you want

24:13

to make other arrangements before your shareholders

24:15

find out um so I think with

24:17

Ask News you know they're out and

24:19

they're looking they're working with those publishers

24:22

and they're they're collecting that news and

24:24

representing it in a graph And

24:27

yeah, as

24:29

you were saying, I mean, an

24:31

ontology, ontologies really don't work across

24:33

domains. You really want to focus

24:36

more on like closed world within

24:38

a domain, having a

24:40

full enterprise wide ontology, nice

24:42

idea, but I rarely see it work. And

24:45

I think that in the case

24:47

of like understanding news reports in

24:49

the world, you don't know what

24:51

the domain is in advance. You

24:53

only know this is what is being

24:55

published. And so I think

24:57

by relaxing that constraint at Ask News,

25:00

they're able to come up with a

25:02

graph of like, here are things that

25:04

are related. You can follow this evidence

25:06

and you can find more historically about

25:08

this area. I

25:10

think those are very important, but

25:12

ultimately it will be shaped

25:14

by some kind of context, some

25:16

type of shared definitions. And

25:19

ontology is really more about sharing definitions

25:21

and making sure we're you know, describing the

25:23

same thing because I swear, you go

25:25

to a big company, use the word customer

25:27

in front of one VP, you

25:29

know, in sales, it means something different

25:31

to like the VP in charge of

25:33

procurement. So even like the

25:35

words themselves don't cross domains. The

25:38

graph is Basically our idea that we

25:40

know that there's connections like if you if

25:43

you do have your your operations data

25:45

But then you also have your your like

25:47

sales data, you know, there's some connections

25:49

across there It's not exactly the same but

25:51

some stuff is connecting so graphs show

25:53

where those connections are But I think you

25:55

know think about like the example of

25:57

Google Maps like there's different levels of detail

25:59

and of course any video game of

26:02

course has this too but you know if

26:04

you're taking satellite data and like trying

26:06

to stitch together a map you zoom in

26:08

you can see the beach and you

26:10

zoom in you see the car tracks and

26:12

you zoom in further at some point

26:14

you're gonna get to pixels right yeah and

26:16

you zoom out and maybe you see

26:18

this landscape of like a beach next to

26:21

the ocean but then probably you zoom

26:23

out at some level and they've got like

26:25

the name of the beach Right. So

26:27

there's like a high level detail. I think

26:29

graphs are much the same. There are

26:31

connections at the low level, like Ask News

26:33

is saying is like, you

26:35

know, here's a reporting from Zimbabwe.

26:38

This is like the reporters on the ground.

26:41

But then you zoom out and you're like,

26:43

okay, well, you know, what impact does

26:45

this have on our supply network? Do

26:47

we have to really make different plans? Is there

26:49

going to be like a war breaking out that

26:51

causes, you know, all those shipping containers to be

26:53

delayed by three months? I

26:56

think at some level you need

26:58

to think of the graphs as

27:00

sort of collecting higher and higher

27:02

into more abstracted, more refined

27:04

concepts, if you will. And

27:06

so the stuff at the low level is kind

27:08

of like, let's see how it all fits together. The

27:11

stuff at a higher level, it's like, oh, actually,

27:13

we can maybe do some inference on this, or we

27:15

can use this to help structure other data that

27:17

we're going to piece together. So,

27:21

Demetrius, you actually touched

27:23

up a really big subject

27:25

that thinks... Now, in

27:28

the exploratory

27:30

process, it's combined

27:33

with the questions. Knowing what

27:35

question to ask often is 80

27:37

-90 % of the work. So,

27:40

a prescribed thing to

27:42

give you the answer often

27:44

meets the point, or

27:46

meets the important subtleties. But

27:49

the problem is how

27:51

do you discover the question

27:53

you need to ask?

27:55

And so in the way

27:57

that our perception, our

27:59

visual perception, our brain is

28:01

a fantastic... I don't want to

28:03

call it a machine, or

28:05

I don't want to even call

28:07

it a tool, but has

28:09

this great power of see patterns

28:11

in the information. Like

28:13

we look out in the sky,

28:15

we see the cloud, we

28:18

have some... we have some kind

28:20

of, like you are a

28:22

performer, I look at your performance,

28:24

your dance, like the information

28:26

being expressed without being able to

28:28

verbalize it, to define it,

28:30

but you have to watch it

28:32

to feel that. Maybe

28:34

you watch it long enough, you stop be able to

28:36

describe it, you stop be able to say, oh,

28:38

this is, some things

28:40

is there. So in a way

28:42

that what the graph does is

28:45

the graph is a fantastic medium

28:47

for visualization. You look

28:49

at the information express it just

28:51

like how I will bring like

28:53

when we think about you Dimitrius

28:55

I immediately think about Paco because

28:57

we in the same part in

29:00

room together so that's association. So

29:02

this association of

29:04

multiple pieces of information entities

29:06

in the space, if you

29:08

visualize effectively, it helps you

29:10

to see the patterns, help

29:13

you to see all the

29:15

missing links, missing patterns, things

29:17

that get our attention. And

29:19

then we start be able

29:21

to formulate the question, to

29:24

formulate, to

29:26

answer the question. More

29:30

than a tabular data

29:32

structure, I have to say,

29:34

the graph really helps

29:36

us to engage our brain

29:39

in this way, to

29:41

spot important information. Just go

29:43

watch a dance performance. You

29:46

see something definitive

29:48

happening, but you

29:50

know it before

29:52

you engage your language

29:54

or ecological thinking. Afterwards,

29:58

things, concepts start to form,

30:00

and then you can start to

30:02

build things around it. Oh, dude.

30:06

How cool is that? You know

30:08

it before you can express it

30:10

in that way. Absolutely. I

30:12

think a lot of analytics workflow

30:14

is work the other way around. We

30:16

focus so much on building up

30:18

the queries, build up

30:20

the programs to

30:23

drive it. to

30:25

drive the answer.

30:29

But as Parkoon and we

30:31

in the investigative space,

30:33

we all know that too

30:35

often getting the hint

30:37

is 80 % of work. Like

30:41

if you know that you're being

30:43

attacked, you know that they came in

30:45

through some vector, there's probably some

30:47

set of machines that are compromised. You're

30:50

not seeing that. You're seeing where you

30:52

know, the bad things are happening, stuff

30:54

is being stolen or whatever. So

30:57

looking across your network, just building up a

30:59

graph of like the associations of what's happening

31:01

during an attack, there's some placeholders. There are

31:03

definite questions that could be generated like, which

31:05

machine was compromised? Maybe I should fix that.

31:07

So I think from the operational perspective, you

31:09

know, I mean, you kind of have to

31:11

think of, I mean, we do think about

31:13

that, right? We do think about like, how

31:16

do we identify those unknowns? But the

31:18

problem is that the more complex

31:20

the problem becomes the more that

31:22

those Unknowns are not something that

31:24

can really be charted. They have

31:26

to be sort of poked at

31:28

and explored Yeah, and I think

31:30

that's why way what you're saying

31:33

with the graph being this visual

31:35

medium that we can poke at

31:37

and we can explore and It

31:39

gives us a different perspective with

31:41

which we can work with and

31:43

wrestle with the data is some

31:45

something that I hadn't heard before,

31:47

but it makes complete sense. From

31:50

a historical perspective, in terms of

31:52

data, you know, something to bring

31:54

out would be to consider about spreadsheets,

31:56

because like spreadsheets are sort of my

31:58

go -to example. This is all in

32:00

tabular form. It's very, very sort of,

32:03

you know, left brain. Everything is

32:05

very buttoned down. But the thing about spreadsheets

32:07

that you never see is there is a really

32:09

complex graph behind it, and it only works

32:11

because of that. But they never

32:13

show that. They just show the tabular

32:15

part. But all the real knowledge and

32:17

dynamics and all the real information you're

32:19

capturing a spreadsheet is about those different

32:21

dependencies and how that graph functions. Classic.

32:25

Of course we don't see it, because that

32:27

would be absolute chaos for us. Mind

32:29

blown. The graph is

32:31

this front -text media for

32:33

this perceptive thinking. Well,

32:36

the challenge is like, when

32:38

we talk about graph, I think

32:40

that we need to really

32:42

really like the separate two things.

32:45

Graph in the media of information

32:47

capture and the graph in the

32:49

media to help us to

32:51

think. There are two different

32:53

things. Graph as information capture, the

32:56

sole purpose is to capture

32:58

information as precise as possible,

33:00

as complete as possible. You

33:02

want to capture as much

33:04

truth as possible. However,

33:07

graph as a way of thinking,

33:10

If you take the raw

33:12

graph captured, preserve a

33:14

lot of truth, well, the

33:16

problem is we can only

33:18

hold seven piece information I will

33:20

bring at any given moment.

33:23

We'll be overwhelmed by all those

33:25

graphs. If we think about

33:27

our brain, in that

33:29

way, even the vector

33:31

embedding, I call it an implicit

33:33

graph, because vector embedding gave

33:35

you a medium to compute the

33:37

similarity. Effectively, you

33:40

can construct a graph. Exactly.

33:43

You can manifest a graph

33:45

out of it. So

33:48

you will see that the

33:50

graph being captured at the layer,

33:52

at the stage that's really

33:54

designed to preserve the ground truth,

33:57

as much truth as possible. But

34:00

then you need a way

34:02

to work the data into

34:04

a form that we can

34:06

easily digest with our perceptive

34:08

power. That is a challenge.

34:10

This is also why, in

34:12

my mind, there is a lot of

34:14

graphs. In theory, people

34:16

know the graph is how we

34:18

think. Thus, it's important.

34:21

But in practice, that is

34:23

a barrier. And how do

34:25

you reconcile the need between

34:27

graphs as information capture medium

34:30

and the graph to support

34:32

our perceptive thinking medium? It's

34:34

a very different thing. just

34:39

going back to what you

34:41

were saying with, we can relate

34:43

each other because we're on

34:46

this podcast together. We've done stuff

34:48

together. Maybe there's certain things

34:50

that come up in our memories

34:52

that are going to be

34:54

the most pertinent to that graph

34:56

that we have in our

34:58

head, but it's never going to

35:01

expand more than seven hops

35:03

or seven different parts of that

35:05

graph. Have you

35:07

ever have your worked with there's

35:09

like a kind of I guess

35:11

rubric might be a way to

35:14

say it came out of Carnegie

35:16

Mellon out of CMU Jeanette Wing

35:18

had this idea of What's called

35:20

computational thinking? And so it's

35:22

sort of like a four -step process

35:24

of like breaking down a problem

35:26

and then being abstracted back out It's

35:29

really powerful and I've used a

35:31

lot in courses teaching people but I

35:33

think that there there may be

35:35

something Kind of emerging as

35:37

like graph thinking and so just

35:39

to throw out like a straw

35:41

man here This is kind of

35:43

thinking out loud, but one of

35:46

the things that we see in

35:48

like fin crime in financial investigations

35:50

is a kind of graph thinking

35:52

a four step process repeated over

35:54

and over where you know,

35:56

you you do your best to build

35:58

out this graph and it might have hundreds

36:00

of millions of nodes or billions of

36:02

nodes or some ginormous number, something beyond human

36:04

scale beyond beyond human comprehension. But

36:07

then step two, partition. So

36:10

like, can we break out this

36:12

enormous graph into some areas of

36:14

subgraphs of patterns that are interesting?

36:16

Like, hey, this this looks like

36:19

a really good customer or hey,

36:21

this looks like a money mule.

36:24

you know, fraud scheme. And

36:27

so you go, you do this dimensional

36:29

reduction then because you go from like five

36:31

billion nodes in a graph down to

36:33

maybe 10 or 20 that are interesting. And

36:36

so that's like, there are graph algorithms

36:38

like Louvain or like, you know, weekly commit

36:40

connecting components or there are different ways

36:42

to get down to that scale. And

36:45

in like in machine learning in general,

36:47

we're looking a lot of dimensional reduction,

36:49

right? So, Once

36:51

you've got down to that scale now

36:53

you can use other graph algorithms like

36:55

maybe between a centrality or different forms

36:57

of centrality to understand how are these

36:59

parts connected and Gosh, maybe there's like

37:01

one node in there who's orchestrating the

37:03

whole crime ring Which typically case there

37:06

might be like a person with a

37:08

bunch of shell companies, right? And they're

37:10

doing fraud So that's step three is

37:12

like leveraging certain types of graph algorithms

37:14

to sort of think of page rank

37:16

Let's bubble up to the top the

37:18

parts that are probably first

37:20

good steps to investigate. And

37:23

then step four, put

37:25

it through a work process.

37:27

And I mean, if you're working with people

37:29

in a bank, put it through case management

37:31

tools, you know, a level

37:33

A analyst gets assigned it, they go

37:36

and they start poking around the graph, they

37:38

do something interactive, they work with the

37:40

visualization, and they apply what they've learned. Or

37:43

you may have some agents involved

37:45

there too to help like summarize

37:47

and and and dig up part,

37:49

but it's a workflow So it's

37:51

kind of a four -step process

37:53

of sort of graph thinking if

37:55

you will that can be applied

37:57

and can integrate people and also

37:59

AI technology together Yeah, I want

38:02

to add one more thing to

38:04

Paco said it's really really important

38:06

to be able to narrow it

38:08

down to be a loop identified

38:10

things to reduce, reduce, reduce. But

38:12

there's also another aspect

38:14

which is a simplification

38:17

abstraction. Like very

38:19

often when you capture the data, you

38:21

don't really like the domain or

38:23

you don't need to know the future

38:25

question. So the domain is

38:27

wide. But we look for

38:29

the information and so the domain

38:31

is narrowed. When domain is narrowed,

38:33

for example, like I call Paco

38:35

as a math scientist, at some

38:37

point I can just refer Parko

38:40

as a math scientist. I don't

38:42

need to add information because math

38:44

scientist is Parko. And

38:46

that only valid in the

38:48

specific domain. So

38:50

the reason I say that is

38:52

because a lot of information

38:54

when you domain wide, like

38:57

I call it when

38:59

you capture information, I

39:01

prefer a pure edge

39:03

approach. Like in the

39:05

graph, edge has no

39:07

properties it's just edge it's

39:09

just association anything you need the

39:11

property means the things you

39:13

may need may need to amend

39:15

it up on maybe you have

39:17

something pointed to it or pointed

39:19

out to it you keep

39:21

it as a node now as

39:23

you're thinking very often like I

39:26

know Paco but I know Paco

39:28

this relationship I can carry a

39:30

lot of context in it already

39:32

I don't need additional information to

39:34

to show, to tell how

39:36

I know Pako, it just can

39:38

be in there. I know Pako

39:41

itself is sufficient. So

39:43

what that means is when

39:45

we present like I know Pako

39:47

that relationship as a single

39:49

relationship, right? In

39:51

the data layer, there might be

39:53

a tons, thousands or tens

39:56

of thousands piece of information there,

39:58

but it come out as

40:00

a one single piece of concise

40:02

information. I think

40:04

that is where I

40:06

think an analytic workflow

40:08

or visual analytic workflow

40:10

should be, is to

40:12

be able to go

40:14

from a very detailed,

40:16

broad, big, large information,

40:18

distill or aggregate down

40:20

to a simple representation,

40:22

but is grounded in

40:24

that particular domain, in

40:26

that particular context, so

40:29

for us to, so we can

40:31

communicate. We can communicate in

40:33

simple language rather than carry a

40:35

lot of information when we

40:37

had to. I know

40:39

Paco, that's it. We

40:41

don't need to know how we know each

40:43

other, where do we know each other

40:45

in certain contexts. Is

40:47

it almost like the data

40:49

underneath is like an

40:51

iceberg in a way and

40:53

you knowing Paco is

40:55

like the tip of the

40:57

iceberg. You have that one. Piece

41:00

of information, but then if you

41:02

wanted to get more granular you

41:04

can go down and see the

41:06

whole iceberg Yes way, could we

41:09

could we say then that? You

41:11

know, we pull everything. We connect

41:13

everything together. It's very noisy. We

41:15

can go up different levels of

41:17

abstraction. But to your point then,

41:19

we're going up levels of abstraction

41:21

in particular domains, like for purpose.

41:24

So we have some shared definitions.

41:27

And then we can start to

41:29

say, OK, now let's do our

41:31

Louvain partitioning or whatever. Then we

41:33

start to drill down into subgraphs.

41:35

It's like maybe a five -step process.

41:37

Yeah, even with Loving

41:40

community calculation or any

41:42

centrality calculation, the graph

41:44

has to be simple.

41:46

Because very often, I

41:49

think the graph we

41:51

talk about is I

41:53

call it the multi

41:55

-domain graph. It has

41:57

different type of information

42:00

in one graph. So

42:02

computing a centrality

42:05

in that kind of a

42:07

hypergraph as a hypergraph

42:09

is very challenging or what

42:11

does it mean as

42:14

a result if you mix

42:16

human and the emails

42:18

and it's difficult. So that

42:20

process itself to me

42:22

is we already need to

42:24

prepare our transform our

42:26

graph data into a form

42:28

that is suitable for

42:31

that centrality computation. Very often

42:33

like you have to

42:35

already project into a specific

42:37

domain for that computation

42:39

to happen. Very

42:41

good. That's what

42:43

I was thinking is like the

42:46

data that you have only becomes

42:48

relevant once you've narrowed it down

42:50

in a certain way and you're

42:52

looking at a certain plane of

42:54

that domain and you say, okay,

42:56

now we're going to be focusing

42:59

in on this plane. That's

43:01

when certain nodes

43:04

and certain data and certain

43:06

connections become relevant because

43:08

you're looking at that layer

43:10

almost in my head

43:12

if I visualize it. And

43:14

we're talking about that

43:16

Google Maps example again, you're

43:18

diving deeper and deeper

43:20

and you see different structures

43:23

depending on the layer

43:25

that you're looking at. And

43:29

and this fits very well with

43:31

like did a mesh kinds of concepts,

43:33

you know Jean McDogone Talking about

43:36

how different domains share you have to

43:38

abstract you have to come up

43:40

with the relations I think chat also

43:42

has the idea of like contracts,

43:44

you know where you have relations across

43:46

domains So you share some definitions

43:48

you have to you have to condense

43:50

down to that level before you

43:53

can go across domain so Yeah,

43:55

if we use the domains in

43:57

an organization to kind of guide when

43:59

and where and how do we

44:01

condense down, then we can

44:03

really take advantage of this

44:05

kind of abstraction. But it's

44:07

almost like I realized after

44:09

I said it, there's

44:11

two vectors or there's two

44:14

dimensions that you are

44:16

looking at when you are

44:18

zooming in or zooming

44:20

out because you're playing on

44:22

the field of

44:24

granularity, but you're also playing

44:26

on the field of the domain

44:28

and what is relevant in

44:30

that domain. So if we have

44:33

that X and Y axis,

44:35

you can get more granular inside

44:37

of the domain, but then

44:39

you can also just go on

44:41

the X axis and change

44:43

domains. And so that, like a

44:45

kaleidoscope, when you turn it,

44:47

you see a whole different set

44:50

of relations. Yeah,

44:54

and I mean in an enterprise context

44:56

this gets really bizarre because you know

44:58

you The people in the domains that

45:00

you depend on may not even know

45:02

that you're out there You know, you

45:04

may be consuming from some log files

45:06

from another application that are like totally

45:08

driving your product So like can we

45:10

have some sort of contract so that

45:12

we know about each other? But

45:15

yeah scooting across the domains.

45:17

That's the that's the key

45:19

challenge to like leveraging these

45:21

kinds of technologies because usually

45:24

You are in a particular domain when

45:26

you're making those decisions, but for most

45:28

applications you have to combine a couple

45:30

domains, right? So it's usually

45:33

like there's something interesting going

45:35

on between like sales and

45:37

procurement or or sales and

45:39

marketing or or you know

45:41

some other business unit So

45:43

usually oftentimes you will have

45:45

to combine and do you

45:47

then try and create Two

45:51

different graphs that are connected to each

45:53

other, or is it one larger graph?

45:55

How do you look at it in

45:57

that regard? Federation

45:59

sounds good. I think trying to

46:01

have one ginormous graph is usually...

46:03

weird. And those projects

46:06

usually don't ever end. But

46:08

federating and being able to go across

46:10

domains and say, okay, over there, let me,

46:12

let me send you something. I'd

46:15

like to know what you can,

46:17

what results can you bring back? So

46:19

are you making a prompt in

46:21

Graphrag across a different domain? Are you

46:23

making a query running some algorithm,

46:25

whatever? There's some kind of information transfer,

46:27

but federation. Yeah,

46:30

I can talk

46:32

about a couple my

46:34

personal experience. First,

46:38

bring information to graph is

46:40

a step forward, a

46:42

step up. Because

46:44

information as a tabular format,

46:46

it needs to be confined

46:48

to a very specific definitions

46:50

as pretty narrow domain. Graph

46:53

is, there's one example, I

46:55

look at the US flight

46:57

record. You can download it

46:59

from Department of Transportation. They

47:01

release every two weeks after. The

47:04

damn thing has 140

47:06

columns, I think. Really,

47:09

really wide. And

47:11

the reason is because the

47:13

flight may get diverted. Whenever

47:16

the flight gets diverted, you

47:18

add about 10, 15 columns

47:20

of information. So then

47:22

you need to capture that the flight

47:25

may be diverted more than once. twice

47:28

is that enough? No, three

47:30

times. Three is not no,

47:32

some is four times. So they

47:34

actually have five diversions. But

47:36

if you have six times too

47:39

bad, it cannot exist. So

47:41

that's the limits of

47:43

tabular format in the

47:45

information capture. With

47:47

the graph, it relaxes a lot. Naturally,

47:50

you can have a thousand diversions.

47:52

I don't care. You

47:54

can just like the graph

47:56

can keep a mind into

47:58

it. So that is really,

48:00

really a big improvement with

48:02

the graph to allow you

48:04

to have a lot more

48:06

flexibility in capturing the information. And

48:09

the other thing is like

48:11

very often in the tablet

48:13

format, it's very difficult to

48:16

check the mismatch. We

48:18

have example of bringing

48:20

two dataset manager from two

48:22

or three different departments

48:24

in the same organizations. Everybody

48:27

know other person's data

48:29

has a problem, but

48:31

you can't force other people to

48:33

fix it. But with the

48:35

graph, when you bring things together,

48:37

you immediately see the mismatches. And

48:39

that, so we have one example

48:41

of a company spend a couple

48:43

years, they could not reconcile the

48:45

data. But once they bring the

48:47

data into the graph, they start

48:49

to see the mismatch in one

48:51

month, they fix the data problem.

48:54

But they start to see the

48:57

mismatch because of the dependencies? Because

49:00

now, let's see,

49:02

you know the records are

49:04

unique, right? But

49:06

then when you link the

49:08

other record together, you need to

49:10

see, oh, this record is actually

49:12

duplicating other systems that they recorded

49:15

differently. Somebody made a mistake there.

49:17

Yeah. We see that a lot for

49:19

entity resolution work. You think like

49:21

a social security membership unique. But

49:24

then you're bringing in data from some other

49:26

sources. And there was an

49:28

application where maybe early on the product manager

49:30

said, yeah, we need to collect this

49:32

whole security number. And then later on they

49:34

said, oh no, we can't do that.

49:36

Just put it in a dummy number. And

49:39

so now you've got like this data

49:41

set that has, you know, 5 ,000 instances

49:43

of the same social security number. So once

49:45

you start to put in a graph,

49:47

you're like, wait, isn't that supposed to be

49:49

unique? How come there's like this enormous

49:51

node with like all these things connected to

49:53

it? Something's wrong. So

49:56

it's really also a great way

49:59

to figure out data quality issues.

50:01

Yeah. Although there's security.

50:04

I mean, going back to what we were

50:06

talking about before. if you are looking

50:08

in financial investigations, if you're looking in sort

50:10

of criminal investigation, okay, maybe

50:12

you've got some open data, like

50:14

here's, you know, sanctioned shell companies

50:16

or whatever. And then

50:19

maybe you've got some private information

50:21

like customers, but maybe you've also got

50:23

some feeds of like, oh yeah,

50:25

here's an active investigation. We're looking at

50:27

these people. But then

50:29

these particular people, they

50:31

have, you know, immunity

50:34

because they're diplomats. So

50:36

like there's all these different levels of

50:38

security and you start to pull it all

50:40

together in a graph. You get a

50:42

very comprehensive view. Maybe not everybody

50:45

can even see that. Like you don't,

50:47

you know, you don't want the police

50:49

officers who are doing parking tickets to

50:51

know that, you know, XYZ diplomat might

50:53

be investigated for a crime. Like that

50:55

information should not go out. So

50:59

where do you draw the line? Because

51:01

the graph really brings it all together.

51:03

But then how do you handle security

51:05

issues? The

51:07

access control with the

51:10

graph is automatically harder than

51:12

the tabular, the traditional

51:14

database. Well, it feels

51:16

like one of these, what

51:18

you were talking about, with

51:21

the ways that you visualize

51:23

it, you can

51:25

almost create different

51:27

access controls on

51:29

the visualizations. So

51:31

I don't know if you've thought through that.

51:33

in a way, but is that kind of

51:35

how you go about it? So

51:37

fundamentally, access control needs to be

51:39

in the data management layer. Like

51:43

if the database can

51:45

support access control, you're

51:47

great. We

51:49

actually, however, run into a situation

51:51

that database do not have the

51:53

sufficient access

51:55

control that supports business needs. So

51:57

in that situation, we actually

51:59

have to implement a future layer

52:01

in the data access. When

52:04

we put the data from the

52:06

database, depends on

52:08

the roles and teams,

52:10

and we actually

52:12

prohibit certain information from

52:14

being accessed. But

52:16

that's not a fundamental solution.

52:18

Fundamental solution has to be

52:21

in the data management layer.

52:23

It's a hard problem. In

52:25

previous work, which is

52:27

more like knowledge graphs

52:30

being used for large

52:32

-scale manufacturing, one

52:34

of the things we ran into

52:36

is security access because you take

52:38

procurement data, plus some operations data,

52:40

plus some sales data, put it

52:42

all into a graph. Suddenly,

52:44

you have a picture of how

52:46

the company works. But it's like a

52:48

really confidential picture. It's like maybe

52:50

the board could see this, but nobody

52:52

else in the company should see

52:54

it. So there's a real power

52:56

there, but there's always a risk. And

52:59

how do you manage that is

53:01

a mind -bogglingly difficult problem. I

53:05

read a book

53:07

talk about... the

53:09

certain like intelligentsia communities

53:11

when they go to another

53:13

countries. In the past,

53:16

you use like a falsified

53:18

identities, but today is

53:20

not good idea anymore because

53:22

all the open source

53:25

intelligence out there, even you

53:27

want to with help

53:29

some information, but people can

53:31

stitch together picture because

53:33

of a related piece of

53:36

information, sit there, outside

53:38

and the social media like

53:40

maybe there's a picture of

53:42

you with somebody that you

53:44

did not take a picture

53:46

did not post it but

53:48

somebody posts on Instagram and

53:50

so all those information out

53:52

there can essentially is a

53:54

graph can link back to

53:57

you even though you turn

53:59

really hard to stay hidden

54:01

at that. That's the

54:03

fundamental problem in terms of

54:05

privacy security, or you want

54:07

to control the access information,

54:10

but because you have all

54:12

those connections in the

54:14

graph, that make it really

54:16

hard. And a corollary

54:18

with that, when I talk

54:20

with people in enterprise who are

54:22

doing large -scale knowledge graph practices, the

54:25

one thing that I keep

54:27

hearing over and over again is

54:29

companies using graphs for market

54:31

intelligence, or maybe sometimes you would

54:33

say competitive intelligence. But

54:36

a lot of this might be

54:38

for sales win -back strategies, trying

54:40

to understand who's the competitor that got our bid

54:42

away from us. How can we go back

54:44

and try to... give

54:46

them a better quote. Oh,

54:48

wow. And so I've heard this

54:50

over and over again. We're like, that's

54:53

one of the first graphs that

54:55

starts making a lot of money is

54:57

like literally doing intelligence inside the

54:59

enterprise. Yeah,

55:01

I was going to go down

55:03

that route of like, let's talk

55:05

about a few other cool use

55:08

cases that you have seen, whether

55:10

it's just graphs, or it

55:12

is graph rag, which is

55:14

a hot term these days, you

55:16

know? I

55:19

mean, you know, it's

55:21

interesting. There's a lot

55:23

of graph database vendors, and they really kind

55:25

of lean heavy on the graph query

55:27

side of how to run this. And that's

55:30

something that's very familiar with people in

55:32

data engineering, data science, you

55:34

know, using a query. But I

55:36

think in the graph space, there

55:38

are other areas that aren't query

55:40

first, like using graph algorithms or

55:42

using There's a whole

55:44

other area of what should be

55:46

called statistical relational learning, but you know,

55:48

you've probably heard of like Bayesian

55:50

nets or causality or different areas over

55:53

there of using graphs. But

55:55

then there's also graph neural networks,

55:57

like how can we train deep learning

55:59

models to like understand patterns and

56:01

try to suggest, hey, I'm

56:03

looking at like all the contracts you

56:06

have with your vendors. And

56:08

I noticed that these three here are missing some

56:10

terms. Do you, you know, is that a

56:12

mistake? So I

56:14

think that, you know, there's,

56:16

there's the queries, there's the algorithms,

56:18

there's the causality kind of,

56:20

you know, that

56:25

area of, there's

56:27

also the graph neural networks. There's

56:29

a few other areas too, but these

56:31

are These are all like different camps

56:34

inside of the graph space. They don't

56:36

always necessarily talk with each other, but

56:38

I think it's really fascinating now that

56:40

we're starting to see more and more

56:42

hybrid integrations of them. Yeah.

56:46

I like to point out

56:48

the fundamentally graph and table

56:50

are two sides of the

56:52

same coin. As

56:54

a physicist, we

56:56

look at the sound, music.

56:58

both from frequency domain like

57:00

is a c d e

57:02

f what's the frequency distribution

57:05

and also look at what

57:07

waveforms like time time domain

57:09

like some some situation you

57:11

want to filter or you

57:13

want to access more on

57:15

the frequency domain some sometime

57:17

makes more sense on the

57:19

waveform domain the the same

57:21

data like like graph essentially

57:23

is a giant If

57:26

you think about the

57:28

large language model neural network,

57:31

it's a graph, but

57:33

it's a gigantic,

57:35

extremely sparse matrix, which

57:38

is table, right? And

57:40

the fact back because

57:42

it's such a giant sparse

57:44

matrix causing today, NVIDIA

57:46

is really hard because NVIDIA

57:48

has these GPUs that

57:50

can process those matrix. But

57:52

guess what? My

57:55

brain consumes about 19

57:57

watt energy. The

57:59

GPU running large -language

58:01

model consumes tens of

58:03

thousands of watt of

58:05

energy to get similar

58:08

computation needs. And

58:10

that's extremely inefficient. Even

58:12

though the computation unit is

58:14

much smaller than my neuron, you

58:17

think it should suppose to

58:19

compute a higher efficiency. That's

58:21

precisely because they're dealing with

58:23

extremely sparse matrix. They're not dealing

58:25

the neural network as a

58:27

graph. They're dealing neural network as

58:29

a matrix. And that's fundamentally

58:31

the problem for the power efficiency.

58:34

So there are certain models

58:36

that come up that really

58:38

deal with AI as a

58:40

graph that several automattitude save

58:43

in energy consumption. So

58:45

in the real world application, The

58:48

one reason why Graf hasn't been taken

58:50

off as we all think for the

58:52

past 20 years like oh Graf gonna

58:54

take off, Graf gonna take off, but

58:56

no it did not. The

58:58

fundamental problem is because

59:01

we are so familiar with

59:03

all the tools and

59:05

methodologies like workflows is well

59:07

established in the tabular

59:09

based way of thinking. It's

59:11

like the dependent transportation

59:14

do not release the flight

59:17

data as a graph they

59:19

released as a table

59:21

is easy to access we

59:23

have all the toolings

59:25

that mature to change that

59:27

is extremely difficult so

59:29

in the way I would

59:32

argue that AI is

59:34

always almost made for graph

59:36

because AI suddenly allow

59:38

you to process unstructured information

59:40

like emails reports this

59:42

like a podcast transcriptions like

59:44

videos into a

59:46

structural form that computer can access.

59:49

But guess what? It

59:51

is a graph that AI

59:53

will convert those data into. So

59:56

now you suddenly have this, some

59:58

people argue, I think it's like

1:00:00

80 % of the information existing

1:00:03

on structural form. Some people argue

1:00:05

that even the percentage even larger. So

1:00:08

the AI suddenly make

1:00:10

this like the majority

1:00:12

of the information available

1:00:14

for analytic workflow at

1:00:17

assessment. And

1:00:19

the funny thing is, you need

1:00:21

graph to do that. So

1:00:23

in the way that my

1:00:25

assessment is, because of AI,

1:00:27

because of AI, we're

1:00:30

actually entering the boom,

1:00:32

like exponential growth error

1:00:34

of a graph, because

1:00:36

the availability in the data.

1:00:39

It's like the internet of

1:00:41

things. We've been waiting

1:00:43

for it to happen since

1:00:45

2010 or 2005 whenever

1:00:47

and it's always just around

1:00:49

the corner. But now

1:00:52

it does make sense that if you

1:00:54

have all of this unstructured data and

1:00:56

you have these relations, then that sounds

1:00:58

like a graph to me. Yeah.

1:01:01

And going back to

1:01:04

like 1980s era, hard

1:01:06

AI, you know, whether we're

1:01:08

talking about like A star B star

1:01:10

kind of algorithms or talking about planning systems,

1:01:12

all of these were expressed as graphs. And

1:01:15

like, you know, some of the early

1:01:17

thinking that that was like pre -Google

1:01:19

that led to Google, they were talking

1:01:21

about graphs. Some of that

1:01:23

work actually came out of like groupware,

1:01:25

but based on graphs. So it's there. Funny

1:01:28

you say that because we

1:01:30

had one of the talks at

1:01:32

the AI Quality Conference back

1:01:34

last year. was from the

1:01:36

guy who created Docker, Solomon. And

1:01:39

his whole talk was really like,

1:01:41

everything's a graph. If we

1:01:43

really break it down, it's just, it's

1:01:45

all graphs and how one thing relates

1:01:48

to another thing. I'll throw, I'll

1:01:50

throw something else in to kind of

1:01:52

go back to our early part. We

1:01:54

were talking about East meets West. There's

1:01:56

a book that A really

1:01:59

favorite book, though, from early days.

1:02:02

This is like going back to the early

1:02:04

90s, but early days of neural networks. About

1:02:07

this idea of like, yeah, there's

1:02:09

some conventions in the West, maybe we

1:02:11

can back off. It's by

1:02:13

USC professor called Bart Kosko. It's

1:02:15

called Fuzzy Thinking. And

1:02:17

sort of his critique of

1:02:19

science, but more from a lens

1:02:22

of more Eastern perspectives. I

1:02:25

know that this book is like more than

1:02:27

30 years old, but I think that there's

1:02:29

some really great perspectives there that weigh in

1:02:31

a lot, especially what Wei was saying about

1:02:33

like, where are we now with LLMs and

1:02:35

how we're leveraging this in the context of

1:02:37

graphs. So

1:02:40

I think the other thing,

1:02:42

was there anything else that you

1:02:44

guys wanted to talk about

1:02:46

before we jump? I know

1:02:48

there's a lot of cool data

1:02:50

visualization stuff. that you're doing way.

1:02:52

Yeah, I just want to add

1:02:54

one thing. I

1:02:57

just want to say

1:02:59

the visualization is not the

1:03:01

end. The

1:03:03

goal is to support analytics.

1:03:07

So I know everybody when it

1:03:09

comes to the graph, talk

1:03:11

about graph visualizations. But

1:03:13

in my mind, what's

1:03:15

really what we need is visual

1:03:17

analytics. How can

1:03:19

we visually transform the information?

1:03:21

How can we visually go

1:03:23

from like information that was

1:03:26

suited for data management, for

1:03:28

data capture, that was so

1:03:30

you can access, work them

1:03:32

step by step towards information

1:03:34

that's suitable for presentation for

1:03:36

answering the specific questions in

1:03:38

that particular domain. So

1:03:41

that steps requires a transformation

1:03:43

of data is not just

1:03:45

like a filter. but

1:03:47

also fundamentally in the

1:03:49

graph schema mutation. The

1:03:52

schema you have for the

1:03:54

data capturing is not a schema

1:03:56

suitable for presentation. There are

1:03:58

two different things. If

1:04:01

you think about in the big data era, the

1:04:04

development of the map

1:04:06

reduce allow you to

1:04:08

have this step -by -step

1:04:10

flow of information from

1:04:12

the original captured. tabular

1:04:14

format into a very

1:04:17

different table that you

1:04:19

can present. In

1:04:21

graph, it's the same thing

1:04:23

that the graph anything needs is

1:04:25

a step -by -step, like we

1:04:27

call it calculus or operators, to

1:04:29

transform your data from the

1:04:32

form that's been captured to the

1:04:34

form that you want to

1:04:36

present to answer the question. Now

1:04:39

that calculus It's

1:04:42

based on, I think it

1:04:44

needs to be in two

1:04:46

forms. It needs to be

1:04:48

in the form that you

1:04:50

can process data in large

1:04:52

quantity, like a large graph

1:04:54

step by step mutates. But

1:04:57

also needs to be visually. You

1:04:59

need a same set of, a

1:05:01

parallel set of operator

1:05:03

that a data analyst,

1:05:06

but ideally a domain

1:05:08

expert, not a data. not

1:05:11

somebody who can write

1:05:13

Python or Cypher queries or

1:05:15

GQL. But somebody

1:05:18

with the domain knowledge, look at it,

1:05:20

because graph is so visual. You're

1:05:22

like, hey, I want to

1:05:24

simplify this. Oh, I know

1:05:26

Paco and Wei has so

1:05:28

many meeting points. Let's abstract

1:05:30

that out. Let's just create a

1:05:33

single reading stream that Wei inference,

1:05:35

like Wei and Paco, that they

1:05:37

know each other. and get

1:05:39

rid of all the other information. So

1:05:42

this all maybe say, hey, Parkour

1:05:44

knows a million people. Maybe I

1:05:46

underestimate a little bit of Parkour.

1:05:48

So sorry about that. No

1:05:50

kidding. You probably know more than that. But

1:05:52

let's from the graph, we can quickly compute

1:05:54

this number and put it in the Parkour,

1:05:56

make Parkour very, very big because Parkour knows

1:05:58

a million people. So

1:06:01

that kind of operation is

1:06:03

highly intuitive. So I

1:06:05

want to stress this. The

1:06:07

visualization for graph is not

1:06:09

end. The visualization for graph

1:06:11

is tool you use to transform

1:06:13

the graph to get you the answer.

1:06:15

That's a way point. Very

1:06:17

good. Yeah, that

1:06:20

is very in line with

1:06:22

what you were saying earlier

1:06:24

on how when you don't

1:06:26

know the question, that's sometimes

1:06:28

the hardest part. And so

1:06:30

being able to wrestle with

1:06:33

the data in different forms,

1:06:35

one being the visualizing it

1:06:37

in different ways. That's one

1:06:39

tool to hopefully help you

1:06:41

get to the answer or

1:06:43

first step, the question, which

1:06:45

can then lead to the

1:06:47

answer you're looking for. Yeah.

1:06:50

And to mutate the graph

1:06:52

visually. So you

1:06:54

can start poking it.

1:06:57

Yeah. Yeah, exactly.

1:06:59

It does feel

1:07:01

like the ability to

1:07:04

just mutate

1:07:07

the graph is such a

1:07:09

strong tool. Because of

1:07:11

all these different reasons that

1:07:13

we had mentioned when it comes

1:07:15

to the depth and the

1:07:18

way that you're able to look

1:07:20

at the domains or you're

1:07:22

able to just find anomalies or

1:07:24

find different data quality issues,

1:07:26

whatever it may be, whatever your

1:07:28

use case is, it's very

1:07:30

cool. It does sound though instinctively

1:07:33

a bit manual though, right? So

1:07:37

far I think way has

1:07:39

brilliant examples what they're doing like

1:07:41

with site XR of leveraging

1:07:43

3d visualizations zoom in zoom out

1:07:45

in conjunction with algorithmic ways

1:07:47

Using graph algorithms to sort of

1:07:50

focus the lens focus the

1:07:52

search light I think that more

1:07:54

can be automated over time

1:07:56

and maybe this is where agents

1:07:58

come in is actually helping

1:08:00

determine How to how to be

1:08:03

the cinematographer there on the

1:08:05

graph? Yeah So there's definitely a

1:08:07

way of helping you to

1:08:09

look at perspectives. And

1:08:11

very often we deal with the data

1:08:13

that's both graph -connected nature, but it's

1:08:15

also dimensional. Each

1:08:18

node has so many properties.

1:08:20

Each property is a dimension. So

1:08:22

it's high -dimensional information. So

1:08:25

which dimension set do you want

1:08:27

to take in combination with the

1:08:29

network? information to help you to

1:08:31

see, be able to have a

1:08:34

versatile way, flexible way of choosing

1:08:36

the dimension set, or it's very

1:08:38

often like when you shift from

1:08:40

one dimension to the other dimension,

1:08:42

you reveal some floccings of things

1:08:44

going together, some clustering that are

1:08:47

happening, it really says, hey, those

1:08:49

things always move in the same

1:08:51

direction. So those signals

1:08:53

help you to formulate a

1:08:55

lot of ideas, instincts from

1:08:57

the data. And then when

1:08:59

you see that information, the next thing you want

1:09:01

to know, hey, I want to capture that as

1:09:03

a feature. Now,

1:09:05

can you represent that as

1:09:07

a feature to that

1:09:10

become what you see become

1:09:12

a thing that become

1:09:14

an entity in your visualization

1:09:16

that you can put

1:09:18

back in there. That

1:09:20

is the visual

1:09:22

analytics. Whoa.

1:09:26

So capturing it as a feature

1:09:28

and then you can feed it

1:09:30

into the tabular data in a

1:09:32

way. Yes, exactly. Guys,

1:09:35

this is awesome. Is there

1:09:37

anything else that you want to hit on before

1:09:39

we stop? I feel like I've learned a

1:09:41

ton just from talking to you. I knew it

1:09:43

was going to be great conversation. I was

1:09:45

hanging on to my seat this whole time. It's

1:09:47

like, oh my God, I'm much. I learned

1:09:49

a lot too. Yeah. In

1:09:51

terms of cross -domain, I

1:09:54

want to show one funny

1:09:56

example, like how difficult cross -domain

1:09:58

is. So in this

1:10:00

example, it's an extreme

1:10:02

cross -domain. So I organize

1:10:05

a kind of tech arts,

1:10:07

dance and science, like

1:10:10

nonprofit. So

1:10:13

one thing we do every

1:10:15

week, every Wednesday, we bring people

1:10:17

in the engineers, science domain

1:10:19

and people in the dance, art,

1:10:21

music, domain together, we

1:10:23

explore something together and have

1:10:25

a conversation. The very first

1:10:27

meeting, when we bring

1:10:29

people together, that happened about

1:10:31

like 11 years ago. We

1:10:34

had about 20 people sitting

1:10:36

the room, everybody like a very

1:10:38

vibrant conversation. And then

1:10:40

that's the sudden realized something

1:10:42

that it is true that

1:10:44

everybody speak English, but nobody

1:10:47

can understand each other. Because

1:10:51

they're using simple

1:10:54

cavities. But because of

1:10:56

domain, just like Paco

1:10:58

talked about earlier in the enterprise

1:11:00

setting, because of the domain

1:11:02

difference, they mean

1:11:04

totally different things. A

1:11:06

physicist talk about energy, we have

1:11:09

very concrete things that we call

1:11:11

energy. A

1:11:13

dancer call energy is

1:11:15

a very different way

1:11:17

of energy. When

1:11:19

the computer people talk

1:11:22

about Python, we're not

1:11:24

talking about a snake.

1:11:27

But the dancer, when they hear Python, they're like,

1:11:29

why are you bringing a snake to the

1:11:31

conversation? So

1:11:35

I think just

1:11:37

accurate what Parker said

1:11:39

earlier in the enterprise data

1:11:41

context, that

1:11:43

domain. is very,

1:11:45

very important to be aware of

1:11:48

the domain, knowing the limit

1:11:50

of the domain and how to

1:11:52

find a way to cross -domain. For

1:11:54

us, it's generally a lot of compensation.

1:11:56

I think it's a human problem.

1:11:58

It's not a technical problem. Techno

1:12:01

can help, but only do

1:12:03

that much. We

1:12:06

had a conversation on

1:12:08

here a few months ago

1:12:10

with folks who had

1:12:12

created a data analyst agent.

1:12:14

and they said one

1:12:16

of the hardest parts for

1:12:18

the success of this

1:12:21

agent was to first create

1:12:23

a glossary of business

1:12:25

terms so that the agent

1:12:27

and really trying to

1:12:29

nail down these fuzzy words

1:12:31

and these words that

1:12:33

maybe for one person they

1:12:36

mean one thing and

1:12:38

another person they mean another

1:12:40

thing and the quintessential

1:12:42

example of this is an

1:12:44

MQL When you're at one

1:12:46

company an MQL or when

1:12:48

you're on one team an

1:12:50

MQL is one thing and

1:12:52

when you go to another

1:12:54

team an MQL is another

1:12:57

thing they all mean marketing

1:12:59

qualified lead but When does

1:13:01

that person become a marketing

1:13:03

qualified lead? What do they

1:13:05

have to have done or

1:13:07

what stage are they in

1:13:09

and so the agents may

1:13:11

understand and the LLMs understand

1:13:13

what an MQL is kind

1:13:15

of, but you really have

1:13:17

to flesh out this glossary

1:13:19

to let them know all

1:13:22

of these different terms that

1:13:24

you use and that are

1:13:26

in your database. So

1:13:28

when the agent needs to go

1:13:30

and pull, how many MQLs did

1:13:32

we have last week? It

1:13:34

understands what that means. Yeah,

1:13:37

that's your semantic layer right

1:13:39

there. That's your that's a controlled

1:13:41

vocabulary that you put enough

1:13:43

these together you get your ontology

1:13:46

Yeah, yeah, yeah exactly

Rate

Join Podchaser to...

  • Rate podcasts and episodes
  • Follow podcasts and creators
  • Create podcast and episode lists
  • & much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.
,

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features